Search

CN-115906852-B - Compliance checking method for private data based on knowledge graph

CN115906852BCN 115906852 BCN115906852 BCN 115906852BCN-115906852-B

Abstract

The invention belongs to the technical field of compliance checking for private data, and provides a knowledge-graph-based compliance checking method for private data. The method mainly comprises the steps of automatically detecting whether the data are non-compliant or not in the using process, extracting rules for carrying out compliance detection from the acquired privacy policy data, firstly identifying entities in order to express sentences in unstructured texts by using a formalization method, mining relations among different entities from the privacy policy according to the identified entities, constructing a rule base consisting of triples, constructing a knowledge graph, acquiring the processing process of the privacy data from an audit log of a data processing engine, finding out metadata in the privacy data, establishing a directed acyclic graph representing the data processing process, namely a data processing graph, matching the graph representing the data processing process with the knowledge graph, judging whether the data processing process accords with the rules defined by the privacy policy, and finding out reasons for the non-compliance.

Inventors

  • YANG HAOMIAO
  • BAI XUEJUN
  • Lu Ruiheng
  • Yi Kelai
  • YANG JUNZE
  • MA YONG
  • LIU AO
  • GONG LI
  • LIU YIYAO
  • WANG YINGZHE
  • WU NINGNING
  • ZHANG XIAOLEI

Assignees

  • 电子科技大学
  • 民航成都电子技术有限责任公司
  • 中国民用航空总局第二研究所

Dates

Publication Date
20260512
Application Date
20221214

Claims (4)

  1. 1. The method for checking compliance of private data based on the knowledge graph is characterized by comprising the following steps: step1, acquiring privacy policy data; step 2, extracting rules for compliance detection from the acquired privacy policy data, and firstly identifying an entity to obtain an entity type in order to express sentences in the unstructured text by using a formalization method ; Step 3, according to the identified entities, the relations among different entities are mined from the privacy policy, a rule base composed of triples is constructed, and a knowledge graph is constructed through entity alignment; Step 4, obtaining the processing process of the privacy data from the audit log of the data processing engine, finding out the metadata in the privacy data, and establishing a directed acyclic graph representing the data processing process, namely a data processing graph for short; step 5, matching the graph representing the data processing process with the knowledge graph, judging whether the data processing process accords with the rule defined by the privacy policy, and finding out the reason for the non-compliance; The step 5 comprises the following substeps: Step 5.1, retrieving a directed acyclic graph representing the data processing procedure, i.e. a method of using data in the data processing graph Traversing relationships in rules in a knowledge graph If data use method Is not related to If the operation modes are the same, the return operation mode is not compliant; Step 5.2 if present And (3) with The same is taken out Head node in knowledge graph And tail node ; Step 5.3, respectively finding out the entities according to the attribution relation of the types in the knowledge graph And All parent type entities, and entities And Respectively form a set And Two sets of entities; step 5.4, fetching and data operation mode from data processing diagram Connected usage data roles And data objects Traversing a collection And Whether or not to contain roles of usage data And data objects If so, then the role of the usage data is indicated And data objects Compliance, if not, searching the next and the next in the knowledge graph Identical relationship Step 5.3 is executed, and the roles of the usage data and the data objects are returned to be non-compliant after the traversal is finished; Step 5.5 at the collection And Respectively find And Then the knowledge graph is taken out Adjacent to each other Comparing whether to use the data in the data processing diagram If the data processing rules are the same, returning the data processing compliance, if the data processing rules are different, continuing traversing And 5.3, returning to the use purpose to be non-compliant after the traversal is finished.
  2. 2. A method for compliance checking for use with private data based on a knowledge-graph according to claim 1, wherein said step 2 comprises the sub-steps of: Step 2.1, sentence dividing processing is carried out on the privacy policy to obtain n complete sentences ; Step 2.2. A complete sentence is processed Word segmentation is carried out by using the existing word segmentation technology to obtain a plurality of phrases ; Step 2.3, taking all phrases belonging to a sentence as the input of the pre-training model to obtain the vector representation of each phrase Wherein Representing a vector corresponding to a sentence start symbol; Step 2.4 for each sentence Phrase vector representation of (a) Performing aggregation operation, and inputting it into linear neural network In (1) regarding as a classification problem, for judging whether the sentence contains data processing rules, if the sentence does not contain the data processing rules, then the sentence is not operated on later, if the sentence does If the processing rule is contained, performing step 2.5; Step 2.5, inputting the vector of each phrase into a model for entity recognition, and obtaining tag probabilities of different entity types through normalization Step 2.6, sentence is processed Tag probability for entity type for all phrases in a document Inputting the sentence into the CRF layer, and learning the entity type sequence of sentences in the training set by the CRF layer because the label probability does not consider the front-back relation between the entity types of the phrases, and adjusting the label probability of the entity types to ensure that the entity type sequence of the phrases in the sentences is reasonable and accords with the grammar structure to obtain the final entity type of each phrase 。
  3. 3. A method for compliance checking for use with private data based on a knowledge-graph according to claim 1, wherein said step 3 comprises the sub-steps of: step 3.1 for each sentence Each of which is to be of a different entity type Combining with other different entity types to obtain all possible entity pairs Step 3.2 for each entity pair Performing relationship identification, firstly obtaining the subordinate relationship between the entities, and if the subordinate relationship exists, constructing a triplet Representing entity type Is that Is a subtype of (2); step 3.3 if two entities If there is no dependency, then process relationship metadata between the two entities is extracted, where the relationship metadata contains two relationship types, expressed as Respectively represent the mode and the purpose of the compliance processing data, and construct The first entity representing a role in performing data processing and the second entity representing a processed data object; and 3.4, forming a rule base according to the triples, aligning the entities of the rule base, and fusing the entity alignment into a complete knowledge graph by judging the similarity of entity vector representations in order to find out the entities with different expressions but the same reference.
  4. 4. A method for compliance checking for use with private data based on a knowledge-graph according to claim 1, wherein said step 4 comprises the sub-steps of: When the enterprise related data platform and system use data, related information of data processing starts to be generated in the task submitting stage, a data processing engine comprises a related program for executing data operation in the platform or system, a processing process for obtaining private data from an audit log of the data processing engine is divided into processing processes for single data objects by taking the operation executed on the single data object as a unit, and the processing processes comprise roles of using data, the data object of operation, a method for operating the data and the purpose of using the data; Step 4.2, performing text replacement on the obtained data processing process; step 4.3, word segmentation and vector representation are carried out on the replaced data by using the process, then the directed acyclic graph is found out, the directed acyclic graph is represented as a multi-task multi-label classification problem, and the first node represents the role of using the data The second node represents the data object Edge representation method for operating data and purpose of data use 。

Description

Compliance checking method for private data based on knowledge graph Technical Field The invention belongs to the technical field of compliance checking for private data, and provides a knowledge-graph-based compliance checking method for private data. Background In recent years, the internet emerging technology represented by big data and artificial intelligence is continuously changed, and efficient information processing facilitates the life of people, but personal sensitive information leakage brings a plurality of troubles. In the information technology age, the continuous occurrence of events such as privacy data leakage, abuse, application and the like, and the non-compliance use of data has had serious influence on society. Data security is increasingly emphasized in various countries, and data compliance is gradually moving toward the center of people's vision. Data compliance refers to compliance of companies, industry organizations with data usage regulations, with legal collection, use, storage, and management of data. Along with the importance of various aspects of society, various laws, such as general data protection regulations of European Union, are continuously put out at home and abroad, and laws, such as data security laws, personal information protection laws, and the like, are put out at home and abroad to provide support for digital economy in China, and also become the basis of compliance detection. Compliance detection generally involves two aspects, one is identifying privacy policies, the primary effort being to extract security rules in the relevant privacy policies, identify access control policies in the policies, and develop compliance detection based on the identified rules. The privacy policy is important for enterprises or users, the users can know the use condition of the data according to the policy, and the enterprises can also conduct self-checking by making the privacy policy. On the other hand, the consistency of data use is that along with the continuous strictness and standardization of data processing rules, the problem of inconsistent information processing and privacy rules is also received attention of a plurality of students, and whether the processing of the privacy data is compliant is judged. For the first aspect of privacy data compliance detection, namely the extraction of security rules, the rules can simplify the reading burden of a user, facilitate the user to quickly understand the processing mode of privacy data, and are also beneficial to the enterprises to actually check whether the data use is consistent with a predefined policy. The technical difficulty is how to extract rules for compliance detection from complex and variable unstructured text data, and the manual means has high accuracy, but the privacy policy with continuous output is too low in efficiency, so that the method cannot be adequate for the data scale and development speed in the current age, and the automatic method starts to receive more and more attention. In the automatic rule extraction, the existing method generally adopts a method of combining machine learning and a predefined template, firstly, the hierarchical syntax structure of an algorithm related to the machine learning is adopted, secondly, rules are generated in privacy policies according to the template and keywords, the method has obvious defects that the rule policy extracted through the manually defined template is incomplete, cannot cover all sentences and faces the problem of incomplete rule extraction, a large amount of training data is needed for identifying the rules from the sentences through the machine learning, and the training data generally needs manual labeling, does not have expansibility and is difficult to be used for privacy policies of other categories. In terms of data processing compliance, i.e. checking whether the use of data is within the scope of rules, existing methods are generally based on methods of objective awareness to determine whether the processing of data belongs to the objective, and methods through data flow and business flow, but these methods describe abstractions of behavior, have no relevance to privacy policies, and cannot be combined with rules of privacy processing. In order to solve the problems, we propose a compliance checking method for privacy data based on knowledge graph. The method aims to express rules hidden in privacy policies in a knowledge graph through an automatic method so as to solve the problem that the privacy policies are complex and changeable, extract metadata in data processing, express the metadata into a directed acyclic graph, match the rules in the knowledge graph and judge whether the rules are compliant. Disclosure of Invention The present invention has been developed in view of the problems in the use of compliance detection methods for existing privacy data. The invention provides a compliance checking method for private data based on a know