CN-122021646-A - Privacy policy compliance assessment method, device, medium and product based on large model
Abstract
The application discloses a privacy policy compliance assessment method, equipment, medium and product based on a large model, and relates to the technical field of computer information processing, wherein the method comprises the steps of intelligently partitioning privacy policies, shortening the data volume required to be processed by the large model once, solving the problem that the large model physically intercepts the text when the input text is overlong, and avoiding the problem that the attention degree and the memory effect of a large model core algorithm on remote position information are obviously reduced when the hyper-long text is processed, namely the long text forgetting problem; in addition, by constructing an evaluation knowledge graph and a navigation semantic tree, and realizing structural mapping from legal rules to specific texts through vector matching and screening, specific terms of the source policy can be traced to the evaluation rules, and the requirement of generating specific evaluation records in an evaluation task is met.
Inventors
- ZHANG MENG
- LI JINYU
- LI CHAO
- LIU QI
- YANG HAOYU
- LIU ZHAOMAN
- BI SHANSHAN
- WANG QIAN
Assignees
- 上海计算机软件技术开发中心
Dates
- Publication Date
- 20260512
- Application Date
- 20260203
Claims (10)
- 1. The privacy policy compliance assessment method based on the large model is characterized by comprising the following steps of: The method comprises the steps of constructing an evaluation knowledge graph based on a first file, wherein the evaluation knowledge graph comprises hierarchical concept nodes constructed based on the first file, and correlation edges exist among the hierarchical concept nodes and are used for linking the hierarchical concept nodes, and the hierarchical concept nodes comprise primary concept nodes, secondary concept nodes and tertiary concept nodes; Constructing a navigation semantic tree for the privacy policy based on the evaluation knowledge graph, specifically comprising: intelligent blocking is carried out on the privacy policy to obtain text blocks, and then the text blocks are vectorized to obtain block vectors; Selecting templates from the case template set of the secondary concept nodes, vectorizing to obtain template vectors, calculating the similarity between the block vectors and the template vectors, and screening out matching results meeting the standard according to the similarity; constructing the navigation semantic tree according to the hierarchical outline of the privacy policy, the matching result and a preset prompt word; and generating a structured evaluation record by using a large model based on the evaluation knowledge graph and the navigation semantic tree.
- 2. The large model-based privacy policy compliance assessment method according to claim 1, wherein the attribute set of primary concept nodes comprises a primary concept node unique identifier, a primary concept node name and terms of the first file, the attribute set of secondary concept nodes comprises a secondary concept node unique identifier, a secondary concept node name, the associated primary concept node unique identifier and the case template set, and the attribute set of tertiary concept nodes comprises a tertiary concept node unique identifier, a tertiary concept node name, the associated secondary concept node unique identifier and an assessment rule.
- 3. The method for large model based privacy policy compliance assessment according to claim 2, wherein intelligent partitioning of the privacy policy to obtain text blocks, and vectorizing the text blocks to obtain block vectors, the steps comprising: converting the privacy policy into plain text content; performing paragraph segmentation on the plain text content to obtain paragraph blocks; Calculating the character length of the paragraph blocks and outputting the text blocks; outputting the paragraph block as the text block if the character length of the paragraph block is smaller than or equal to the standard character length; if the character length of the paragraph blocks is larger than the standard character length, dividing according to specific punctuation marks to generate sub paragraph blocks, outputting the sub paragraph blocks into text blocks, wherein the character length of the sub paragraph blocks is smaller than or equal to the standard character length; And calling an embedded model, and carrying out vectorization processing on each text block to obtain the block vector.
- 4. The method for evaluating the compliance of privacy policies based on large models according to claim 2, wherein the steps of selecting templates from the set of case templates of the secondary concept nodes and vectorizing to obtain template vectors, calculating the similarity between the block vectors and the template vectors, and screening out matching results meeting standards according to the similarity comprise: selecting at least one template from a set of case templates of the secondary concept nodes; Calling an embedded model to carry out vectorization processing on the template to obtain the template vector; Calculating cosine similarity of the block vector and the template vector; Screening out the text blocks with the cosine similarity larger than a preset threshold value; And outputting the matching result corresponding to the text block, wherein the matching result comprises the text block, the secondary concept node, the template, the cosine similarity and the corresponding relation between the template and the cosine similarity.
- 5. The method for evaluating compliance with privacy policy based on big model as claimed in claim 2, wherein the step of constructing the navigation semantic tree according to the hierarchical outline of the privacy policy, the matching result and the preset hint word comprises: calling a big model to extract the hierarchical outline from the privacy policy according to a preset prompt word, wherein the hierarchical outline comprises a plurality of hierarchical titles; labeling each text block with a title label corresponding to the hierarchical outline and a concept label corresponding to the evaluation knowledge graph according to the matching result and the hierarchical outline, wherein the concept label comprises the primary concept node and the secondary concept node. Dividing the text block into single sentences, reading all evaluation rules under the control of the secondary concept nodes of the text block corresponding to the sentences, and marking three-level concept node labels or non-matching word patterns on the sentences according to the evaluation rules; and taking title of the privacy policy as a root node, taking the sentences as branch nodes, taking the title labels, the concept labels and the three-level concept node labels corresponding to the sentences as leaf nodes, and constructing the navigation semantic tree according to the hierarchical relation of the hierarchical outline and the sentence sequence.
- 6. The large model based privacy policy compliance assessment method of claim 5, wherein the step of generating a structured assessment record using large model collaborative processing based on the assessment knowledge graph and the navigational semantic tree comprises: traversing the three-level concept nodes in the evaluation knowledge graph, and searching the related sentences and the evaluation rules in the navigation semantic tree for each three-level concept node; invoking the large model to generate a formatted evaluation record based on the evaluation rules corresponding to the three-level concept nodes by the prompt words; extracting and storing the evaluation record by using a regular expression.
- 7. The large model based privacy policy compliance assessment method as defined in claim 6, further comprising: generating a new privacy policy for compliance based on the evaluation record, the specific steps comprising: The evaluation records comprise a positive evaluation record and a negative evaluation record; invoking the large model and rectifying and improving the negative evaluation record according to the rectifying and improving prompt word so as to generate rectifying and improving advice; And calling the big model, and generating the privacy policy, the front evaluation record and the correction proposal according to the generated prompt word, so as to generate the new privacy policy.
- 8. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor executes the computer program to implement the steps of the large model-based privacy policy compliance assessment method of any of claims 1-7.
- 9. A computer readable storage medium having stored thereon a computer program, which when executed by a processor implements the steps of the large model based privacy policy compliance assessment method of any of claims 1-7.
- 10. A computer program product comprising a computer program which, when executed by a processor, implements the steps of the large model based privacy policy compliance assessment method of any of claims 1-7.
Description
Privacy policy compliance assessment method, device, medium and product based on large model Technical Field The application relates to the technical field of computer information processing, in particular to a privacy policy compliance assessment method, device, medium and product based on a large model. Background Under the wave of the digital age, the key supporting role of data as a core production element for various industries is more remarkable, and the data security risk is also increased sharply. Security events such as data leakage, abuse, tampering and the like frequently threaten personal privacy and business confidentiality, are more likely to rise to the strategic level affecting national security, and form a serious challenge for the robust operation of the novel digital infrastructure. Privacy policy texts are characterized by redundancy, one privacy policy text for complex businesses often contains tens of thousands of words, a traditional method for carrying out policy compliance assessment based on a large model has a long text forgetting problem, specific terms of the source policy cannot be traced according to assessment rules, and the requirement of generating specific assessment records in an assessment task is difficult to meet. Disclosure of Invention The application aims to provide a privacy policy compliance assessment method, device, medium and product based on a large model, which can effectively solve the problem of long texts, establish a traceable mapping relation, reversely position any judgment to a specific sentence of an original text during assessment, and meet the rigid requirement of an assessment task on an evidence chain. In order to achieve the above object, the present application provides the following solutions: The application provides a privacy policy compliance assessment method based on a large model, which comprises the steps of constructing an assessment knowledge graph based on a first file, wherein the assessment knowledge graph comprises hierarchical concept nodes constructed based on the first file, and correlation edges exist among the hierarchical concept nodes and are used for linking the hierarchical concept nodes, and the hierarchical concept nodes comprise primary concept nodes, secondary concept nodes and tertiary concept nodes; Constructing a navigation semantic tree for the privacy policy based on the evaluation knowledge graph, specifically comprising: intelligent blocking is carried out on the privacy policy to obtain text blocks, and then the text blocks are vectorized to obtain block vectors; Selecting templates from the case template set of the secondary concept nodes, vectorizing to obtain template vectors, calculating the similarity between the block vectors and the template vectors, and screening out matching results meeting the standard according to the similarity; constructing the navigation semantic tree according to the hierarchical outline of the privacy policy, the matching result and a preset prompt word; and generating a structured evaluation record by using a large model based on the evaluation knowledge graph and the navigation semantic tree. Optionally, the attribute set of the primary concept node comprises a primary concept node unique identifier, a primary concept node name and the clause of the first file, the attribute set of the secondary concept node comprises a secondary concept node unique identifier, a secondary concept node name, the associated primary concept node unique identifier and a case template set, and the attribute set of the tertiary concept node comprises a tertiary concept node unique identifier, a tertiary concept node name, the associated secondary concept node unique identifier and an evaluation rule. Optionally, performing intelligent blocking on the privacy policy to obtain text blocks, and vectorizing the text blocks to obtain block vectors, where the steps include: converting the privacy policy into plain text content; performing paragraph segmentation on the plain text content to obtain paragraph blocks; Calculating the character length of the paragraph blocks and outputting the text blocks; outputting the paragraph block as the text block if the character length of the paragraph block is smaller than or equal to the standard character length; if the character length of the paragraph blocks is larger than the standard character length, dividing according to specific punctuation marks to generate sub paragraph blocks, outputting the sub paragraph blocks into text blocks, wherein the character length of the sub paragraph blocks is smaller than or equal to the standard character length; And calling an embedded model, and carrying out vectorization processing on each text block to obtain the block vector. Optionally, the step of selecting a template from the case template set of the secondary concept node and vectorizing to obtain a template vector, calculating the similarity between the block vect