CN-121765770-B - Privacy information protection method and system based on privacy disclosure evaluation
Abstract
The invention relates to the technical field of data privacy protection, and discloses a privacy information protection method and system based on privacy revealing evaluation. The method comprises the steps of collecting heterogeneous original interaction data of multiple scenes, forming a user privacy behavior portrait through fusion cleaning, and identifying core data nodes and associated edges in the user privacy behavior portrait. And configuring differential sensitivity weights for privacy categories to which each node belongs, generating a dynamic evaluation grid, and aggregating the dynamic evaluation grid into a panoramic risk topological graph. And calculating a multi-dimensional leakage probability curved surface by simulating the circulation aggregation process of the information in the topological graph, and determining the privacy vulnerability index according to the multi-dimensional leakage probability curved surface. And deriving and integrating a protection response action sequence for each privacy class by combining the risk tolerance boundaries to form an executable reinforcement scheme. The method can evaluate the complex leakage risk caused by the data relevance and realize self-adaptive active protection.
Inventors
- XU XIAOYAN
- SUN KAILONG
- LIN XIAOTIAN
- LI HAO
- ZHANG CHI
Assignees
- 杭州美宿在途网络科技有限公司
Dates
- Publication Date
- 20260512
- Application Date
- 20260303
Claims (9)
- 1. A privacy information protection method based on privacy disclosure evaluation, comprising: Collecting an original interaction data stream related to a target user, wherein the original interaction data stream comprises heterogeneous data generated under various service scenes; Fusing and cleaning the original interaction data stream to form a structured user privacy behavior portrait; Based on the user privacy behavior portrait, identifying core data nodes and associated edges forming a privacy leakage path; For each core data node, according to the privacy category to which the data attribute belongs, configuring differentiated sensitivity measurement weight, and generating a corresponding dynamic evaluation grid for each privacy category according to the sensitivity measurement weight, wherein the dynamic evaluation grid comprises the following steps: analyzing the data attribute of the core data node, and classifying the core data node into a plurality of preset privacy categories; obtaining the accessed frequency and the derived data quantity of the core data node in a preset history period, and calculating to obtain a dynamic adjustment factor; Weighting and fusing the reference weight coefficient and the corresponding dynamic adjustment factor to obtain the final sensitivity measurement weight of each privacy class; determining the grid division density of each privacy category according to the size of the sensitivity measurement weight, wherein the higher the sensitivity measurement weight is, the higher the grid division density corresponding to the privacy category is; aiming at each privacy category, establishing an evaluation grid unit with a hierarchical structure in a data value domain space according to the corresponding grid division density, and recording a covered data value range and a current risk state by each evaluation grid unit to form the dynamic evaluation grid; Simulating the circulation and aggregation process of the privacy information in the panoramic risk topological graph based on the panoramic risk topological graph, and calculating to obtain a multidimensional leakage probability curved surface; Determining a privacy vulnerability index at the current moment according to the multi-dimensional leakage probability curved surface; And deducing necessary protection response action sequences for each privacy category by combining with a preset risk tolerance boundary, and integrating all the protection response action sequences to form an executable privacy reinforcement scheme.
- 2. The privacy information protection method according to claim 1, wherein the dynamic evaluation grid for aggregating all privacy categories generates a panoramic risk topology map, comprising: extracting evaluation grid cells recorded with risk states in each dynamic evaluation grid as grid cells to be aggregated; establishing directional connection links between grid units to be aggregated of different privacy categories according to the association relation, wherein the directional connection links have weight attributes, and the size of the weight attributes represents the propagation strength of privacy risks along the directional connection links; And carrying out layout and rendering on all grid units to be aggregated and the directional connection links which are established with the directional connection links in a unified topological space, wherein the grid units to be aggregated are used as topological nodes, the directional connection links are used as topological edges, and then the panoramic risk topological graph is generated, and each topological node in the panoramic risk topological graph stores the identification of the source privacy category and the risk state value of the original grid unit.
- 3. The privacy information protection method based on privacy disclosure assessment according to claim 1, wherein the simulating the circulation and aggregation process of the privacy information in the panoramic risk topological graph based on the panoramic risk topological graph, and calculating to obtain the multidimensional disclosure probability curved surface comprise: Setting at least one simulated leakage source point in the panoramic risk topological graph, wherein the simulated leakage source point corresponds to a potential privacy information exposure entrance in actual service, taking the simulated leakage source point as a starting point, performing multi-round risk diffusion simulation in the panoramic risk topological graph according to the propagation intensity and direction indicated by the weight attribute of a topological edge, recording the step number and the accumulated intensity of a risk signal reaching each topological node in each round of simulation, counting the frequency of the risk signal reaching each topological node in all simulation rounds, converting the frequency into initial leakage probability, correcting the initial leakage probability by combining the risk state value of an original grid unit saved by the topological node, obtaining a final leakage probability value of each topological node, classifying all the topological nodes according to the privacy category to which the topological nodes belong and the position in a value domain space, and fitting the final leakage probability value of the topological node under the same privacy category into a continuous curved surface in a three-dimensional space, namely the multi-dimensional leakage probability curved surface.
- 4. The privacy information preserving method based on privacy revealing assessment according to claim 1, wherein the collecting the original interaction data stream related to the target user comprises: asynchronously collecting original data packets containing click streams, transaction records, position tracks and equipment information from an application client, a server log and a network probe used by a target user, performing timestamp alignment and format standardization processing on the collected original data packets, and removing noise data and invalid fields in the collected original data packets to form a standard data record; and carrying out association binding on the standard data records from different data sources and unique target user identity marks according to a preset user entity analysis rule, and generating the original interactive data stream arranged in time sequence.
- 5. The method for protecting privacy information based on privacy disclosure assessment according to claim 4, wherein the fusing and cleaning the original interactive data stream to form a structured user privacy behavior representation comprises: performing behavior slicing on the original interaction data stream, wherein each behavior slice covers a complete user operation session; Extracting operation types, operation objects and operation environment contexts representing user intentions from each behavior slice to form behavior tuples; Comparing the behavior tuples in the continuous multiple behavior slices, and identifying a repetitive behavior mode and abnormal deviation behaviors; matching the identified behavior pattern with a preset privacy behavior knowledge base, and marking out behavior tuples related to sensitive information And constructing the user privacy behavior portraits of a tree structure by taking the target user identity as a root node and taking the marked behavior tuples related to the sensitive information and the associated context information as branches and leaves, wherein each node of the user privacy behavior portraits comprises a behavior type, data content and a sensitive level label.
- 6. The privacy information protecting method according to claim 5, wherein the identifying the core data nodes and the associated edges that constitute the privacy disclosure path based on the user privacy behavior representation comprises: Traversing the tree structure of the user privacy behavior portrait, primarily screening nodes containing sensitivity level labels higher than a preset threshold value as candidate data nodes, analyzing association relations among the candidate data nodes, if two candidate data nodes are continuous in time sequence and have information deduction or enhancement relations in semanteme, establishing a directional connection between the two candidate data nodes as candidate association edges, calculating connection closeness of each candidate association edge, comprehensively determining the connection closeness based on the frequency of behavior transfer between the two nodes, the data similarity and time interval, screening candidate association edges with the connection closeness higher than the preset value and candidate data nodes connected with the candidate association edges, and respectively confirming the candidate association edges as cores and the core data nodes to jointly form a plurality of potential privacy leakage paths.
- 7. The privacy information protection method according to claim 1, wherein the determining the privacy vulnerability index at the current time according to the multi-dimensional revealing probability surface comprises: Performing integral operation on the multidimensional leakage probability curved surface corresponding to each privacy category to obtain the overall leakage risk quantity of the privacy category; acquiring average leakage risk amounts of all privacy categories of the target user in a historical period, comparing the overall leakage risk amount at the current moment with the corresponding historical average leakage risk amount, and calculating to obtain a risk change ratio; Weighting and summing the risk change ratios according to the sensitivity measurement weight of each privacy class to obtain a comprehensive value; inputting the comprehensive numerical value into a preset indexing mapping function, and outputting the normalized privacy vulnerability index, wherein the value range of the privacy vulnerability index is zero to one.
- 8. The privacy information protection method according to claim 7, wherein the deriving a necessary protection response action sequence for each privacy class in combination with a preset risk tolerance boundary, and integrating all protection response action sequences to form an executable privacy reinforcement scheme comprises: Comparing the privacy vulnerability index with a preset risk tolerance boundary, wherein the risk tolerance boundary comprises a plurality of level thresholds; The method comprises the steps of determining a current required risk treatment level according to a comparison result, sequencing each privacy category according to the overall leakage risk quantity, selecting a group of ordered protection operations from a preset response action library according to the current required risk treatment level and the specified number of privacy categories with the front risk quantity sequence to form a protection response action sequence of the privacy category, wherein the protection operations comprise data desensitization, access interception, permission recovery or log enhancement, coordinating conflict between time arrangement and resource occupation of the protection response action sequences of the privacy categories, formulating a unified execution sequence and resource allocation plan, and finally integrating the protection response action sequence of the privacy categories into an executable privacy reinforcement scheme containing specific operation instructions, execution conditions and expected indexes.
- 9. A privacy information protection system based on privacy revealing evaluation comprising a memory, a processor and a computer program stored in the memory and running on the processor, characterized in that the processor, when executing the computer program, implements the steps of a privacy information protection method based on privacy revealing evaluation according to any of the preceding claims 1 to 8.
Description
Privacy information protection method and system based on privacy disclosure evaluation Technical Field The invention relates to the technical field of data privacy protection, in particular to a privacy information protection method and system based on privacy revealing evaluation. Background In current privacy preserving practices, the mainstream methods rely mostly on a pre-set static rule base or fixed sensitive data identification. These techniques typically employ a unified encryption or desensitization policy for the identified sensitive information. Such methods have difficulty in effectively assessing dynamic relevance and context sensitivity within data in the face of heterogeneous interaction data derived from a variety of service scenarios. The protection action of the method is often lagged behind the actual evolution process of the privacy risk, and the capability of carrying out real-time and differential risk assessment according to the actual value and association relation of the data in a specific business scene is lacking. Another common disadvantage of the prior art is that the risk assessment model is mostly discretized, islanded. They typically treat various types of privacy data as independent assessment objects, calculating their individual leakage risk values. The method cannot describe and quantify the linkage risk caused by aggregation and transmission of privacy information in a complex data flow network. When an attacker utilizes the association among a plurality of low-sensitivity data nodes to carry out inference attack, the traditional protection system based on single-point evaluation is easy to fail, and the derived privacy leakage path generated by data aggregation cannot be early warned in advance. Disclosure of Invention The invention aims to provide a privacy information protection method and system based on privacy disclosure evaluation, which are used for solving the problems in the background technology. In order to achieve the above object, the present invention provides a privacy information protection method based on privacy disclosure evaluation, the method comprising: Collecting an original interaction data stream related to a target user, wherein the original interaction data stream comprises heterogeneous data generated under various service scenes; Fusing and cleaning the original interaction data stream to form a structured user privacy behavior portrait; Based on the user privacy behavior portrait, identifying core data nodes and associated edges forming a privacy leakage path; Aiming at each core data node, configuring differentiated sensitivity measurement weights according to privacy categories to which data attributes of the core data node belong, and generating corresponding dynamic evaluation grids for each privacy category according to the sensitivity measurement weights; Simulating the circulation and aggregation process of the privacy information in the panoramic risk topological graph based on the panoramic risk topological graph, and calculating to obtain a multidimensional leakage probability curved surface; Determining a privacy vulnerability index at the current moment according to the multi-dimensional leakage probability curved surface; And deducing necessary protection response action sequences for each privacy category by combining with a preset risk tolerance boundary, and integrating all the protection response action sequences to form an executable privacy reinforcement scheme. Preferably, the configuring differentiated sensitivity measurement weights according to privacy categories to which data attributes belong, and generating corresponding dynamic evaluation grids for each privacy category according to the sensitivity measurement weights, includes: analyzing the data attribute of the core data node, and classifying the core data node into a plurality of preset privacy categories; obtaining the accessed frequency and the derived data quantity of the core data node in a preset history period, and calculating to obtain a dynamic adjustment factor; Weighting and fusing the reference weight coefficient and the corresponding dynamic adjustment factor to obtain the final sensitivity measurement weight of each privacy class; determining the grid division density of each privacy category according to the size of the sensitivity measurement weight, wherein the higher the sensitivity measurement weight is, the higher the grid division density corresponding to the privacy category is; And establishing an evaluation grid unit with a hierarchical structure in a data value domain space according to the corresponding grid division density of each privacy category, and recording the covered data value range and the current risk state of each evaluation grid unit by each evaluation grid unit to form the dynamic evaluation grid. Preferably, the dynamic evaluation grid for aggregating all privacy categories generates a panoramic risk topological graph,