CN-121997378-A - Local differential privacy map data synthesis algorithm based on neighborhood structure
Abstract
The invention discloses a local differential privacy graph data synthesis algorithm based on a neighborhood structure, which comprises the steps of obtaining node degrees and an adjacency list after disturbance of each user, calculating a projection threshold value for restraining triangular counting sensitivity, guiding the user to calculate and report core neighborhood structure information of the user by utilizing the projection threshold value and a disturbance mechanism, carrying out joint optimization on two neighborhood statistics of the collected degrees and the triangular counting value, correcting structural inconsistency of the two neighborhood statistics, completing generation of target statistics, grouping all nodes by utilizing the corrected target statistics, sequentially executing an edge generation algorithm in groups and among groups, preferentially meeting triangular counting targets, meeting the requirement of the degree values, and completing construction of a synthesized graph meeting local differential privacy protection. The invention obviously improves the availability and fidelity of the synthetic image data by maintaining the key neighborhood structure of the image under strict privacy protection.
Inventors
- GAO YUNJUN
- ZHANG ZHIKUN
- DONG JIAWEI
- YUAN QUAN
Assignees
- 浙江大学
Dates
- Publication Date
- 20260508
- Application Date
- 20260410
Claims (10)
- 1. The local differential privacy map data synthesis algorithm based on the neighborhood structure is characterized by comprising the following steps of: (1) Acquiring node degrees and an adjacency list of each user subjected to local differential privacy disturbance, and calculating a projection threshold value for restricting triangle counting sensitivity based on the noisy number; (2) Projecting the local data of each user by using the projection threshold, calculating and generating a noisy triangular count value, and executing joint optimization operation to correct the inconsistency of the neighborhood structure information so as to obtain the target statistics of each user; (3) And grouping all user nodes according to the target statistics, and sequentially constructing edges in the groups to meet the triangular counting target and edges between the groups to meet the degree target so as to realize the generation of the synthetic graph.
- 2. The local differential privacy map data synthesis algorithm based on the neighborhood structure according to claim 1, wherein the specific implementation manner of the step (1) is as follows: s11, each user node acquires the local true degree and the true adjacency list; s12, disturbing the true degree by using a Laplace mechanism to generate a noisy degree; s13, disturbing the real adjacency list by utilizing a random response mechanism to generate a noisy adjacency list; S14, collecting noise values and noise adjacency lists reported by all users, and obtaining a noise value sequence and a noise adjacency matrix through splicing to form a complete noise statistic set; and S15, determining a projection threshold value of each user based on the generated noise number and privacy budget allocation by solving an optimization problem, namely searching a threshold value for minimizing the sum of information loss errors caused by projection and errors introduced by noise.
- 3. The local differential privacy map data synthesis algorithm based on neighborhood structure according to claim 2, wherein the Laplace mechanism in step S12 first sets a privacy budget for degree collection The user is from the scale parameter Sampling a random noise value from the Laplace distribution of the adjacent list and adding it to the true degree to obtain the noise number, wherein the random response mechanism in the step S13 firstly sets the privacy budget for the adjacent list collection For each bit in the real adjacency list, probability is used Keeping the original value unchanged to probability Inverting the value to generate a noisy adjacency list, wherein the projection threshold value expression in the step S15 is Wherein For the projection threshold of user i, For the number of noisy versions of user i, Is a correction factor to be optimized.
- 4. A local differential privacy map data synthesis algorithm according to claim 3, wherein the correction factor Obtained by minimizing the following objective function: ; Wherein: Is an objective function.
- 5. The local differential privacy map data synthesis algorithm based on the neighborhood structure according to claim 3, wherein the specific implementation manner of the step (2) is as follows: S21, distributing the noisy adjacent matrix and the projection threshold to corresponding users; S22, after receiving the projection threshold, the user node performs projection operation on the real adjacency list of the user node, reduces the adjacency relationship of the user node and generates a trimmed local view; s23, the user node combines the trimmed local view with the noisy adjacent matrix to form a mixed view, and further calculates a local triangular count value of the view and performs unbiased correction; S24, the user node utilizes privacy budget and Laplace mechanism to disturb the triangular count value, generates a noisy triangular count value and reports the noisy triangular count value; S25, aggregating noisy triangular count values reported by all users, and checking whether the neighborhood structural information of each user meets structural constraint conditions or not according to the neighborhood structural information of each user, wherein the neighborhood structural information is characterized by neighborhood statistics formed by pairing noisy count values and noisy triangular count values; S26, for the neighborhood statistics which do not meet the structural constraint, executing a constraint optimization process to find a new set of target statistics which meet the structural constraint.
- 6. The local differential privacy map data synthesis algorithm based on the neighborhood structure according to claim 5, wherein in the step S23, the user node uses the noisy adjacency matrix to query whether a connection edge exists between any two neighbors in the local view, if so, it is determined that the two neighbors and the user node form a triangle, and the number of the triangle is counted by traversing the triangle count value, and unbiased correction is performed on the triangle count value based on the turn-over probability of the adjacency table, in the step S24, the privacy budget for the triangle count is set first For any user i, it is from the scale parameters of Sampling a random noise value in the Laplace distribution, and adding the random noise value into the triangular count value to obtain the noisy triangular count value The structural constraint conditions in the step S25 are: And is also provided with And Is non-negative, the optimization in step S26 is performed by minimizing the objective function Solving to obtain target statistics Wherein And The target degree and target triangular count value for user i respectively, And The weight value is set and inversely proportional to the laplace distribution scale parameters for degree and trigonometric count plus noise, respectively.
- 7. The local differential privacy map data synthesis algorithm based on the neighborhood structure according to claim 6, wherein the specific implementation manner of the step (3) is as follows: s31, grouping all user nodes according to the target triangular count value so that nodes in the same group have similar neighborhood structure targets; S32, checking all the packets in order to ensure the effectiveness and homogeneity of the packets, sorting the packets with the number of nodes not reaching a preset minimum scale according to a target triangular count value, and merging the nodes with adjacent packets in sequence; s33, for each group, determining the generation probability of the connection edges in the group according to the total target triangle quantity, and establishing a dense local neighborhood structure based on the generation probability; s34, after the inner edge of the group is constructed, calculating the generated degree of each node, and comparing the degree with the target degree to obtain the residual degree of each node; S35, constructing a tree array for each group, and maintaining the surplus degree of each node in the group; S36, traversing all different grouping pairs, and calculating the candidate quantity of edges to be generated between the grouping pairs proportionally according to the total residual degree of the two groups of internal nodes; S37, for any group pair, selecting nodes for inter-group connection edges in two groups, wherein the probability of the selected nodes is proportional to the residual degree of the nodes; S38, updating the surplus degree of the two nodes on the connecting edge, and synchronously adjusting the tree array of the group in which the surplus degree is positioned; And S39, repeating the steps S37 to S38 until all the grouping pairs are traversed, generating connecting edges with corresponding candidate numbers, and finally completing the construction of the synthetic graph meeting the target statistics.
- 8. The local differential privacy map data synthesis algorithm based on neighborhood structure according to claim 7, wherein the probability of generation in the step S33 is determined by a relational expression Wherein k is the number of nodes in the group and q is the generation probability, and the candidate number of the generated edges in the step S36 is proportional to the product of the total residual degree of the nodes in the groups.
- 9. A computer device comprising a memory and a processor, wherein the memory has a computer program stored therein, and the processor is configured to execute the computer program to implement the local differential privacy map data synthesis algorithm based on the neighborhood structure according to any one of claims 1 to 8.
- 10. A computer readable storage medium storing a computer program, wherein the computer program is executed by a processor to implement the local differential privacy map data synthesis algorithm based on a neighborhood structure according to any one of claims 1 to 8.
Description
Local differential privacy map data synthesis algorithm based on neighborhood structure Technical Field The invention belongs to the technical field of data privacy, and particularly relates to a local differential privacy map data synthesis algorithm based on a neighborhood structure. Background The graph data synthesis under privacy protection is one of key technologies of graph data analysis and sharing, and by generating a synthetic graph with similar statistical characteristics and without revealing individual privacy, data support can be provided for downstream tasks such as community discovery, anomaly detection, personalized recommendation and the like. Particularly, when processing real image data containing sensitive information such as user relationship, behavior pattern and the like, anonymization or desensitization processing is particularly important before release, and aims to balance data utility and privacy protection requirements. The local differential privacy model is paid attention to because of unique advantages, allows data to be disturbed before leaving user equipment, and even a data collector cannot acquire original sensitive information, so that various inference attacks are effectively resisted, and powerful privacy guarantee is provided. However, in order to achieve high-level privacy protection, a local differential privacy mechanism has to inject a considerable degree of noise into the data, which brings a serious challenge to synthesizing high-fidelity graph data, so that how to accurately capture and reproduce the complex structure of the graph in a noisy environment becomes a critical subject to be solved in the field. Existing methods have difficulty balancing data availability with privacy protection. On one hand, the existing method is limited to collecting the degree distribution of nodes, and is robust to noise, but because the degree carries limited graph structure information, the high-order topological characteristics such as community structure, transitivity, clustering coefficients and the like cannot be effectively captured, so that obvious structural differences exist between the synthesized graph and the real graph, and the usability of the synthesized graph in complex analysis is limited. On the other hand, the existing method directly issues a complete adjacency list, and though the complete adjacency list can retain more comprehensive structural information in theory, a large amount of noise needs to be injected under the local differential privacy to meet strict privacy requirements, so that the original structure is seriously damaged, and the usability of the synthesized data is extremely poor. In order to balance the two, researchers try to adopt the characteristics of intermediate granularity, for example, a method is adopted to enrich the characteristic representation of nodes through an iterative process, the nodes are classified based on the degrees of the nodes, category labels of neighbor nodes of each node are counted in each iteration to form a neighbor category distribution vector, the neighbor category distribution vector is sent to a data collector after being disturbed, and the data collector groups the nodes according to the collected noisy vectors and generates a graph according to the grouping. Although the method contains more neighborhood information than single degree, the method is essentially of indirect statistics of neighbor attributes, and can not sense whether connection exists between neighbors, so that the method can not accurately infer and restore high-order characteristics such as triangle count, cluster coefficient and the like which are critical to a local dense structure, and the final composite graph has obvious deviation from a real graph in terms of community structure and local aggregation. Disclosure of Invention In order to cope with the real scene of privacy-protected graph data synthesis, the invention provides a local differential privacy graph data synthesis algorithm based on a neighborhood structure, which fully utilizes local information of nodes to capture a local dense structure of an original graph, and generates a synthetic graph with similar statistical characteristics based on collected information, so that the usability of the synthetic graph is improved on the premise of guaranteeing individual privacy. A local differential privacy map data synthesis algorithm based on a neighborhood structure comprises the following steps: (1) Acquiring node degrees and an adjacency list of each user subjected to local differential privacy disturbance, and calculating a projection threshold value for restricting triangle counting sensitivity based on the noisy number; (2) Projecting the local data of each user by using the projection threshold, calculating and generating a noisy triangular count value, and executing joint optimization operation to correct the inconsistency of the neighborhood structure information so a