CN-121413031-B - Data privacy protection method, system, computer equipment and storage medium

CN121413031BCN 121413031 BCN121413031 BCN 121413031BCN-121413031-B

Abstract

The invention discloses a data privacy protection method, a data privacy protection system, computer equipment and a storage medium. The method comprises the following steps of obtaining original graph data G to be subjected to privacy protection from a client, initializing a GNN model, pre-training the original graph data G, setting a forgetting target, performing further iterative training optimization on the pre-trained GNN model based on contrast learning to obtain an optimized GNN model, inputting new graph data to be subjected to privacy protection, which is transmitted by the client, into the optimized GNN model, and outputting the graph data subjected to privacy protection. According to the invention, by constructing a layer-by-layer forgetting operator and an improved contrast learning frame, the direct association of the sensitive edges and the propagation influence of the sensitive edges in the graph topology are cooperatively eliminated, so that accurate and efficient graph data forgetting is realized on the premise of ensuring the overall performance of the model, and further effective data privacy protection is realized.

Inventors

LIU FAN
CHEN LINA
WANG SIYI
CHE XUAN
CHEN JINGFEI
GU HUANHUAN
LI QIANMU
HONG QIANG
SHI ZONGSHENG

Assignees

南京理工大学

Dates

Publication Date: 20260505
Application Date: 20251224

Claims (10)

1. A method of protecting data privacy, the method comprising the steps of: Step 1, acquiring original image data G to be privacy protected from a client; Step2, initializing a GNN model and pre-training on the original graph data G; step 3, setting a forgetting target, and carrying out further iterative training optimization on the pre-trained GNN model based on contrast learning to obtain an optimized GNN model; step 4, aiming at new graph data to be privacy-protected transmitted by the client, inputting the new graph data to be privacy-protected into the optimized GNN model, and outputting the graph data after privacy protection; The step 3 specifically comprises the following steps: step 3-1, setting training round number T, and determining target edge set needing to be forgotten in graph data G In each training round, removing the target edge set from the graph data to generate updated graph data ; Step 3-2, the updated graph data Inputting a pre-trained GNN model to obtain a base node embedded Z; Step 3-3, utilizing a layer-by-layer forgetting operator to sequentially revise the basic node embedding from the shallow layer to the deep layer of the GNN model to generate a final embedding ; Step 3-4, constructing a comparison sample pair, namely constructing a positive sample pair and a negative sample pair according to the deleted target edge; step 3-5, constructing a joint loss function comprising a forgetting loss function and a performance maintenance loss function, wherein the forgetting loss function forces the model to ignore the existence of a sensitive edge, namely a target edge needing to be forgotten by maximizing the similarity probability of a positive sample pair; step 3-6, training and optimizing the pre-trained GNN model based on the comparison sample pair and the joint loss function; step 3-7, repeatedly executing the steps 3-2 to 3-6 to perform iterative training until the number of training rounds T reaches or the preset convergence condition is met, and finally outputting the optimized GNN model; the specific process of the step 3-3 comprises the following steps: defining a conditional-activated forgetting operator for each layer of the GNN model, the operator being triggered and performing parameterized transformations only in the local neighborhood of the target sensitive edge, for deleted edges In GNN No. H The mapping function that the layer definition forgetting operator is conditionally active is as follows: ; Wherein, the In order to parameterize the transformation function, Is the first The matrix of learnable weights of the layer, the weight matrix Is shared within each layer of the layer, Represent the first The weight matrix dimension of the layer; Representing edges In the first place Of layers of Skipping a neighborhood subgraph; Is a node In the first place The characteristics of the layers are such that, Is the first Forgetting operators of the layers; determining an optimal neighborhood hop count to achieve an optimal balance between forgetting effect and computational overhead; applying corresponding forgetting operators on each layer sequentially from the shallow layer to the deep layer of the GNN model, correcting node embedding in the neighborhood of the sensitive edge layer by layer, gradually decoupling the association, and outputting final embedding after correction of all layers 。
2. The data privacy protection method according to claim 1, wherein in step 2, the pre-training is performed on the original graph data G by using Adam optimizer until the model converges.
3. The method of claim 1, wherein the step of constructing the comparison sample pair in step 3-4 is performed by deleting edges of the positive sample pair With invalid edges not present in randomly sampled graph data G The composition is that, Negative sample pair is composed of deleted edges With active edges reserved in graph data G The composition is that, 。
4. A data privacy protection method according to claim 3, wherein the joint loss function in step 3-5 is specifically: ; In the formula, In order to combine the loss function(s), As a function of the forgetfulness loss, For the purpose of maintaining a loss function for performance, Is a balance coefficient.
5. The data privacy preserving method of claim 4, wherein the balance coefficient 。
6. The data privacy protection method of claim 4, wherein the forgetting loss function The method comprises the following steps: ; In the formula, Corresponding deleted edges in positive sample pairs With randomly sampled invalid edges The calculated edge representation vector is embedded according to the two end nodes, Deleted edges in corresponding negative sample pairs And the effective side The edges of (a) represent a vector, As a function of the cosine similarity, Is a temperature parameter for controlling the sharpness of the similarity distribution.
7. The data privacy protection method of claim 6, wherein the performance preserving penalty function The method comprises the following steps: ; In the formula, The mean square error is indicated as such, 、 Representing deleted edges respectively Neighborhood node of (c) At the edge After the deletion, before the deletion Layer characteristics.
8. A data privacy protection system based on the method of any of claims 1 to 7, characterized in that the system comprises: The first module is used for obtaining the original image data G to be privacy-protected from the client; the second module is used for initializing the GNN model and pre-training the original graph data G; The third module is used for setting a forgetting target, and carrying out further iterative training optimization on the pre-trained GNN model based on contrast learning to obtain an optimized GNN model; And the fourth module is used for inputting the new image data to be privacy-protected, which is transmitted by the client, into the optimized GNN model and outputting the image data after privacy protection.
9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of any one of claims 1 to 7 when the computer program is executed by the processor.
10. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the method according to any one of claims 1 to 7.

Description

Data privacy protection method, system, computer equipment and storage medium Technical Field The invention belongs to the technical field of artificial intelligence security and data privacy protection, and particularly relates to a data privacy protection method, a system, computer equipment and a storage medium. Background The graph neural network is used as an important tool for processing graph structure data, and has strong capability in the fields of social network analysis, recommendation systems, knowledge graphs and the like. With the perfection of data privacy regulations, how to remove specific users or sensitive relationship data from a trained graph neural network while maintaining overall performance of the model becomes a critical issue to be solved. Early graph data forgetting methods mainly adopt an accurate forgetting scheme, such as thorough forgetting through data slicing and model retraining. Although the method can ensure the forgetting effect, the method needs complete original training data, the calculation cost linearly increases along with the data scale, and the method is difficult to deploy in practical application. To solve the efficiency problem, recent research proposes an approximate forgetting method based on an influence function, and the influence of data points on a model is estimated by calculating a parameter gradient. Although the method reduces the calculation cost, the method has the following limitations that 1) only the direct influence of target data is considered, the topological propagation effect in a graph structure is ignored, so that forgetting is incomplete, 2) a protection mechanism for a neighborhood structure is lacked when sensitive edges are removed, so that the model performance is obviously reduced, and 3) good balance between privacy protection and model effectiveness is difficult to achieve. The fundamental problem of the existing method is that the interaction mechanism of the topological characteristic of the graph data and the neural network parameter update is not fully considered. In particular, the deletion of edges in the graph affects the representation of nodes within its hop neighborhood through multi-layer network propagation, whereas existing approaches lack targeted handling of such propagation effects. Therefore, a new method for forgetting graph data, which can simultaneously achieve forgetting thoroughness, calculation efficiency and performance maintenance, needs to be designed to realize effective data privacy protection. Disclosure of Invention The present invention aims to solve the above problems of the prior art, and provides a data privacy protection method, a system, a computer device and a storage medium. According to the invention, by constructing a layer-by-layer forgetting operator and an improved contrast learning frame, the direct association of the sensitive edges and the propagation influence of the sensitive edges in the graph topology are cooperatively eliminated, so that accurate and efficient graph data forgetting is realized on the premise of ensuring the overall performance of the model, and further effective data privacy protection is realized. The technical solution for realizing the purpose of the invention is that, on one hand, a data privacy protection method is provided, the method comprises the following steps: Step 1, acquiring original image data G to be privacy protected from a client; Step2, initializing a GNN model and pre-training on the original graph data G; step 3, setting a forgetting target, and carrying out further iterative training optimization on the pre-trained GNN model based on contrast learning to obtain an optimized GNN model; And 4, inputting the new image data to be privacy-protected transmitted by the client into the optimized GNN model, and outputting the image data after privacy protection. Further, in step 2, the Adam optimizer is used to pretrain the original graph data G until the model converges. Further, the step 3 specifically includes: step 3-1, setting training round number T, and determining target edge set needing to be forgotten in graph data G In each training round, removing the target edge set from the graph data to generate updated graph data; Step 3-2, the updated graph dataInputting a pre-trained GNN model to obtain a base node embedded Z; Step 3-3, utilizing a layer-by-layer forgetting operator to sequentially revise the basic node embedding from the shallow layer to the deep layer of the GNN model to generate a final embedding ; Step 3-4, constructing a comparison sample pair, namely constructing a positive sample pair and a negative sample pair according to the deleted target edge; step 3-5, constructing a joint loss function comprising a forgetting loss function and a performance maintenance loss function, wherein the forgetting loss function forces the model to ignore the existence of a sensitive edge, namely a target edge needing to be forgotten by maximiz