CN-121980603-A - Privacy compliance auditing method based on graph distance self-coding model

CN121980603ACN 121980603 ACN121980603 ACN 121980603ACN-121980603-A

Abstract

The invention provides a privacy compliance auditing method based on a graph distance self-coding model. The graph distance self-coding model comprises a graph neural network encoder and a double decoder, a plurality of samples, a positive sample pair and a negative sample pair are utilized to train the model, each sample comprises a sub-graph and node attributes of a target node, the sub-graph and node attributes of the target node are encoded through the encoder to obtain potential space representation of the node, the double decoder reconstructs target node information according to the potential space representation, so that reconstruction loss between the target node information of the samples and the reconstructed target node information is minimized, potential disturbance distances between the potential space representations of the nodes of two samples in the positive sample pair are minimized, and input space distances between the potential space representations of the two nodes in the negative sample pair are maximized as optimization targets, and model parameters are updated. The invention maps the subgraph and the node attribute to the shared potential space, and improves the accuracy of disturbance distance measurement and the success rate of inference attack.

Inventors

XU BINGBING
LI HAOWEN
SU DU
SUN FEI
CAO QI
SHEN HUAWEI
CHENG XUEQI

Assignees

中国科学院计算技术研究所

Dates

Publication Date: 20260505
Application Date: 20251226

Claims (10)

1. A training method for a graph distance self-coding model, the model comprising a graph neural network encoder and a double decoder, the training method comprising: s1, acquiring a training set, wherein the training set comprises a plurality of samples, each sample comprises target node information, and the information comprises a sub-graph of a target node and node attributes; S2, constructing a positive sample pair and a negative sample pair based on a training set, wherein the positive sample pair is composed of one sample in the training set and a disturbed sample obtained by carrying out disturbance treatment on the sample, and the negative sample pair is composed of a sub-graph and node attributes of two different target nodes in the training set; s3, training a model by using the training set, the positive sample pair and the negative sample pair to obtain a trained model, wherein the mode of training the model comprises the following steps: S31, obtaining potential space representation of the target node according to target node information encoding through a graph neural network encoder, and reconstructing target node information according to the potential space representation of the target node by a double decoder; S32, the method takes the minimum reconstruction loss between the target node information of the sample and the reconstructed target node information, the minimum potential disturbance distance between the potential space representations of the target nodes of the two samples in the positive sample pair and the maximum input space distance between the potential space representations of the target nodes in the negative sample pair as an optimization target, and guides the updating of model parameters.
2. The training method of claim 1, wherein the subgraph of the target node in each sample includes edges and neighbor nodes connected to the node, the double decoder includes a structure decoding module, and in S31, the method of reconstructing the target node information includes: Reconstructing a subgraph according to the potential space representation of the target node through a structure decoding module, wherein the subgraph comprises reconstructed edges and neighbor nodes which are connected with the target node; The double decoder further includes a feature decoding module, and in S31, the method for reconstructing the target node information further includes: the node attributes are reconstructed from the potential spatial representation of the target node by the feature decoding module.
3. Training method according to claim 2, characterized in that in said S32, said reconstruction loss comprises: attribute reconstruction losses between the target node attributes of the samples and the reconstructed node attributes, and graph structure reconstruction losses between the subgraphs of the target nodes of the samples and the reconstructed subgraphs.
4. A training method according to claim 3, characterized in that the reconstruction loss is calculated as follows: , Wherein, the Representing the loss of reconstruction and, Representing the reconstruction loss of the structure of the graph, , Connection exists in the target node information representing the positive sample Personal node and the first The edges of the individual nodes are then joined, Representing the set of edges of the positive sample, Representing predicted first Personal node and the first The probability that an edge exists between the individual nodes, Representing the total number of edges of the positive sample, Connection-present in destination node information representing negative samples Personal node and the first The edges of the individual nodes are then joined, Representing the set of edges of the negative sample, Representing the total number of edges of the negative sample, Representing the loss of the attribute reconstruction, , Indicating the total amount of the sample, The number of the sample is indicated and, Represent the first The node properties of the individual samples are, Represent the first Reconstruction node properties of individual samples.
5. The training method of claim 1, wherein S32 comprises constructing a contrast learning loss function that achieves an optimization objective of minimizing the potential disturbance distance and maximizing the input spatial distance by minimizing the value of the contrast learning loss function, wherein the contrast learning loss function is as follows: , Wherein, the Representing a comparison of the learning loss function, Indicating the total amount of the sample, The number of the sample is indicated and, Represent the first A potential spatial representation of the target node of the individual samples, Represent the first A potential spatial representation of the target node of the perturbed sample for each sample, Representation and the first The sample numbers of the individual sample numbers are different, Representing the potential perturbation distance between the potential spatial representations of the two target nodes in the positive sample pair, Represent the first A potential spatial representation of the target node of the perturbed sample for each sample, Representing the input spatial distance between the potential spatial representations of the two target nodes in the negative sample pair.
6. The training method according to claim 2, wherein in the step S2, the manner of the disturbance processing includes modifying a node attribute in the target node information and/or modifying a sub-graph of a node in the target node information to obtain target node information of the disturbed sample; the method for modifying the node attribute comprises the steps of carrying out random masking on the node attribute; The modification processing mode of the subgraph of the node comprises the steps of randomly deleting the edge connected with the node or randomly adding the edge connected with the node.
7. Training method according to one of the claims 1-6, characterized in that the graph distance self-coding model is applied to the field of social network recommendation, the training set adopts a training set of privacy use compliance under the field of social network recommendation; The target node information of each sample in the training set comprises target nodes representing target users in the social network, and subgraphs and node attributes representing personal privacy data of the target users.
8. The privacy compliance auditing method based on the graph distance self-coding model is characterized by being applied to privacy compliance auditing scenes in the field of social network recommendation, and comprises the following steps: Obtaining target node information to be inferred, wherein the target node information comprises a sub-graph and node attributes, wherein target nodes represent target users in a social network, and the sub-graph and node attributes represent personal privacy data of the target users; Acquiring a training set of privacy use compliance in the field of social network recommendation, obtaining a trained graph distance self-coding model by using the training set and based on the method of one of claims 1 to 7, and obtaining potential space representation of a target node by using a graph neural network encoder of the model according to target node information to be inferred; calculating the minimum potential disturbance distance meeting the condition by using a black box decision boundary attack algorithm, wherein the minimum potential disturbance distance is calculated according to the potential space representation of the target node and the potential space representation after disturbance treatment, and the condition comprises that the double decoder of the model can reconstruct according to the potential space representation after disturbance treatment to obtain the target node information to be inferred; performing member reasoning according to the minimum potential disturbance distance, wherein the target node information to be inferred is regarded as a sample of the training set when the potential disturbance distance exceeds a preset threshold, otherwise, the target node information to be inferred is not regarded as the sample of the training set; The target node information determines privacy-use compliance information when the sample is considered as a sample of the training set, and determines privacy-use non-compliance information when the sample is not considered as a sample of the training set.
9. A computer readable storage medium, having stored thereon a computer program executable by a processor to perform the steps of the method of one of claims 1-8.
10. An electronic device, comprising: One or more processors, and A memory, wherein the memory is for storing executable instructions; the one or more processors are configured to implement the steps of the method of one of claims 1-8 via execution of the executable instructions.

Description

Privacy compliance auditing method based on graph distance self-coding model Technical Field The invention relates to the technical field of artificial intelligence security and privacy protection, in particular to the field of information security in a graph neural network, and more particularly relates to a privacy compliance auditing method based on a graph distance self-coding model. Background The graph neural network (Graph Neural Networks, GNNs for short) has been widely used in tasks such as node classification, link prediction, recommendation system, etc. due to its excellent performance in modeling graph structure data (such as social network, citation network, knowledge graph, etc.). GNNs tend to imply rich information of training data in their models, as they exhibit strong fitting capabilities in many practical tasks. However, it is this capability that exposes the graph neural network to increasingly severe risk of privacy disclosure, especially in the context of Membership Inference Attack (MIA) INFERENCE ATTACK. The goal of a membership inference attack is to determine whether a particular sample is used to train the target model, and in practice, an attacker can further infer user identity, behavioral preferences, and even sensitive attributes from this information. In the past, a great deal of research on MIA has focused mainly on extragraph models (e.g., CNN, MLP), but in recent years, expansion has been also gradually performed to Graph Neural Networks (GNNs), revealing that GNNs also have vulnerabilities in terms of privacy protection. Therefore, MIA research aiming at GNN has theoretical significance and has important value in practice for enhancing the credibility and safety of the model. Most existing MIA methods for GNN rely on model posterior output (posterior output) as a clue to the attack, i.e., by accessing a model's predictive probability distribution of samples (e.g., softmax score) to infer whether they belong to a training set member. However, in actual deployment, for privacy protection or system limitations, the server often only provides the final classification result (label), i.e. "label-only output" scenario. In this case, posterior information is not available, rendering many existing inference attack methods ineffective. To address the above problems, an emerging class of label-only (label-only) attack methods have been proposed that attempt to approximate membership by substituting features, such as reasoning by the minimum perturbation distance (which may also refer to the minimum challenge perturbation commonly used in challenge) of the original sample to the decision boundary (i.e., the interface between different classes in feature space). Therefore, the core assumption of the concept is that training samples are usually farther from model decision boundaries, so that the prediction result is more difficult to change by attack, and the existing concept has preliminary verification in an off-graph scene. When the existing thought is applied to privacy compliance audit under GNN member reasoning attack in a label scene, because the existing thought faces to graph data, the non-European structure of the graph data brings new challenges, including 1) compared with simple pixel/vector disturbance in European space, the disturbance of the graph structure needs to consider complex changes of an adjacent relation and node attributes at the same time, definition and calculation of disturbance distance of the graph structure are difficult, and 2) a comparable unified measurement standard is difficult to establish between the graph structure space and the graph node attribute space, and actual disturbance distance of a graph node is difficult to effectively reflect. These challenges all make privacy compliance audits implemented based on GNN membership inference attacks in a tag-only scenario a lack of efficient and viable tools. It should be noted that, the present background art is only for describing the relevant information of the present invention to facilitate understanding of the technical solution of the present invention, but does not mean that the relevant information is necessarily prior art. Related information is filed and published with the inventive arrangements, and should not be considered prior art, in the absence of evidence that related information was published prior to the filing date of the present application. Disclosure of Invention Therefore, the invention aims to overcome the defects of the prior art and provide a privacy compliance auditing method based on a graph distance self-coding model. The invention aims at realizing the following technical scheme: According to the first aspect of the invention, a training method for a graph distance self-coding model is provided, the model comprises a graph neural network encoder and a double decoder, the training method comprises the steps of S1, acquiring a training set, wherein each sample comprises a