CN-121981169-A - System and method for optimizing graph self-coding mask strategy for link prediction task

CN121981169ACN 121981169 ACN121981169 ACN 121981169ACN-121981169-A

Abstract

The invention discloses a graph self-coding mask strategy optimization system and method for a link prediction task, wherein the system comprises a mask generation module, an adaptive mask strategy module and a graph self-coding module, wherein the mask generation module is used for calculating importance scores of reference edges connecting two documents in an original document reference network, the adaptive mask strategy module is used for dynamically adjusting the mask strategy according to the importance scores to generate a mask edge set, the graph self-coding module is used for carrying out mask processing on the original document reference network according to the mask edge set to obtain a masked graph structure, carrying out node representation learning and link prediction on the masked graph structure, and optimizing model parameters by using a loss function.

Inventors

LI JIE
LI JINCHENG
CHENG GAOFENG
ZHAO KE
LUO RONG
YANG SIMIN
LIU ZEYI
CUI ZHEN
DENG YUQIU
QIU ZHANHONG
ZHANG HUASEN

Assignees

北京邮电大学
中国人民解放军91054部队

Dates

Publication Date: 20260505
Application Date: 20260126

Claims (8)

1. The self-coding mask strategy optimizing system for the graph facing the link prediction task is characterized by comprising a mask generating module, an adaptive mask strategy module and a self-coding module; the mask generation module is used for calculating importance scores of reference edges connecting two documents in the original document reference network; The self-adaptive mask strategy module is used for dynamically adjusting a mask strategy according to the importance scores to generate a mask edge set; the graph self-coding module is used for carrying out mask processing on the original document reference network according to the mask edge set to obtain a masked graph structure, carrying out node representation learning and link prediction on the masked graph structure, and optimizing model parameters by using a loss function.
2. The link prediction task oriented graph self-encoding masking policy optimization system of claim 1, wherein said importance score is: Where s i denotes the importance score, min denotes the minimum value, D u denotes the frequency of reference u, and D v denotes the frequency of reference v.
3. The link prediction task oriented graph self-encoding masking policy optimization system of claim 1, wherein the workflow of said adaptive masking policy module comprises: the cited edges are ordered in a descending order based on the importance scores, and the first K important edges are selected to form an important edge set Y; And (3) increasing probability by using the important side set Y to optimize training, wherein the mask probability set of all sides is as follows: Where P represents the set of mask probabilities for all edges, Representing initial probability values sampled from a uniform distribution before each round of training, β representing the probability of additional masks assigned by the high scoring edges, e i representing the ith edge in the graph structure; Performing reverse ordering on all edges according to probability values in the mask probability set from high to low to mask to obtain the mask edge set needing masking 。
4. The link prediction task oriented graph self-encoding mask policy optimization system of claim 1, wherein the workflow of the graph self-encoding module comprises: Processing the original document referencing network based on the set of masking edges to construct a masked post-masking graph structure: Wherein, the The post-mask graph structure is represented, A collection of documents is represented and, Representing a set of reference relationships maintained after the masking operation; performing representation learning on document nodes in the masked graph structure through an encoder to obtain potential representations of documents, and then constructing feature representations of reference edges through combining connected document node representations; Based on the feature representation of the referenced edges, calculating the existence probability of each edge by using a decoder: Wherein, the Representing an initial representation of edge e v,u , K represents the number of layers of the graph neural network, Representing the node vector representation of the ith layer after document v is learned by the neural network encoder, A node vector representation representing the j-th layer after the document u has been learned by the encoder, Which represents the Hadamard product of the two, Representing the probability that references exist for document u and document v, and MLP represents the multi-layer perceptron; Based on the existence probability, constructing a reconstruction loss function: Wherein, the Representing the reconstruction of the loss function, Representing the positive sample reference relationship, Representing the negative-sample reference relationship, Representing the average reconstruction loss for a positive sample set, Representing the average reconstruction loss of the negative sample set; Model parameters are optimized based on the reconstruction loss function.
5. A method for optimizing a graph self-coding mask strategy for a link prediction task, which is applied to the system as claimed in any one of claims 1 to 4, and is characterized by comprising the following steps: Calculating importance scores of reference edges connecting two documents in an original document reference network; Dynamically adjusting a masking strategy according to the importance scores to generate a masking edge set; And carrying out mask processing on the original document reference network according to the mask edge set to obtain a masked graph structure, carrying out node representation learning and link prediction on the masked graph structure, and optimizing model parameters by using a loss function.
6. The link prediction task oriented graph self-coding mask strategy optimization method of claim 5, wherein the importance scores are: Where s i denotes the importance score, min denotes the minimum value, D u denotes the frequency of reference u, and D v denotes the frequency of reference v.
7. The link prediction task oriented graph self-encoding masking policy optimization method of claim 5, wherein the method of generating the set of masking edges comprises: the cited edges are ordered in a descending order based on the importance scores, and the first K important edges are selected to form an important edge set Y; And (3) increasing probability by using the important side set Y to optimize training, wherein the mask probability set of all sides is as follows: Where P represents the set of mask probabilities for all edges, Representing initial probability values sampled from a uniform distribution before each round of training, β representing the probability of additional masks assigned by the high scoring edges, e i representing the ith edge in the graph structure; Performing reverse ordering on all edges according to probability values in the mask probability set from high to low to mask to obtain the mask edge set needing masking 。
8. The link prediction task oriented graph self-coding mask strategy optimization method of claim 5, wherein the method for optimizing model parameters comprises: Processing the original document referencing network based on the set of masking edges to construct a masked post-masking graph structure: Wherein, the The post-mask graph structure is represented, A collection of documents is represented and, Representing a set of reference relationships maintained after the masking operation; performing representation learning on document nodes in the masked graph structure through an encoder to obtain potential representations of documents, and then constructing feature representations of reference edges through combining connected document node representations; Based on the feature representation of the referenced edges, calculating the existence probability of each edge by using a decoder: Wherein, the Representing an initial representation of edge e v,u , K represents the number of layers of the graph neural network, Representing the node vector representation of the ith layer after document v is learned by the neural network encoder, A node vector representation representing the j-th layer after the document u has been learned by the encoder, Which represents the Hadamard product of the two, Representing the probability that references exist for document u and document v, and MLP represents the multi-layer perceptron; Based on the existence probability, constructing a reconstruction loss function: Wherein, the Representing the reconstruction of the loss function, Representing the positive sample reference relationship, Representing the negative-sample reference relationship, Representing the average reconstruction loss for a positive sample set, Representing the average reconstruction loss of the negative sample set; Model parameters are optimized based on the reconstruction loss function.

Description

System and method for optimizing graph self-coding mask strategy for link prediction task Technical Field The invention belongs to the technical field of graph self-encoders, and particularly relates to a graph self-encoding mask strategy optimization system and method for a link prediction task. Background In a literature citation network (also called citation network), a graph structure provides an intuitive and powerful representation method for an academic knowledge system. Nodes in the network represent academic documents (e.g., research papers, journal articles, or meeting reports, etc.), and each edge characterizes a citation relationship between the documents. Specifically, if document a refers to document B, there is a directed edge in the graph that points from node a to node B. Based on this structure, the goal of link prediction is to intelligently infer which unconnected document pairs may have potential referencing associations between them, using known referencing relationships. The technology has important application value in the fields of accurate document recommendation, academic knowledge discovery and the like. But such data acquisition costs are high and commonly face tag scarcity problems. In order to fully utilize massive unlabeled data, self-supervision graph pre-training becomes a core technology, wherein a generating method represented by a graph mask self-encoder is favored, a complex data enhancement design is not needed, a model is enabled to autonomously learn the structure and the characteristics of a graph through a mask reconstruction mechanism, and the idea also continues the successful experience of Bert in the natural language field and a visual field encoder. The masking strategy is used as the core design of the self-encoder, and directly determines the pre-training efficiency and the modeled representation quality. Two common masking methods include a random masking method and a random walk masking method. By the two masking methods, the graph self-encoder can enhance the understanding capability of graph data in the training process and obtain certain performance in the representation of learning tasks, but has the obvious limitations that the random masking is low in data efficiency and poor in representation quality due to indiscriminate masking, and the random walk masking is poor in global perception and generalization capability due to limited field of view and overlapped view. These problems make it difficult for the existing methods to meet the practical application requirements, so that a better masking strategy is needed to improve the performance of the graph self-encoder. Disclosure of Invention In the current graph self-encoder task, although the masking method promotes the learning of the graph structure and the characteristics by the model to a certain extent, the following defects still exist in practical application: (1) The importance of edges in the figure is not fully considered The existing masking methods fail to effectively distinguish the importance of the different edges in the graph. For example, the random masking method is one-view on all edges, whereas the random walk masking method focuses mainly on local topologies, but neither can explicitly identify and preferentially mask those critical edges. Such defects result in the model possibly focusing too much on secondary parts during training, and failing to adequately capture global key information of the graph, thereby affecting the quality of graph characterization learning. (2) Task inapplicability and difficulty dynamic adjustment deficiency Existing masking methods fail to follow a "simple to difficult" masking process, and this dynamic adjustment facilitates progressive learning of the model. However, current random masking and random walk masking methods generally mask edges randomly in a fixed proportion during training, and do not dynamically adjust according to task difficulty. For example, in the early stages of model training, task difficulty may be too high, masking is too complex, resulting in models that are difficult to learn effectively, and in the later stages, masking strategies may be too simple to provide sufficient challenges to affect the learning efficiency and final performance of the model. In order to solve the problems, the invention provides the following scheme: a graph self-coding mask strategy optimization system facing a link prediction task comprises a mask generation module, an adaptive mask strategy module and a graph self-coding module; the mask generation module is used for calculating importance scores of reference edges connecting two documents in the original document reference network; The self-adaptive mask strategy module is used for dynamically adjusting a mask strategy according to the importance scores to generate a mask edge set; the graph self-coding module is used for carrying out mask processing on the original document reference network a