CN-122020151-A - Graph data augmentation and graph neural network training method, device and equipment
Abstract
The embodiment of the specification discloses a method, a device and equipment for amplifying graph data and training a graph neural network. The augmentation scheme comprises the steps of enabling the graph data to comprise a plurality of nodes and edges among the nodes, determining designated nodes in the graph data and neighbor nodes of the designated nodes, selecting partial nodes in the neighbor nodes as nodes to be augmented, selecting nodes with the distance smaller than a preset threshold value from paths corresponding to the nodes to be augmented in the graph data as target nodes, deleting the edges between the nodes to be augmented and the designated nodes, and generating new edges between the target nodes and the designated nodes to generate augmented graph data.
Inventors
- HU BINBIN
- Bo Deyu
- ZHANG ZHIQIANG
- SHI CHUAN
- WANG XIAO
- ZHOU JUN
Assignees
- 支付宝(杭州)数字服务技术有限公司
- 北京邮电大学
Dates
- Publication Date
- 20260512
- Application Date
- 20220321
Claims (16)
- 1. A method of augmenting graph data, the graph data comprising a plurality of nodes and edges between the nodes, the method comprising: Determining a designated node in the graph data and neighbor nodes of the designated node; selecting partial nodes from the neighbor nodes as nodes to be amplified; Selecting a node with a distance smaller than a preset threshold value from the nodes to be amplified in the path corresponding to the nodes to be amplified in the graph data as a target node; Deleting the edges between the nodes to be augmented and the designated nodes, and generating new edges between the target nodes and the designated nodes to generate augmented graph data.
- 2. The method of claim 1, wherein selecting a part of nodes among the neighbor nodes as nodes to be amplified specifically comprises: determining the probability corresponding to each node through Bernoulli distribution in the neighbor nodes; and taking the node with the probability exceeding the preset probability as the node to be amplified.
- 3. The method of claim 1, wherein selecting, as the target node, a node having a distance from the node to be amplified that is smaller than a preset threshold on a path corresponding to the node to be amplified in the graph data, specifically includes: And selecting one node from the neighbor nodes of the node to be amplified as a target node.
- 4. A graph neural network training method corresponding to graph data, the graph data including a plurality of nodes and edges between the nodes, the method comprising: determining a designated node in the graph data and a neighbor node of the designated node, wherein the neighbor node of the designated node in the graph data is used as a first neighbor node; Adding new neighbor nodes to the designated node according to the augmentation of the first neighbor node to generate augmented graph data, wherein the neighbor nodes of the designated node in the augmented graph data are used as second neighbor nodes; Judging whether the designated node has a label or not; If the pseudo label does not exist, generating the pseudo label for the appointed node according to the prediction result of the second neighbor node; And according to the pseudo tag, monitoring a prediction result of the second neighbor node to train the graph neural network corresponding to the graph data.
- 5. The method of claim 4, wherein the adding new neighbor nodes to the designated node according to the augmentation of the first neighbor node to generate augmented graph data specifically comprises: Selecting partial nodes from the first neighbor nodes as nodes to be amplified; Selecting a node with a distance smaller than a preset threshold value from the nodes to be amplified in the path corresponding to the nodes to be amplified in the graph data as a target node; Deleting the edges between the nodes to be augmented and the designated nodes, and generating new edges between the target nodes and the designated nodes to generate augmented graph data.
- 6. The method of claim 4, wherein the generating a pseudo tag for the designated node according to the prediction result of the second neighboring node specifically includes: and determining an average term of the prediction results of the second neighbor nodes, and taking the average term as a pseudo tag of the designated node.
- 7. The method of claim 4, wherein the generating a pseudo tag for the designated node according to the prediction result of the second neighboring node specifically includes: Determining an average term of the prediction results of the second neighbor nodes, and taking the average term as an initial pseudo tag of the designated node; and performing entropy reduction processing on the initial pseudo tag, and taking the obtained low-entropy pseudo tag as the pseudo tag of the appointed node.
- 8. The method of claim 7, wherein the entropy reduction processing is performed on the initial pseudo tag, specifically comprising: determining a low-entropy control factor corresponding to the initial pseudo tag; And reprocessing the initial pseudo tag according to the low-entropy control factor to amplify the degree of difference between the outputs of the dimensions of the initial pseudo tag so as to obtain the low-entropy pseudo tag.
- 9. The method of claim 8, wherein the low entropy control factor has a value in the range of (0, 1); Reprocessing the initial pseudo tag according to the low entropy control factor to amplify the degree of difference between the outputs of the dimensions of the initial pseudo tag, thereby obtaining the low entropy pseudo tag, which specifically comprises: determining a reciprocal term of the low entropy control factor; Taking the reciprocal item as an index of the output of each dimension of the initial pseudo tag to obtain index items respectively corresponding to each dimension; For each dimension, determining the proportion of the index term of the dimension in the sum of the index terms of the dimension, and taking the proportion as the output of the dimension after reprocessing; and obtaining the low-entropy pseudo tag according to the reprocessed output of each dimension.
- 10. The method of claim 4, wherein the supervising the prediction result of the second neighbor node according to the pseudo tag specifically comprises: Traversing each of the designated nodes for which no label exists; respectively generating and summing divergence items between the prediction result and the corresponding pseudo labels for the prediction result of each second neighbor node of the traversed designated node to obtain the divergence item corresponding to the designated node; Determining a first loss term by summing and averaging the divergence terms corresponding to each designated node; and supervising the prediction result of the second neighbor node according to the first loss item.
- 11. The method of claim 4, after said determining whether the designated node has a label, the method further comprising: If so, determining a second loss item through cross entropy according to the prediction result of each designated node and the label thereof after generating the augmentation chart data; and according to the second loss item, monitoring the prediction result of the nodes in the augmentation graph data.
- 12. The method of claim 4, the method further comprising: Determining that the augmented graph data has been used for training of the graph neural network in a current iterative process; and re-amplifying the neighbor nodes of the designated node, and using the re-amplified graph data for dynamic training of the graph neural network in the next iteration process.
- 13. An augmentation apparatus of graph data, the graph data including a plurality of nodes and edges between the nodes, the apparatus comprising: the node determining module is used for determining a designated node in the graph data and neighbor nodes of the designated node; the node selection module selects partial nodes from the neighbor nodes as nodes to be amplified; The distance determining module is used for selecting a node with a distance smaller than a preset threshold value from the nodes to be amplified in the path corresponding to the nodes to be amplified in the graph data as a target node; And the edge reconstruction module deletes the edge between the node to be amplified and the designated node, and generates a new edge between the target node and the designated node so as to generate the data of the amplification diagram.
- 14. A graph neural network training device corresponding to graph data, the graph data including a plurality of nodes and edges between the nodes, the device comprising: The node determining module is used for determining a designated node in the graph data and a neighbor node of the designated node, wherein the neighbor node of the designated node in the graph data is used as a first neighbor node; The neighbor augmentation module is used for augmenting according to the first neighbor node, adding a new neighbor node for the designated node to generate augmented graph data, wherein the neighbor node of the designated node in the augmented graph data is used as a second neighbor node; The judging module is used for judging whether the designated node has a label or not; The pseudo tag generation module is used for generating a pseudo tag for the appointed node according to the prediction result of the second neighbor node if the judgment result of the judgment module is not present; and the supervision and training module supervises the prediction result of the second neighbor node according to the pseudo tag so as to train the graph neural network corresponding to the graph data.
- 15. An augmentation apparatus of graph data including a plurality of nodes and edges between the nodes, the apparatus comprising: at least one processor, and A memory communicatively coupled to the at least one processor, wherein, The memory stores instructions executable by the at least one processor, the instructions are executable by the at least one processor to enable the at least one processor to: Determining a designated node in the graph data and neighbor nodes of the designated node; selecting partial nodes from the neighbor nodes as nodes to be amplified; Selecting a node with a distance smaller than a preset threshold value from the nodes to be amplified in the path corresponding to the nodes to be amplified in the graph data as a target node; Deleting the edges between the nodes to be augmented and the designated nodes, and generating new edges between the target nodes and the designated nodes to generate augmented graph data.
- 16. A graph neural network training device corresponding to graph data, the graph data including a plurality of nodes and edges between the nodes, the device comprising: at least one processor, and A memory communicatively coupled to the at least one processor, wherein, The memory stores instructions executable by the at least one processor, the instructions are executable by the at least one processor to enable the at least one processor to: determining a designated node in the graph data and a neighbor node of the designated node, wherein the neighbor node of the designated node in the graph data is used as a first neighbor node; Adding new neighbor nodes to the designated node according to the augmentation of the first neighbor node to generate augmented graph data, wherein the neighbor nodes of the designated node in the augmented graph data are used as second neighbor nodes; Judging whether the designated node has a label or not; If the pseudo label does not exist, generating the pseudo label for the appointed node according to the prediction result of the second neighbor node; And according to the pseudo tag, monitoring a prediction result of the second neighbor node to train the graph neural network corresponding to the graph data.
Description
Graph data augmentation and graph neural network training method, device and equipment The application relates to a divisional application of an application patent application with the application number of 202210277845.X, named as 'augmentation of image data, image neural network training method, device and equipment' which is filed on the 21 st 3 rd year 2022. Technical Field The present disclosure relates to the field of artificial intelligence, and in particular, to a method, an apparatus, and a device for graph data augmentation and graph neural network training. Background With the development of computer and internet technologies, artificial intelligence is also applied to various fields, greatly improving the working efficiency. The graph neural network (Graph Neural Networks, GNN) is a connectionist model that captures the dependency relationships in the graph by information transfer between nodes in the graph, which has excellent performance in processing unstructured data. Traditional graph neural networks are prominent in supervised learning, but are not fully satisfactory in semi-supervised learning, such as over-fitting problems. Based on this, a more accurate and generalizable neural network solution for semi-supervised learning is needed. Disclosure of Invention One or more embodiments of the present disclosure provide a method, apparatus, device, and storage medium for graph data augmentation and graph neural network training to solve the technical problem that a graph neural network scheme with more accuracy and better generalization capability for semi-supervised learning is required. To solve the above technical problems, one or more embodiments of the present specification are implemented as follows: One or more embodiments of the present specification provide an augmentation method of graph data including a plurality of nodes and edges between the nodes, the method comprising: Determining a designated node in the graph data and neighbor nodes of the designated node; selecting partial nodes from the neighbor nodes as nodes to be amplified; Selecting a node with a distance smaller than a preset threshold value from the nodes to be amplified in the path corresponding to the nodes to be amplified in the graph data as a target node; Deleting the edges between the nodes to be augmented and the designated nodes, and generating new edges between the target nodes and the designated nodes to generate augmented graph data. According to one or more embodiments of the present disclosure, a graph neural network training method corresponding to graph data, where the graph data includes a plurality of nodes and edges between the nodes, includes: determining a designated node in the graph data and a neighbor node of the designated node, wherein the neighbor node of the designated node in the graph data is used as a first neighbor node; Adding new neighbor nodes to the designated node according to the augmentation of the first neighbor node to generate augmented graph data, wherein the neighbor nodes of the designated node in the augmented graph data are used as second neighbor nodes; Judging whether the designated node has a label or not; If the pseudo label does not exist, generating the pseudo label for the appointed node according to the prediction result of the second neighbor node; And according to the pseudo tag, monitoring a prediction result of the second neighbor node to train the graph neural network corresponding to the graph data. An augmentation apparatus of graph data provided by one or more embodiments of the present specification, the graph data including a plurality of nodes and edges between the nodes, the apparatus comprising: the node determining module is used for determining a designated node in the graph data and neighbor nodes of the designated node; the node selection module selects partial nodes from the neighbor nodes as nodes to be amplified; The distance determining module is used for selecting a node with a distance smaller than a preset threshold value from the nodes to be amplified in the path corresponding to the nodes to be amplified in the graph data as a target node; And the edge reconstruction module deletes the edge between the node to be amplified and the designated node, and generates a new edge between the target node and the designated node so as to generate the data of the amplification diagram. A graph neural network training apparatus corresponding to graph data provided in one or more embodiments of the present disclosure, where the graph data includes a plurality of nodes and edges between the nodes, the apparatus includes: The node determining module is used for determining a designated node in the graph data and a neighbor node of the designated node, wherein the neighbor node of the designated node in the graph data is used as a first neighbor node; The neighbor augmentation module is used for augmenting according to the first neighbor node, adding a ne