US-20260127409-A1 - PARTITIONED TRAINING VIA ONE-HOP HISTORICAL GRADIENTS

US20260127409A1US 20260127409 A1US20260127409 A1US 20260127409A1US-20260127409-A1

Abstract

Systems/techniques that facilitate partitioned training via one-hop historical gradients are provided. In various embodiments, a system can access a graph. In various aspects, the system can train a graph neural network on partitions of the graph, based on historical gradients of partition-wise training losses with respect to one-hop node embeddings.

Inventors

Thanh Lam Hoang
Marcos Martínez Galindo
Marco Luca Sbodio
Raúl Fernández Díaz
Mykhaylo Zayats
Rodrigo Hernan Ordonez-Hurtado
Vanessa Lopez Garcia

Assignees

INTERNATIONAL BUSINESS MACHINES CORPORATION

Dates

Publication Date: 20260507
Application Date: 20241106

Claims (20)

1 . A system, comprising: a processor that executes computer-executable components stored in a non-transitory computer-readable memory, the computer-executable components comprising: an access component that accesses a graph; and a training component that trains a graph neural network on partitions of the graph, based on historical gradients of partition-wise training losses with respect to one-hop node embeddings.
2 . The system of claim 1 , wherein a first partition of the graph and a second partition of the graph have a one-hop node, wherein the one-hop node corresponds to an historical gradient value and to an historical gradient count, wherein the historical gradient value is initially zero, and wherein the historical gradient count is initially zero.
3 . The system of claim 2 , wherein, while training the graph neural network on the first partition, the training component: adds to the historical gradient value of the one-hop node a gradient of a first loss of the first partition with respect to a first embedding of the one-hop node; and increments the historical gradient count.
4 . The system of claim 3 , wherein, while training the graph neural network on the second partition, the training component: adds to the historical gradient value a gradient of a second loss of the second partition with respect to a second embedding of the one-hop node; and increments the historical gradient count.
5 . The system of claim 4 , wherein, while training the graph neural network on a third partition to which the one-hop node belongs, the training component: updates the graph neural network based on a third loss of the third partition and based on a product between the historical gradient value and a reciprocal of the historical gradient count.
6 . The system of claim 5 , wherein the training component: resets the historical gradient value to zero and the historical gradient count to zero, in response to updating the graph neural network based on the third loss.
7 . The system of claim 1 , wherein the partitions of the graph are disjoint.
8 . The system of claim 1 , wherein the partitions of the graph are overlapping.
9 . The system of claim 1 , wherein the computer-executable components comprise: an execution component that executes, post-training, the graph neural network on another graph, thereby yielding an inferencing task result for the another graph.
10 . A computer-implemented method, comprising: accessing, by a device operatively coupled to a processor, a graph; and training, by the device, a graph neural network on partitions of the graph, based on historical gradients of partition-wise training losses with respect to one-hop node embeddings.
11 . The computer-implemented method of claim 10 , wherein a first partition of the graph and a second partition of the graph have a one-hop node, wherein the one-hop node corresponds to an historical gradient value and to an historical gradient count, wherein the historical gradient value is initially zero, and wherein the historical gradient count is initially zero.
12 . The computer-implemented method of claim 11 , wherein, while training the graph neural network on the first partition, the device: adds to the historical gradient value of the one-hop node a gradient of a first loss of the first partition with respect to a first embedding of the one-hop node; and increments the historical gradient count.
13 . The computer-implemented method of claim 12 , wherein, while training the graph neural network on the second partition, the device: adds to the historical gradient value a gradient of a second loss of the second partition with respect to a second embedding of the one-hop node; and increments the historical gradient count.
14 . The computer-implemented method of claim 13 , wherein, while training the graph neural network on a third partition to which the one-hop node belongs, the device: updates the graph neural network based on a third loss of the third partition and based on a product between the historical gradient value and a reciprocal of the historical gradient count.
15 . The computer-implemented method of claim 14 , wherein the device: resets the historical gradient value to zero and the historical gradient count to zero, in response to updating the graph neural network based on the third loss.
16 . The computer-implemented method of claim 10 , wherein the partitions of the graph are disjoint.
17 . The computer-implemented method of claim 10 , wherein the partitions of the graph are overlapping.
18 . The computer-implemented method of claim 10 , further comprising: executing, by the device and post-training, the graph neural network on another graph, thereby yielding an inferencing task result for the another graph.
19 . A computer program product for facilitating partitioned training via one-hop historical gradients, the computer program product comprising a non-transitory computer-readable memory having program instructions embodied therewith, the program instructions executable by a processor to cause the processor to: access a graph; and train a graph neural network on partitions of the graph, based on historical gradients of partition-wise training losses with respect to one-hop node embeddings or based on gradients of in-partition node embeddings with respect to learnable parameters of the graph neural network.
20 . The computer program product of claim 19 , wherein the program instructions are further executable to cause the processor to: execute, post-training, the graph neural network on another graph, thereby yielding an inferencing task result for the another graph.

Description

BACKGROUND The subject disclosure relates to training of graph neural networks. SUMMARY The following presents a summary to provide a basic understanding of one or more embodiments. This summary is not intended to identify key or critical elements, or delineate any scope of the particular embodiments or any scope of the claims. Its sole purpose is to present concepts in a simplified form as a prelude to the more detailed description that is presented later. In one or more embodiments described herein, systems, computer-implemented methods, computer program products, or apparatuses that can facilitate training via one-hop historical gradients are described. According to one or more embodiments, a system is provided. In various aspects, the system can comprise a processor that can execute computer-executable components stored in a non-transitory computer-readable memory. In various instances, the computer-executable components can comprise an access component that can access a graph. In various cases, the computer-executable components can comprise a training component that can train a graph neural network on partitions of the graph, based on historical gradients of partition-wise training losses with respect to one-hop node embeddings. In various aspects, a first partition of the graph and a second partition of the graph can have a one-hop node, the one-hop node can correspond to an historical gradient value and to an historical gradient count, the historical gradient value can be initially zero, and the historical gradient count can be initially zero. In various instances, while training the graph neural network on the first partition, the training component can add to the historical gradient value of the one-hop node a gradient of a first loss of the first partition with respect to a first embedding of the one-hop node; and can increment the historical gradient count. In various cases, while training the graph neural network on the second partition, the training component can add to the historical gradient value a gradient of a second loss of the second partition with respect to a second embedding of the one-hop node; and can increment the historical gradient count. In various aspects, while training the graph neural network on a third partition to which the one-hop node belongs, the training component can update the graph neural network based on a third loss of the third partition and based on a product between the historical gradient value and a reciprocal of the historical gradient count. In various instances, such training can cause the graph neural network to learn more quickly (e.g., using fewer training epochs) than it could otherwise learn. In various aspects, the above-described system can be reformulated, reformatted, or otherwise implemented as a computer-implemented method or as a computer program product. DESCRIPTION OF THE DRAWINGS FIG. 1 illustrates a block diagram of an example, non-limiting system that facilitates partitioned training via one-hop historical gradients in accordance with one or more embodiments described herein. FIG. 2 illustrates a block diagram of an example, non-limiting system including a plurality of partitions, a one-hop historical gradient value registry, and a one-hop historical gradient count registry that facilitates partitioned training via one-hop historical gradients in accordance with one or more embodiments described herein. FIG. 3 illustrates an example, non-limiting block diagram showing how a graph can be broken into a plurality of partitions in accordance with one or more embodiments described herein. FIG. 4 illustrates an example, non-limiting block diagram of a one-hop historical gradient value registry and a one-hop historical gradient count registry in accordance with one or more embodiments described herein. FIGS. 5-9 illustrate example, non-limiting block diagrams showing how a graph neural network can be trained on a partition in accordance with one or more embodiments described herein. FIGS. 10-13 illustrate example, non-limiting block diagrams showing how a one-hop historical gradient value and a one-hop historical gradient count of a particular node can change during training in accordance with one or more embodiments described herein. FIGS. 14-17 illustrate flow diagrams of example, non-limiting computer-implemented methods that facilitate partitioned training via one-hop historical gradients in accordance with one or more embodiments described herein. FIGS. 18-20 illustrate example, non-limiting experimental results in accordance with one or more embodiments described herein. FIG. 21 illustrates a flow diagram of an example, non-limiting computer-implemented method that facilitates partitioned training via one-hop historical gradients in accordance with one or more embodiments described herein. FIG. 22 illustrates a block diagram of an example, non-limiting operating environment in which one or more embodiments described herein can be facilitat