CN-119939320-B - Open set cross-network node classification method and device

CN119939320BCN 119939320 BCN119939320 BCN 119939320BCN-119939320-B

Abstract

The application discloses a method and a device for classifying open set cross-network nodes, which relate to the field of machine learning, and design a framework for separating first and then domain adaptation, firstly, a rough boundary is constructed to separate an unknown class and a known class through countermeasure learning, then pseudo labels are distributed to iteratively train a model in a self-training mode, a more accurate boundary is gradually generated for separating the known class and the unknown class, secondly, in the domain adaptation stage, negative domain adaptation coefficients are distributed to the nodes of the unknown class, positive domain adaptation coefficients are distributed to the nodes of the known class, so that the nodes of the known class of a target network are aligned with a source network, and the nodes of the unknown class in the target network are pushed away from the source network, thereby realizing countermeasure domain alignment for eliminating the unknown class, and further realizing classification of the open set cross-network nodes with higher accuracy.

Inventors

SHEN XIAO
CHEN ZHIHAO
ZHOU XI

Assignees

海南大学

Dates

Publication Date: 20260508
Application Date: 20250127

Claims (10)

1. A method for classifying an open set of cross-network nodes, the method comprising: Acquiring node embedding according to the attribute and the adjacency matrix of the nodes in the target network by using a graph neural network encoder; obtaining the classification prediction probability of the node according to the node embedding by using a neighborhood aggregation node classifier; In a separation stage, performing countermeasure training on the graph neural network encoder and the neighborhood aggregation node classifier, performing rough separation on a known class and an unknown class, and marking classification labels of the known class or the unknown class on the nodes; in a domain adaptation stage, clustering nodes in a target network based on nodes with known categories in a source network and the nodes with unknown categories in the target network, and marking the nodes with clustering labels of the known categories or the unknown categories; assigning pseudo labels to the nodes based on the classification labels and the clustering labels, and iteratively training the graph neural network encoder and the neighborhood aggregation node classifier in a self-training mode according to the pseudo labels; Based on the trained graph neural network encoder and the neighborhood aggregation node classifier, the nodes in the target network are classified; The source network and the target network are different quotation data sets, each node represents a paper, each side represents a quotation relation between the papers, each node is provided with an attribute vector and a classification label, the attribute vector is a keyword extracted from the heading of the paper, and the classification label is a research field of the corresponding paper of the node.
2. The method of claim 1, wherein the neighborhood aggregation node classifier is constructed using a single layer multi-headed attention network, and wherein the neighborhood aggregation node classifier has an output dimension 1 greater than the number of known categories in the source network.
3. The method of claim 1, wherein said training the graph neural network encoder and the neighborhood aggregation node classifier against comprises: training the neighborhood aggregation node classifier to enable the classification prediction probability of each node belonging to an unknown class in the target network to approach a fixed threshold value between 0 and 1, and training the graph neural network encoder to maximize the error rate of the neighborhood aggregation node classifier.
4. The method of claim 1, wherein the clustering the nodes in the target network based on the nodes in the source network of known categories and the nodes in the target network of unknown categories for which the classification labels are unknown comprises: clustering nodes in the target network according to the node embedding to obtain a plurality of clusters, wherein the clusters comprise a plurality of clusters corresponding to known categories in the source network one by one and a cluster corresponding to an unknown category; Wherein an initial centroid of the cluster corresponding to an unknown class is determined by a set of unknown class nodes, the nodes in the set of unknown class nodes selected from the nodes in the target network according to the classification prediction probabilities of the unknown classes.
5. The method of claim 1, wherein the assigning pseudo tags to the nodes based on the classification tags and the cluster tags comprises: and when the classification label and the clustering label of the node are consistent, the pseudo label is allocated to the node.
6. The method according to any one of claims 1 to 4, further comprising: And based on the pseudo tag, assigning a negative domain adaptation coefficient to the node of the unknown class, and assigning a positive domain adaptation coefficient to the node of the known class.
7. An open set cross-network node classification apparatus, the apparatus comprising: The graph neural network encoder is used for acquiring node embedding according to the attribute and the adjacency matrix of the nodes in the target network; the neighborhood aggregation node classifier is used for acquiring the classification prediction probability of the node according to the node embedding; the system comprises a neighborhood aggregation node classifier, a graph neural network encoder, a domain matching module, a clustering module, a pseudo-label and a pseudo-label, wherein the graph neural network encoder and the neighborhood aggregation node classifier are subjected to countermeasure training in a separation stage, the known class and the unknown class are subjected to rough separation, and the nodes are marked with classification labels of the known class or the unknown class; the output module is used for classifying the nodes in the target network based on the trained graphic neural network encoder and the neighborhood aggregation node classifier; The source network and the target network are different quotation data sets, each node represents a paper, each side represents a quotation relation between the papers, each node is provided with an attribute vector and a classification label, the attribute vector is a keyword extracted from the heading of the paper, and the classification label is a research field of the corresponding paper of the node.
8. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any of claims 1 to 6 when the computer program is executed.
9. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 6.
10. A computer program product comprising a computer program, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 6.

Description

Open set cross-network node classification method and device Technical Field The application relates to the technical field of machine learning, in particular to a method and a device for classifying open set cross-network nodes. Background In graph data analysis, labels are often expensive, limited, or even unusable. Classification across network nodes has received a great deal of attention in the field of graph machine learning in recent years, with the objective of migrating knowledge learned from one source network with rich node labels to predict node labels in another target network that lacks labels. Existing cross-network node classification methods are designed almost for closed set assumptions, i.e., requiring that the source and target networks have exactly the same class label space. However, in real world practice, the target network may contain new categories that have not been present in the source network. For example, in a cross-platform online social network user interest prediction scenario, users in a newly formed target social network may contain new interest categories that do not appear in the mature source social network. As shown in fig. 1, in the open set cross-network node classification problem, the target network contains not only all known categories in the source network, but also "unknown" categories that do not occur in the source network. The purpose of the open set cross-network node classification problem is to 1 classify nodes belonging to a known class of a source network in a target network into corresponding known classes, and 2 detect nodes belonging to an unknown class in the target network. The problem of the above-mentioned open set cross-network node classification is solved, and there are two major challenges: 1. since the target network is completely unlabeled, we cannot know which nodes in the target network belong to the known class that the source network has appeared and which nodes belong to the new "unknown" class in the target network. Therefore, how to construct a boundary to separate nodes belonging to a known class and nodes belonging to an "unknown" class in a target network is a great challenge in solving the problem of open set cross-network node classification; 2. The distribution differences between different networks may prevent the model trained on the source network from being applied directly to the target network. In the problem of classification of the open set cross-network node, as the target network has an unknown class which does not appear in the source network, if the distribution of the source network and the target network is directly aligned as in the previous method of classification of the closed set cross-network node, the problem exists that the distribution of the unknown class of the target network is aligned with the distribution of the known class of the source network, so that negative migration is caused, and the difficulty of identifying the node of the unknown class of the target network is increased. Thus, how to align the distribution of the target network with the distribution of the source network, excluding unknown classes of the target network, is another challenge in solving the open set cross-network node classification problem. Disclosure of Invention Based on the above, it is necessary to provide a method and an apparatus for classifying open set cross-network nodes, which are good in the problem of classifying open set cross-network nodes. In a first aspect, the present application provides a method for classifying an open set cross-network node. The method comprises the following steps: Acquiring node embedding according to the attribute and the adjacency matrix of the nodes in the target network by using a graph neural network encoder; Obtaining the classification prediction probability of the nodes according to node embedding by using a neighborhood aggregation node classifier; in the separation stage, performing countermeasure training on a graph neural network encoder and a neighborhood aggregation node classifier, performing coarse separation on a known class and an unknown class, and marking classification labels of the known class or the unknown class on nodes; In the domain adaptation stage, based on the nodes of known class in the source network and the nodes of unknown class in the target network, clustering the nodes in the target network, and marking the nodes with the clustering labels of the known class or the unknown class; And classifying nodes in the target network based on the trained graph neural network encoder and the neighborhood aggregation node classifier. In one embodiment, the neighborhood aggregation node classifier is constructed using a single-layer multi-headed attention network, and the neighborhood aggregation node classifier has an output dimension 1 greater than the number of known categories in the source network. In one embodiment, performing countermeasure training on the