CN-122020475-A - Graph anomaly detection method based on topological anomaly quantification

CN122020475ACN 122020475 ACN122020475 ACN 122020475ACN-122020475-A

Abstract

The invention discloses a graph anomaly detection method based on topological anomaly quantification, and belongs to the technical field of computer data processing. The method aims to solve the problems that the existing generation type pseudo-anomaly method is low in pseudo-sample quality and sensitive to pseudo-tag noise due to lack of anomaly metrics. The technical scheme is characterized in that firstly, comprehensive anomaly scores are calculated based on a local topological structure of nodes to quantify anomaly degrees, pseudo anomaly nodes are generated accordingly to construct an enhanced training graph, then, a graph neural network is utilized to conduct preliminary prediction and identify high-risk nodes, label optimization based on neighborhood category distribution is conducted on the high-risk nodes through a topological anomaly enhancement module, virtual center nodes are generated to construct enhanced connection, and finally, a graph training model based on the optimization is finally output. The method can reduce the dependence on the labeling data and improve the detection accuracy under the condition of scarce abnormal labels, and is suitable for financial wind control, network security and equipment failure prediction.

Inventors

CUI CAIXIA
Fan Yangrui

Assignees

太原师范学院

Dates

Publication Date: 20260512
Application Date: 20260202

Claims (10)

1. The graph anomaly detection method based on topological anomaly quantification is characterized by comprising the following steps of: s1, receiving input graph data comprising nodes and edges, wherein the input graph data comprises marked normal nodes; s2, aiming at the nodes in the input graph data, calculating the comprehensive anomaly score based on the local topological structure of the nodes; s3, generating pseudo-abnormal nodes according to the comprehensive abnormal score, and constructing an enhanced training diagram containing the marked normal nodes and the pseudo-abnormal nodes; s4, performing preliminary training on the enhanced training diagram by using a diagram neural network to obtain the predictive probability distribution of the nodes; S5, identifying high-risk nodes with high prediction uncertainty based on the prediction probability distribution; s6, performing label optimization based on neighborhood category distribution of the high-risk nodes, and generating a virtual center node for each category to construct virtual connection; And S7, training a final graph anomaly detection model based on the graph with the virtual connection optimized and added by the labels, and outputting anomaly scores of the nodes.
2. The method for detecting a graph anomaly based on topology anomaly quantification of claim 1, In S1, receiving input graph data includes constructing a K-hop neighborhood for each node to extract local sub-graph structure information.
3. The method for detecting a graph anomaly based on topology anomaly quantification of claim 1, In S2, calculating the composite anomaly score based on the local topology includes: calculating the boundary score of the node to measure the connection sparsity of the node and the normal node, calculating the proxy isolation score of the node to measure the isolation of the node on the structure, and carrying out weighted fusion on the boundary score and the proxy isolation score to obtain the comprehensive anomaly score.
4. The method for detecting a graph anomaly based on topology anomaly quantification of claim 1, In S3, generating the pseudo-abnormal nodes comprises sorting all the nodes according to the comprehensive abnormal score, selecting the nodes with the score higher than a preset threshold value, and marking the nodes as the pseudo-abnormal nodes.
5. The method for detecting a graph anomaly based on topology anomaly quantification of claim 1, In S5, identifying the high risk node includes: the method comprises the steps of calculating the prediction entropy of each node based on the prediction probability distribution to serve as uncertainty of the node, calculating the risk score of the node according to the uncertainty mean value of the prediction category to which the node belongs, and identifying the node with the risk score exceeding a threshold value as the high-risk node.
6. The method for detecting a graph anomaly based on topology anomaly quantification of claim 1, In S6, performing tag optimization includes: And if the support degree of the neighbor node to the opposite category is higher than the support degree of the current prediction category, turning the label of the high-risk node to the opposite category.
7. The method for detecting a graph anomaly based on topology anomaly quantification of claim 1, In S6, generating a virtual center node to construct a virtual connection includes: and establishing a connection edge between the node and the virtual center node according to the calculated connection probability based on whether the predicted category of the virtual center node is different from the category to which the virtual center node belongs.
8. The method for detecting a graph anomaly based on topology anomaly quantification of claim 1, In S7, training the final graph anomaly detection model includes constructing a joint loss function that combines classification loss and regularization loss for training the graph anomaly detection model.
9. An electronic device comprising a processor and a memory storing computer program instructions; The processor, when executing the computer program instructions, implements a topology anomaly quantification-based graph anomaly detection method as claimed in any one of claims 1-8.
10. A computer storage medium having stored thereon computer program instructions which, when executed by a processor, implement the topology anomaly quantification-based graph anomaly detection method of any one of claims 1-8.

Description

Graph anomaly detection method based on topological anomaly quantification Technical Field The invention belongs to the technical field of computer data processing, and particularly relates to a graph anomaly detection method based on topological anomaly quantification. Background Along with the rapid development of artificial intelligence and graph neural network technology, analysis based on graph structure data is widely applied to key fields such as financial wind control, network intrusion detection, equipment fault prediction and the like. The graph anomaly detection is taken as an important research direction, and aims to identify abnormal nodes which deviate from a normal mode in structure or attribute in the graph, so that the graph anomaly detection has a significant meaning for guaranteeing the safety and reliability of a system. At present, the technical methods in the field are mainly divided into an unsupervised method and a semi-supervised method. Unsupervised methods typically rely on the inherent structure of the graph to define anomalies, such as using a graph self-encoder or a contrast learning-based model, with the degree of anomalies being measured by node reconstruction errors or representation differences. Although the method does not depend on label information, due to the fact that the method is completely based on the structure and the characteristics, real semantic anomalies are difficult to effectively distinguish from rare but real normal node modes, ambiguity exists in detection results, and more false positives or false negatives can be generated in practical application. The semi-supervision method tries to use limited labeling data to classify nodes through the graph neural network so as to realize more targeted detection. These methods alleviate to some extent the problem of indistinguishing between false anomalies and true anomalies. However, in reality scenes such as financial wind control or equipment monitoring, the abnormal nodes are extremely rare, the cost for acquiring accurate and reliable abnormal labels is high, and the abnormal modes are often complex and various, so that the performance of the semi-supervision method is obviously reduced under the condition of insufficient labeling data. Meanwhile, unavoidable label errors or pseudo label noise in the labeling process can further interfere with the model training process, and the training stability and the final detection precision are damaged. To address the challenge of anomaly sample scarcity, partially generated graph anomaly detection methods are presented. The method aims at expanding the training set by synthesizing the pseudo-abnormal nodes so as to improve the discrimination capability of the model. The existing generating strategy mainly comprises a characteristic interpolation method and a noise disturbance method. The feature interpolation method generates samples by interpolation among normal node features, and although feature diversity can be increased, the generated samples are often too smooth and lack the boundances and outliers of the true abnormal samples, so that the discrimination boundaries of the model are difficult to effectively expand. The noise disturbance law simulates anomalies by adding random noise to the features or structures of normal nodes, assuming that the abnormal nodes deviate significantly from the normal nodes in the feature or structure space. However, because the disturbance process lacks quantitative measurement and directional guidance on the "degree of abnormality", the generation process has high randomness, and the generated pseudo-abnormal nodes are often not representative enough, so that a real and complex abnormal mode cannot be effectively simulated, and a model cannot obtain a high-quality learning signal from the model. In addition, both semi-supervised and generative methods are generally sensitive to pseudo-tag noise and disturbances of the graph structure itself. Many methods directly use inferred or synthesized pseudo tags for training, lack an evaluation and error correction mechanism for tag reliability, and the risk of mislabeling is particularly pronounced when borderline or isolated nodes exist in the graph. At the same time, the graph topology information itself may be locally perturbed by data acquisition errors or system dynamics, which makes discrimination based on the original, possibly noisy topology vulnerable. The lack of systematic label risk estimation and topology enhancement means can lead to continuous amplification deviation of false labels in the training process, form misleading model update, and finally possibly lead the model to be in the dilemma of slow training or convergence to suboptimal solution. Therefore, exploring a unified optimization strategy capable of improving the generation quality of the pseudo-abnormal sample and the robustness of the model to the labels and the structural noise simultaneously becomes a key for e