CN-122023931-A - Graph data anomaly detection method, system, equipment and medium

CN122023931ACN 122023931 ACN122023931 ACN 122023931ACN-122023931-A

Abstract

The invention provides a graph data anomaly detection method, a system, equipment and a medium, belonging to the anomaly detection field, wherein the method comprises the following steps of inputting an anomaly graph to be purified and a reference anomaly graph into an anomaly detection model at the same time, and training the anomaly detection model; the method comprises the steps of obtaining a training abnormal detection model, obtaining a node comparison score, calculating the edge interference score of the abnormal image to be purified through the node comparison score, sequencing the edge interference score, identifying the first K sequenced edges as interference edges, deleting the interference edges to obtain a rough clean image, inputting the rough clean image and a reference abnormal image into the abnormal detection model at the same time, and performing repeated iterative operation on the training process of the abnormal detection model, the node comparison score process and the deleting process of the interference edges until the number of the deleted interference edges reaches a set threshold value to obtain a target clean image.

Inventors

JIN DI
CAO JINGYI
WANG XIAOBAO
Feng Bingdao
HE DONGXIAO
WANG ZHEN

Assignees

天津大学

Dates

Publication Date: 20260512
Application Date: 20260211
Priority Date: 20250213

Claims (8)

1. The method for detecting the abnormality of the graph data is characterized by comprising the following steps: obtaining an abnormal graph to be purified and a reference abnormal graph of academic paper citation relation graph data, wherein each node of the academic paper citation relation graph data represents a paper, edges represent citation relations among the papers, the citation relations of the abnormal graph have abnormal citation relations, and the nodes of the abnormal graph have abnormal paper characteristics; The method comprises the steps of inputting an anomaly graph to be purified and a reference anomaly graph into an anomaly detection model at the same time, and training the anomaly detection model, namely, randomly walking on the anomaly graph to be purified and the reference anomaly graph by using the anomaly detection model, respectively extracting node subgraphs from the anomaly graph to be purified and the reference anomaly graph, forming a first positive example pair by the node subgraphs and the node subgraphs of the node subgraphs to form a first negative example pair, forming a second positive example pair by the node representations updated in the node and the subgraphs and the node representations updated in other subgraphs, and forming a second negative example pair by the node representations updated in the node and the other subgraphs; Obtaining a difference value between the similarity score of the first negative example pair and the similarity score of the first positive example pair of the reference abnormal graph and a difference value between the similarity score of the second negative example pair and the similarity score of the second positive example pair by using the trained abnormal detection model, and calculating a node comparison score according to the difference value; calculating the edge interference score of the abnormal graph to be purified through the node comparison score, sequencing the edge interference score, recognizing the front K sequenced edges as interference edges, and deleting the interference edges to obtain a rough clean graph; And simultaneously inputting the rough clean graph and the reference abnormal graph into an abnormal detection model, and performing repeated iterative operation on the training process of the abnormal detection model, the node comparison score calculation process and the interference edge deletion process until the number of the deleted interference edges reaches a set threshold value to obtain a clean graph of the reference relation of the target academic paper.
2. The graph data anomaly detection method according to claim 1, wherein the trained anomaly detection model is used for obtaining a difference value between a similarity score of a first negative example pair and a similarity score of a first positive example pair of a reference anomaly graph and a difference value between a similarity score of a second negative example pair and a similarity score of a second positive example pair, calculating a node contrast score according to the difference value, specifically determining an anomaly score of a subgraph and a node contrast scale according to the difference value between the similarity score of the first negative example pair and the similarity score of the first positive example pair of the reference anomaly graph, determining an anomaly score of a node and a node contrast scale according to the difference value between the similarity score of the second negative example pair and the similarity score of the second positive example pair of the reference anomaly graph, and calculating a node contrast score according to the anomaly score of the subgraph and the node contrast scale and the anomaly score of the node contrast scale.
3. The graph data anomaly detection method according to claim 2, wherein the node comparison score is calculated by the anomaly score of the subgraph and the node comparison scale and the anomaly score of the node and the node comparison scale, specifically, the anomaly score of the subgraph and the node comparison scale and the anomaly score of the node and the node comparison scale are weighted and summed to obtain the node comparison score.
4. The graph data anomaly detection method of claim 1, wherein the node comparison score is obtained by the following formula: , In the formula, Representing the contrast score for the node of round r, The mean of the contrast scores of the nodes is represented, and R represents the total round.
5. The graph data anomaly detection method according to claim 1, wherein the anomaly graph to be cleaned and the reference anomaly graph are simultaneously input into an anomaly detection model, the anomaly detection model is trained, and the loss function used is: , In the formula, Representing the loss of contrast of nodes and subgraphs in the reference anomaly graph, Representing the loss of contrast of nodes and subgraphs in the rough clean graph, Representing the node-to-node contrast loss in the baseline anomaly graph, Representing the contrast loss of nodes in the rough clean graph, and alpha and beta respectively represent balance parameters of different views and different scales.
6. A graph data processing system, comprising: The model training module is used for acquiring an anomaly graph to be purified and a reference anomaly graph of academic paper citation relation graph data; each node of the academic paper citation relation graph data represents a paper, edges represent citation relations among the papers, the citation relations of the abnormal graph have abnormal citation relations, and the nodes of the abnormal graph have abnormal paper characteristics; the method comprises the steps of inputting an anomaly graph to be purified and a reference anomaly graph into an anomaly detection model at the same time, and training the anomaly detection model, namely, randomly walking on the anomaly graph to be purified and the reference anomaly graph by using the anomaly detection model, respectively extracting node subgraphs from the anomaly graph to be purified and the reference anomaly graph, forming a first positive example pair by the node subgraphs and the node subgraphs of the node subgraphs to form a first negative example pair, forming a second positive example pair by the node representations updated in the node and the subgraphs and the node representations updated in other subgraphs, and forming a second negative example pair by the node representations updated in the node and the other subgraphs; The node comparison score acquisition module is used for acquiring a difference value between the similarity score of the first negative example pair and the similarity score of the first positive example pair of the reference anomaly graph and a difference value between the similarity score of the second negative example pair and the similarity score of the second positive example pair by using the trained anomaly detection model, and calculating a node comparison score according to the difference value; The interference edge deleting module is used for calculating edge interference scores of the abnormal images to be purified through node comparison scores, sequencing the edge interference scores, recognizing the first K sequenced edges as interference edges, and deleting the interference edges to obtain a rough clean image; and the target clean image acquisition module is used for inputting the rough clean image and the reference abnormal image into an abnormal detection model at the same time, and performing repeated iterative operation on the training process of the abnormal detection model, the node comparison score calculation process and the interference edge deletion process until the number of the deleted interference edges reaches a set threshold value, so as to obtain the clean image of the target academic paper citation relation.
7. A computer device comprising a memory storing a computer program and a processor for running the computer program in the memory to perform the graph data anomaly detection method of any one of claims 1-5.
8. A computer readable storage medium, characterized in that the computer readable storage medium stores a computer program adapted to be loaded by a processor for executing the graph data anomaly detection method of any one of claims 1 to 5.

Description

Graph data anomaly detection method, system, equipment and medium Technical Field The invention belongs to the field of anomaly detection, and particularly relates to a method, a system, equipment and a medium for detecting graph data anomalies. Background With the rapid growth of the number of academic papers, the quotation relationship of the papers has become an important basis for measuring academic influence, describing knowledge propagation processes and supporting academic evaluation and recommendation systems. However, the referencing behavior is not always driven entirely by academic dependencies, and unnatural or strategic referencing behavior may exist in reality, creating unusual structural patterns in the referencing network. Such abnormal citation relationships distort objective reflection of paper influence, and further influence key decisions such as academic evaluation, journal and institution evaluation, scientific research resource allocation and the like. Therefore, the abnormal detection of the thesis citation relation has important significance for maintaining fairness, credibility and healthy development of academic ecology. The graph is a basic data structure consisting of nodes and edges, and plays a vital role in representing relationships of different disciplines, such as recommendation systems, social network analysis, and financial risk assessment. In graph analysis, graph Anomaly Detection (GAD) has become a critical area of research aimed at identifying patterns that differ significantly from most cases, and detection of such anomalies reveals potential irregularities in the data, facilitating active intervention, and thus protecting data integrity. This capability has profound applications, particularly in the fields of fraud detection, discovery of brain pathology mechanisms, network intrusion prevention, and the like. Early methods employed shallow mechanisms such as self-network analysis, residual analysis and CUR decomposition to detect anomalies. However, these methods cannot capture the complex relationships inherent in the graph data, which limits their ability to detect complex anomalies. With the advent of deep learning, some studies have utilized Graph Neural Networks (GNNs) to reconstruct structural and node features, which use reconstruction errors as the basis for anomaly identification, although these approaches have advanced, they require explosive memory resources. In addition, the convolution operation in GNNs may smooth the anomaly signal, reducing the uniqueness of the anomaly node, thereby affecting the detection accuracy. More recently, researchers have employed contrast learning for anomaly detection by sub-sampling with a restart Random Walk (RWR), generating positive instance pairs between nodes and their local sub-graphs, and generating negative instance pairs from different sub-graphs, evaluating the degree of anomalies of nodes by examining the similarity differences between the positive and negative instance pairs. However, in the existing academic paper citation relation graph, due to the existence of the interference side of the abnormal citation relation, the accuracy of graph abnormality detection is reduced, and the model is difficult to accurately identify the abnormal paper citation relation and the corresponding abnormal paper, so that the reliability of an academic abnormality detection result is affected. Disclosure of Invention In order to overcome the defects in the prior art, the invention provides a graph data anomaly detection method, which comprises the following steps: obtaining an abnormal graph to be purified and a reference abnormal graph of academic paper citation relation graph data, wherein each node of the academic paper citation relation graph data represents a paper, edges represent citation relations among the papers, the citation relations of the abnormal graph have abnormal citation relations, and the nodes of the abnormal graph have abnormal paper characteristics; The method comprises the steps of inputting an anomaly graph to be purified and a reference anomaly graph into an anomaly detection model at the same time, and training the anomaly detection model, namely, randomly walking on the anomaly graph to be purified and the reference anomaly graph by using the anomaly detection model, respectively extracting node subgraphs from the anomaly graph to be purified and the reference anomaly graph, forming a first positive example pair by the node subgraphs and the node subgraphs of the node subgraphs to form a first negative example pair, forming a second positive example pair by the node representations updated in the node and the subgraphs and the node representations updated in other subgraphs, and forming a second negative example pair by the node representations updated in the node and the other subgraphs; Obtaining a difference value between the similarity score of the first negative example pair and the similarity