Search

CN-122023881-A - Graph level abnormality detection method and electronic equipment

CN122023881ACN 122023881 ACN122023881 ACN 122023881ACN-122023881-A

Abstract

The application relates to the technical field of graph machine learning, in particular to a graph level anomaly detection method and electronic equipment, wherein the method comprises the steps of obtaining a source domain graph data set and a target domain graph data set, wherein the source domain graph data set comprises a source domain attribute graph and label information of a partially marked source domain attribute graph, the target domain graph data set comprises a completely unmarked target domain attribute graph, the source domain attribute graph simultaneously comprises a normal graph and an anomaly graph, the target domain attribute graph comprises the anomaly graph, training a graph level anomaly detection model comprising a decoupling sample generation module and a prototype self-supervision module, decoupling the former from two-domain data to obtain a decoupling graph, carrying out self-supervision learning and cross-domain alignment on the sub-graph, and finally realizing graph level anomaly detection by using the trained model. Therefore, the problems that effective information and ineffective information are difficult to decouple, source domain knowledge is difficult to migrate to a target domain and the like in a graph level anomaly detection technology are solved.

Inventors

  • WAN HAI
  • NI ZHIBIN
  • ZHANG CHENGHAO
  • ZHAO XIBIN

Assignees

  • 清华大学

Dates

Publication Date
20260512
Application Date
20251231

Claims (10)

  1. 1. The method for detecting the graph level abnormality is characterized by comprising the following steps of: acquiring a source domain map data set and a target domain map data set, wherein the source domain map data set comprises a source domain attribute map and label information of a part of marked source domain attribute maps, the target domain map data set comprises a completely unmarked target domain attribute map, the source domain attribute map simultaneously comprises a normal map and an abnormal map, and the target domain attribute map comprises an abnormal map; Training a graph level anomaly detection model by using the source domain graph dataset and the target domain graph dataset, wherein the graph level anomaly detection model comprises a decoupling sample generation module and a prototype self-supervision module, and in the training process, performing decoupling processing on the source domain graph dataset and the target domain graph dataset by using the decoupling sample generation module to obtain a decoupling graph, and performing self-supervision learning and cross-domain alignment on the decoupling graph by using the prototype self-supervision module; And realizing graph level anomaly detection by using the trained graph level anomaly detection model.
  2. 2. The graph-level anomaly detection method of claim 1, wherein in a training process, the source domain graph dataset and the target domain graph dataset are input into a decoupled sample generation model, the decoupled sample generation model outputs a target loss for combining the tag signal and the target domain attribute graph, and the decoupled sample generation model forms the source domain dataset and the target domain dataset in the process of generating the combined sample; Inputting the target loss, the source domain data set and the target domain data set into the prototype self-supervision model, outputting training total loss by the prototype self-supervision model, and updating model parameters of the graph-level anomaly detection model according to the training total loss.
  3. 3. The graph-level anomaly detection method of claim 2, wherein the data processing flow of the decoupled sample generation model comprises: Obtaining an objective function of the decoupling sample generation model; Inputting the source domain map data set and the target domain map data set into the target function, and sequentially performing correlation processing, decoupling processing, compression processing, reconstruction processing and cross-domain processing on the target function; obtaining a correlation processing loss, a decoupling processing loss, a compression processing loss, a reconstruction processing loss and a cross-domain processing loss, and calculating the target loss according to the correlation processing loss, the decoupling processing loss, the compression processing loss, the reconstruction processing loss and the cross-domain processing loss.
  4. 4. The graph-level anomaly detection method of claim 3, wherein the expression of the target loss is: Wherein, the Representing the target loss, DSG representing the decoupled sample generation module, Representing the loss of processing of the correlation, Representing the loss of the decoupling process in question, Representing the loss of the compression process in question, Indicating a loss of consistency and, Representing the loss of the reconstruction process, Representing the loss of processing across the domain, The correlation is represented by a correlation value, Representing decoupling, MI representing compression, con representing consistency, recon representing reconstruction, cro representing cross-domain.
  5. 5. The graph-level anomaly detection method of claim 2, wherein the data processing flow of the prototype self-supervision module comprises generating an initial source domain prototype set and an initial target domain prototype set from the combined sample, generating a source domain union set from the initial source domain prototype set and the source domain data set, generating a target domain union set from the initial target domain prototype set and the target domain data set, and calculating the training total loss from the target loss, the source domain union set, and the target domain union set.
  6. 6. The graph-level anomaly detection method of claim 5, wherein the computing the training total loss from the target loss, the source domain merger and the target domain merger comprises: The training method comprises the steps of calculating a first similarity distribution vector according to a source domain merging set and a target domain feature, calculating a second similarity distribution vector according to the target domain merging set and the source domain feature, calculating prototype loss according to the minimum entropy of each of the first similarity distribution vector and the second similarity distribution vector, calculating prediction loss according to the source domain merging set and the target domain merging set, and calculating training total loss according to the target loss, the prototype loss and the prediction loss.
  7. 7. The graph-level anomaly detection method of claim 6, wherein alignment of source and destination domains is achieved using a concept of a prototype, the expression of prototype loss being: Wherein, the Representing the loss of the prototype in question, Representing the first similarity distribution vector, Representing the second similarity distribution vector, i representing an index, S representing a source domain, T representing a target domain, Represents the source domain feature to target domain union set, Representing the union of the target domain features to the source domain, H represents entropy, H () represents entropy calculation, The number of source domain samples is represented, Representing the number of target domain samples.
  8. 8. The graph-level anomaly detection method of claim 6, wherein mutual information is utilized to maximize association of labels and extracted key information, the expression of predictive loss is: Wherein, the Representing the prediction loss, H represents entropy, H () represents entropy calculation, Indicating the number of all unlabeled samples, Representing the predicted distribution of the model over the i-th unlabeled exemplar, i representing the index of the unlabeled exemplar, The atlas representing the ith unlabeled exemplar is embedded, A category label representing the i-th sample.
  9. 9. The graph-level anomaly detection method of claim 6, wherein the expression of the total loss is: Wherein, the Representing the total loss of the said means, In the representation of the loss of the target, Representing the loss of the prototype in question, Representing the said predicted loss of value of the said prediction, Representing the trade-off coefficient.
  10. 10. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor executing the program to implement the graph-level anomaly detection method of any one of claims 1-9.

Description

Graph level abnormality detection method and electronic equipment Technical Field The present application relates to the field of graph machine learning, and in particular, to a method for detecting a graph level abnormality and an electronic device. Background Graph Level Anomaly Detection (GLAD) aims at identifying anomaly graphs from a graph set. As an important direction of data mining, GLAD is widely used in the fields of financial networks, drug discovery, malware detection, and the like. Although many methods are proposed, most of the methods are developed in an unsupervised manner, so that compared with the supervised method, the detection precision is lower, and the manual labeling of the single-domain large-scale graph data is high in cost and difficulty in practice. Unsupervised domain adaptation provides a feasible path for GLAD by migrating fully labeled source domain knowledge to unlabeled target domain, and is widely focused in academia and industry, but in some practical applications, even providing large-scale labeling in source domain is often costly, and the few-sample unsupervised mode can effectively solve the problems existing in large-scale labeling. However, the few sample unsupervised approach in the GLAD context has the problems of label scarcity, domain offset, etc., which makes it difficult for graph-level anomaly detection techniques to decouple valid and invalid information and to migrate source domain knowledge to target domain weakness. Disclosure of Invention The application provides a graph level anomaly detection method and electronic equipment, which are used for solving the problem that effective and invalid information is difficult to decouple and source domain knowledge is difficult to migrate to a target domain in a graph level anomaly detection technology. The embodiment of the first aspect of the application provides a graph level anomaly detection method, which comprises the following steps of obtaining a source domain graph dataset and a target domain graph dataset, wherein the source domain graph dataset comprises a source domain attribute graph and label information of a partially marked source domain attribute graph, the target domain graph dataset comprises a completely unmarked target domain attribute graph, the source domain attribute graph simultaneously comprises a normal graph and an anomaly graph, the target domain attribute graph comprises an anomaly graph, training a graph level anomaly detection model by using the source domain graph dataset and the target domain graph dataset, the graph level anomaly detection model comprises a decoupling sample generation module and a prototype self-supervision module, in the training process, decoupling the source domain graph dataset and the target domain graph dataset by using the decoupling sample generation module to obtain a decoupling graph, self-supervision learning and cross-domain alignment of the decoupling graph by using the prototype self-supervision module, and realizing graph level anomaly detection by using the trained graph level anomaly detection model. According to one embodiment of the application, in the training process, a source domain map data set and a target domain map data set are input into a decoupling sample generation model, the decoupling sample generation model outputs a label signal and a target domain attribute map to carry out combined target loss, the decoupling sample generation model forms the source domain data set and the target domain data set in the process of generating a combined sample, the target loss, the source domain data set and the target domain data set are input into a prototype self-supervision model, the prototype self-supervision model outputs a training total loss, and model parameters of a map level anomaly detection model are updated according to the training total loss. According to one embodiment of the application, a data processing flow of a decoupled sample generation model includes: The method comprises the steps of obtaining a target function of a decoupling sample generation model, inputting a source domain diagram data set and a target domain diagram data set into the target function, sequentially carrying out correlation processing, decoupling processing, compression processing, reconstruction processing and cross-domain processing on the target function, obtaining correlation processing loss, decoupling processing loss, compression processing loss, reconstruction processing loss and cross-domain processing loss, and calculating the target loss according to the correlation processing loss, the decoupling processing loss, the compression processing loss, the reconstruction processing loss and the cross-domain processing loss. According to one embodiment of the application, the expression for the target loss is: Wherein, the Representing the target loss, DSG represents the decoupled sample generation module,Indicating a loss of correlation processi