Search

CN-121980469-A - Knowledge graph analysis data governance anomaly detection and recognition method

CN121980469ACN 121980469 ACN121980469 ACN 121980469ACN-121980469-A

Abstract

The invention relates to the technical field of data processing, in particular to a data governance anomaly detection and identification method for knowledge graph analysis, which comprises the steps of obtaining anomaly candidate entities during anomaly screening, taking the relationship type among the anomaly candidate entities as an analysis main body, carrying out association mode mining on the anomaly candidate entities, and recording mode analysis results of each anomaly candidate entity in an industry knowledge graph; based on the mode analysis result of the abnormal candidate entity, extracting the node position corresponding to the abnormal candidate entity, determining the abnormal attribution map corresponding to each abnormal candidate entity according to the trend of the connecting line and the node position of the abnormal candidate entity, carrying out reverse tracing on the abnormal attribution map to determine the updating range corresponding to each abnormal candidate entity, solving the range intersection of multiple abnormal candidate entities by utilizing the updating range of the abnormal candidate entity, and determining the updating mode of the industry knowledge map according to the node data corresponding to the range intersection. The accuracy and the efficiency of data anomaly detection processing are realized.

Inventors

  • DONG SHA
  • ZHANG HENGJING
  • MA CHENGYING
  • QIAN LONG
  • Zhou Binzhan
  • YE MINGCHEN
  • ZHANG GAN

Assignees

  • 浙江大学

Dates

Publication Date
20260505
Application Date
20260325

Claims (10)

  1. 1. The data governance anomaly detection and identification method for the knowledge graph analysis is characterized by comprising the following steps of: Acquiring abnormal candidate entities during abnormal screening, and relationship types and abnormal types corresponding to the abnormal candidate entities based on entity data of an industry knowledge graph; the relation type among the abnormal candidate entities is used as an analysis main body, the abnormal candidate entities are subjected to association mode discovery, and the mode analysis result of each abnormal candidate entity in the industry knowledge graph is recorded; based on the mode analysis result of the abnormal candidate entity, extracting the node position corresponding to the abnormal candidate entity, and determining an abnormal attribution map corresponding to each abnormal candidate entity according to the trend of the connecting line and the node position of the abnormal candidate entity; Carrying out reverse tracing on the abnormal attribution map, and determining the updating range corresponding to each abnormal candidate entity; And solving a range intersection of multiple abnormal candidate entities by using the updating range of the abnormal candidate entities, and determining an updating mode of the industry knowledge graph according to node data corresponding to the range intersection.
  2. 2. The method for detecting and identifying abnormal data governance by analyzing a knowledge graph according to claim 1, wherein the implementation manner of obtaining the abnormal candidate entity in abnormal screening further comprises: extracting a plurality of abnormal candidate entities from entity data of the current industry knowledge graph, and determining entity identifiers of the abnormal candidate entities during abnormal screening; And carrying out data grouping on any two abnormal candidate entities by utilizing the mapping relation between the entity identifiers and the industry knowledge graph, and labeling the relation types among the abnormal candidate entities by grouping attributions of the abnormal candidate entities.
  3. 3. The method for detecting and identifying abnormal data governance by analyzing a knowledge graph according to claim 2, wherein when any two abnormal candidate entities are grouped, the implementation manner comprises: According to the mapping relation between the current entity identifier and the industry knowledge graph, when any two abnormal candidate entities correspond to the same node in the industry knowledge graph, the corresponding abnormal candidate entities are regarded as a group; if any two abnormal candidate entities do not correspond to the same node in the industry knowledge graph, regarding the corresponding abnormal candidate entities as a group according to the abnormal types corresponding to the abnormal candidate entities; and for the rest data, grouping the abnormal candidate entities according to the minimum attribution unit of the industry knowledge graph of each abnormal candidate entity.
  4. 4. The method for detecting and identifying abnormal data governance by analyzing a knowledge graph according to claim 1, wherein the implementation of association pattern mining of the abnormal candidate entity further comprises: When the relation type of the abnormal candidate entity is known, dividing the abnormal candidate entity into a same-node association group which is associated with multiple nodes in an abnormal manner, a same-abnormality type group which is associated with abnormal type semanteme and a structure association group which is associated with a community structure; Aiming at the same node association group, based on a preset causal deduction rule, comparing the abnormal types of the abnormal candidate entities one by one, determining a causal relationship chain of the abnormal candidate entities, and regarding the causal relationship chain as a mode analysis result of the same node association group; Clustering the same abnormal type group by using semantic features, and connecting corresponding abnormal candidate entities according to clustering clusters formed by the semantic features to obtain a mode analysis result of the same abnormal type group; And extracting common characteristics of each abnormal candidate entity aiming at the structural association group, and taking the common characteristics as a mode analysis result of the structural association group.
  5. 5. The method for detecting and identifying abnormal data governance by knowledge-graph analysis according to claim 4, wherein the implementation of comparing the types of anomalies of the anomaly candidate entities one by one further comprises: Checking target nodes corresponding to the node association groups, traversing all the exception types of the target nodes when the current exception types are not the same, and identifying exception pairs with direct business logic association; judging a causal direction according to a preset causal deduction rule for each abnormal pair, and generating a causal relation chain corresponding to the target node through the causal direction; when the current anomaly types are the same, extracting semantic features of each anomaly candidate entity, connecting the corresponding anomaly candidate entities according to the difference of the semantic features, and taking the connection as an output causal relation chain.
  6. 6. The method for detecting and identifying abnormal data governance by analyzing a knowledge graph according to claim 1, wherein determining an abnormal attribution graph corresponding to each abnormal candidate entity comprises: Determining the node position of the current abnormal candidate entity through the mode analysis result, and sequentially determining the trend of connecting lines of the abnormal candidate entity according to the processing sequence of the abnormal candidate entity in the mode analysis result; According to the trend of the connecting lines of the abnormal candidate entities, the communication condition among the abnormal candidate entities is determined, and according to the communication level of each abnormal candidate entity, the abnormal candidate entities under each communication level are connected, so that an output abnormal attribution map is obtained.
  7. 7. The method for detecting and identifying abnormal data governance by analyzing a knowledge graph according to claim 6, wherein when connecting abnormal candidate entities under each connected level, the implementation manner further comprises: Collecting all abnormal candidate entities participating in connection, and sequentially setting the communication levels of the abnormal candidate entities according to the mode analysis results corresponding to the abnormal candidate entities; and for the abnormal candidate entities in the same communication level, when a connection relation exists between the abnormal candidate entities, gradually connecting the corresponding abnormal candidate entities from large to small according to the data range corresponding to the abnormal candidate entities.
  8. 8. The method for detecting and identifying abnormal data governance by analyzing a knowledge graph according to claim 1, wherein determining the update range corresponding to each abnormal candidate entity further comprises: all abnormal candidate entities in the abnormal attribution map are set as tracing starting points, the connecting line direction corresponding to the abnormal candidate entities is used as a reverse tracing direction, and a plurality of path sequences are obtained in a breadth-first searching mode; cross-verifying the path sequence according to the end point of the path sequence to obtain a verified influence node list; And mapping the influence node list into the industry knowledge graph by taking the current industry knowledge graph as a reference, respectively calculating the node reconstruction proportion and the relation reconstruction proportion, and taking the calculated node reconstruction proportion and the relation reconstruction proportion as the output updating range.
  9. 9. The method for detecting and identifying abnormal data governance by analyzing a knowledge graph according to claim 8, wherein the implementation manner of cross-verifying the path sequence further comprises: determining the time sequence of each node in each path sequence when updating according to the path sequence corresponding to each abnormal type; Traversing the path sequence in time sequence, checking the minimum updating time of the path sequence under time sequence analysis; and traversing the window by taking the minimum updating time as time, counting the total updating time required by all path sequences, and completing the cross verification of each path sequence by time constraint of each path sequence.
  10. 10. The method for detecting and identifying abnormal data governance by analyzing a knowledge graph according to claim 1, wherein determining an update mode of an industry knowledge graph further comprises: when any abnormal candidate entity exists in the range intersection, checking the number of nodes and the abnormal type in the range intersection; And sequentially configuring the updating modes at the range intersection by utilizing the ratio of the abnormal types in the range intersection and the number of node settings and combining the association modes under the current scene.

Description

Knowledge graph analysis data governance anomaly detection and recognition method Technical Field The invention relates to the technical field of data processing, in particular to a data governance anomaly detection and identification method for knowledge graph analysis. Background In the practical application process of the knowledge graph, along with the continuous improvement of the complexity of data, the problems of data quality defects, such as abnormal node attributes, relation fracture, data loss and the like, are inevitably caused, and if the abnormal problems are not recognized and treated in time, the structure of the knowledge graph is disordered, so that the operation stability of the knowledge graph is affected. The method comprises the steps of constructing a knowledge graph, selecting initial nodes from the knowledge graph to perform layered random walk, recording the node sequences of the walk to obtain a node sequence set, converting each node sequence in the node sequence set into a mode to obtain a mode set, and performing abnormal mode mining according to the mode set to obtain an abnormal structure. As disclosed in chinese patent publication No. CN120407270a, an operation and maintenance root cause positioning method and system combined with a knowledge graph are disclosed, firstly, a historical operation and maintenance log data set of a target system including a plurality of log sequences is obtained, each log sequence is composed of a system operation state record and an abnormal event mark, entity relationship recognition processing is performed on the historical operation and maintenance log data set, an operation and maintenance knowledge graph is generated, then a pre-trained node feature extraction model is invoked, an attribute feature extraction operation is performed on a node corresponding to a system component entity in the knowledge graph, a node operation state feature set and a node environment association feature set are generated, then a multi-level feature fusion process is performed based on the node operation state feature set and the node environment association feature set, a root cause positioning feature set is generated, finally, an abnormal event backtracking analysis process is performed according to the root cause positioning feature set, and a root cause positioning result and a corresponding operation and maintenance optimization strategy set of the target system are generated. In the prior art, the mining processing of an abnormal structure under abnormal mode mining is completed through the matching processing of an updated subgraph, or the root cause positioning processing under abnormal detection is determined through the identifier configuration of root cause positioning in a configuration mode of identifiers such as environment and components, but the prior art is biased to the preamble processing of detection data under a data management scene, the influence range of each abnormal mode occurrence and the updated content of corresponding data under the influence range are not considered, so that the content loss under abnormal detection such as root cause positioning influences the coupling of each node in the knowledge graph, and the use stability and the use efficiency of the knowledge graph are reduced. Disclosure of Invention In order to solve the technical problems, the technical scheme adopted by the invention is that the data governance anomaly detection and identification method for the knowledge graph analysis comprises the steps of acquiring anomaly candidate entities during anomaly screening, and relationship types and anomaly types corresponding to the anomaly candidate entities based on entity data of industry knowledge graphs. And taking the relationship type among the abnormal candidate entities as an analysis main body, carrying out association mode discovery on the abnormal candidate entities, and recording the mode analysis result of each abnormal candidate entity in the industry knowledge graph. Based on the mode analysis result of the abnormal candidate entity, extracting the node position corresponding to the abnormal candidate entity, and determining the abnormal attribution map corresponding to each abnormal candidate entity according to the trend of the connecting line and the node position of the abnormal candidate entity. And carrying out reverse tracing on the abnormal attribution map, and determining the updating range corresponding to each abnormal candidate entity. And solving a range intersection of multiple abnormal candidate entities by using the updating range of the abnormal candidate entities, and determining an updating mode of the industry knowledge graph according to node data corresponding to the range intersection. The method has the advantages that firstly, the method extracts the abnormal candidate entity, the relation and the abnormal type, performs association mode discovery based on the relation type, co