CN-122027463-A - Knowledge-graph-based base station fault root cause positioning method and system

CN122027463ACN 122027463 ACN122027463 ACN 122027463ACN-122027463-A

Abstract

The invention belongs to the technical field of communication network operation and maintenance and artificial intelligence intersection, and particularly relates to a base station fault root cause positioning method and system based on a knowledge graph, which are used for constructing a communication network knowledge graph carrying causal direction attributes and propagation weights, performing self-adaptive space-time clustering convergence on alarm data, mapping alarm events to a knowledge graph to activate causal propagation subgraph, performing direction-aware multi-hop message transmission through a graph attention network along causal relationship edges to calculate root cause probability scores, outputting nodes with top ranking as root causes, and continuously updating the graph weights through expert feedback closed loops.

Inventors

Chen Jiechuang
CHEN YOUWEN
Zhong Dele

Assignees

广州广杰网络科技有限公司

Dates

Publication Date: 20260512
Application Date: 20260409

Claims (10)

1. The base station fault root cause positioning method based on the knowledge graph is characterized by comprising the following steps of: Constructing a knowledge graph in the field of communication networks, modeling base station equipment, network element topology, alarm types and fault modes as entity nodes, modeling physical connection, logic bearing, alarm triggering and causal propagation among entities as relationship edges, wherein each relationship edge carries causal direction attributes and propagation weights; Performing space-time clustering convergence on alarm data received in a preset time window, identifying a homologous alarm set and aggregating the same into an alarm event; Mapping the alarm event to a corresponding entity node in the knowledge graph, and activating a causal propagation subgraph associated with the alarm event; performing multi-hop message transmission along the causal relationship edge of the causal propagation subgraph through a graph attention network, and calculating probability scores of candidate root cause nodes; And descending order arrangement is carried out on the candidate root cause nodes according to the probability scores, and the nodes with the top ranking are output to serve as fault root cause positioning results.
2. The method according to claim 1, wherein in the constructing a knowledge graph of the communication network domain, the alarm suppression relationship is modeled as a causal direction inference signal, specifically comprising: collecting alarm suppression rules recorded by a network management system in a communication network, wherein the alarm suppression rules indicate a sub-alarm set which is automatically shielded when a parent alarm is generated; modeling a direction attribute of a causal propagation relationship side by pointing a parent alarm corresponding entity to a child alarm corresponding entity in the alarm suppression rule, wherein the parent alarm entity is marked as an upstream causal node, and the child alarm entity is marked as a downstream causal node; and calculating initial propagation weights of causal propagation relationship edges based on the historical co-occurrence frequency of the parent-child alarm pairs in the alarm suppression rules.
3. The method according to claim 2, further comprising a cross-vendor alarm semantic alignment step before modeling the direction of the parent alarm corresponding entity to the child alarm corresponding entity in the alarm suppression rule as a direction attribute of a causal propagation relationship edge: Extracting alarm names, alarm categories and alarm parameter fields in alarm data of network management systems of various manufacturers; constructing a unified alarm semantic space, and encoding heterogeneous alarm texts of all manufacturers into semantic vectors through a pre-training language model; In the unified alarm semantic space, mapping semantically equivalent heterogeneous alarms into the same standard alarm entity by taking a cosine similarity threshold as a criterion; and writing the cross-manufacturer mapping result back to the alarm type entity attribute of the knowledge graph.
4. A method according to claim 3, wherein in performing space-time clustering convergence on alarm data received in a preset time window, the preset time window is an adaptive window based on causal propagation delay estimation, and specifically comprises: Reading the propagation weight of a causal propagation relationship side in the knowledge graph and the topology hop count between network element nodes connected with the relationship side; estimating an expected value and a variance of causal propagation delay according to the propagation weight and the topological hop count; setting the width of the time window to be the sum of the expected value of the causal propagation delay and the standard deviation of a preset multiple; And performing density clustering on alarm data falling into the self-adaptive time window by taking the time difference between the topological distance of the alarm occurrence position and the alarm occurrence time as a measure, and aggregating the alarms with reachable densities into the same alarm event.
5. The method according to claim 4, wherein in the performing multi-hop messaging along causal edges of causal propagation subgraphs through a graph attention network, attention coefficients of the graph attention network calculate a fused causal direction code, comprising in particular: Extracting causal direction attribute of each causal relation edge in the causal propagation subgraph, and encoding the causal direction attribute into a direction vector; after the direction vector is spliced with the source node feature vector and the target node feature vector, calculating an original attention coefficient through a learnable attention parameter matrix; Applying a causal direction mask to the original attention coefficient to attenuate the causal direction message passing attention coefficient to a preset attenuation factor; And carrying out normalization processing on the attenuated attention coefficient, and updating the hidden state representation of each node in a jump-by-jump way along the causal direction until the preset propagation jump number is reached.
6. The method of claim 5, further comprising the step of expert feedback closed loop updating after the descending order of candidate root cause nodes according to the probability score: Receiving a confirmation mark or a correction mark of the fault root cause positioning result by operation and maintenance personnel; When receiving the confirmation mark, enhancing the propagation weight of each relation edge on the causal propagation path where the confirmed root cause node is located in the knowledge graph; When the correction mark is received, the propagation weight of each relation edge on the causal propagation path where the negative root cause node is located is reduced, and meanwhile, the propagation weight of each relation edge on the path where the correct root cause node specified by an operation and maintenance personnel is located is enhanced; And writing the updated propagation weight back to the knowledge graph for calculating the attention coefficient of the graph attention network during the subsequent fault root cause positioning.
7. The method of claim 1, wherein the outputting the top ranked node as the result of the fault root location further comprises: Extracting causal propagation paths from root nodes corresponding to the fault root positioning results to each alarm event node from the knowledge graph; and rendering the causal propagation path into a fault propagation link visual map, wherein the visual map is marked with the alarm time, the alarm level and the propagation weight of each path node.
8. The method of claim 1, wherein the outputting the top ranked node as the result of the fault root location further comprises: inquiring a preset fault treatment plan library based on the fault root positioning result to obtain a treatment plan matched with the fault mode of the root node; And automatically generating a work order according to the treatment plan, and distributing the work order to a professional team corresponding to the root cause node.
9. The method of claim 1, wherein the reference value of the preset time window is 5min, the preset propagation hop count is 3 hops to 5 hops, and the output number of the probability scores is the first 3 to the first 5 candidate root cause nodes.
10. A knowledge-graph-based base station fault root cause positioning system for implementing the method of any one of claims 1-9, comprising: the knowledge graph construction module is used for constructing a knowledge graph in the field of communication networks, modeling base station equipment, network element topology, alarm types and fault modes as entity nodes, and modeling physical connection, logic bearing, alarm triggering and causal propagation among entities as relationship edges, wherein each relationship edge carries causal direction attributes and propagation weights; The alarm convergence module is used for performing space-time clustering convergence on the alarm data received in the preset time window, identifying a homologous alarm set and aggregating the homologous alarm set into an alarm event; The alarm mapping module is used for mapping the alarm event to a corresponding entity node in the knowledge graph and activating a causal propagation subgraph associated with the alarm event; The root cause reasoning module is used for executing multi-hop message transmission along the causal relation edge of the causal propagation subgraph through the graph attention network and calculating probability scores of candidate root cause nodes; And the result output module is used for arranging the candidate root cause nodes in a descending order according to the probability scores and outputting the nodes with the top order as fault root cause positioning results.

Description

Knowledge-graph-based base station fault root cause positioning method and system Technical Field The invention belongs to the technical field of communication network operation and maintenance and artificial intelligence intersection, and particularly relates to a base station fault root cause positioning method and system based on a knowledge graph. Background With the large-scale deployment of the 5 th generation mobile communication network, the number of base stations of a single provincial operator reaches hundreds of thousands of levels, and the number of alarm data generated every day can reach millions. Complex space-time association and causal propagation relation exist between massive alarms, and the faults of the same equipment can be spread step by step along the network topology to trigger hundreds of derived alarms. When operation and maintenance personnel face an alarm storm, the operation and maintenance personnel have to rely on manual experience to check one by one, the average fault positioning time is up to 40 minutes, and the network service quality and the user experience are seriously affected. The existing fault root positioning technology mainly develops along two paths. The first path is an association analysis method based on expert rules. The method defines an alarm association rule base in advance, executes rule matching on the received alarm according to a time window and a topological distance, and outputs root candidate which is matched with the fault mode in the rule base. For example, patent application publication No. EP3796176A1 discloses a fault root cause analysis method, which characterizes the relationship between alarm events by extracting feature vectors of the alarm events, and then judges whether the alarm event is a root cause alarm event based on a preset classification model. In the method, 3 dimensions of time association, topological association and text similarity are comprehensively considered in an alarm aggregation stage, a random forest classification algorithm is adopted for root cause identification, and a classification model is continuously trained through expert annotation data. However, this approach essentially models root cause localization as a feature vector based classification problem, the classification outcome of which is highly dependent on the quality of the feature engineering and the sufficiency of the training samples. When the alarm features are similar but the causal propagation paths are different, for example, the alarms of the same kind are propagated from two different fault sources to the same sink node respectively, the method is difficult to distinguish the real root cause sources due to lack of causal direction information, so that the positioning accuracy is unstable. The second path is an inference method based on the network topology. Patent application publication number US7043661B2 discloses a topology-based network fault root cause analysis inference engine that traverses along a topology path to discover root causes by constructing a network topology graph and defining alarm propagation rules on the graph. The system clusters the incident alarms into alarm groups according to the arrival time and the topological distance, each alarm group corresponds to a potential fault event, and then traverses on a topological graph according to a predefined result scene. The method has a good effect in a transmission network with a definite topological relation, but the reasoning capability of the method is seriously dependent on manually defined consequence scene rules, so that when the complex multi-layer network fault cascade propagation is faced, the maintenance cost of a rule base is increased sharply, and a novel fault mode is difficult to cover. In addition, quantitative characterization of causal relation is not introduced in the method, and topological edges only represent physical connection and do not distinguish causal directions, so that the reasoning directions lack guidance, and space explosion is searched in a large-scale network. The publication number is CN105677759B, which discloses an alarm association analysis method in an information communication network, wherein statistical association relation among alarms is mined by preprocessing alarm data and multidimensional association analysis. However, the method is still based on association mining of statistical frequencies, structured knowledge-maps are not built to explicitly characterize causal propagation mechanisms, and graph neural networks are not introduced to automatically learn deep causal associations between alarms. In a comprehensive view, the prior art has a common deep bottleneck that root cause positioning problems are modeled as alarm feature-based classification problems or rule-based matching problems, and causal propagation structure information contained in a communication network is not fully utilized. The alert feature and statistical assoc