CN-122019240-A - Root cause positioning method, equipment and medium based on large model

CN122019240ACN 122019240 ACN122019240 ACN 122019240ACN-122019240-A

Abstract

The invention provides a root cause positioning method, equipment and medium based on a large model, and relates to the technical field of data processing; the method comprises the steps of determining initial abnormal nodes and corresponding initial abnormal scores based on a call chain graph and abnormal index data of faults, determining target abnormal nodes and obtaining corresponding prediction abnormal scores and node selection description texts according to a call chain path from a starting point of the call chain graph to the initial abnormal nodes, node identification of the initial abnormal nodes, abnormal index data, target abnormal logs, historical fault cases and large models, obtaining target abnormal scores of the target abnormal nodes by combining the initial abnormal scores, the prediction abnormal scores and adjustment weights obtained based on the node selection description texts, and sequencing the target abnormal nodes according to the target abnormal scores to obtain root cause positioning results, wherein fault semantic information in unstructured texts such as fault call chain topology and quantized index data is fused, logs and the like is effectively utilized, and accuracy of root cause positioning is improved.

Inventors

WEN JIANBO
HUANG JIALE
GAO DONG
LIU CHUNLEI
ZHANG XI
JIN YANGQI

Assignees

中航信移动科技股份有限公司

Dates

Publication Date: 20260512
Application Date: 20260408

Claims (10)

1. A root cause positioning method based on a large model, which is characterized by comprising the following steps: S1, determining a plurality of initial abnormal nodes based on a calling chain graph of a fault and abnormal index data corresponding to the fault, and acquiring initial abnormal scores corresponding to each initial abnormal node; S2, acquiring a key data set based on each initial abnormal node, wherein the key data set comprises a node identifier of each initial abnormal node, a plurality of call chain paths from the starting point of a call chain graph to the initial abnormal node, and abnormal index data, a target abnormal log and a historical fault case corresponding to each initial abnormal node; s3, inputting target prompt words into a large model, screening at least one target abnormal node from all initial abnormal nodes by the large model, and outputting an abnormal recognition result, wherein the abnormal recognition result comprises a node identification of each target abnormal node, a predicted abnormal score corresponding to the node identification of each target abnormal node and a node selection description text; s4, for each target abnormal node, acquiring a target abnormal score corresponding to the target abnormal node based on an initial abnormal score, a predicted abnormal score and an adjustment weight corresponding to the predicted abnormal score corresponding to the target abnormal node; S5, sequencing all the target abnormal nodes according to the sequence of the target abnormal scores from large to small so as to obtain a root cause positioning result.
2. The root cause localization method based on a large model according to claim 1, further comprising, after step S1, before step S2: s01, for each initial abnormal node, acquiring a longest path from the starting point of the call chain graph to the initial abnormal node based on the call chain graph, and taking the longest path as a candidate call path corresponding to the initial abnormal node; S02, performing de-duplication processing on all candidate call paths according to a preset de-duplication rule, and taking the candidate call paths subjected to de-duplication processing as call chain paths from the start point of a call chain graph to an initial abnormal node, wherein the preset de-duplication rule is that if L j is a continuous sub-path of L e , L j is deleted, L j is a j candidate call path, L e is an e candidate call path, j is not less than 1 and not more than m, e is not more than j, and m is the total number of candidate call paths.
3. The root cause positioning method based on a large model according to claim 1, wherein for each initial abnormal node, an abnormal log of the initial abnormal node in a target time period is determined as a target abnormal log corresponding to the initial abnormal node, an ending time point of the target time period is a fault occurrence time point, and a duration of the target time period is a preset duration.
4. The root cause positioning method based on the large model according to claim 1, wherein the historical fault cases corresponding to the initial abnormal nodes are obtained by inquiring a fault case database, the fault case database comprises a plurality of fault cases, and the fault cases comprise fault description information and node identifications of root cause nodes corresponding to the fault description information.
5. The large model-based root cause positioning method according to claim 4, wherein if the node identification of the initial abnormal node is identical to the node identification of the root cause node in the fault case, the fault case is regarded as a historical fault case corresponding to the initial abnormal node.
6. The root cause positioning method based on a large model according to claim 1, wherein the target anomaly score H i corresponding to the i-th target anomaly node C i meets the following conditions: H i =W 1 ×F i +W 2 ×A i ×α;W 1 is importance weight corresponding to initial anomaly score, F i is score obtained by normalizing initial anomaly score corresponding to C i , W 2 is importance weight corresponding to predictive anomaly score, A i is predictive anomaly score corresponding to C i , alpha is adjustment weight corresponding to predictive anomaly score, i is greater than or equal to 1 and less than or equal to n, and n is number of target anomaly nodes.
7. The root cause positioning method based on the large model according to claim 6, wherein alpha is more than or equal to 0 and less than or equal to 1, and the larger alpha is, the higher the reliability of the predictive abnormality score is.
8. The big model based root cause positioning method of claim 1, wherein the starting point of the call chain graph is a service interface or execution unit that the user request or system event associated with the failure first enters the distributed system.
9. A non-transitory computer readable storage medium, wherein the storage medium has stored therein a computer program that is loaded and executed by a processor to implement the large model-based root cause localization method of any one of claims 1-8.
10. An electronic device comprising a processor, a memory and a computer program stored on the memory and executable on the processor, wherein the processor implements the large model based root cause localization method as claimed in any one of claims 1-8 when the computer program is executed by the processor.

Description

Root cause positioning method, equipment and medium based on large model Technical Field The present invention relates to the field of data processing technologies, and in particular, to a root cause positioning method, apparatus, and medium based on a large model. Background In modern distributed systems and micro-service architectures, service calling relations are increasingly complex, multiple components and levels are often involved when faults occur, the existing root cause positioning method mainly relies on analyzing fault calling chain topology and quantitative index data, an abnormal score of each node in the fault calling chain topology is calculated through a preset scoring algorithm, all nodes are ordered according to the order of the abnormal scores from large to small, and the first several nodes in the ordering results are selected as root cause positioning results. However, the above method also has the following technical problems: the method mainly relies on fault call chain topology and quantitative index data to perform root cause positioning, key fault semantics contained in unstructured texts such as logs, alarms and the like are difficult to effectively analyze, so that the judgment basis for the root cause of the fault is insufficient, the sorting result of each node in the root cause positioning result deviates from the priority of the true root cause node, and the accuracy and reliability of root cause positioning are reduced. Disclosure of Invention Aiming at the technical problems, the invention adopts the following technical scheme: according to a first aspect of the present invention, there is provided a root cause positioning method based on a large model, the method comprising the steps of: s1, determining a plurality of initial abnormal nodes based on a calling chain graph of a fault and abnormal index data corresponding to the fault, and acquiring initial abnormal scores corresponding to each initial abnormal node, wherein the initial abnormal nodes are nodes in the calling chain graph. And S2, acquiring a key data set based on each initial abnormal node, wherein the key data set comprises a node identifier of each initial abnormal node, a plurality of call chain paths from the starting point of the call chain graph to the initial abnormal node, and abnormal index data, a target abnormal log and a historical fault case corresponding to each initial abnormal node. S3, inputting target prompt words into a large model, screening at least one target abnormal node from all initial abnormal nodes by the large model, and outputting an abnormal recognition result, wherein the abnormal recognition result comprises a node identification of each target abnormal node, a predicted abnormal score corresponding to the node identification of each target abnormal node and a node selection description text, and the target prompt words are constructed based on a key data set and a preset prompt word template. S4, for each target abnormal node, acquiring a target abnormal score corresponding to the target abnormal node based on the initial abnormal score, the predicted abnormal score and the adjustment weight corresponding to the predicted abnormal score corresponding to the target abnormal node, wherein the adjustment weight corresponding to the predicted abnormal score is obtained based on the node selection description text. S5, sequencing all the target abnormal nodes according to the sequence of the target abnormal scores from large to small so as to obtain a root cause positioning result. According to a second aspect of the present invention there is provided a non-transitory computer readable storage medium having stored therein a computer program loaded and executed by a processor to implement the foregoing method. According to a third aspect of the invention there is provided an electronic device comprising a processor, a memory and a computer program stored on the memory and executable on the processor, the processor implementing the method as described above when executing the computer program. The invention has at least the following beneficial effects: The invention provides a root cause positioning method, equipment and medium based on a large model, wherein in the method, a plurality of initial abnormal nodes are determined based on a calling chain graph of a fault and abnormal index data corresponding to the fault, initial abnormal scores corresponding to the initial abnormal nodes are obtained, target prompt words are built based on a key data set and a preset prompt word template, the target prompt words are input into the large model, at least one target abnormal node is screened out of all the initial abnormal nodes by the large model, an abnormal recognition result is output, the key data set comprises node identifications of the initial abnormal nodes, a plurality of calling chain paths from the starting point of the calling chain graph to the initial abnormal nodes and