Search

CN-121996459-A - Fault analysis method, device, equipment and storage medium based on large model

CN121996459ACN 121996459 ACN121996459 ACN 121996459ACN-121996459-A

Abstract

The application discloses a fault analysis method, device, equipment and storage medium based on a large model, wherein the method comprises the steps of generating a target system topological structure corresponding to a target application system based on system state data of at least two subsystems in the target application system and a system call relationship among the subsystems; marking fault nodes with faults in the target system topological structure, generating fault prompt words according to the fault information of the fault nodes, inputting the marked target system topological structure and the fault prompt words into a fault analysis model, and carrying out fault analysis on the fault nodes. Through the technical scheme, the accuracy and the efficiency of fault analysis are improved.

Inventors

  • CHENG JIE
  • ZHOU XUKANG

Assignees

  • 北京博睿宏远数据科技股份有限公司

Dates

Publication Date
20260508
Application Date
20260122

Claims (10)

  1. 1. A large model-based fault analysis method, comprising: Generating a target system topology structure corresponding to a target application system based on system state data of at least two subsystems in the target application system and a system call relationship among the subsystems, wherein the target application system refers to an application system with system faults; Marking fault nodes with faults in the target system topological structure, and generating fault prompt words according to the fault information of the fault nodes; inputting the marked target system topological structure and the fault prompt word into a fault analysis model to perform fault analysis on the fault node, wherein the fault analysis model is an artificial intelligent model which is trained in advance.
  2. 2. The method of claim 1, wherein inputting the labeled target system topology and the fault prompt word into the fault analysis model to perform fault analysis on the fault node comprises: Carrying out semantic analysis and intention recognition on the fault prompt words by adopting a large language model, and determining fault occurrence positions and intention recognition results corresponding to the fault prompt words; Traversing the target system topology structure by taking the fault occurrence position as a traversing starting point according to the intention recognition result, and determining at least one candidate fault link; Traversing the link nodes in the candidate fault links, determining abnormal link nodes with abnormal system state data in the link nodes, and generating at least one candidate fault root cause according to the abnormal link nodes and the traversing starting point by adopting a large language model.
  3. 3. The method of claim 2, further comprising, after generating the candidate root cause of the fault: Generating a causal analysis instruction according to the candidate fault root cause, inputting the causal analysis instruction into the large language model to perform causal analysis, and generating a causal analysis result, wherein the causal analysis comprises causal logic analysis of the candidate fault root cause and causal fact analysis, the causal logic analysis is used for determining word sequence relation probability between a cause word and a causal word in the candidate fault root cause, and the causal fact analysis is used for determining whether causal logic between causal pairs in the candidate fault root cause can be supported by evidence; And determining the corresponding causal confidence coefficient for the candidate fault root according to the causal analysis result, and determining the target fault root from the candidate fault root according to the causal confidence coefficient.
  4. 4. The method of claim 2, wherein traversing the link nodes in the candidate failed link comprises: And in the traversing process of the candidate fault link, aiming at the current traversing link node, calling a node interface corresponding to the current traversing link node, and accessing system state data corresponding to the current traversing link node.
  5. 5. The method of claim 1, wherein the annotated target system topology is usable to characterize the location of the failure occurrence and the degree of failure anomaly of a failed node, and the path of failure propagation that the failed node is in.
  6. 6. The method of claim 1, wherein the target system topology is capable of real-time updating, the observation entities of the target system topology include hosts, processes, service instances and interfaces, and system state data corresponding to different observation entities are different.
  7. 7. A large model-based fault analysis apparatus, comprising: The system topology module is used for generating a target system topology structure corresponding to a target application system based on system state data of at least two subsystems in the target application system and a system call relationship among the subsystems, wherein the target application system refers to an application system with system faults; the prompt word generation module is used for marking fault nodes with faults in the target system topological structure and generating fault prompt words according to the fault information of the fault nodes; The fault analysis module is used for inputting the marked target system topological structure and the fault prompt word into a fault analysis model to perform fault analysis on the fault node, wherein the fault analysis model is an artificial intelligent model which is trained in advance.
  8. 8. An electronic device, comprising: One or more processors; a memory for storing one or more programs; The one or more programs, when executed by the one or more processors, cause the one or more processors to implement the large model-based fault analysis method of any of claims 1-6.
  9. 9. A computer readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the large model based fault analysis method according to any of claims 1-6.
  10. 10. A computer program product comprising a computer program which, when executed by a processor, implements the large model based fault analysis method according to any of claims 1-6.

Description

Fault analysis method, device, equipment and storage medium based on large model Technical Field The embodiment of the application relates to the technical field of computers, in particular to a fault analysis method, device and equipment based on a large model and a storage medium. Background With the widespread use of cloud protogenesis and micro-service architecture, enterprise-level applications often consist of hundreds or thousands of loosely coupled services, calling links and dependencies grow exponentially, any single point failure can spread rapidly along the dependency links, causing "alarm storms" and significantly lengthening average repair time. The existing fault root cause analysis method still has obvious short plates in the aspects of rule maintenance, alarm noise reduction, dynamic topology adaptability, data drift handling, result interpretable evaluation and the like, and is difficult to meet the continuous requirements of a modern large-scale distributed system on high accuracy, low average repair time and low maintenance cost. Disclosure of Invention The application provides a fault analysis method, device, equipment and storage medium based on a large model, so as to improve the accuracy and the efficiency of fault analysis. According to an aspect of the present application, there is provided a large model-based fault analysis method, the method including: Generating a target system topology structure corresponding to a target application system based on system state data of at least two subsystems in the target application system and a system call relationship among the subsystems, wherein the target application system refers to an application system with system faults; Marking fault nodes with faults in the target system topological structure, and generating fault prompt words according to the fault information of the fault nodes; inputting the marked target system topological structure and the fault prompt word into a fault analysis model to perform fault analysis on the fault node, wherein the fault analysis model is an artificial intelligent model which is trained in advance. According to another aspect of the present application, there is provided a large model-based fault analysis apparatus including: The system topology module is used for generating a target system topology structure corresponding to a target application system based on system state data of at least two subsystems in the target application system and a system call relationship among the subsystems, wherein the target application system refers to an application system with system faults; the prompt word generation module is used for marking fault nodes with faults in the target system topological structure and generating fault prompt words according to the fault information of the fault nodes; The fault analysis module is used for inputting the marked target system topological structure and the fault prompt word into a fault analysis model to perform fault analysis on the fault node, wherein the fault analysis model is an artificial intelligent model which is trained in advance. According to another aspect of the present application, there is provided an electronic apparatus including: One or more processors; a memory for storing one or more programs; when the one or more programs are executed by the one or more processors, the one or more processors implement any of the large model-based fault analysis methods provided by the embodiments of the present application. According to another aspect of the present application, there is provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements any of the large model-based fault analysis methods provided by the embodiments of the present application. According to another aspect of the application, a computer program product is provided, comprising a computer program which, when executed by a processor, implements any of the large model based fault analysis methods provided by the embodiments of the application. According to the application, the unified system state expression of the target application system is constructed based on the system state data of the subsystem, a unified data entry is established for subsequent fault analysis, and the judgment result of a certain single data dimension is not relied on any more, but the perceptibility and the positioning capability of the system to complex fault scenes are effectively improved through the cross-dimension and multi-angle target system topological structure, so that a large language model generates more reasonable and more sufficient fault positioning interpretation in the reasoning process, and the accuracy and the efficiency of fault analysis are improved. Drawings FIG. 1 is a flow chart of a large model-based fault analysis method provided in accordance with a first embodiment of the present application; FIG. 2 is a flow chart of a fa