Search

CN-121980531-A - Emergency decision bidirectional deduction method, device and equipment

CN121980531ACN 121980531 ACN121980531 ACN 121980531ACN-121980531-A

Abstract

The embodiment of the application provides a method, a device and equipment for bidirectional deduction of emergency decision. In the embodiment of the application, the acquired multi-source data can be subjected to time axis alignment to construct a multi-dimensional feature vector with a unified time sequence, the multi-dimensional feature vector is input into a causal sounding model to carry out abnormal propagation analysis on the forward influence and the reverse influence between a service related vector and a device related vector in the multi-dimensional feature vector based on a double-track space-time topological sub-network in the causal sounding model to obtain an abnormal scoring list, a core sub-network is deduced based on the reverse setting state in the causal sounding model, a virtual intervention test is designed for candidate root cause nodes determined in the abnormal scoring list in a digital twin mirror environment of the target system, and a fault hierarchical causal path with probability weight is generated based on the dynamic influence of the virtual intervention test on a key index of the target system.

Inventors

  • WANG DONGFANG
  • ZHANG YAPU
  • Jia Yewu
  • FAN YIMENG
  • HAN YANKUN
  • WANG PULING
  • XU DONGLIANG

Assignees

  • 中移在线服务有限公司
  • 中国移动通信集团有限公司

Dates

Publication Date
20260505
Application Date
20260126

Claims (10)

  1. 1. The emergency decision bidirectional deduction method is characterized by comprising the following steps of: collecting equipment monitoring data, equipment attribute data, service dependency relationship data and service flow data of a target system, and performing time axis alignment on the collected multi-source data to construct a multi-dimensional feature vector with a unified time sequence; Inputting the multi-dimensional feature vector into a causal sounding model to perform abnormal propagation analysis on the forward influence and the reverse influence between the service related vector and the equipment related vector in the multi-dimensional feature vector based on a double-track space-time topology sub-network in the causal sounding model so as to obtain an abnormal scoring list; And deducing a core sub-network based on an inverse state in the causal sounding model, designing a virtual intervention test for candidate root cause nodes determined from the abnormal score list in a digital twin mirror environment of the target system, and generating a fault grading causal path with probability weight based on dynamic influence of the virtual intervention test on key indexes of the target system.
  2. 2. The method of claim 1, wherein the anomaly propagation analysis of the forward and reverse effects between the traffic-related vectors and the device-related vectors in the multi-dimensional feature vectors based on the dual-rail spatiotemporal topology subnetwork in the causal penetration model, to obtain an anomaly score list, comprises: based on the physical layer of the double-track space-time topology sub-network, performing time sequence modeling and anomaly aggregation on the equipment related vectors, and outputting an equipment sequence with abnormal signals; based on the service layer of the double-track space-time topology sub-network, carrying out graph convolution propagation and anomaly aggregation on the service related vectors, and outputting an anomaly scoring sequence of an instance node; and based on the coupling layer of the double-track space-time topological sub-network, carrying out bidirectional propagation and fusion on the equipment sequence with abnormal signals and the abnormal scoring sequence of the example node to generate the abnormal scoring list.
  3. 3. The method of claim 2, wherein the bi-directional propagation and fusion of the coupling layers based on the dual-rail spatio-temporal topology sub-network to generate the anomaly score list comprises: respectively calculating forward propagation weights from a physical layer to a service layer and backward propagation weights from the service layer to the physical layer through a cross attention mechanism in the coupling layer; Based on the forward propagation weight, transmitting an abnormal signal in the equipment sequence with abnormal signal to a relevant service layer node; based on the back propagation weight, transferring anomaly scores in the anomaly score sequence of the instance node to related physical layer nodes; and merging the cross-layer abnormal information received by each node with the intra-layer abnormal information of the node to generate a unified abnormal scoring list.
  4. 4. The method of claim 3, wherein the cross-layer propagation mechanism comprises: Propagating an anomaly signal along a forward propagation path in a direction from a device layer node to a traffic layer node to propagate the anomaly signal based on a time-sequential characteristic of a device physical state to model a causal effect of the anomaly signal on a traffic index, and The anomaly signal is propagated along a reverse propagation path from the traffic layer node to the device layer node to propagate the anomaly signal based on the contextual characteristics of the traffic logic to capture a reverse effect of the traffic anomaly on the underlying device.
  5. 5. The method of claim 4, wherein a graph convolution operation is performed on a traffic layer sub-graph of the heterogeneous hierarchical graph structure in the dual-track spatio-temporal topology sub-network; after each graph convolution operation, performing cross attention calculation on the current service layer node characteristic from the forward propagation path and the corresponding service layer node characteristic vector from the reverse propagation path; Based on the results of the cross-attention computation, a comprehensive characterization of the business layer node is updated to identify a dual-rail dependency potential pattern between the device correlation vector and the business correlation vector.
  6. 6. The method of claim 1, wherein deriving a core subnetwork based on inverse states in the causal penetration model, in a digital twin mirror environment of the target system, designing a virtual intervention test for candidate root cause nodes determined from the anomaly score list, comprises: based on the inverse state deduction core sub-network in the causal sounding model, creating a system mirror image copy containing the current equipment resource state and the service flow state in the digital twin mirror image environment of the target system; performing virtual intervention operations in the system image copy, including forward repair, reverse corruption, and combined intervention, on candidate root cause nodes determined by the anomaly score list; Recording distribution changes of key indexes in the mirror image copy of the system before and after executing the virtual intervention operation; and based on the distribution change, quantifying to obtain an intervention effect quantification value of the virtual intervention operation.
  7. 7. The method of claim 6, wherein generating the probability weighted fault-graded causal path comprises: constructing a confusion relation map and a corresponding feature matrix based on the historical fault case data and the features related to the candidate root cause nodes; the causal judgment result of the historical cases in the historical fault case data is used as a label, and the feature matrix is subjected to confusion factor importance analysis to obtain a confusion factor importance score; combining the intervention effect quantized value with the confusion factor importance score to determine the final causal probability weight of the candidate root cause node; based on the final causal probability weights, generating the probability weighted fault hierarchical causal paths.
  8. 8. The method of claim 1, wherein the method further comprises: Inputting the fault grading causal path into a virtual-real fusion modeling field; In the virtual-real fusion scene, reconstructing the production environment topology of the target system in proportion by a containerization technology based on the fault grading causal path, and injecting a simulation fault corresponding to the fault grading causal path from partial real-time traffic cloned from the target system; loading a plurality of preset emergency treatment strategy templates, and carrying out parallel deduction in the reconstructed simulation environment; And recording and analyzing performance indexes of each strategy template in the deduction process, and outputting a plurality of groups of deduction schemes comprising three-dimensional evaluation indexes of service recovery degree, resource consumption ratio and operation complexity.
  9. 9. The method as recited in claim 8, further comprising: inputting the failure grading causal path into an L1 intelligent agent, decomposing to obtain an emergency sub-target sequence with priority, and Inputting the emergency sub-target sequence and the multiple groups of deduction schemes to an L2 intelligent agent; And generating a target emergency strategy by the L2 agent through an ecological cluster evolution game mechanism based on a three-dimensional fitness defined according to a three-dimensional evaluation index of the deduction scheme as an optimization target in a base strategy source pool obtained by fusing a historical strategy library and fragments of the deduction scheme, wherein the three-dimensional fitness comprises a service recovery degree, a resource consumption ratio and an operation complexity.
  10. 10. The method of claim 9, wherein the generating the target contingency strategy via an ecological cluster evolution gaming mechanism comprises: constructing a candidate strategy tree by the L2 agent based on the base strategy source pool; classifying strategies in the candidate strategy tree according to a conserved ecological cluster, an aggressive ecological cluster and a mixed ecological cluster; Based on the three-dimensional fitness, performing evolution operations of crossing, mutation and forced reset on strategies in various ecological clusters; And dynamically adjusting the duty ratio of each ecological cluster in the evolution process according to the average fitness of the strategies in each ecological cluster, and outputting the strategy with the highest fitness in the evolution process as the target emergency strategy.

Description

Emergency decision bidirectional deduction method, device and equipment Technical Field The present application relates to the field of wireless communications technologies, and in particular, to a method, an apparatus, and a device for bidirectional deduction of an emergency decision. Background With the deep advancement of enterprise digital transformation, the complexity, the scale and the business dependence of an IT system are increasingly enhanced, and the stable and continuous operation of the IT system becomes a key component of enterprise core competitiveness. IT operation and maintenance, particularly emergency treatment, is used as a core defense line for guaranteeing service continuity, system stability and data security, and the value of the IT operation and maintenance is upgraded from basic technical guarantee to strategic service support. When the emergency response capability is used for responding to the sudden faults, the high-efficiency emergency response capability directly determines the duration of service interruption and the scale of economic loss, and drives the operation and maintenance technology to accelerate the evolution towards the direction of intelligence, automation and high reliability. At present, the mainstream IT operation and maintenance emergency technology mainly presents an intelligent and automatic deep fusion trend, and is characterized by comprising two types of methods, namely automatic response and log analysis, centralized acquisition and association analysis of logs are realized through a tool, a fault recovery flow is automatically triggered according to preset rules, abnormal nodes in a micro-service architecture are positioned by utilizing a call chain tracking technology, and secondly, AI-driven prediction and diagnosis are performed, historical operation and maintenance data are trained on the basis of a machine learning model so as to predict hardware faults or performance bottlenecks, and repair suggestions are automatically generated. However, when facing increasingly complex cross-layer and dynamic fault scenes, the prior art scheme still has significant defects, particularly in fault root cause positioning links, the prior method generally relies on static topological association or statistical correlation models for analysis, the modeling dimension is single, influences between the physical state of equipment and upper-layer business logic are difficult to accurately quantify, the positioned root cause is a suspicious node under statistical association, a large number of pseudo-relevant interference is included, physical interpretability and verification reliability are lacked, and the error positioning rate is high. Disclosure of Invention The application provides a method, a device and equipment for bidirectional deduction of emergency decision, which are used for solving the problems that in the related technology, a fault root is positioned by depending on a static topological association or statistical correlation model, the modeling dimension is single, the influence between the physical state of equipment and upper business logic is difficult to accurately quantify, the positioned root is a suspicious node under the statistical association, a large number of pseudo-relevant interferences are included, the physical interpretability and the verification reliability are lacked, and the error positioning rate is high. The embodiment of the application provides a bidirectional deduction method for emergency decision, which comprises the following steps: collecting equipment monitoring data, equipment attribute data, service dependency relationship data and service flow data of a target system, and performing time axis alignment on the collected multi-source data to construct a multi-dimensional feature vector with a unified time sequence; Inputting the multi-dimensional feature vector into a causal sounding model to perform abnormal propagation analysis on the forward influence and the reverse influence between the service related vector and the equipment related vector in the multi-dimensional feature vector based on a double-track space-time topology sub-network in the causal sounding model so as to obtain an abnormal scoring list; And deducing a core sub-network based on an inverse state in the causal sounding model, designing a virtual intervention test for candidate root cause nodes determined from the abnormal score list in a digital twin mirror environment of the target system, and generating a fault grading causal path with probability weight based on dynamic influence of the virtual intervention test on key indexes of the target system. The embodiment of the application also provides an emergency decision bidirectional deduction device, which comprises: The feature vector construction module is used for acquiring equipment monitoring data, equipment attribute data, service dependency relationship data and service flow data of the target