CN-121711265-B - Verification cascade failure early warning method and device for complex simulation system

CN121711265BCN 121711265 BCN121711265 BCN 121711265BCN-121711265-B

Abstract

The invention relates to a verification cascade failure early warning method and device for a complex simulation system. The method comprises the steps of constructing a multi-layer check network according to various check tasks of the complex simulation system. Dividing each checking task into a plurality of continuous time slices according to the multi-layer checking network, wherein each time slice corresponds to one complete operation and index acquisition process of the complex simulation system. And calculating the state transition probability among the nodes of the multi-layer check network so as to obtain the failure probability of the check node state in the multi-layer check network by adopting the Bayesian network. And performing failure evaluation on the joint states among the check nodes on the key propagation paths to obtain joint failure probability. And constructing a three-level early warning mechanism according to the failure probability and the joint failure probability so as to perform early warning state sample iterative learning through an optimization evaluation algorithm and generate a cascade failure early warning scheme. By adopting the method, the verification efficiency and the self-adaptability of the complex simulation system can be improved.

Inventors

JIA CHUNBO
WANG BINGLIN
DONG JIE
LIU JIAN
SUN JIANGUO
LIU LONGBO
LIU ZHENMING
CHEN XUGUANG
WANG SHUO

Assignees

中国人民解放军国防科技大学

Dates

Publication Date: 20260508
Application Date: 20260212

Claims (7)

1. The utility model provides a verification cascade failure early warning method facing to a complex simulation system, which is characterized in that the method comprises the following steps: constructing a multi-layer check network according to various check tasks of the complex simulation system; Dividing each checking task into a plurality of continuous time slices according to the multi-layer checking network, wherein each time slice corresponds to a complete operation and index acquisition process of the complex simulation system; Calculating state transition probability among nodes of the multi-layer check network so as to obtain failure probability of check node states in the multi-layer check network by adopting a Bayesian network; performing failure evaluation on the joint state between the check nodes on the key propagation path to obtain joint failure probability; constructing a three-level early warning mechanism according to the failure probability and the joint failure probability so as to perform early warning state sample iterative learning through an optimization evaluation algorithm and generate a cascade failure early warning scheme; the three-level early warning mechanism comprises node level early warning, path level early warning and system level early warning; when the failure probability of a certain check node exceeds a preset threshold and the error index corresponding to the check node is in an ascending trend, triggering the node level early warning: Wherein, the As a function of the normalization of the error intensity, A threshold value is preset for the probability of failure, Is the first Current first check node The check link index of each check period, Is the first Current first check node The check link index of each check period, Is the minimum variation of the error index; according to the propagation intensity weight in the multi-layer check network and the failure probability of each check node, calculating the accumulated risk value of the key propagation path: Wherein, the As an accumulated risk value for the critical propagation path, Is the first Check node number The propagation strength weights between the individual check nodes, For the kth check node in the time slice When the accumulated risk value exceeds a path level threshold value, judging that a high risk level failure condition exists in the current critical propagation path, triggering the path level early warning, and identifying an error conduction channel with high risk; and generating a system instability index by weighting and aggregating the failure probabilities of all check nodes: Wherein, the As an index of the system's instability index, Is the first Triggering the system-level early warning and starting an emergency intervention flow when the system instability index exceeds a system-level threshold value; And taking the state variables triggering all check nodes corresponding to the node level early warning, the path level early warning and the system level early warning as early warning state samples, and performing iterative learning on the early warning state samples through an optimization evaluation algorithm to generate a cascade failure early warning scheme.
2. The method of claim 1, wherein constructing the multi-layered verification network based on the plurality of verification tasks of the complex simulation system comprises: Designing a multi-layer check network consisting of a plurality of check nodes and directed edges according to a plurality of check tasks of a complex simulation system, wherein each check node comprises a code implementation check node, an algorithm convergence check node, a numerical error control check node, a physical consistency check node and an uncertainty quantization check node corresponding to the check task; and the directed edge takes a verification period corresponding to the verification task as a propagation unit, characterizes the error propagation direction, and assigns propagation intensity weight and propagation delay parameters.
3. The method of claim 1, wherein dividing each of the verification tasks into a plurality of consecutive time slices according to the multi-layer verification network, each time slice corresponding to a complete operation and index collection process of the complex simulation system, comprises: dividing each check task into a plurality of continuous time slices according to state variables of check nodes in the multi-layer check network: Wherein, the Is the first The check nodes are on the time slice Is used to determine the state variable of (1), For the current time slice The check link index of the test table meets the preset threshold value, no obvious problem exists, For the current time slice The check link index of (2) slightly exceeds a preset threshold, allowing recovery by local adjustment, For the current time slice The verification link index of the test pattern (C) deviates from a preset threshold value seriously or two continuous periods are abnormal or key criteria are not met, so that the verification function is lost; each time slice corresponds to a complete running and index collecting process of the complex simulation system, and the state of the complex simulation system is updated at the end of each time slice.
4. A method according to claim 3, wherein calculating a state transition probability between nodes of the multi-layer check network comprises: calculating the state transition probability between nodes of the multi-layer check network by adopting a two-time slice transition network: Wherein, the In order to be a probability of a state transition, Is the first The check nodes are on the time slice Is used to determine the state variable of (1), As the target state variable of the current check node, Is a set of parent nodes of a state variable, For a plurality of classes of probability normalization functions, Is the first The inherent bias terms of the individual check nodes, Is the first Check node number The propagation strength weights between the individual check nodes, As a function of the state mapping (map) function, Is the first Check node number The propagation delay between the individual check nodes, In order to achieve this, the first and second, As a function of the normalization of the error intensity, Is the first The check link indexes of the check nodes, Is the first And checking nodes.
5. The method of claim 4, wherein obtaining the probability of failure of the check node state in the multi-layered check network using a bayesian network comprises: After the Bayesian network is adopted and the normalization is carried out according to the multisource observation indexes of each check node acquired at the end of each check period, the failure probability of the check node state in the multi-layer check network is calculated: Wherein, the In order for the probability of failure to be present, Is the first The check nodes are on the time slice Is used to determine the state variable of (1), From the 1 st check period to the current 1 st check period Observation sequence of individual check periods.
6. The method of claim 5, wherein performing failure evaluation on the joint state between the check nodes on the critical propagation path to obtain a joint failure probability, further comprises: on a key propagation path along which an evaluation error of a Bayesian network spreads along a specific link, calculating the joint failure probability of all check nodes corresponding to the key propagation path by adopting a joint distribution decomposition and message transmission algorithm: Wherein, the Time slice for 1 st check node Is used to determine the state variable of (1), For the 2 nd check node in time slice Is used to determine the state variable of (1), Time slice for 3 rd check node Is used to determine the state variable of (1), For the 4 th check node in time slice State variables of (2).
7. The utility model provides a check-up cascade failure early warning device towards complicated simulation system which characterized in that, the device includes: the verification network construction module is used for constructing a multi-layer verification network according to various verification tasks of the complex simulation system; the time slice dividing module is used for dividing each checking task into a plurality of continuous time slices according to the multi-layer checking network, and each time slice corresponds to a complete running and index collecting process of the complex simulation system; The node failure verification module is used for calculating the state transition probability among the nodes of the multi-layer verification network so as to acquire the failure probability of the verification node state in the multi-layer verification network by adopting a Bayesian network; the path failure verification module is used for carrying out failure evaluation on the joint state between the verification nodes on the key propagation path to obtain joint failure probability; The early warning evaluation module is used for constructing a three-level early warning mechanism according to the failure probability and the joint failure probability so as to perform early warning state sample iterative learning through an optimization evaluation algorithm and generate a cascade failure early warning scheme, wherein the three-level early warning mechanism comprises node level early warning, path level early warning and system level early warning, and when the failure probability of a certain check node exceeds a preset threshold value and the error index corresponding to the check node is in an ascending trend, the node level early warning is triggered: Wherein, the As a function of the normalization of the error intensity, A threshold value is preset for the probability of failure, Is the first Current first check node The check link index of each check period, Is the first Current first check node The check link index of each check period, Calculating the accumulated risk value of the key propagation path according to the propagation intensity weight in the multi-layer check network and the failure probability of each check node: Wherein, the As an accumulated risk value for the critical propagation path, Is the first Check node number The propagation strength weights between the individual check nodes, For the kth check node in the time slice When the accumulated risk value exceeds a path level threshold value, judging that the current critical propagation path has high risk level failure condition, triggering the path level early warning and marking as an error conduction channel with high risk, and generating a system instability index by weighting and aggregating the failure probabilities of all check nodes: Wherein, the As an index of the system's instability index, Is the first The method comprises the steps of determining the propagation intensity weight of each check node, triggering the system-level early warning and starting an emergency intervention flow when the system instability index exceeds a system-level threshold, taking the state variables of all check nodes corresponding to the node-level early warning, the path-level early warning and the system-level early warning as early warning state samples, and performing iterative learning on the early warning state samples through an optimization evaluation algorithm to generate a cascade failure early warning scheme.

Description

Verification cascade failure early warning method and device for complex simulation system Technical Field The invention relates to the technical field of simulation system verification, in particular to a verification cascade failure early warning method and device for a complex simulation system. Background In digital engineering practice of high-security-level complex systems, model reliability is a fundamental premise for support design optimization, state monitoring and emergency decision. The verification and validation framework of the existing model covers a plurality of links such as code realization validation, algorithm convergence analysis, numerical error control, physical consistency verification, uncertainty quantization and the like. With the development of automated testing and continuous integration (CI/CD) technology, some systems have been able to implement the flow execution of verification tasks and output structural indicators (such as grid convergence index GCI, fidelity index, posterior coverage, etc.). However, the existing verification system still has significant limitation that each verification link is regarded as a static and isolated check point, and systematic monitoring and management on the dynamic characteristics of the verification flow are lacked. In fact, the verification process is essentially a strongly coupled, nonlinear, delayed dynamic process, and minor errors in the upstream ring may be conducted and amplified through multiple links, ultimately resulting in a complete failure of the downstream verification, i.e., a "cascade failure" phenomenon. For example, boundary condition errors in code implementation may not be immediately exposed, but numerical concurrency is induced in subsequent algorithm convergence checks, which in turn leads to overall misalignment of the model during the physical verification phase, forming a typical "cascade of verification processes" failure. In recent years, some researches attempt to introduce a graph neural network or a rule engine to monitor the verification state, but these methods have difficulty in effectively handling observation noise, model uncertainty and time sequence causality. Although the existing V & V technology is mature in local verification capability, the conventional V & V technology still has the key defects that 1, verification links are isolated to operate and lack of overall process monitoring, each verification task is independently executed and cannot sense that 'upstream ring node abnormality is downwards transmitted', so that problems are delayed in discovery, backtracking analysis is often carried out after verification failure of a final model, efficiency is low, 2, dynamic risk modeling is lacked, dependency relationship and influence intensity among the verification ring nodes are not quantified, linkage impact of single link failure on overall verification reliability is difficult to evaluate, 3, early warning capability is lacked, the conventional system generally only alarms after verification failure and cannot provide risk transmission path prediction and grading early warning, and 4, active blocking capability is lacked, namely, even if problems are discovered, an automatic strategy is not used for guiding to allocate limited verification resources to repair high risk links preferentially, so that intervention cost is high and effect is poor. Disclosure of Invention Based on the above, it is necessary to provide a verification cascade failure early warning method and device for a complex simulation system, which can improve the verification efficiency and the adaptability of the complex simulation system. A verification cascade failure early warning method facing a complex simulation system comprises the following steps: And constructing a multi-layer check network according to various check tasks of the complex simulation system. Dividing each checking task into a plurality of continuous time slices according to the multi-layer checking network, wherein each time slice corresponds to one complete operation and index acquisition process of the complex simulation system. And calculating the state transition probability among the nodes of the multi-layer check network so as to obtain the failure probability of the check node state in the multi-layer check network by adopting the Bayesian network. And performing failure evaluation on the joint states among the check nodes on the key propagation paths to obtain joint failure probability. And constructing a three-level early warning mechanism according to the failure probability and the joint failure probability so as to perform early warning state sample iterative learning through an optimization evaluation algorithm and generate a cascade failure early warning scheme. A complex simulation system oriented verification cascade failure early warning device, the device comprising: and the verification network construction module is used for construc