CN-121980302-A - Intelligent operation and maintenance and fault self-healing method for industrial software cloud base based on digital twin
Abstract
The invention discloses an intelligent operation and maintenance and fault self-healing method of an industrial software cloud base based on digital twinning, and relates to the technical field of digital twinning. The cloud base operation state accurate mapping and visual display is realized by constructing a layered digital twin model and dynamically updating, combining independent visual modules of a physical layer, a logic layer and an application layer, the abnormal of each layer is quickly identified and graded early warning is generated through real-time calibration of a reference value of a dynamic reference library, the response timeliness of the abnormal is improved, a dynamically updated fault knowledge graph is constructed based on historical fault data, a multidimensional association analysis and confidence coefficient verification mechanism is integrated, a unique fault cause is positioned, the fault diagnosis period is greatly shortened, the cause positioning accuracy is improved, a strategy library is constructed by depending on the fault knowledge graph and the historical self-healing case, the simulation verification is carried out on candidate strategies through the digital twin model, the effectiveness of the self-healing strategies is ensured, then the self-healing strategies are issued and executed, the intelligent automatic fault restoration is realized, and the stable operation of the cloud base is ensured.
Inventors
- WANG RUILI
- WANG HONGLEI
- FENG LIMING
- ZHAO SHUO
- LIN LUAN
- LI JIMIN
Assignees
- 内蒙古科学技术研究院
- 北京十沣科技有限公司
Dates
- Publication Date
- 20260505
- Application Date
- 20251222
Claims (10)
- 1. The intelligent operation and maintenance and fault self-healing method for the industrial software cloud base based on digital twinning is characterized by comprising the following steps of: acquiring physical structure data and real-time running state data of an industrial software cloud base, constructing a layered digital twin model of the cloud base, and dynamically updating the layered digital twin model by combining geometric parameters, performance parameters and topological relations of the physical base; Based on the constructed layered digital twin model, mapping the actual running state of the industrial software cloud base in real time and carrying out simulation deduction, constructing a visual monitoring interface for visual display, wherein the visual monitoring interface comprises a physical layer, a logic layer and an application layer framework based on the layered digital twin model, corresponding three independent visual modules are arranged in the visual monitoring interface, state data of the three visual modules are synchronously updated based on a model precision calibration period, abnormal states deviating from a normal range are identified, and hierarchical early warning information is generated; constructing a fault knowledge graph, extracting abnormal index data corresponding to the abnormal state, performing multidimensional data association analysis, screening entity data associated with the abnormal index, generating a fault analysis data set, and positioning a unique fault cause based on the fault knowledge graph and the fault analysis data set; And constructing a self-healing strategy library based on the cause nodes of the fault knowledge graph and the historical self-healing cases, matching the self-healing strategy according to the positioned fault cause, inputting the self-healing strategy into a layered digital twin model for simulation verification, acquiring an effective self-healing strategy, and issuing the effective self-healing strategy to a cloud base for execution.
- 2. The intelligent operation, maintenance and fault self-healing method of an industrial software cloud base based on digital twinning as claimed in claim 1, wherein the process of constructing a layered digital twinning model of the cloud base comprises the following steps: Extracting physical structure data of an industrial software cloud base, wherein the physical structure data comprise physical structure parameters and hardware performance parameters of a hardware component, and constructing a geometric simulation model of the cloud base by combining network topology data of the cloud base to generate physical layer state data; Collecting virtualized resource configuration, network routing rules, data transmission protocols and software dependency relations of a cloud base, constructing a logic topology model of the cloud base, simulating a resource scheduling flow, a data transmission path and software interaction logic, and generating logic layer configuration data; Acquiring a deployment architecture, a service call link, service flow configuration and performance index threshold of industrial software, constructing an application running model of a cloud base, and associating physical layer data with logic layer configuration data; And establishing a mapping relation between the service performance index and the state data of the geometric simulation model and the configuration data of the logic topology model, and generating a layered digital twin model of the cloud base.
- 3. The intelligent operation, maintenance and fault self-healing method of an industrial software cloud base based on digital twinning as claimed in claim 2, wherein the layered digital twinning model of the cloud base further comprises: Establishing an accuracy calibration rule of a geometric simulation model, and acquiring hardware real-time operation parameters in physical layer data based on a preset extraction time interval; Comparing the real-time hardware operation parameters with the actual hardware operation state of the cloud base output by the geometric simulation model, and correcting the hardware performance parameter mapping coefficient of the geometric simulation model based on the comparison result; acquiring a virtualized resource configuration of a cloud base and a software dependency relationship, and constructing a configuration data check library based on the dependency relationship; Extracting configuration data of the logic topology model, comparing the configuration data with rules of a configuration data check library, and generating configuration correction suggestions; When the virtualized resources of the cloud base are changed, automatically updating the resource scheduling simulation parameters and the data transmission path parameters of the logic topology model; And establishing a performance index threshold cooperative adjustment mechanism based on the mapping relation, and when the hardware performance parameters of the physical layer model or the resource configuration of the logic topology model are changed, recalculating the service performance index threshold of the application layer model through the performance index threshold cooperative adjustment mechanism, and synchronously updating the service performance monitoring rule of the application layer model.
- 4. The intelligent operation and maintenance and fault self-healing method for the digital twin-based industrial software cloud base as claimed in claim 3, wherein the comparison result modifies the hardware performance parameter mapping coefficient of the geometric simulation model, and the method comprises the following steps: extracting a current hardware running state simulation value output by the geometric simulation model; The method comprises the steps of calling a current hardware real-time operation parameter corresponding to a current hardware operation state simulation value, and carrying out absolute difference processing by utilizing the current hardware real-time operation parameter and the current hardware operation state simulation value to obtain a current absolute error value; Performing ratio processing on the absolute error value and the current hardware real-time operation parameter to obtain an error rate; A preset basic correction step length alpha is called from a database, wherein the value range of the basic correction step length is 0< alpha less than or equal to 0.5; The corresponding error rate in the last hardware performance parameter mapping coefficient correction process is called as a preposed error rate; Adjusting a preset basic correction step alpha by using the error rate and the pre-error rate to obtain an adjusted correction step alpha t ; And correcting the mapping coefficient of the hardware performance parameter by using the adjusted correction step length alpha t .
- 5. The intelligent operation and maintenance and fault self-healing method of an industrial software cloud base based on digital twinning as claimed in claim 4, wherein the correcting the hardware performance parameter mapping coefficient by using the adjusted correction step alpha t comprises: The hardware running state simulation value and the hardware real-time running parameter which are output by the corresponding geometric simulation model in the process of correcting the mapping coefficient of the hardware performance parameter last time are called; the hardware running state simulation value and the hardware real-time running parameter which are output by the corresponding geometric simulation model in the last hardware performance parameter mapping coefficient correction process are used as a reference hardware running state simulation value and a reference hardware real-time running parameter; Acquiring an error trend factor r by combining the reference hardware running state simulation value and the reference hardware real-time running parameter with the current hardware real-time running parameter and the current hardware running state simulation value; The adjusted correction step length alpha t and the coefficient value k 0 of the corresponding hardware performance parameter mapping coefficient after the hardware performance parameter mapping coefficient is corrected last time are called; and correcting the hardware performance parameter mapping coefficient by utilizing the error trend factor and the coefficient value k 0 of the adjusted correction step length alpha t based on the hardware performance parameter mapping coefficient.
- 6. The intelligent operation, maintenance and fault self-healing method for the digital twin-based industrial software cloud base as claimed in claim 1, wherein the identifying of the abnormal state deviating from the normal range comprises the following steps: Based on the physical layer, the logic layer and the application layer data characteristics of the layered digital twin model, combining historical operation data with industry standards, and establishing a dynamic reference library of each layer; and associating physical layer hardware performance parameters, logic layer configuration data and application layer business performance index threshold values of the layered digital twin model with each layer of dynamic reference library, and calibrating the reference values of each layer of dynamic reference library in real time.
- 7. The intelligent operation, maintenance and fault self-healing method of an industrial software cloud base based on digital twinning according to claim 6, wherein identifying abnormal states deviating from a normal range further comprises: Acquiring physical layer data of a layered digital twin model, comparing the physical layer data with a reference value of a corresponding physical layer dynamic reference library, and marking the physical layer data as abnormal physical layer parameters if a plurality of continuous acquisition periods exceed the reference value; Based on a physical layer geometric simulation model of the layered digital twin model, monitoring a target running state of the cloud base hardware equipment, including an off-line state, an interface connection interruption state or a physical position deviation state of the equipment, and if the target running state is monitored, marking that the physical layer equipment state is abnormal; Acquiring logic layer resource occupation data and transmission link data of a layered digital twin model, comparing the logic layer resource occupation data and the transmission link data with reference values of corresponding logic layer dynamic reference libraries, and marking the logic layer resource abnormality or transmission abnormality if the resource occupation data, the transmission delay/packet loss rate exceed the reference values or the transmission link path deviates from a preset route; Extracting rules of a configuration data check library, performing compliance check on logic layer configuration data, and marking the logic layer configuration as abnormal if the configuration does not accord with the rules or the dependent components are missing; Acquiring application layer business performance data of the layered digital twin model, comparing the application layer business performance data with a reference value of an application layer dynamic reference library, and marking the business performance data as abnormal application layer performance if the business performance data exceeds the reference value or the change trend is abnormal; And monitoring the state of the calling link of the application layer, wherein the state comprises micro-service calling failure, calling overtime and link interruption state, and the state data without abnormality of the physical layer and the logic layer are combined, and if any link state problem exists, the state is marked as the abnormality of the link of the application layer.
- 8. The intelligent operation, maintenance and fault self-healing method of an industrial software cloud base based on digital twinning as claimed in claim 1, wherein when generating hierarchical early warning information, the method further comprises the step of judging an abnormal root cause based on marked abnormal state data: Acquiring marked application layer abnormal state data, wherein the marked application layer abnormal state data comprises application layer performance abnormality and application layer link abnormality, and associating logic layer resource occupation data with physical layer hardware state data; based on the association results with the logic layer and the physical layer, if the logic layer resource occupies a reference value exceeding the logic layer dynamic reference library and the physical layer hardware parameter exceeds the reference value of the physical layer dynamic reference library, judging that the application layer is abnormal caused by the bottom layer resource; If the logic layer resource occupation data and the physical layer hardware state data accord with the reference value of the corresponding dynamic reference library, judging that the application is abnormal; acquiring marked logic layer abnormal state data, wherein the marked logic layer abnormal state data comprises logic layer resource abnormality, transmission abnormality, configuration abnormality and associated physical layer network equipment state data; Based on the association result with the physical layer, if the physical layer network device parameter is lower than the reference value of the physical layer dynamic reference library, determining that the logic layer is abnormal due to the physical layer hardware, and if the physical layer network device data accords with the reference value of the physical layer dynamic reference library, determining that the logic layer protocol configuration is abnormal.
- 9. The intelligent operation and maintenance and fault self-healing method for the digital twin-based industrial software cloud base as claimed in claim 1, wherein the construction of the fault knowledge graph comprises the following steps: Collecting historical fault data of an industrial software cloud base to form a knowledge data source; Defining a target entity and an association relation between the entities in a knowledge graph template, wherein the target entity comprises a fault type, an abnormal index, a cause entity, layered data nodes and a self-healing strategy; converting the knowledge data source into a target entity and an association relationship in the knowledge graph, generating an initial fault knowledge graph, establishing a real-time updating mechanism, and automatically updating the target entity attribute and the association relationship in the knowledge graph.
- 10. The intelligent operation and maintenance and fault self-healing method for the digital twin-based industrial software cloud base as claimed in claim 1, wherein the method for locating the unique fault cause based on the fault knowledge graph and the fault analysis data set specifically comprises the following steps: Acquiring a fault analysis data set, carrying out semantic matching on the fault analysis data set based on abnormal indexes in a fault knowledge graph and entities of layered data nodes, and determining corresponding nodes of each item in the data set in the graph; generating at least one potential cause reasoning path based on the association relation between the abnormal index and the cause in the matched map nodes and the fault knowledge map and the association relation between the hierarchical data nodes and the cause; Carrying out rationality verification on each reasoning path according to the association relation of a physical layer, a logic layer and an application layer of the layered digital twin model; Matching corresponding fault models in the fault knowledge graph according to potential cause reasoning paths after rationality verification, and extracting corresponding diagnosis knowledge in the graph based on the selected fault models; Extracting supplementary data according to the extracted diagnosis knowledge, carrying out fuzzy matching on the supplementary data and standard diagnosis data in a fault knowledge graph, and calculating the matching similarity of the supplementary data; according to a confidence coefficient transfer algorithm, combining the complementary data matching similarity and the abnormal index weight in the fault analysis data set, and calculating the confidence coefficient of the current fault cause; if the confidence coefficient is lower than a preset confidence threshold, adjusting the retrieval range of the supplementary data, re-matching, and if the confidence coefficient is higher than the preset confidence threshold, continuing searching along the hierarchy where the current potential cause is located to the next hierarchy based on the linkage relation of a physical layer, a logic layer and an application layer of the hierarchical digital twin model; based on the fault analysis data set, calculating the matching degree of each potential cause after confidence verification, and screening the potential cause with the highest matching degree; acquiring corresponding layering data in a visual monitoring interface, performing entity verification on the screened potential reasons, confirming that the potential reasons can explain all abnormal phenomena, and positioning unique fault reasons; inputting the output unique fault cause to a layered digital twin model, reproducing the fault occurrence link, and verifying the causal relationship between the fault cause and the abnormal phenomenon according to the reproduction result; based on repeated fault links, analyzing the influence degree and the conduction path of fault reasons on a physical layer, a logic layer and an application layer by combining layered association data displayed by a visual monitoring interface, and generating a fault influence link report; and acquiring the association relation between the reasons in the fault knowledge graph and the adaptive self-healing strategies, analyzing the blocking coverage range of each candidate self-healing strategy to the fault influence chain by combining the fault influence chain report, calculating the suitability of the candidate strategies, and synchronizing the calculation result to the visual monitoring interface.
Description
Intelligent operation and maintenance and fault self-healing method for industrial software cloud base based on digital twin Technical Field The invention relates to the technical field of digital twinning, in particular to an intelligent operation and maintenance and fault self-healing method of an industrial software cloud base based on digital twinning. Background The operation and maintenance mode of the traditional industrial software depends on manual experience investigation, the whole operation state of the cloud base cannot be comprehensively mastered in real time, abnormal discovery is delayed, systematic knowledge precipitation and correlation analysis means are lacked, suitability verification on complex faults is lacked, self-healing success rate is limited, and service interruption is easy to cause. For example, china patent application with publication number CN111596604A discloses an intelligent diagnosis and self-healing control system for engineering equipment faults based on digital twinning, which comprises a physical entity module, a data acquisition module, an information processing module, a fault diagnosis module, a self-healing control module and a digital twinning module, wherein the data acquisition module acquires information data of engineering equipment operation in the physical entity module in real time and transmits the data to the digital twinning module for digital twinning simulation of the engineering equipment; the patent application can improve the fault pre-judging accuracy, reduce the fault occurrence rate, reduce the equipment maintenance cost, enhance the running stability and robustness of the equipment, and still has the problem that the operation and maintenance scene of the industrial software cloud base is difficult to adapt to: 1. the state simulation of the operation and maintenance focusing physical entity of the engineering equipment does not cover a special logic layer and an application layer of the industrial software cloud base, cannot realize multi-level data linkage mapping, and is difficult to comprehensively reflect the operation state of the complex architecture of the cloud base; 2. the fault diagnosis lacks systematic knowledge precipitation and association analysis capability, does not introduce a fault knowledge graph, only depends on single diagnosis logic after data processing, and has insufficient cause positioning accuracy and efficiency when facing the complex causal relationship of multiple types of faults of the cloud base; 3. The self-healing control lacks a strategy simulation verification mechanism, only directly executes repair operation, and aims at the problem that when a cloud base is subjected to multi-level association faults, faults are easily expanded or service is interrupted due to improper strategy, so that the operation and maintenance requirements of high stability and high complexity of the industrial software cloud base can not be met. Disclosure of Invention The invention aims to provide an intelligent operation and maintenance and fault self-healing method for an industrial software cloud base based on digital twinning, which realizes accurate mapping of an operation state and abnormal real-time monitoring by dynamically integrating physical structure and real-time operation data of the cloud base through a layered digital twinning model, enables operation and maintenance to respond more rapidly by hierarchical early warning, builds a self-healing strategy base based on reason nodes and historical cases, realizes intelligent self-healing of faults, remarkably improves the intellectualization, the precision and the high efficiency of the operation and maintenance of the industrial software cloud base, reduces labor cost, and ensures stable and reliable operation of industrial software so as to solve the problems in the background technology. In order to achieve the above purpose, the present invention provides the following technical solutions: An intelligent operation and maintenance and fault self-healing method of an industrial software cloud base based on digital twinning comprises the following steps: acquiring physical structure data and real-time running state data of an industrial software cloud base, constructing a layered digital twin model of the cloud base, and dynamically updating the layered digital twin model by combining geometric parameters, performance parameters and topological relations of the physical base; Based on the constructed layered digital twin model, mapping the actual running state of the industrial software cloud base in real time and carrying out simulation deduction, constructing a visual monitoring interface for visual display, identifying abnormal states deviating from a normal range, and generating hierarchical early warning information; constructing a fault knowledge graph, extracting abnormal index data corresponding to the abnormal state, performing multidimensional data associa