CN-122020377-A - Multi-mode data fusion fault prediction method and device

CN122020377ACN 122020377 ACN122020377 ACN 122020377ACN-122020377-A

Abstract

The application provides a fault prediction method and device for multi-modal data fusion, which are used for acquiring multi-modal data of a monitoring area, processing the multi-modal data to obtain characteristic information of each mode, performing cross-modal reasoning analysis by utilizing a pre-trained large language model based on the characteristic information of each mode to obtain a first hidden danger prediction result, inputting the characteristic information of each mode into a trained deep learning model to perform space hidden danger mode analysis to obtain a second hidden danger prediction result, and performing bidirectional integration on the first hidden danger prediction result and the second hidden danger prediction result and generating fault early warning information according to the integration result. According to the method, high-precision real-time early warning and intelligent traceability of the safety risk of the chemical industry park are realized through multi-mode data dynamic fusion, large-model intelligent reasoning and two-way verification of deep learning prediction.

Inventors

YANG MENG
ZHANG HONGYAN

Assignees

中国移动通信集团辽宁有限公司
中国移动通信集团有限公司

Dates

Publication Date: 20260512
Application Date: 20260127

Claims (10)

1. A method for predicting a failure of a multimodal data fusion, the method comprising: Collecting multi-mode data of a monitoring area; processing the multi-mode data to obtain characteristic information of each mode; Based on the characteristic information of each mode, performing cross-mode reasoning analysis by utilizing a pre-trained large language model to obtain a first hidden danger prediction result; The characteristic information of each mode is input into a trained deep learning model to carry out space hidden danger mode analysis to obtain a second hidden danger prediction result, wherein the trained deep learning model is trained by using a clustering algorithm based on historical hidden danger data of each grid unit in the monitoring area in continuous T historical time periods; And carrying out bidirectional integration on the first hidden danger prediction result and the second hidden danger prediction result, and generating fault early warning information according to the integration result.
2. The method of claim 1, wherein the multimodal data includes at least structured physical sensory data, and unstructured visual, audio, and text data; Processing the multi-mode data to obtain characteristic information of each mode, including: Extracting key visual feature vectors in the visual data by using a lightweight convolutional neural network model; extracting an audio feature vector in the audio data by using a mel frequency cepstrum coefficient method; performing real-time semantic analysis on the text data by using a light-weight large language model, and extracting a keyword embedded vector; and filtering abnormal values of the physical sensing data, standardizing units, and packaging the physical sensing data into a data frame in a preset format.
3. A method according to claim 1 or 2, wherein after processing the multi-modal data to obtain characteristic information of each modality, the method further comprises: and distributing a credibility fusion weight for the characteristic information of each mode, wherein the credibility fusion weight is obtained by calculating based on the acquisition frequency, the data precision, the data stability and the importance attention weight of the mode transmitted in a preset time period of the mode data corresponding to the characteristic information of the corresponding mode.
4. The method of claim 3, wherein the confidence fusion weight is calculated as: wherein z represents the total number of categories of the modal data; fusing weights for the credibility of the i-th type of modal data, The acquisition frequency of the i-th type modal data; The acquisition precision of the i-th type modal data is obtained; stability of the i-th type modal data; attention weight for class i modality data; 、、、 As the weight coefficient of the light-emitting diode, 。
5. The method of claim 1, wherein before inputting the feature information of each mode into the trained deep learning model to perform spatial hidden danger mode analysis to obtain the second hidden danger prediction result, the method further comprises: Dividing the monitoring area into a plurality of grid cells, and based on a history database, counting history hidden danger data of each grid cell in continuous T history time periods to obtain a history hidden danger sequence with the length of T corresponding to each grid cell; And using the historical hidden danger sequences of the grid units as training samples, performing unsupervised learning on the training samples through a clustering algorithm, and dividing all the grid units into a plurality of hidden danger grade categories to obtain a trained deep learning model.
6. The method of claim 5, wherein performing unsupervised learning on the training samples by a clustering algorithm, classifying all grid cells into a plurality of hazard classes, comprises: Calculating center vectors of all grid cell history hidden danger sequences as a first clustering center; Selecting a history hidden danger sequence of a grid unit with the largest sum of Euclidean distances with all the determined clustering centers as the next clustering center; repeating the selecting steps until the number of the clustering centers reaches a preset value q; And dividing each grid cell into hidden danger grade categories represented by the nearest clustering centers according to the Euclidean distance minimum principle.
7. The method of claim 5, wherein inputting the feature information of each mode into a trained deep learning model for spatial hidden danger mode analysis to obtain a second hidden danger prediction result, comprises: extracting the position identification of the target grid unit from the characteristic information of each mode; Searching a history hidden danger sequence of the target grid unit from pre-generated history hidden danger data according to the position identification; And inputting the historical hidden danger sequence of the target grid unit into the trained deep learning model to obtain the current hidden danger class of the target grid unit output by the deep learning model and serve as the second hidden danger prediction result.
8. A multi-modal data fusion failure prediction apparatus, the apparatus comprising: The acquisition unit is used for acquiring multi-mode data of the monitoring area; the processing unit is used for processing the multi-mode data to obtain characteristic information of each mode; The analysis unit is used for performing cross-modal reasoning analysis by utilizing a pre-trained large language model based on the characteristic information of each mode to obtain a first hidden danger prediction result, and inputting the characteristic information of each mode into a trained deep learning model to perform space hidden danger mode analysis to obtain a second hidden danger prediction result; and the integration unit is used for carrying out bidirectional integration on the first hidden danger prediction result and the second hidden danger prediction result and generating fault early warning information according to the integration result.
9. An electronic device, characterized in that the electronic device comprises a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory are in communication with each other through the communication bus; a memory for storing a computer program; a processor for implementing the method of any of claims 1-7 when executing a program stored on a memory.
10. A computer readable storage medium, characterized in that the computer readable storage medium has stored therein a computer program which, when executed by a processor, implements the method of any of claims 1-7.

Description

Multi-mode data fusion fault prediction method and device Technical Field The application relates to the technical fields of industrial Internet and safety production, in particular to a fault prediction method and device for multi-mode data fusion. Background The fault prediction technology is important to improving the reliability of an industrial system and reducing the operation and maintenance cost by analyzing the equipment operation data and identifying fault symptoms in advance. The prior art scheme mainly comprises methods based on statistics, machine learning, deep learning, time sequence analysis and the like. For example, some schemes utilize dynamic models to conduct trend prediction and hidden danger analysis after preprocessing and feature extraction by collecting multi-source sensor data, some schemes utilize feature screening and lightweight LSTM networks to achieve equipment life prediction, and others utilize cyclic neural network models to conduct time sequence feature extraction and identification on network security events. However, there are significant limitations to the prior art. Firstly, in the aspect of multi-mode data fusion, the existing method cannot fully consider dynamic weight changes among different mode data (such as physical sensing, vision, audio and text), so that the data fusion effect is unstable in different application scenes, and the overall performance of the system is affected. Secondly, in the aspects of anomaly identification and feature screening, the prior art relies on a static threshold judgment or static weight calculation method, so that the method is difficult to adapt to nonlinear anomaly modes or dynamic changes of equipment running states in complex environments, and the instantaneity and accuracy of a prediction model are limited. Moreover, in terms of hidden danger identification and analysis, the existing scheme often only depends on a single type model (such as a time sequence model) to judge, lacks effective capture of complex semantic relationships in multi-modal data, and lacks dynamic modeling of relevance among different modal data in a multi-dimensional analysis stage, so that early warning accuracy and traceability are insufficient when facing cross-modal complex faults or attack behaviors. Therefore, the prior art is difficult to meet the requirements of high precision, real-time early warning and intelligent traceability on safety risks in high-risk scenes such as chemical industry parks. Disclosure of Invention The embodiment of the application aims to provide a fault prediction method and device for multi-mode data fusion, which are used for realizing high-precision, real-time early warning and intelligent traceability of safety risks of a chemical industry park through multi-mode data dynamic fusion, large-model intelligent reasoning and two-way verification of deep learning prediction. In a first aspect, a fault prediction method for multi-modal data fusion is provided, where the method may include: Collecting multi-mode data of a monitoring area; processing the multi-mode data to obtain characteristic information of each mode; Based on the characteristic information of each mode, performing cross-mode reasoning analysis by utilizing a pre-trained large language model to obtain a first hidden danger prediction result; The characteristic information of each mode is input into a trained deep learning model to carry out space hidden danger mode analysis to obtain a second hidden danger prediction result, wherein the trained deep learning model is trained by using a clustering algorithm based on historical hidden danger data of each grid unit in the monitoring area in continuous T historical time periods; And carrying out bidirectional integration on the first hidden danger prediction result and the second hidden danger prediction result, and generating fault early warning information according to the integration result. In one possible implementation, the multimodal data includes at least structured physical sensory data, as well as unstructured visual data, audio data, and text data; Processing the multi-mode data to obtain characteristic information of each mode, including: Extracting key visual feature vectors in the visual data by using a lightweight convolutional neural network model; extracting an audio feature vector in the audio data by using a mel frequency cepstrum coefficient method; performing real-time semantic analysis on the text data by using a light-weight large language model, and extracting a keyword embedded vector; and filtering abnormal values of the physical sensing data, standardizing units, and packaging the physical sensing data into a data frame in a preset format. In one possible implementation, after processing the multi-mode data to obtain the feature information of each mode, the method further includes: and distributing a credibility fusion weight for the characteristic information of each mode, wherein t