CN-122027500-A - Equipment fault prediction method and system based on MQTT and big data analysis

CN122027500ACN 122027500 ACN122027500 ACN 122027500ACN-122027500-A

Abstract

The invention discloses an equipment fault prediction method and system based on MQTT and big data analysis. The method comprises the steps of enabling an edge acquisition end to acquire equipment operation parameters through a BMC, an OS and a sensor interface, dynamically adjusting acquisition frequency based on data change amplitude, uploading the equipment operation parameters through an MQTT protocol, constructing a dynamic physical topology model by a cloud end, carrying out space alignment and feature fusion on multi-source heterogeneous data, inputting a fusion feature matrix into a causal enhancement type transducer prediction model, outputting predicted data in a future preset time window, calculating reconstruction errors of actual feature data and the predicted data, generating a fault risk score, triggering a hierarchical response strategy according to the score, sending an instruction to a CPLD through an I2C bus when the score exceeds a threshold value, and executing bandwidth degradation or hardware isolation operation. The invention realizes the closed-loop operation and maintenance from data perception to active control, and remarkably improves the prediction accuracy and the treatment efficiency.

Inventors

WU JINGSHENG
WEI QINYU

Assignees

广州米智技术有限公司
珠海微米物联科技有限公司

Dates

Publication Date: 20260512
Application Date: 20260228

Claims (10)

1. The device fault prediction method based on MQTT and big data analysis is characterized by comprising the following steps: S1, an edge acquisition end acquires equipment operation parameters through a baseboard management controller BMC, an operating system OS and a sensor interface, performs missing value filling and correlation fitting on the operation parameters, dynamically adjusts acquisition frequency according to data change amplitude, and uploads processed data to a cloud end through an MQTT protocol; S2, receiving data by a cloud end, constructing a dynamic physical topology model, and carrying out space alignment and feature fusion on the uploaded multi-source heterogeneous data based on the dynamic physical topology model to generate a fusion feature matrix; s3, inputting the fusion feature matrix into a causal enhancement type transducer prediction model, wherein the causal enhancement type transducer prediction model outputs prediction data of equipment in a future preset time window; s4, acquiring actual characteristic data in a current time window, calculating a reconstruction error between the actual characteristic data and the predicted data output in the step S3, and generating a fault risk score by combining the absolute value of the equipment operation parameter; and S5, triggering a hierarchical response strategy according to the fault risk score, and sending a control instruction to the complex programmable logic device CPLD through the I2C bus when the score exceeds a preset threshold value, and executing bandwidth degradation or hardware isolation operation.
2. The method according to claim 1, wherein the step S1 of dynamically adjusting the acquisition frequency according to the data change amplitude specifically includes: Updating the maximum value and the minimum value in the operation data in real time, and carrying out normalization conversion on each operation data based on the maximum value and the minimum value; Calculating standard deviation of the latest normalized values of the preset number, and taking the standard deviation as the data change amplitude; Adjusting data acquisition frequency according to the data change amplitude, wherein the data acquisition frequency is in direct proportion to the data change amplitude; the processing of the missing data comprises the steps of calculating the correlation degree of any two columns of data, and if the correlation degree is larger than a preset threshold value, selecting another column of existing data with the maximum correlation degree to perform linear proportion fitting.
3. The method according to claim 1, wherein the constructing a dynamic physical topology model in step S2 specifically includes: Acquiring a physical connection relation and a signal transmission direction identifier of equipment; Analyzing the connection weight among devices by combining the light attenuation characteristic values of the distributed optical fiber sensor to construct a weighted graph model; And mapping the multi-source heterogeneous data to the space nodes of the weighted graph model to generate a fusion feature matrix with space-time labels.
4. The method of claim 1, wherein the causal enhancement fransformer predictive model construction method and reasoning process comprises: constructing a causal graph based on equipment mechanism or expert knowledge, the causal graph characterizing causal links between sensor parameters and potential faults; Converting the causal map into an attention mask matrix; embedding the attention mask matrix into a self-attention layer of a transducer model, focusing on only feature paths with causal correlation by a constraint model, and predicting the future state by using the processed feature paths.
5. The method according to claim 1, wherein the performing bandwidth degradation or hardware isolation operations in step S5 specifically includes: calculating a device health score according to the fault risk score and the device operation parameters; When the health score is smaller than a first preset threshold value and larger than or equal to a second preset threshold value, triggering bandwidth degradation operation, adjusting the link width through a PCIe switch configuration register, and reducing the performance load of the equipment; and when the health score is smaller than a second preset threshold value, triggering hardware isolation operation, and sending a power supply cutting instruction to the CPLD to drive the MOSFET switch to cut off an auxiliary power supply interface of the equipment.
6. An equipment failure prediction system based on MQTT and big data analysis, characterized by being adapted to implement the method according to any of claims 1-5, comprising: The edge acquisition layer comprises a BMC module, an operating system interface and various sensors and is used for acquiring equipment operation parameters, executing frequency adjustment and data preprocessing and issuing data to a network through an MQTT protocol; The cloud processing layer comprises a data storage module, a topology construction module, a model training module and a prediction reasoning module, wherein the topology construction module is used for generating a dynamic physical topology model; The response control layer comprises a hierarchical response strategy module and a hardware control module, wherein the hardware control module is connected with the CPLD through an I2C bus and is used for executing bandwidth adjustment or power cut-off operation according to a prediction result.
7. The system of claim 6, wherein the edge acquisition layer further comprises a frequency adjustment unit, the frequency adjustment unit being specifically configured to: a data conversion unit for performing normalization conversion; An amplitude calculation unit for calculating a standard deviation as a data variation amplitude; and the adjusting and executing unit is used for adjusting the acquisition frequency in proportion to the amplitude.
8. The system of claim 6, wherein the cloud processing layer further comprises: The causal reasoning module is used for constructing a causal map based on historical data and expert knowledge and generating an attention mask; And the feature fusion module is used for extracting the feature vector of the picture and fusing the feature vector with the time sequence data to construct a feature matrix.
9. An electronic device comprising a processor, a memory and a computer program stored on the memory, wherein the processor implements the method of any of claims 1-5 when executing the computer program.
10. A computer readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the method according to any of claims 1-5.

Description

Equipment fault prediction method and system based on MQTT and big data analysis Technical Field The invention relates to the technical field of predictive maintenance of equipment, in particular to an equipment fault prediction method and system based on MQTT and big data analysis. Background With the deep integration of the Internet of things, big data and artificial intelligence technologies, predictive maintenance of equipment has become a key technology for guaranteeing stable and efficient operation of modern industry, data centers and communication systems. However, the currently mainstream technical solutions still have many bottlenecks and challenges to be broken through in the full link from data acquisition to final decision execution, and are specifically implemented in the following several layers: 1. bottleneck of data acquisition and transmission efficiency At the data acquisition end, most of the existing schemes adopt periodic polling with fixed frequency or simple static threshold alarming, and the dynamic characteristics of the equipment operation data cannot be fully considered. As disclosed in prior art 1 (publication No. CN118740598 a), the data value density is not uniformly distributed, and high frequency acquisition can cause resource waste when the device is operating steadily, and low frequency sampling may miss key signs when the state is abrupt. In addition, the problem of data missing caused by asynchronous acquisition and transmission packet loss in a sensor network is common, the existing simple interpolation method (such as direct use of pre-value filling) lacks utilization of potential relevance among data, and data quality is difficult to ensure, so that accuracy of subsequent analysis is affected. In the data transmission layer, facing to mass and scattered equipment nodes, the traditional request-response mode or customized private protocol based on HTTP/HTTPS is difficult to meet the requirements of low bandwidth consumption, low power consumption, high concurrency connection and mass theme management in the industrial Internet of things scene, and the efficient aggregation of real-time data is restricted. 2. Difficult problem of multi-mode data fusion and abnormal positioning Modern equipment state monitoring increasingly relies on multi-source heterogeneous sensors such as digital timing signals (vibration, temperature, current), image/video signals (thermal imaging, appearance monitoring), and fiber optic sensing signals (light attenuation, strain), among others. As described in prior art 2 (publication No. CN 119538185B), the prior art method attempts to fuse multi-modal data, but usually adopts simple feature stitching or post-decision fusion, and fails to deeply mine the space-time correlation and physical coupling relationship between modalities. More particularly, as indicated in prior art 3 (publication number CN 120389938B), prior art schemes tend to ignore or simplify the critical spatial constraints of device physical connection topology and signal propagation paths. Abnormal signals (such as vibration and heat) generated by faults can be conducted along a physical structure, a static or simplified topological model cannot accurately describe the dynamic process, so that in a complex system (such as a communication network and a server cluster), detected faults are difficult to accurately trace to specific fault equipment or a board card, the false alarm and missing alarm rate is high, and the practical value of early warning is reduced. 3. Model interpretive loss and decision execution hysteresis Although AI models typified by deep learning have made remarkable progress in the accuracy of failure prediction, their "black box" characteristics become a core obstacle that restricts their wide landing in practical industrial scenes. As emphasized by prior art 4 (publication No. CN 120416067B), existing models typically output only one abstract probability of failure, failing to provide a reasonable explanation of why this prediction was made. The operation and maintenance personnel can not know which sensor readings and which operation mode lead to high risk judgment, so that the prediction result is difficult to be converted into a targeted checking or maintaining action, and the trust of the personnel to the intelligent system is weakened. On the other hand, in the decision execution link, the prior art stack has obvious "sense-decision-execution" faults. Most predictive systems stop at generating alert worksheets, respond late and rely on human intervention, as described in prior art 5 (publication number CN 120474953B). Even if few schemes integrate control interfaces, hard isolation (such as direct power-off) is performed after faults occur, and a predictive probability-based progressive and active intervention mechanism (such as dynamic performance degradation and resource reconfiguration) is lacked, so that the faults cannot be prospectively inhibited fr