CN-119892662-B - DCS network fault detection and intelligent reasoning method and device
Abstract
The disclosure belongs to the technical field of nuclear power, and particularly relates to a DCS network fault detection and intelligent reasoning method and device. The method combines a deep learning model and a knowledge graph technology, and aims to improve the accuracy and the intelligent level of DCS network fault detection. Specifically, the present disclosure collects traffic data and device status data in a DCS network through DPI technology and SNMP protocol, and extracts features related to faults from log data using natural language processing technology. And then, compressing the extracted high-dimensional information into a low-dimensional embedded vector matrix by a knowledge graph embedding method, and inputting the low-dimensional embedded vector matrix into a deep learning model for fault detection and prediction.
Inventors
- FENG WEI
- LU WEIWEI
- TONG HANG
- HONG SHIXIN
- ZHANG DENG
- WANG WEIGUO
- Shi Haozhong
Assignees
- 中核武汉核电运行技术股份有限公司
- 中核国电漳州能源有限公司
Dates
- Publication Date
- 20260512
- Application Date
- 20241220
Claims (9)
- 1. The DCS network fault detection and intelligent reasoning method is characterized by comprising the following steps: step 1, data collection and preprocessing, wherein terminal equipment collects data from a DCS network, and the data are extracted to obtain a feature set after cleaning and classification; step 2, knowledge graph construction, which comprises, Step 21, extracting domain knowledge, namely extracting domain knowledge related to network faults from related documents, technical documents and expert knowledge of an industrial control system; Step 22, knowledge representation, namely knowledge is represented by using standard language, and a map of entity nodes and relationship edges is constructed; Step 23, knowledge graph construction and updating, including, Step 231, knowledge acquisition, namely automatically extracting knowledge from unstructured data through a text mining and expert system; step 232, knowledge fusion and reasoning are combined with the existing industrial control system fault knowledge base, the reliability of triples containing head entities, relations and tail entities in the knowledge graph is measured by adopting a scoring function of a Complex model, the entities and the relations are mapped to a Complex space, then the scores of the triples are calculated through a Complex dot product, and an reasoning rule is automatically generated; step 3, training and predicting a deep learning model, performing fault detection based on the deep learning model, and using a multi-layer LSTM or a model based on a transducer, wherein input data comprise network flow, equipment state and log data, and combining the characteristics generated by a knowledge graph to output fault occurrence probability, type prediction and reason positioning; Step 232 includes: Step 2321, inputting data in the form of triples into the Complex embedded model, wherein each triplet consists of a head entity h, a relation r and a tail entity t, and describes one fact in the knowledge graph; In complex space, the head and tail entities H and t are represented as complex vectors Hk and Tk, respectively, the complex representation of the head entity H includes a real part and an imaginary part, represented as Re (H) and Im (H), respectively, and the relationship r is also mapped as complex representation; step 2322, projecting Hk and Tk into a Complex space by the Complex model, and determining an internal relation of the triples (h, r, t) by the Complex model by calculating a vector dot product of the head entity and the tail entity in the Complex space; In step 2323, the output of the complex model is a feature generated by embedding a vector matrix as a knowledge-graph, where each row corresponds to an embedded representation of an entity or relationship.
- 2. The method of claim 1, wherein step 1 comprises: step 11, data acquisition, wherein the terminal equipment collects data through a DCS switch gateway system and stores the collected data in a classified manner, wherein the data comprises network flow, equipment state logs, alarm information and historical fault records; The terminal equipment acquires network flow data by adopting a DPI technology and stores the network flow data according to a time stamp, wherein the network flow data comprises an IP packet, a protocol type, a flow size and a communication time delay; the terminal equipment adopts SNMP or periodically collects equipment state data through a special API interface, wherein the equipment state data comprises CPU utilization rate, memory use condition and network interface state; the terminal equipment is provided with a log collector, log data are obtained at regular time, and are archived and analyzed through a log management system, wherein the log data comprise a system log, an application log and an equipment log; the terminal equipment extracts historical fault data from a fault record library to generate a structured data table, wherein the historical fault data comprises fault occurrence time, type, influence range and fault recovery time; Step 12, data cleaning, namely removing abnormal peaks in network traffic by the terminal equipment through a data cleaning tool by setting a threshold value, and smoothing delay data by using a sliding window method; and 13, extracting the key features of the cleaned data by using a feature extraction tool by the terminal equipment.
- 3. The method according to claim 2, wherein step 13 comprises: Step 131, the terminal equipment determines the high-dimensional characteristics of the cleaned data, wherein the high-dimensional characteristics comprise the change rate of the flow mode, statistics of inter-packet time intervals and analysis equipment response delay; Step 132, the terminal device uses PCA or LDA to reduce the dimension of the high-dimension feature, and the most representative feature set is reserved; in step 133, the terminal device marks the data by using the pre-stored fault record, and performs data enhancement by using SMOTE to balance the data set according to the problem of scarcity of the fault sample.
- 4. The method of claim 1, wherein step 3 further comprises: step 31, model selection and design; Step 311, selecting a model architecture, selecting an LSTM or GRU model based on the time sequence data characteristics of the DCS network, and designing a multi-layer network structure to improve the time sequence prediction capability of the model; Step 312, inputting feature setting, which combines the features extracted by the data preprocessing module with the knowledge graph reasoning result to be used as the input of a model, wherein the input features comprise network flow features, equipment state features, log features and reasoning features generated by the knowledge graph; Step 313, setting output, wherein the output of the model comprises fault occurrence probability, fault type prediction and fault reason positioning, predicting multi-category fault types by using a Softmax function, and providing detailed fault analysis by combining the explanatory output of the knowledge graph; Step 32, model training; step 321, training data preparation, namely dividing a training set, a verification set and a test set by using a labeling data set generated by a data preprocessing module; Step 322, optimizing the super parameters, and determining the optimal super parameter configuration of the model through grid search or Bayesian optimization; step 323, training the model in the GPU acceleration environment by using a distributed training framework TensorFlow, periodically evaluating the model performance on a verification set, and preventing over fitting by EarlyStopping; Step 33, model verification and test; Step 331, verifying and testing, evaluating model performance using confusion matrix, ROC curve, AUC, F1-score index; And 332, model improvement, namely, retraining and optimizing the model aiming at the fault type which does not perform well in the verification process, and continuously updating the model to adapt to new data characteristics by using a transfer learning technology.
- 5. The method of claim 1, further comprising step 4, system integration and deployment, comprising: Step 41, a real-time fault detection system constructs a data stream processing pipeline, inputs real-time network flow and equipment state data into a trained model, and automatically triggers an alarm and informs related personnel when abnormality is detected; and 42, the knowledge graph cooperates with the model, the detected faults are deeply analyzed by combining the knowledge graph, explanation and processing suggestions of fault reasons are provided, and meanwhile, the model and the knowledge graph are continuously updated by collecting new data.
- 6. The method of claim 1, further comprising step 5, performance optimization and evaluation, comprising: step 51, performance optimization, namely optimizing a model architecture aiming at the characteristics of the DCS network, such as a method for increasing a mechanism sensitive to time delay or processing data incompleteness, and accelerating the response speed of real-time detection to adapt to the resource limitation of the DCS system; And 52, system evaluation, namely simulating different fault scenes in a DCS network environment through a simulation tool, comprehensively evaluating the real-time detection capability of the system, deploying the system in an actual DCS network, collecting long-term operation data, and evaluating the stability, detection accuracy and response speed of the system under different load conditions.
- 7.A DCS network fault detection and intelligent reasoning device, the device comprising: The data collection and preprocessing module is used for collecting data from the DCS network, and extracting the data to obtain a feature set after cleaning and classification; the knowledge graph construction comprises the steps of, The domain knowledge extraction module is used for extracting domain knowledge related to network faults from related documents, technical documents and expert knowledge of the industrial control system; the knowledge representation module is used for representing knowledge by using standard language and constructing a map of entity nodes and relationship edges; The knowledge graph construction and updating comprises the steps of, The knowledge acquisition module is used for automatically extracting knowledge from unstructured data through a text mining and expert system; The knowledge fusion and reasoning module is used for combining the fault knowledge base of the existing industrial control system, measuring the credibility of triples containing head entities, relations and tail entities in the knowledge graph by adopting a scoring function of a Complex model, mapping the entities and the relations to a Complex space, calculating the scores of the triples through a Complex dot product, and automatically generating a reasoning rule; The deep learning model training and predicting module is used for carrying out fault detection based on the deep learning model and using a multilayer LSTM or a model based on a transducer, wherein input data comprise network flow, equipment state and log data, and the characteristics generated by combining a knowledge graph are used for outputting fault occurrence probability, type prediction and reason positioning; The knowledge fusion and reasoning module comprises: The input module inputs data in the form of triples into the Complex embedded model, wherein each triplet consists of a head entity h, a relation r and a tail entity t, and describes one fact in the knowledge graph; In complex space, the head and tail entities H and t are represented as complex vectors Hk and Tk, respectively, the complex representation of the head entity H includes a real part and an imaginary part, represented as Re (H) and Im (H), respectively, and the relationship r is also mapped as complex representation; The processing module is used for projecting the Hk and the Tk into a Complex space by adopting a Complex model, and determining the internal relation of the triples (h, r, t) by calculating the vector dot product of the head entity and the tail entity in the Complex space; and the output module is used for adopting the output of the Complex model as the characteristic generated by taking the embedded vector matrix as the knowledge graph, wherein each row corresponds to the embedded representation of one entity or relation.
- 8. A DCS network fault detection and intelligent reasoning device, the device comprising: A processor; A memory for storing processor-executable instructions; Wherein the processor is configured to perform the method of any one of claims 1 to 6.
- 9. A non-transitory computer readable storage medium having stored thereon computer program instructions, which when executed by a processor, implement the method of any of claims 1 to 6.
Description
DCS network fault detection and intelligent reasoning method and device Technical Field The disclosure belongs to the technical field of nuclear power, and particularly relates to a DCS network fault detection and intelligent reasoning method and device. Background DCS (distributed control system) is an important component of industrial control systems and is widely used in various industrial automation processes. Fault detection of the DCS network is a key link for guaranteeing safety and stable operation of an industrial process. However, as the size and complexity of DCS networks increases, conventional fault detection methods have difficulty in coping with the challenges of dynamic changes in network environments and complex fault modes. In DCS network fault detection, the accurate identification of network traffic anomalies, equipment status anomalies and potential threats in logs is a core element for ensuring safe operation of the system. The traditional method generally depends on rules and threshold setting, and complex fault characteristics and modes are difficult to capture, so that the intelligent and accurate fault detection technology needs are increasingly urgent. In the field of network security, a knowledge graph embedding method gradually becomes a key technology for improving the efficiency and accuracy of a deep learning model. However, conventional embedding methods such as TransE present some limitations in handling complex entities and relationships. Specifically, although simple and computationally efficient, the conventional embedding method can only handle linear relationships, and for network security knowledge maps involving multiple relationships and complex structures, it is often difficult for the conventional embedding method to capture nonlinear associations between entities. This results in less than ideal performance of the model in handling tasks such as threat detection and fault prediction. Disclosure of Invention In order to overcome the problems in the related art, the method and the device for detecting and intelligently reasoning the DCS network faults are provided. According to an aspect of the disclosed embodiments, there is provided a DCS network fault detection and intelligent reasoning method, the method including: step 1, data collection and preprocessing, wherein terminal equipment collects data from a DCS network, and the data are extracted to obtain a feature set after cleaning and classification; step 2, knowledge graph construction, which comprises, Step 21, extracting domain knowledge, namely extracting domain knowledge related to network faults from related documents, technical documents and expert knowledge of an industrial control system; Step 22, knowledge representation, namely knowledge is represented by using standard language, and a map of entity nodes and relationship edges is constructed; Step 23, knowledge graph construction and updating, including, Step 231, knowledge acquisition, namely automatically extracting knowledge from unstructured data through a text mining and expert system; step 232, knowledge fusion and reasoning are combined with the existing industrial control system fault knowledge base, the reliability of triples containing head entities, relations and tail entities in the knowledge graph is measured by adopting a scoring function of a Complex model, the entities and the relations are mapped to a Complex space, then the scores of the triples are calculated through a Complex dot product, and an reasoning rule is automatically generated; And step 3, training and predicting the deep learning model, performing fault detection based on the deep learning model, using a multi-layer LSTM or a model based on a transducer, inputting data comprising network flow, equipment state and log data, and outputting fault occurrence probability, type prediction and reason positioning by combining the characteristics generated by the knowledge graph. In one possible implementation, step 1 includes: step 11, data acquisition, wherein the terminal equipment collects data through a DCS switch gateway system and stores the collected data in a classified manner, wherein the data comprises network flow, equipment state logs, alarm information and historical fault records; The terminal equipment acquires network flow data by adopting a DPI technology and stores the network flow data according to a time stamp, wherein the network flow data comprises an IP packet, a protocol type, a flow size and a communication time delay; the terminal equipment adopts SNMP or periodically collects equipment state data through a special API interface, wherein the equipment state data comprises CPU utilization rate, memory use condition and network interface state; the terminal equipment is provided with a log collector, log data are obtained at regular time, and are archived and analyzed through a log management system, wherein the log data comprise a system log, an appli