US-12619227-B2 - Method and system for causal inference and root cause identification in industrial processes

US12619227B2US 12619227 B2US12619227 B2US 12619227B2US-12619227-B2

Abstract

Fault diagnosis in industries typically involves identification of key variables/sensors bearing fault signature, classification of detected fault into known fault classes and detecting root causes/sources of the fault. This disclosure relates to a method and system for a deep learning based causal inference in a multivariate time series data of abnormal events and failures in industrial manufacturing processes and equipment. The system generates causal networks for non-linear and non-stationary multivariate time series data. The causal network learns for a dynamic non-stationary and nonlinear complex process or system fault using observed data without any prior process knowledge. The causal networks of faults are identified in real-time using a deep learning-based causal network learning technique. The system identifies causal connections and temporal lag information among variables to generate a directed causal graph of fault called the causal network, which is used to identify fault propagation paths and root cause variables.

Inventors

TANMAYA SINGHAL
KALYANI BHARAT ZOPE
SRI HARSHA NISTALA
Venkataramana Runkana

Assignees

TATA CONSULTANCY SERVICES LIMITED

Dates

Publication Date: 20260505
Application Date: 20230707
Priority Date: 20220721

Claims (20)

1 . A processor-implemented method for causal network learning and identifying one or more root cause variables and corresponding one or more fault propagation paths for faults in industrial processes comprising steps of: receiving, via an input/output interface, a multivariate time-series data of one or more sensors from a plurality of sources, wherein the multivariate time-series data comprising a plurality of variables recorded for a plurality of time instances; pre-processing, via one or more hardware processors, the received multivariate time-series data using a plurality of pre-processing techniques to remove noise and spurious values, imputing/interpolating missing values, and resampling the received multivariate time series data to a uniform frequency; detecting, via the one or more hardware processors, at least one fault in the pre-processed multivariate time series data and a plurality of fault variables responsible for the at least one fault using a fault detection and diagnosis technique; dividing, via one or more hardware processors, the preprocessed multivariate time-series data of the plurality of fault variables along the time axis into a plurality of windows of a predefined dimension, wherein each of the plurality of windows is further split into one or more sources and a target variable; training, via the one or more hardware processors, a deep neural network for causal inference using each of the plurality of windows and a stochastic proximal gradient descent (SPGD) technique with an adaptive learning rate; extracting, via the one or more hardware processors, one or more encoding parameters from one or more encoding layers of the deep neural network and one or more time lag decomposition parameters from one or more time lag decomposition layers of the deep neural network; computing, via the one or more hardware processors, causality scores for each of the one or more sources corresponding to the target variable using Frobenius norm of one or more encoding parameters to obtain causality score for one or more source-target variable pairs; applying, via the one or more hardware processors, a causality significance test on the computed one or more causality scores to identify a threshold value to select one or more source-target variable pairs having a causality score above the identified threshold in the multivariate time-series data; determining, via the one or more hardware processors, a temporal lag based on a contribution to the lag decomposition parameters corresponding to each of the one or more selected source-target variable pairs having a causality score above the threshold; generating, via the one or more hardware processors, a causal structure of fault propagation from the selected one or more source-target variable pairs and the determined temporal lags; predicting a fault score for current time instance and one or more future time instances, wherein in response to detecting a fault using the predicted fault score, fault localization, generation of a casual structure and identification of one or more root cause variables is performed in advance; identifying, via the one or more hardware processors, the one or more root cause variables and one or more fault propagation paths for faults in industrial processes based on traversing the generated causal structure; identifying one or more faulty data instances to identify a plurality of faulty variables responsible for the detected fault by one of a visual inspection and by traversing the casual network using one or more graph traversal techniques; sending the plurality of faulty variables and data corresponding to the plurality of faulty variables to the casual network to learn the casual network of the one or more fault propagation paths; and recommending one or more corrective actions related to one of a shutdown, a maintenance and optimization to one or more operators of the industrial processes based on the identified root cause variables and fault propagation paths.
2 . The processor-implemented method of claim 1 , wherein the deep neural network comprising one or more encoding layers and one or more time lag decomposition layers with one or more penalized causal parameters and one or more forecasting layers.
3 . The processor implemented method of claim 1 , wherein the fault detection and diagnosis technique comprise one of a statistical, machine learning and deep learning techniques comprising a principal component analysis (PCA), a Mahalanobis distance, an isolation forest, an elliptical envelope, a K-nearest neighbors, a multilayer perceptron, a long-short term memory autoencoder (LSTM-AE), a convolution network autoencoder (CNN-AE).
4 . The processor-implemented method of claim 1 , wherein the one or more source variables comprising the multivariate time series data corresponding to one or more previous time instances.
5 . The processor-implemented method of claim 1 , wherein the target variable comprising the multivariate time series data corresponding to a current time instance.
6 . The processor implemented method of claim 1 , wherein one or more causal parameters are penalized while training the deep neural network to isolate one or more source-target variable pairs having a causality score above the identified threshold in the multivariate time-series data.
7 . The processor implemented method of claim 1 , wherein the non-convex optimization function is trained using SPGD algorithm with adaptive learning.
8 . A system for causal network learning and identifying one or more root cause variables and corresponding one or more fault propagation paths for faults in industrial processes comprising: an input/output interface to receive a multivariate time-series data of one or more sensors from a plurality of sources, wherein the multivariate time-series data comprising a plurality of variables recorded for a plurality of time instances, a memory in communication with the one or more hardware processors, wherein the one or more hardware processors are configured to execute programmed instructions stored in the memory to: preprocess the received multivariate time-series data using a plurality of pre-processing techniques to remove noise and spurious values, imputing/interpolating missing values and resampling the received multivariate time series data to a uniform frequency; detect at least one fault in the pre-processed multivariate time series data and a plurality of fault variables responsible for the at least one fault using a fault detection and diagnosis technique; divide the preprocessed multivariate time-series data of the plurality of fault variables along the time axis into a plurality of windows of a predefined dimension, wherein each of the plurality of windows is further split into one or more sources and a target variable; train a deep neural network for causal inference using each of the plurality of windows and a stochastic proximal gradient descent (SPGD) technique with an adaptive learning rate; extract one or more encoding parameters from one or more encoding layers of the deep neural network and one or more time lag decomposition parameters from one or more time lag decomposition layers of the deep neural network; compute causality scores for each of the one or more sources corresponding to the target variable using Frobenius norm of one or more encoding parameters to obtain one or more source-target variable pairs; apply a causality significance test on the computed one or more causality scores to identify a threshold value to select one or more source-target variable pairs having a causality score above the identified threshold in the multivariate time-series data; determine a temporal lag based on a contribution to the time lag decomposition parameters corresponding to each of the one or more selected source-target variable pairs having a causality score above the threshold; and generate a causal structure of fault propagation from the selected one or more source-target variable pairs and the determined temporal lags; predict a fault score for current time instance and one or more future time instances, wherein in response to detecting a fault using the predicted fault score, fault localization, generation of a casual structure and identification of one or more root cause variables is performed in advance; identify the one or more root cause variables and one or more fault propagation paths for faults in industrial processes based on traversing the generated causal structure; identify one or more faulty data instances to identify a plurality of faulty variables responsible for the detected fault by one of a visual inspection and by traversing the casual network using one or more graph traversal techniques; send the plurality of faulty variables and data corresponding to the plurality of faulty variables to the casual network to learn the casual network of the one or more fault propagation paths; and recommend one or more corrective actions related to one of a shutdown, a maintenance and optimization to one or more operators of the industrial processes based on the identified root cause variables and fault propagation paths.
9 . The system of claim 8 , wherein the deep neural network comprising one or more encoding layers and one or more time lag decomposition layers with one or more penalized causal parameters and one or more forecasting layers.
10 . The system of claim 8 , wherein the fault detection and diagnosis technique comprise one of a statistical, machine learning and deep learning techniques comprising a principal component analysis (PCA), a Mahalanobis distance, an isolation forest, an elliptical envelope, a K-nearest neighbors, a multilayer perceptron, a long-short term memory autoencoder (LSTM-AE), a convolution network autoencoder (CNN-AE).
11 . The system of claim 8 , wherein the one or more source variables comprising the multivariate time series data corresponding to one or more previous time instances.
12 . The system of claim 8 , wherein the target variable comprising the multivariate time series data corresponding to a current time instance.
13 . The system of claim 8 , wherein one or more causal parameters are penalized while training the deep neural network to isolate one or more source-target variable pairs having a causality score above the identified threshold in the multivariate time-series data.
14 . The system of claim 8 , wherein the non-convex optimization function is trained using SPGD algorithm with adaptive learning.
15 . A non-transitory computer readable medium storing one or more instructions which when executed by one or more processors on a system, cause the one or more processors to perform method comprising: receiving, via an input/output interface, a multivariate time-series data of one or more sensors from a plurality of sources, wherein the multivariate time-series data comprising a plurality of variables recorded for a plurality of time instances; preprocessing the received multivariate time-series data using a plurality of pre-processing techniques to remove noise and spurious values, imputing/interpolating missing values and resampling the received multivariate time series data to a uniform frequency; detecting at least one fault in the pre-processed multivariate time series data and a plurality of fault variables responsible for the at least one fault using a fault detection and diagnosis technique; dividing the preprocessed multivariate time-series data of the plurality of fault variables along the time axis into a plurality of windows of a predefined dimension, wherein each of the plurality of windows is further split into one or more sources and a target variable; training a deep neural network for causal inference using each of the plurality of windows and a stochastic proximal gradient descent (SPGD) technique with an adaptive learning rate; extracting one or more encoding parameters from one or more encoding layers of the deep neural network and one or more time lag decomposition parameters from one or more time lag decomposition layers of the deep neural network; computing causality scores for each of the one or more sources corresponding to the target variable using Frobenius norm of one or more encoding parameters to obtain one or more source-target variable pairs; applying a causality significance test on the computed one or more causality scores to identify a threshold value to select one or more source-target variable pairs having a causality score above the identified threshold in the multivariate time-series data; determining a temporal lag based on a contribution to the time lag decomposition parameters corresponding to each of the one or more selected source-target variable pairs having a causality score above the threshold; and generating a causal structure of fault propagation from the selected one or more source-target variable pairs and the determined temporal lags; predicting a fault score for current time instance and one or more future time instances, wherein in response to detecting a fault using the predicted fault score, fault localization, generation of a casual structure and identification of one or more root cause variables is performed in advance; identifying the one or more root cause variables and one or more fault propagation paths for faults in industrial processes based on traversing the generated causal structure; identifying one or more faulty data instances to identify a plurality of faulty variables responsible for the detected fault by one of a visual inspection and by traversing the casual network using one or more graph traversal techniques sending the plurality of faulty variables and data corresponding to the plurality of faulty variables to the casual network to learn the casual network of the one or more fault propagation paths; and recommending one or more corrective actions related to one of a shutdown, a maintenance and optimization to one or more operators of the industrial processes based on the identified root cause variables and fault propagation paths.
16 . The non-transitory computer readable medium of claim 15 , wherein the deep neural network comprising one or more encoding layers and one or more time lag decomposition layers with one or more penalized causal parameters and one or more forecasting layers.
17 . The non-transitory computer readable medium of claim 15 , wherein the fault detection and diagnosis technique comprise one of a statistical, machine learning and deep learning techniques comprising a principal component analysis (PCA), a Mahalanobis distance, an isolation forest, an elliptical envelope, a K-nearest neighbors, a multilayer perceptron, a long-short term memory autoencoder (LSTM-AE), a convolution network autoencoder (CNN-AE).
18 . The non-transitory computer readable medium of claim 15 , wherein the one or more source variables comprising the multivariate time series data corresponding to one or more previous time instances.
19 . The non-transitory computer readable medium of claim 15 , wherein the target variable comprising the multivariate time series data corresponding to a current time instance.
20 . The non-transitory computer readable medium of claim 15 , wherein one or more causal parameters are penalized while training the deep neural network to isolate one or more source-target variable pairs having a causality score above the identified threshold in the multivariate time-series data.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS AND PRIORITY This U.S. patent application claims priority under 35 U.S.C. § 119 to Indian Application number 202221041919, filed on Jul. 21, 2022. The entire contents of the aforementioned application are incorporated herein by reference. TECHNICAL FIELD The disclosure herein generally relates to the field of industrial data analytics and specifically, to a method and system for a causal network learning and identifying one or more root cause variables and one or more fault propagation paths for faults in industrial processes and equipment. BACKGROUND Effective fault detection and diagnosis is a key step towards predictive and prescriptive maintenance of industrial processes and equipment. Here, fault diagnosis is an umbrella term that typically involves identification of key variables/sensors bearing the fault signature (fault localization/isolation), classification of the detected fault into one or more known fault classes (fault classification) and detecting the root cause/source of the fault (causality or RCA). Of these, real-time root cause identification of faults or abnormal events is a key ask from industries as it gives plant operators and managers an opportunity to address the problem in real-time before the fault progresses and leads to a failure. For example, in coke oven batteries, whenever there is a change in coke quality (process fault), operators would want to know if it is due to changes in coking parameters, health of the coke ovens or the chemical composition of coal blend so that appropriate corrective action can be taken. Considering the growing need for real-time process monitoring and reliable fault detection and diagnosis in industries and given the large number of process variables, it is not practical for domain experts or plant operators to localize the fault or identify the causal network of variables responsible for the fault in real-time. Most of the time, causal networks are generated using prior process knowledge or manually by subject matter experts (SMEs). There are two types of approaches for learning causal networks and root cause identification (RCI) of faults, namely knowledge-based and data-driven approaches. Knowledge-based methods (such as FMEA) require apriori knowledge of faults/failures and the relationship between faults and observations (symptoms). While such knowledge can be derived from fundamental understanding of the process, sources of domain knowledge and experience with the process, the initial effort required for this approach is significant and the gathered knowledge may not be exhaustive leading to missed or incorrect identifications in some cases. On the other hand, data-driven methods rely entirely on historical and current operating data and can be applied with minimum initial effort. However, existing data-driven techniques cannot effectively identify nonlinear relationships among the variables and cannot deal with non-stationarity in the data. Therefore, these techniques are not effective for learning causal networks of nonlinear and nonstationary industrial processes. SUMMARY Embodiments of the disclosure present technological improvements as solutions to one or more of the above-mentioned technical problems recognized by the inventors in conventional systems. For example, in one embodiment, a system and method for causal network learning and identifying one or more root cause variables and one or more fault propagation paths for faults in industrial processes is provided. In one aspect, a processor-implemented method for causal network learning and identifying one or more root cause variables and one or more fault propagation paths for faults in industrial processes is provided. The processor-implemented method includes one or more steps such as receiving a multivariate time-series data of one or more sensors from a plurality of sources, pre-processing the received multivariate time-series data using a plurality of pre-processing techniques to remove noise and spurious values, imputing/interpolating missing values of the plurality of received data and resampling the received multivariate time series data to a uniform frequency. Further, the processor-implemented method detecting at least one fault in the pre-processed multivariate time series data and a plurality of fault variables responsible for the at least one fault using a fault detection and diagnosis technique, dividing the preprocessed multivariate time-series data of the plurality of fault variables along the time axis into a plurality of windows of a predefined dimension, training a deep neural network for causal inference using each of the plurality of windows and a stochastic proximal gradient descent (SPGD) technique with an adaptive learning rate, extracting one or more encoding parameters from one or more encoding layers of the deep neural network and one or more lag decomposition parameters from one or more lag decomposition layers of the