US-12625757-B1 - System and method for automated system behavior prediction and fault mitigation
Abstract
A configurable, data-driven computer software program that predicts system behavior and detects anomalies within these behavior predictions. The software program is configured by a data set specified by the user or another computer program and utilizes system data provided by the user or the target system itself. The program predicts system behavior, estimates outlying predictions and then automatically labels the predictions and outliers. Once system predictions are labeled, the program uses outlier recognition to classify predictions as expected (nominal) or unexpected (anomalous) behavior. Users can configure the program to perform various tasks, such as testing other software programs for defects, as well as detecting and mitigating system errors/faults in real-time on target systems.
Inventors
- Amanda Acevedo
- Alex Clanton Stevens
Assignees
- VEDO SYSTEMS
Dates
- Publication Date
- 20260512
- Application Date
- 20250129
Claims (20)
- 1 . A method, the method comprising: (a) automatically generating, on a controller having a processor and memory coupled to a physical target system and its actuators, AI models in which system behaviors are predicted and anomalous behaviors are classified using telemetry at a control loop of at least a system update; (b) comparing, in real time within the control loop predicted target variables to measured target variables to compute deltas; (c) processing the deltas through an inverse neural network whose input layer corresponds to outputs of a prediction model and whose output layer corresponds to control inputs of the target system, to identify a root cause and determine mitigation inputs; and (d) within the control loop, automatically transmitting the mitigation inputs through an actuator interface to the target system under safety constraints, thereby resolving a detected fault without human intervention, and maintaining execution of steps (b)-(d) within a predefined control-loop timing threshold.
- 2 . The method of claim 1 , wherein automated fault detection comprises processing telemetry at the control loop to compute deltas and providing, by a classifier model executed within the control loop, classifier confidence values, and automatically providing a fault state that performs steps (c) and (d) when the classifier confidence values are over a threshold set by safety constraints.
- 3 . The method of claim 2 , wherein the inverse neural network identifies a root cause and computes mitigation inputs that are validated by the prediction model within the control loop, and the mitigation inputs are applied without human intervention and subject to actuator and safety constraints.
- 4 . A system for automated fault detection and resolution, the system comprising: (i) a non-transitory computer readable medium containing executable instructions; (ii) a processor that executes the executable instructions; (iii) an automated fault detection and resolution software module embedded in a target system and configured to detect faults and generate and apply resolutions in the target system; (iv) a computer program that, when executed, implements: (1) a configuration command to automatically configure a fault prediction neural network based on configuration parameters including system inputs and targets; (2) ingestion of telemetry at a control loop and prediction of target variables; (3) generation of deltas between predicted and measured target variables; (4) an inverse neural network that applies deltas in target variables to actuator control inputs, and generation of mitigation inputs subject to safety constraints; and (5) automatic application of the mitigation inputs via an actuator interface to resolve the fault in the target system within the control loop.
- 5 . The system of claim 4 , wherein the computer program further comprises: instructions to ingest measured target variables at the control loop; a prediction model implemented as a neural network; and a classifier model implemented as a neural network, the prediction model and classifier model executing within a control loop.
- 6 . The system of claim 5 , wherein the computer program further comprises: instructions to online train the prediction model using system behavior datasets while maintaining the control loop; instructions to generate predicted datasets and deltas; instructions to identify nominal and anomalous outputs based on the deltas; instructions to auto label nominal and anomalous outputs; and instructions to update classifier weights with the auto labels.
- 7 . The system of claim 4 , wherein training of a fault prediction model uses all available telemetry via an online training that adapts to a control loop timing threshold.
- 8 . The system of claim 4 , further comprising instructions to label at least one data value as an outlier when an outlier detection algorithm identifies no outliers, the labeling being limited to a threshold number of labels per control loop used to maintain classifier robustness.
- 9 . The system of claim 4 , wherein upon detection of a fault the computer program: determines expected inputs from observed behavior by providing deltas through the inverse neural network; computes differences between observed behavior and inputs required to reproduce the behavior; validates mitigation inputs by passing them through a prediction model; and automatically communicates the validated mitigation inputs through an actuator interface to implement a mitigation solution within the control loop.
- 10 . The system of claim 4 , further comprising instructions to incrementally train model parameters based on system behavior datasets and deploy updated model parameters within the control loop after a safety validation.
- 11 . The system of claim 10 , wherein the incremental training is performed while maintaining execution within the control loop.
- 12 . The system of claim 4 , further comprising a graphical user interface configured to enable non-machine learning experts to select inputs and targets, set threshold safety constraints and estimated loop utilization prior to deploying configuration changes to the embedded module.
- 13 . The system of claim 4 , wherein outputs of a prediction model and a classifier model are combined that issues mitigation inputs only when predicted deltas greater than a certain behavior thresholds and a classifier confidence is greater than a threshold, thereby reducing false positives.
- 14 . The system of claim 9 , wherein the computer program further: adjusts predicted behavior by applying observed behavior deltas; generates corrected inputs via the inverse neural network; validates the corrected inputs via the prediction model; and performs iterative refinement over a threshold number of control loop cycles until a satisfactory outcome is achieved.
- 15 . The system of claim 14 , wherein an input layer of the inverse neural network corresponds to an output layer of the prediction model and an output layer of the inverse neural network corresponds to control inputs of the target system, thereby providing model space deltas to actuator space commands.
- 16 . The system of claim 4 , further comprising instructions to simulate the target system with actuator constraints as the embedded module, for testing of mitigation performance before field deployment.
- 17 . The system of claim 4 , wherein configuration parameters include user defined inputs, targets, training data complexity, and model scaling coefficients, and layer sizes.
- 18 . The system of claim 4 , wherein prediction model is biased toward lower fidelity prediction to intentionally generate larger deltas that, when averaged, increase training signal to noise for a classifier neural network and thereby reduce false positives.
- 19 . The system of claim 4 , wherein an auto labeling module together with a classifier neural network establishes a dynamic threshold for fault detection based on a weighted estimate of delta statistics computed each control loop cycle.
- 20 . The system of claim 4 , wherein the target system comprises a satellite with thrusters, and the computer program: ingests thruster commands and three axis force telemetry at the control loop; configures prediction model inputs as thruster commands and targets as force axis values; computes deltas between predicted and measured force values; and, upon detecting a fault, provides the deltas through the inverse neural network to generate per thruster mitigation commands subject to thruster cycle control safety constraints, and transmits the commands via the actuator interface within the control loop.
Description
CROSS REFERENCE TO RELATED APPLICATIONS This patent application claims priority to provisional patent application entitled A SYSTEM AND METHOD FOR AUTOMATED SYSTEM BEHAVIOR PREDICTION AND FAULT MITIGATION by Acevedo filed on Feb. 3, 2024, and assigned Ser. No. 63/549,451, which is hereby incorporated by reference in it's entirety. The present invention relates to the fields of system behavior prediction (including off-nominal behaviors) and system recovery upon detection of faulted system behavior. The term ‘anomalies’ are used throughout to indicate off-nominal behavior and ‘faults’ are used to indicate anomalies that resulted in the failure of a running system in some way either by degraded system performance or potential total failure of a given system. BACKGROUND OF THE INVENTION As software systems have evolved due to increased autonomy and the integration of AI and ML, system developers and operators rely more and more on various systems and methods to predict system behaviors and manage system faults. Predictions of nominal behavior and detection of erroneous/faulty behavior are used during the different phases of system development to ensure the successful creation and operation of products. Detection of and responses to system faults during operations are critical to addressing issues that can impact human safety and the achievement of operational objectives, as well as the financial wellbeing of all involved. Outdated systems and methods from the prior art lead to high cost and time impacts of developing and using these systems and methods, negating potential benefits of the technologies involved. Outdated Systems and Methods Although software systems have become more complex, the systems and methods used to predict system behaviors and manage system faults have not evolved in-kind. Many of these solutions apply antiquated approaches that do not benefit from advancements in technology, including but not limited to AI and ML. Traditional System Behavior and/or Fault Detection (Non-AI, General Capability) Using the aerospace industry as an example, prediction of system behavior is “baked” into system design in most cases. Developers commonly use various analyses and assumptions to identify anticipated system behavior for standard and extreme operational scenarios. Once they are confident, they have identified the system's performance envelope, they will develop a system that fits within these bounds. The resulting system is limited in its adaptability, resulting in substantial rework as designs evolve and operational capabilities change. New systems that operate similarly to others in similar environments typically require significant updates to existing systems and methods for prediction and fault management, or the development of new ones altogether. Developers and operators will also use tools built with models that predict system behavior for various tasks, such as system testing and monitoring, as well as operator training. These models often duplicate those created for the target system, increasing cost and maintenance across platforms. Significant benefit is provided by systems and methods that can generically predict behavior for any system without requiring a priori knowledge of system performance capabilities. There is also a significant benefit provided by systems and methods that can be used on multiple platforms for system development and operation. Traditional AI Fault Detection (General Capability) Additionally, methods in these areas that do utilize Artificial Intelligence (AI) and Machine Learning (ML) rely on outdated AI and ML approaches. For example, FIG. 16 illustrates a typical work flow for traditional ML fault detection models. This workflow results in a unique fault detection application that is specific to the target system. After product requirements are defined and the ML software is designed, software engineers begin writing ML code for the prediction model. This effort includes coding the shape, size, and complexity of the traditional prediction model, wherein the traditional ML prediction model is a neural network. A shape for the neural network, chosen by software engineers, is typically customized for each target system to ensure accuracy of the generated prediction values. The largest drawbacks to this workflow are the significant effort required to develop custom code and the subsequent limitations to code reuse. Another drawback to traditional ML fault detection models is the requirement for vast quantities of data to train and validate them. FIG. 18 illustrates a typical data flow employed in ML model training. Training data are generated by applying sample system operation data or by generating simulated system data. Generating these data is time consuming, generally requiring multiple passes to ensure sufficient data have been provided and used for training the ML model. Furthermore, labeling of data usually requires manual input from developers on what data va