US-12619887-B2 - Systems and methods for an automated data science process

US12619887B2US 12619887 B2US12619887 B2US 12619887B2US-12619887-B2

Abstract

Example implementations described herein are directed to systems and methods for generation and deployment of automated and autonomous self-learning machine learning models, which can include generating a predictive model and a prescriptive model through an offline learning process at a first system; controlling operations of a second system through deploying the predictive model and the prescriptive model to the second system; and autonomously updating the predictive model and the prescriptive model from feedback from the second system through an online learning process while the prescriptive model and the predictive model are deployed on the second system.

Inventors

Yongqiang Zhang
Wei Lin
William SCHMARZO

Assignees

HITACHI VANTARA LLC

Dates

Publication Date: 20260505
Application Date: 20200820

Claims (13)

1 . A method for generation and deployment of automated and autonomous self-learning machine learning models, the method comprising: generating a predictive model and a prescriptive model through an offline learning process at a first system; controlling operations of a second system through deploying the predictive model and the prescriptive model to the second system, wherein the controlling operations of the second system through deploying the predictive model and the prescriptive model to the second system comprises: deploying the predictive model and the prescriptive model to be online and configured to intake real-time data from the second system; generating predictions through the predictive model from the real-time data; generating prescriptive actions through the prescriptive model from the predictions; and controlling the operations of the second system according to the prescriptive actions; and autonomously updating the prescriptive model and the predictive model from feedback from the second system through an online learning process while the prescriptive model and the predictive model are deployed on the second system.
2 . The method of claim 1 , wherein the controlling operations of the second system through deploying the predictive model and the prescriptive model to the second system comprises executing an automated application of prescriptive actions generated from the prescriptive model to change operations of the second system.
3 . The method of claim 1 wherein the generating the predictive model and the prescriptive model through the offline learning process at the first system comprises: generating, from a solution configuration file, a descriptive component configured to conduct descriptive analysis; generating, from the solution configuration file and the descriptive analysis, an exploratory component configured to conduct exploratory analysis; generating, from the solution configuration file and the exploratory analysis, a predictive component configured to incorporate one or more machine learning libraries specified in the solution configuration file to generate the predictive model; and generating, from the solution configuration file and the predictive model, a prescriptive component configured to map prescriptive actions to results from the predictive model to generate a prescriptive model.
4 . The method of claim 1 , wherein the autonomously updating the predictive model and the prescriptive model from the feedback from the second system through the online learning process while the predictive model and the prescriptive model are deployed on the second system comprises: determining an error based on a difference between the feedback from the second system and a prediction from the predictive model associated with the controlling of the operations; retraining, at the second system, the predictive model and the prescriptive model based on real time data, the error, and the feedback through a continuous learning process while the predictive model and the prescriptive model are deployed at the second system; and for the retrained predictive model and the retrained prescriptive model having better performance than the predictive model and the prescriptive model, deploying the retrained prescriptive model and the retrained predictive model to the second system.
5 . The method of claim 4 , wherein the retraining, at the second system, the predictive model and the prescriptive model based on the real time data, the error, and the feedback through the continuous learning process while the predictive model and the prescriptive model are deployed at the second system comprises: distributing machine learning processes for generating a retrained predictive model and a retrained prescriptive model into a plurality of local models associated with sub-systems of the second system; iteratively ensembling the plurality of local models to generate a plurality of retrained predictive models and a plurality of retrained prescriptive models and selecting ones of the generated plurality of retrained predictive models and generated plurality of retrained prescriptive models to be distributed back into the machine learning processes; and ensembling the plurality of local models to generate the retrained predictive model and the retrained prescriptive model.
6 . The method of claim 4 , wherein the retraining, at the second system, the predictive model and the prescriptive model based on the real time data, the error and the feedback through the continuous learning process while the predictive model and the prescriptive model are deployed at the second system comprises: retraining the predictive model and the prescriptive model based on the real time data, error and feedback through one or more of a reinforcement learning process or a transfer learning process.
7 . A system for generation and deployment of automated and autonomous self-learning machine learning models, the system comprising: a first system, comprising a processor configured to: generate a predictive model and a prescriptive model through an offline learning process; control operations of a second system through deploying the predictive model and the prescriptive model to the second system, wherein the processor is configured to control operations of the second system through deploying the predictive model and the prescriptive model to the second system by: deploying the predictive model and the prescriptive model to be online and configured to intake real-time data from the second system; generating predictions through the predictive model from the real-time data; generating prescriptive actions through the prescriptive model from the predictions; and controlling the operations of the second system according to the prescriptive actions; and autonomously update the prescriptive model and the predictive model from feedback from the second system through an online learning process while the prescriptive model and the predictive model are deployed on the second system.
8 . The system of claim 7 , wherein the processor is configured to control operations of the second system through deploying the predictive model and the prescriptive model to the second system by executing an automated application of prescriptive actions generated from the prescriptive model to change operations of the second system.
9 . The system of claim 7 , wherein the processor is configured to generate the predictive model and the prescriptive model through the offline learning process at the first system by: generating, from a solution configuration file, a descriptive component configured to conduct descriptive analysis; generating, from the solution configuration file and the descriptive analysis, an exploratory component configured to conduct exploratory analysis; generating, from the solution configuration file and the exploratory analysis, a predictive component configured to incorporate one or more machine learning libraries specified in the solution configuration file to generate the predictive model; and generating, from the solution configuration file and the predictive model, a prescriptive component configured to map prescriptive actions to results from the predictive model to generate a prescriptive model.
10 . The system of claim 7 , wherein the processor is configured to autonomously update the predictive model and the prescriptive model from the feedback from the second system through the online learning process while the predictive model and the prescriptive model are deployed on the second system by: determining an error based on a difference between the feedback from the second system and a prediction from the prediction model associated with the controlling of the operations; wherein the second system is configured to retrain the predictive model and the prescriptive model based on real time data, the error, and the feedback through a continuous learning process while the predictive model and the prescriptive model are deployed at the second system; and for the retrained predictive model and the retrained prescriptive model having better performance than the predictive model and the prescriptive model, deploying the retrained prescriptive model and the retrained predictive model to the second system.
11 . The system of claim 10 , wherein the second system is configured to retrain the predictive model and the prescriptive model based on the real time data, the error, and the feedback through the continuous learning process while the predictive model and the prescriptive model are deployed at the second system by: distributing machine learning processes for generating a retrained predictive model and a retrained prescriptive model into a plurality of local models associated with sub-systems of the second system; iteratively ensembling the plurality of local models to generate a plurality of retrained predictive models and a plurality of retrained prescriptive models and selecting ones of the generated plurality of retrained predictive models and generated plurality of retrained prescriptive models to be distributed back into the machine learning processes; and ensembling the plurality of local models to generate the retrained predictive model and the retrained prescriptive model.
12 . The system of claim 11 , wherein the second system is configured to retrain the predictive model and the prescriptive model based on the real time data, the error and the feedback through the continuous learning process while the predictive model and the prescriptive model are deployed at the second system by: retraining the predictive model and the prescriptive model based on the real time data, error and feedback through one or more of a reinforcement learning process or a transfer learning process.
13 . A non-transitory computer readable medium, storing instructions for generation and deployment of automated and autonomous self-learning machine learning models, the instructions comprising: generating a predictive model and a prescriptive model through an offline learning process at a first system; controlling operations of a second system through deploying the predictive model and the prescriptive model to the second system, wherein the controlling operations of the second system through deploying the predictive model and the prescriptive model to the second system comprises: deploying the predictive model and the prescriptive model to be online and configured to intake real-time data from the second system; generating predictions through the predictive model from the real-time data; generating prescriptive actions through the prescriptive model from the predictions; and controlling the operations of the second system according to the prescriptive actions; and autonomously updating the prescriptive model and the predictive model from feedback from the second system through an online learning process while the prescriptive model and the predictive model are deployed on the second system.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS This Application is a 371 National Phase Patent Application of International Application Number PCT/US2020/047221, filed on Aug. 20, 2020, the entire contents of which is incorporated herein by reference. BACKGROUND Field The present disclosure is generally directed to data science techniques, and more specifically, to data science processes and automated machine learning (AutoML). Related Art In the related art, the data science process defines the methodology to deliver analytics solutions and intelligent applications efficiently and effectively. Automated Machine Learning (AutoML) is a system of framework that can automatically build model(s) for the given data, which automates the maximum number of steps in an ML pipeline, minimizes the human effort, and improves the model performance. In related art implementations there are AutoML frameworks that involve predictive component. The AutoML frameworks of the related art involve static implementations that are specifically tailored to a particular ML implementation, and are not generally customizable. Moreover, such related art implementations do not utilize an automated data science process. SUMMARY Related art implementations do not have components to facilitate descriptive and exploratory analysis, prescriptive, automation or autonomous learning components. Example implementations described herein are directed to facilitating these components besides predictive component through a unified, highly customizable and extensible framework configured to provide an automated data science process. The conventional data science process has several problems. Firstly, the conventional data science process is not comprehensive enough to support value-driven tasks. There is a need to have a comprehensive process that allows for data science practitioners to understand specifically how to drive value. Secondly, the conventional data science process only focuses on the offline process. No online data science process has been proposed, nor is there a data science process that combines both offline process and online process in the related art. There is a need to have a data science process that support both offline process and online process. The online process can be critical for the real world system since it facilitates the automation and autonomous learning to obtain the best-fit models based on the real time data in a dynamic system. Further, the conventional data science process requires human beings to perform a significant amount of manual work. There is a need for a system that automatically performs the tasks in the data science process. Related art implementations of AutoML frameworks have several deficiencies. Firstly, the related art AutoML frameworks only handle the “predictive” aspect of the data science tasks, while the generic work in other components of the data science process is not automated. There is a need for a system that automates the generic work in the data science process. In related art implementations, each AutoML library only supports one underlying machine learning library. There is a need to facilitate a unified and extensible system to support various machine learning libraries. To address the above needs, the example implementations described herein involve a comprehensive data science system that is descriptive, exploratory, predictive, prescriptive, automation, and autonomous, as well as configured to support value driven tasks. This process involves an offline process and online process that are seamlessly integrated as one whole system. In example implementations described herein, there is an offline process that defines the methodologies and workflows for all data science tasks against historical data. This process corresponds to the first four components to facilitate the Descriptive, Exploratory, Predictive, and Prescriptive aspects of the system. In example implementations described herein, there is an online process that defines the methodologies and workflows for all data science tasks against real-time data. This process corresponds to the last four components to facilitate Predictive, Prescriptive, Automation, Autonomous. In example implementations, described herein, there is an automated system for the data science process in which AutoML is applied to each step of the process to reduce the manual work and optimize outcomes. A unified, customizable and extensible framework is introduced and provided to support the system. Aspects of the present disclosure can include a method for generation and deployment of automated and autonomous self-leaning machine learning models, the method involving generating a predictive model and a prescriptive model through an offline learning process at a first system; controlling operations of a second system through deploying the predictive model and the prescriptive model to the second system; and autonomously updating the prescriptive model and the