US-12626183-B2 - Training machine learning models

US12626183B2US 12626183 B2US12626183 B2US 12626183B2US-12626183-B2

Abstract

A method, a computer system and a computer program product for training machine learning models. The present invention may include coupling the machine learning system to a network and receiving a new estimator not included in the list of estimators and a respective documentation. The present invention may include adding the new estimator to the list stored in memory. The present invention may include reading the documentation and providing the machine learning process tool with respective extracted data. The present invention may include adapting, at least one training data set. The present invention may include training at least a subset of the machine learning models by using the new estimator.

Inventors

Lukasz G. Cmielowski
Szymon Kucharczyk
Kiran A. Kate
Daniel Jakub Ryszka

Assignees

INTERNATIONAL BUSINESS MACHINES CORPORATION

Dates

Publication Date: 20260512
Application Date: 20210614

Claims (17)

1 . A method for training machine learning models in a machine learning system using a machine learning process tool of the machine learning system, the machine learning process tool providing, when performing the training, at least one training data set out of a group of training data sets as an input to a respective estimator of a machine learning model, the estimator being selected out of a list of estimators stored in a memory of the machine learning process system, the method comprising: coupling, via an Application Programming Interface (API), the machine learning system to a network, wherein the machine learning system is comprised of a receiving module and a reading module; receiving, by the machine learning system, via the network, a new estimator not included in the list of estimators and a respective documentation; adding the new estimator to the list stored in the memory; reading, by the reading module, the documentation and providing the machine learning process tool with respective extracted data, wherein the data is extracted from a machine-readable format of the documentation and is comprised of a plurality of hyperparameters and corresponding values; adapting, by the machine learning process tool, at least one training data set out of the group of training data sets to the new estimator based on the extracted data by adapting one or more hyperparameters and values of the at least one training data set to include parameter ranges and parameter distributions with the corresponding values from the documentation of the new estimator; training at least a subset of the machine learning models by using the new estimator, with the at least one training data set as an input, the training resulting in an output score from each of the machine learning models of the subset; selecting a machine learning model from the machine learning models of the subset based on a ranking of the machine learning models of the subset according to the input and the output score obtained during the training; and generating a final output from the machine learning model using the new estimator in response to a task identified by a user.
2 . The method of claim 1 , further comprising: defining rules to be fulfilled with respect to each estimator, including any new estimator that may be accepted, and the adapting at least one training data set being made in accordance with the rules; and filtering out the new estimator if an adapting of at least one training data set in accordance with the rules is not possible or if the new estimator is not included in a list of allowed estimators, the list of allowed estimators being stored in the memory.
3 . The method of claim 2 , wherein the rules are defined by at least one of a format of hyperparameters and one or more values of the hyperparameters, the hyperparameters being defined by variable schemes.
4 . The method of claim 3 , further comprising: updating the schemes upon loading of at least one or more new estimators or inclusion of a new software module.
5 . The method of claim 1 , wherein the receiving module receives the new estimator from the API and sends the new estimator and the respective documentation to a reading module, and wherein the reading module includes a converting module.
6 . The method of claim 5 , further comprising: converting, by the converting module of the reading module, at least part of the documentation of the new estimator into the machine-readable format, the machine-readable format being JavaScript Object Notation®, JSON.
7 . The method of claim 1 , wherein the extracted data is extracted from the documentation of the new estimator by the reading module, the reading module performing one or more methods, including at least one or more of, get default hyperparameters, get estimator parameters, get estimator parameter ranges, get estimator parameters distribution, get operator type.
8 . A computer system for machine learning by using machine learning models and by using at least one training data set as an input to a respective estimator of a machine learning model, the estimator being selected out of a list of estimators, the computer system including: a memory storing the list of estimators and a group of training data sets, and a machine learning process tool configured for defining the group of machine learning models, the machine learning process tool being configured to provide, when performing the training, at least one training data set out of the group of training data sets as an input to a respective estimator of the machine learning model; the computer system being configured for: coupling, via an Application Programming Interface (API), the machine learning system to a network, wherein the machine learning system is comprised of a receiving module and a reading module; receiving, by the machine learning system, via the network, a new estimator not included in the list of estimators and a respective documentation; adding the new estimator to the list stored in the memory; reading, by the reading module, the documentation and providing the machine learning process tool with respective extracted data, wherein the data is extracted from a machine-readable format of the documentation and is comprised of a plurality of hyperparameters and corresponding values; adapting, by the machine learning process tool, at least one training data set out of the group of training data sets to the new estimator based on the extracted data by adapting one or more hyperparameters and values of the at least one training data set to include parameter ranges and parameter distributions with the corresponding values from the documentation of the new estimator; training at least a subset of the machine learning models by using the new estimator, with the at least one training data set as an input, the training resulting in an output score from each of the machine learning models of the subset; selecting a machine learning model from the machine learning models of the subset based on a ranking of the machine learning models of the subset according to the input and the output score obtained during the training; and generating a final output from the machine learning model using the new estimator in response to a task identified by a user.
9 . The computer system of claim 8 , further comprising: defining rules to be fulfilled with respect to each estimator, including any new estimator that may be accepted, and the adapting at least one training data set being made in accordance with the rules; and filtering out the new estimator if an adapting of at least one training data set in accordance with the rules is not possible or if the new estimator is not included in a list of allowed estimators, the list of allowed estimators being stored in the memory.
10 . The computer system of claim 9 , wherein the rules are defined by at least one of a format of hyperparameters and one or more values of the hyperparameters, the hyperparameters being defined by variable schemes.
11 . The computer system of claim 10 , further comprising: updating the schemes upon loading of at least one or more new estimators or inclusion of a new software module.
12 . The computer system of claim 11 , wherein the receiving module receives the new estimator from the API and sends the new estimator and the respective documentation to a reading module, and wherein the reading module includes a converting module, and wherein the converting module converts at least part of the documentation of the new estimator into the machine-readable format, the machine-readable format being JavaScript Object Notation®, JSON.
13 . A computer program product comprising: a machine learning process tool of a machine learning system, the machine learning process tool providing, when performing a training, at least one training data set out of a group of training data sets as an input to a respective estimator of a machine learning model, the estimator being selected out of a list of estimators stored in a memory of the machine learning process system; a computer-readable storage medium having computer-readable program code embodied therewith, the computer-readable program code being configured to implement a method comprising: coupling, via an Application Programming Interface (API), the machine learning system to a network, wherein the machine learning system is comprised of a receiving module and a reading module; receiving, by the machine learning system, via the network, a new estimator not included in the list of estimators and a respective documentation; adding the new estimator to the list stored in the memory; reading, by the reading module, the documentation and providing the machine learning process tool with respective extracted data, wherein the data is extracted from a machine-readable format of the documentation and is comprised of a plurality of hyperparameters and corresponding values; adapting, by the machine learning process tool, at least one training data set out of the group of training data sets to the new estimator based on the extracted data by adapting one or more hyperparameters and values of the at least one training data set to include parameter ranges and parameter distributions with the corresponding values from the documentation of the new estimator; training at least a subset of the machine learning models by using the new estimator, with the at least one training data set as an input, the training resulting in an output score from each of the machine learning models of the subset; selecting a machine learning model from the machine learning models of the subset based on a ranking of the machine learning models of the subset according to the input and the output score obtained during the training; and generating a final output from the machine learning model using the new estimator in response to a task identified by a user.
14 . The computer program product of claim 13 , further comprising: defining rules to be fulfilled with respect to each estimator, including any new estimator that may be accepted, and the adapting at least one training data set being made in accordance with the rules; and filtering out the new estimator if an adapting of at least one training data set in accordance with the rules is not possible or if the new estimator is not included in a list of allowed estimators, the list of allowed estimators being stored in the memory.
15 . The computer program product of claim 14 , wherein the rules are defined by at least one of a format of hyperparameters and one or more values of the hyperparameters, the hyperparameters being defined by variable schemes.
16 . The computer program product of claim 15 , further comprising: updating the schemes upon loading of at least one or more new estimators or inclusion of a new software module.
17 . The computer program product of claim 16 , wherein the receiving module receives the new estimator from the API and sends the new estimator and the respective documentation to a reading module, and wherein the reading module includes a converting module, and wherein the converting module converts at least part of the documentation of the new estimator into the machine-readable format, the machine-readable format being JavaScript Object Notation®, JSON.

Description

BACKGROUND Embodiments of the present invention relate to the field of machine learning (ML) and specifically to training of machine learning models. ML models are data files used by hardware and/or software entities or by hardware and/or software systems which run a dedicated software, with the purpose to produce a specific kind of output when an input having a predetermined format is provided. The ML models are defined as composed of different stages, the entirety of the stages being called a pipeline. Two different ML models differ in the definition of at least one of these stages. Of course, different ML models may use completely different learning methods, such as “Linear Regression”, “Logic Regression”, “Decision Tree”, “Boosting”, and the like. Internal parameter values for the ML models need to be defined for being able to use the ML model for a specific task. This parameter defining is done via a training of the ML models where training data are used as inputs. So-called estimators may represent a first stage of the ML models. An estimator is an algorithm that transforms the input training data into a single value or into multiple values for use by the subsequent stages in the ML model. A method of training an ML model out of a group of different ML models each having another estimator enables a ranking, e.g., by using a ranking-score, and to select the ML model having the optimum rank, e.g., the highest ranking-score. That enables to use, after training with suitable training data, an ML model that is the optimum for a desired application or task. With regard to a specific task, a user may preselect first the learning method applied by the ML model. Then, the estimator to be chosen is not arbitrary: Prior to a training, a user might apply predetermined criteria for a first selection of estimators that appear well-suited to the learning method chosen. If, for instance, the target variable is discrete (categorical, nominal, ordinal), the methods “Logistic Regression”, “Naive Bayes classifier”, “Support Vector machines”, “Decision trees”, “Boosted trees”, “Random forest”, “Neural networks”, and “K-Nearest neighbors”, can be used. With a large number of input data “SGD”, “Stochastic Gradient Descent”), might be helpful. “K-Nearest neighbors” is effective if training data is huge and noisy data. If, to the contrary, the target variable is continuous, then a regression algorithm might need to be used, the regression algorithm selected out of the group comprising for example “Linear regression”, “Polynomial regression”, “Ridge Regression”, “Lasso Regression”, and “ElasticNet regression”. As to the estimators, for example, for an ML model using the learning method “Gaussian Naive Bayes”, the estimator should output an average value μ, a variance □2, and the further statistical value P(Y). For the learning method “Logistic Regression”, the estimator stage has to optimize an objective function using gradient descent. For “Decision Tree” and “Boosting with decision stamps” as learning methods, well-known algorithms are available on the marketplace. For the learning method “K-Nearest neighbors”, the estimator must store all training data to classify new points. K is chosen using cross-variation. For the learning method “Support Vector Machines”, a quadratic program has to be solved to find a boundary that maximizes a margin. It might be that a user does not have an optimum estimator available on his or her system. However, many dedicated estimators are available in the marketplace from third parties, i.e. from suppliers that supply these estimators for specific other ML systems. Since different ML system may be as such incompatible, the respective estimators of a first system generally cannot be most simply used by a second system. SUMMARY Various embodiments provide a method for training machine learning (ML) models in an ML system by means of an ML process tool of the ML system, the ML process tool providing, when performing the training, at least one training data set out of a group of training data sets as an input to a respective estimator of an ML model, the estimator being selected out of a list of estimators stored in a memory of the ML, and the various embodiments provide a respective process system and computer program product, all of these as described by the subject matter of the independent claims. Advantageous embodiments are described in the dependent claims. Embodiments of the present invention can be freely combined with each other if they are not mutually exclusive. In one embodiment, a method trains ML models in an ML system by means of an ML process tool of the ML system, the ML process tool providing, when performing the training, at least one training data set out of a group of training data sets as an input to a respective estimator of an ML model, the estimator being selected out of a list of estimators stored in a memory of the ML process system. The method includes coupling the ML sy