US-12626141-B2 - Automated generation of machine learning models

US12626141B2US 12626141 B2US12626141 B2US 12626141B2US-12626141-B2

Abstract

This document relates to automated generation of machine learning models, such as neural networks. One example system includes a hardware processing unit and a storage resource. The storage resource can store computer-readable instructions cause the hardware processing unit to perform an iterative model-growing process that involves modifying parent models to obtain child models. The iterative model-growing process can also include selecting candidate layers to include in the child models based at least on weights learned in an initialization process of the candidate layers. The system can also output a final model selected from the child models.

Inventors

Debadeepta Dey
Hanzhang HU
Richard A. Caruana
John C. LANGFORD
Eric J. Horvitz

Assignees

MICROSOFT TECHNOLOGY LICENSING, LLC

Dates

Publication Date: 20260512
Application Date: 20221213

Claims (20)

1 . A method performed on a computing device, the method comprising: outputting a graphical user interface having a first graphical element for designating an operation search space of operations available to be performed by candidate layers in an iterative model-growing process; receiving first user input via the first graphical element, the first user input identifying a set of multiple operations to include in the operation search space; adding an initial parent model to a parent model pool; performing two or more iterations of the iterative model-growing process, the iterative model-growing process comprising: selecting a particular parent model having a plurality of layers from the parent model pool; inserting a plurality of candidate layers into the particular parent model, respective candidate layers being configured to perform respective operations selected from the set of multiple operations identified by the first user input; initializing the plurality of candidate layers to obtain learned weights for the candidate layers during training while the plurality of candidate layers are connected to the particular parent model, the plurality of candidate layers being initialized while maintaining weights of the plurality of layers of the particular parent model; based at least on the learned weights of the plurality of candidate layers, selecting less than all of the plurality of candidate layers as selected candidate layers to include in each child model of a plurality of child models for subsequent training, each respective child model including the plurality of layers of the particular parent model and one or more of the selected candidate layers; training the plurality of child models having the one or more selected candidate layers, the training resulting in trained child models; evaluating the trained child models using one or more criteria; and based at least on the evaluating, designating an individual trained child model as a new parent model and adding the new parent model to the parent model pool; and after the two or more iterations, selecting at least one trained child model as a final model and outputting the final model.
2 . The method of claim 1 , further comprising: receiving second user input via a second graphical element of the graphical user interface, wherein the second user input designates a default model, designates a randomly-generated model, or navigates to an existing model to use as the initial parent model.
3 . The method of claim 1 , wherein the first user input received via the first graphical element of the graphical user interface selects at least two different convolution operations, and the respective operations performed by the respective candidate layers are selected randomly from the operation search space.
4 . The method of claim 1 , wherein the first user input received via the first graphical element of the graphical user interface selects at least two different pooling operations, and the respective operations performed by the respective candidate layers are selected randomly from the operation search space.
5 . The method of claim 1 , further comprising: receiving second user input directed to a second graphical element of the graphical user interface, the second user input identifying a specified amount of computational resources to use for the iterative model-growing process; and responsive to expending the specified amount of computational resources, stopping the iterative model-growing process and selecting the final model.
6 . The method of claim 5 , wherein the second user input directed to the second graphical element designates a number of GPU-days to expend for the iterative model-growing process.
7 . The method of claim 5 , wherein the second user input directed to the second graphical element designates a length of time to expend for the iterative model-growing process.
8 . The method of claim 1 , further comprising: receiving second user input directed to a second graphical element of the graphical user interface, the second user input directed to the second graphical element identifying model size as a particular criterion for evaluating the trained child models; and designating the individual trained child model as the new parent model based at least on a model size of the individual trained child model.
9 . The method of claim 1 , further comprising: receiving second user input directed to a second graphical element of the graphical user interface, the second user input directed to the second graphical element specifying connectivity parameters for the child models; and generating the child models according to the connectivity parameters.
10 . The method of claim 9 , wherein the connectivity parameters specified by the second user input indicate a number of previous layers that contribute to the one or more selected candidate layers of the child models.
11 . The method of claim 9 , wherein the connectivity parameters specified by the second user input directed to the second graphical element indicate whether skip connections are employed in the child models.
12 . The method of claim 1 , wherein the first user input selects a group of at least two different convolutional kernel sizes available to be performed by the respective candidate layers when performing the iterative model-growing process.
13 . The method of claim 1 , wherein the first user input selects a group of at least two different convolutional stride sizes available to be performed by the respective candidate layers when performing the iterative model-growing process.
14 . The method of claim 1 , wherein the first user input selects a group of pooling operations available to be performed by the respective candidate layers when performing the iterative model-growing process, the group including at least a max pooling operation and an average pooling operation.
15 . The method of claim 1 , wherein the first user input selects a group of at least two different pooling window sizes available to be performed by the respective candidate layers when performing the iterative model-growing process.
16 . A system comprising: a hardware processing unit; and a storage resource storing computer-readable instructions which, when executed by the hardware processing unit, cause the hardware processing unit to: receive first user input via a first graphical element of a graphical user interface, the first user input identifying a set of multiple operations to include in an operation search space for an iterative model-growing process; add an initial parent model to a parent model pool; perform two or more iterations of the iterative model-growing process, the iterative model-growing process comprising: selecting a particular parent model having a plurality of layers from the parent model pool; inserting a plurality of candidate layers into the particular parent model, respective candidate layers being configured to perform respective operations selected from the set of multiple operations identified by the first user input; initializing the plurality of candidate layers to obtain learned weights for the candidate layers during training while the plurality of candidate layers are connected to the particular parent model, the plurality of candidate layers being initialized while maintaining weights of the plurality of layers of the particular parent model; based at least on the learned weights of the plurality of candidate layers, selecting less than all of the plurality of candidate layers as selected candidate layers to include in each child model of a plurality of child models for subsequent training, each respective child model including the plurality of layers of the particular parent model and one or more of the selected candidate layers; training the plurality of child models having the one or more selected candidate layers, the training resulting in trained child models; evaluating the trained child models using one or more criteria; and based at least on the evaluating, designating an individual trained child model as a new parent model and adding the new parent model to the parent model pool; and after the two or more iterations, select at least one trained child model as a final model and outputting the final model.
17 . The system of claim 16 , wherein the operation search space includes multiple convolution operations and multiple pooling operations designated by the first user input that is received via the first graphical element of the graphical user interface.
18 . The system of claim 16 , wherein the computer-readable instructions, when executed by the hardware processing unit, cause the hardware processing unit to: configure the respective candidate layers to perform respective operations randomly selected from the set of multiple operations identified by the first user input.
19 . The system of claim 18 , wherein the first user input directed to the first graphical element identifies at least two different convolution operations and at least two different pooling operations from which the respective operations are randomly selected.
20 . The system of claim 19 , wherein the first user input directed to the first graphical element specifies at least two different window sizes and at least two different strides for the at least two different convolution operations.

Description

BACKGROUND Traditionally, machine learning models were manually constructed by experts who would define a structure of the model and then use automated techniques for model training. As machine learning models have grown more complex, various attempts have been made to automate the process of generating machine learning models. However, these efforts have met with limited success. SUMMARY This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter. The description generally relates to techniques for automated generation of machine learning models. One example includes a method or technique that can be performed on a computing device. The method or technique can include performing two or more iterations of an iterative model-growing process. The iterative model-growing process can include selecting a particular parent model from a parent model pool of one or more parent models, generating a plurality of candidate layers, and initializing the plurality of candidate layers while reusing learned parameters and/or structure of the particular parent model. The iterative model-growing process can also include selecting particular candidate components such as layers to include in child models for training. Respective child models can include the particular parent model and one or more of the particular candidate layers or other structures. The iterative model-growing process can also include training the plurality of child models to obtain trained child models, and evaluating the trained child models using one or more criteria. The iterative model-growing process can also include designating an individual trained child model as a new parent model based at least on the evaluating and adding the new parent model to the parent model pool. The method or technique can also include selecting at least one trained child model as a final model after the two or more iterations, and outputting the final model. Another example includes a system that entails a hardware processing unit and a storage resource. The storage resource can store computer-readable instructions which, when executed by the hardware processing unit, cause the hardware processing unit to perform an iterative model-growing process that involves modifying parent models to obtain child models. The iterative model-growing process can include selecting candidate layers to include in the child models based at least on weights learned in an initialization process of the candidate layers. The computer-readable instructions can also cause the hardware processing unit to output a final model selected from the child models. Another example includes a computer-readable storage medium storing instructions which, when executed by a processing device, cause the processing device to perform acts. The acts can include performing two or more iterations of an iterative model-growing process. The iterative model-growing process can include selecting a particular parent model from a parent model pool of one or more parent models, initializing a plurality of candidate layers, and selecting a plurality of child models for training. Respective child models can include a structure inherited from the particular parent model and at least one of the candidate layers. The iterative model-growing process can also include training the plurality of child models to obtain trained child models, and designating an individual trained child model as a new parent model based at least on one or more criteria. The iterative model-growing process can also include adding the new parent model to the parent model pool. The acts can also include selecting at least one trained child model as a final model after the two or more iterations, and outputting the final model. The above listed examples are intended to provide a quick reference to aid the reader and are not intended to define the scope of the concepts described herein. BRIEF DESCRIPTION OF THE DRAWINGS The Detailed Description is described with reference to the accompanying figures. In the figures, the left-most digit(s) of a reference number identifies the figure in which the reference number first appears. The use of similar reference numbers in different instances in the description and the figures may indicate similar or identical items. FIG. 1 illustrates an example method or technique for automated generation of machine learning models, consistent with some implementations of the present concepts. FIG. 2 illustrates an example approach for generating candidate layers of a machine learning model, consistent with some implementations of the present concepts. FIG. 3 illustrates an example approach for initializing candidate layers of a machine learning model, consistent wit