US-12626119-B2 - Task-adaptive architecture for few-shot learning

US12626119B2US 12626119 B2US12626119 B2US 12626119B2US-12626119-B2

Abstract

Meta-training an artificial neural cell for use in a few-shot learner, wherein the meta-training includes: executing a Neural Architecture Search (NAS) to automatically learn an architecture of the artificial neural cell; training adaptive controllers that are comprised in the architecture of the artificial neural cell, wherein each of the adaptive controllers is configured to adapt the architecture of the artificial neural cell to a few-shot learning task; and regressing the architecture of the artificial neural cell from support data of the few-shot learning task, through the adaptive controllers. Generating the few-shot learner based on the meta-trained artificial neural cell, to form an Artificial Neural Network (ANN).

Inventors

Eliyahu Schwartz
Leonid KARLINSKY
SIVAN DOVEH

Assignees

INTERNATIONAL BUSINESS MACHINES CORPORATION

Dates

Publication Date: 20260512
Application Date: 20201129

Claims (20)

1 . A method comprising operating at least one hardware processor to: meta-train an artificial neural cell for use in a few-shot learner, wherein said meta-training comprises: executing a Neural Architecture Search (NAS) to automatically learn an architecture of the artificial neural cell, training adaptive controllers that are comprised in the architecture of the artificial neural cell, wherein each of the adaptive controllers is configured to adapt the architecture of the artificial neural cell to a few-shot learning task by re-wiring the architecture, wherein the rewiring of the architecture comprises effectively changing a corresponding directed acyclic graph of operations within the artificial neural cell, and regressing the architecture of the artificial neural cell from support data of the few-shot learning task, through the adaptive controllers; and generate the few-shot learner based on the meta-trained artificial neural cell, to form an Artificial Neural Network (ANN).
2 . The method of claim 1 , further comprising adapting the architecture of the meta-trained artificial neural cell to a new few-shot learning task, wherein said generating of the few-shot learner comprises connecting multiple ones of the meta-trained artificial neural cell having an adapted architecture, to form the ANN.
3 . The method of claim 1 , further comprising: adapting the architecture of the meta-trained artificial neural cell to a new few-shot learning task.
4 . The method of claim 3 , further comprising training the few-shot learner in a new few-shot learning task, wherein said training of the few-shot learner is devoid of fine-tuning.
5 . The method of claim 1 , wherein the NAS is a Differentiable NAS (D-NAS).
6 . The method of claim 5 , wherein the architecture of the artificial neural cell comprises an adaptive block structured as a Directed Acyclic Graph (DAG) having nodes and edges, in which: each of the nodes defines a feature map calculated as a combination of those of the edges which are directed at the respective node; each of the edges is associated with a respective one of the adaptive controllers; and each of the edges defines a mixed operation controlled by the respective adaptive controller.
7 . The method of claim 6 , wherein each of the mixed operations comprises: a set of search space operations; and a mixing coefficient of the multiple search space operations.
8 . The method of claim 7 , wherein: said meta-training further comprises optimizing the mixing coefficient; and said training of the adaptive controllers comprises optimizing a modifier that is configured to modify the mixing coefficient respective of the few-shot learning task, so as to enhance performance of the few-shot learning task.
9 . The method of claim 8 , wherein: each of the adaptive controllers uses Global Average Pooling (GAP) and applies a Multi-Layer Perceptron (MLP) to produce the modifier.
10 . A system comprising: (a) at least one hardware processor; and (b) a non-transitory computer-readable storage medium having program code embodied therewith, the program code executable by said at least one hardware processor to: meta-train an artificial neural cell for use in a few-shot learner, wherein the meta-training comprises: executing a Neural Architecture Search (NAS) to automatically learn an architecture of the artificial neural cell, training adaptive controllers that are comprised in the architecture of the artificial neural cell, wherein each of the adaptive controllers is configured to adapt the architecture of the artificial neural cell to a few-shot learning task by re-wiring the architecture, wherein the rewiring of the architecture comprises effectively changing a corresponding directed acyclic graph of operations within the artificial neural cell, and regressing the architecture of the artificial neural cell from support data of the few-shot learning task, through the adaptive controllers; and generate the few-shot learner based on the meta-trained artificial neural cell, to form an Artificial Neural Network (ANN).
11 . The system of claim 10 , wherein the program code is further executable to: adapt the architecture of the meta-trained artificial neural cells to a new few-shot learning task.
12 . The system of claim 11 , wherein the program code is further executable to train the few-shot learner in a new few-shot learning task, wherein said training of the few-shot learner is devoid of fine-tuning.
13 . The system of claim 10 , wherein the NAS is a Differentiable NAS (D-NAS).
14 . The system of claim 13 , wherein the architecture of the artificial neural cell comprises an adaptive block structured as a Directed Acyclic Graph (DAG) having nodes and edges, in which: each of the nodes defines a feature map calculated as a combination of those of the edges which are directed at the respective node; each of the edges is associated with a respective one of the adaptive controllers; and each of the edges defines a mixed operation controlled by the respective adaptive controller.
15 . The system of claim 14 , wherein each of the mixed operations comprises: a set of search space operations; and a mixing coefficient of the multiple search space operations.
16 . The system of claim 15 , wherein: said meta-training further comprises optimizing the mixing coefficient; and said training of the adaptive controllers comprises optimizing a modifier that is configured to modify the mixing coefficient respective of the few-shot learning task, so as to enhance performance of the few-shot learning task.
17 . The system of claim 16 , wherein: each of the adaptive controllers uses Global Average Pooling (GAP) and applies a Multi-Layer Perceptron (MLP) to produce the modifier.
18 . A computer program product comprising a non-transitory computer-readable storage medium having program code embodied therewith, the program code executable by at least one hardware processor to: meta-train an artificial neural cell for use in a few-shot learner, wherein said meta-training comprises: executing a Neural Architecture Search (NAS) to automatically learn an architecture of the artificial neural cell, training adaptive controllers that are comprised in the architecture of the artificial neural cell, wherein each of the adaptive controllers is configured to adapt the architecture of the artificial neural cell to a few-shot learning task by re-wiring the architecture, wherein the rewiring of the architecture comprises effectively changing a corresponding directed acyclic graph of operations within the artificial neural cell, and regressing the architecture of the artificial neural cell from support data of the few-shot learning task, through the adaptive controllers; and generate the few-shot learner based on the meta-trained artificial neural cell, to form an Artificial Neural Network (ANN).
19 . The computer program product of claim 18 , wherein the program code is further executable to: adapt the architecture of the meta-trained artificial neural cells to a new few-shot learning task.
20 . The computer program product of claim 19 , wherein the program code is further executable to train the few-shot learner in a new few-shot learning task, wherein said training of the few-shot learner is devoid of fine-tuning.

Description

BACKGROUND The invention relates to the field of few-shot learning, a type of machine learning. Few-shot learning (FSL) in general, and few-shot classification (FSC) in particular, have seen much progress recently. Few-shot learning involves situations where inference has to be made on the basis of only a handful of examples, as opposed to the traditional requirement in machine learning to learn from a vast number of examples, typically in the hundreds or thousands. In different FSC applications, label complexity ranges from image-level class labels (‘classification’), to labeled bounding boxes (‘detection’), to labeled pixel masks (‘segmentation’). A popular approach in FSC is meta-learning, or ‘learning-to-learn.’ In meta-learning, the inputs are not images per-se, but instead a set of few-shot tasks, {Ti}, each K-shot/N-way task containing a small amount K (usually 1-5, possibly a few more) of labeled support images and some amount of unlabeled query images for each of the N categories (or ‘classes’) of the task. The goal of meta-learning is to find a base model that can transfer well to tasks built from novel, previously-unseen categories, in which only a small amount of examples per category is available. For example, using few-shot classification, a base model that was meta-learned from images of dogs, cats, and birds, may be transferred to a task in which images of bears and rodents (the novel categories) require classification. While many different FSL methods have been proposed, one of the key factors leading to higher FSL performance is surprisingly simple—the backbone neural network architecture used to embed the images of the few-shot tasks. While first works on FSL resorted to small architectures with just a few convolution layers, recent works show that large architectures pre-trained on the training portion of FSL datasets produce strong features that are more easily transferable to novel few-shot tasks. Hand-in-hand with the growing sophistication of FSC methods, some general factors affecting their performance have become apparent. One such factor is the Convolutional Neural Network (CNN) backbone architecture at the basis of modern FSC methods. So far, in many of the FSC approaches, the backbone architectures were chosen rather arbitrarily by re-using the most popular modern classification architectures. Under this setup, meta-learning only seeks the best transferable parameters, while the backbone architecture itself remains pre-determined and fixed. Few approaches have actually made an attempt to optimize the backbone architecture used for FSC, leaving much to be desired. There still remains a need for effective meta-learning-based methods that enable a learned architecture to adapt itself to novel few-shot tasks. The foregoing examples of the related art and limitations related therewith are intended to be illustrative and not exclusive. Other limitations of the related art will become apparent to those of skill in the art upon a reading of the specification and a study of the figures. SUMMARY The following embodiments and aspects thereof are described and illustrated in conjunction with systems, tools and methods which are meant to be exemplary and illustrative, not limiting in scope. One embodiment provides a method comprising: Meta-training an artificial neural cell for use in a few-shot learner, wherein the meta-training includes: executing a Neural Architecture Search (NAS) to automatically learn an architecture of the artificial neural cell; training adaptive controllers that are comprised in the architecture of the artificial neural cell, wherein each of the adaptive controllers is configured to adapt the architecture of the artificial neural cell to a few-shot learning task; and regressing the architecture of the artificial neural cell from support data of the few-shot learning task, through the adaptive controllers. Generating the few-shot learner based on the meta-trained artificial neural cell, to form an Artificial Neural Network (ANN). Another embodiment provides a system comprising: (a) at least one hardware processor; and (b) a non-transitory computer-readable storage medium having program code embodied therewith, the program code executable by said at least one hardware processor to: Meta-train an artificial neural cell for use in a few-shot learner, wherein the meta-training includes: executing a Neural Architecture Search (NAS) to automatically learn an architecture of the artificial neural cell; training adaptive controllers that are comprised in the architecture of the artificial neural cell, wherein each of the adaptive controllers is configured to adapt the architecture of the artificial neural cell to a few-shot learning task; and regressing the architecture of the artificial neural cell from support data of the few-shot learning task, through the adaptive controllers. Generate the few-shot learner based on the meta-trained artificial neural cell, to form an Artif