CN-114118196-B - Method and apparatus for training a model for image classification
Abstract
Methods and apparatus for training a model for image classification are provided. A training method for a model of an image classification includes receiving a first set of image samples of a base class of base tasks, training the model based on the first set of image samples to obtain base classification weights for the base class of base tasks from the trained model, sequentially receiving a plurality of new tasks, and upon receipt of any one of the plurality of new tasks, receiving a second set of image samples of the new class of any one of the new tasks, training a weight generator based on the base classification weights, one or more other classification weights for the new class of previously received one or more other of the plurality of new tasks, the first set of image samples, and one or more of the second set of image samples to obtain new classification weights for the new class of any one of the new tasks, and updating the model with the new classification weights for the new class of any one of the new tasks.
Inventors
- Mustafa Eyre Hamid
- CUI YUZHEN
- LI ZHENGYUAN
- WANG SIJIA
Assignees
- 三星电子株式会社
- 三星电子株式会社
Dates
- Publication Date
- 20260421
- Application Date
- 20210812
- Priority Date
- 20210122
Claims (20)
- 1. A training method for a model for image classification, comprising: receiving a first image sample set of a base class of a base task; Training a model based on the first image sample set to obtain a base classification weight for a base class of the base task from the trained model; Sequentially receiving a plurality of new tasks, and Upon receiving any one of the plurality of new tasks: A second set of image samples of a new class of the any one new task is received, Training a weight generator based on the base classification weight, one or more other classification weights for a new class of one or more previously received other new tasks of the plurality of new tasks, one or more of the first image sample set and the second image sample set to obtain a new classification weight for the new class of any one new task using the base classification weight and the one or more other classification weights as inputs to the weight generator, and The model is updated with new classification weights for the new classes of any one new task.
- 2. The training method of claim 1, further comprising training the weight generator using a random number of base classes selected from the base classes and a fake new task of a fake new class selected from the base classes, or using a fixed number of fake new tasks of a fake new class selected from the base classes.
- 3. The training method of claim 2, wherein the step of training the weight generator comprises determining the average cross entropy loss using randomly selected samples from a set of image samples for a class of training weight generators.
- 4. Training method according to claim 1, wherein the updated model is used to classify the first sample set of the second image sample set of any new task into a new class.
- 5. The training method of claim 1, wherein training the model based on the first set of image samples comprises extracting features from the first set of image samples and training the model based on the extracted features.
- 6. The training method of claim 1, wherein the step of training the weight generator comprises: extracting features from a second sample set of the second image sample set of the any one new task, and The extracted features, the base classification weights, and the one or more other new classification weights are used by a weight generator to generate new classification weights.
- 7. The training method of claim 6, wherein the number of the one or more other new tasks is less than or equal to three.
- 8. The training method of claim 1, wherein the step of training the weight generator comprises: extracting features from a second sample set of the second image sample set of the any one new task, and New classification weights are generated by the weight generator using the classification weights and extracted features for classes selected from the base class and the new class of the one or more other new tasks.
- 9. Training method according to claim 8, wherein for each new task a random number of classes is selected for the classification weights used to generate the new classification weights.
- 10. Training method according to any of the claims 1-9, wherein the weight generator is a bi-directional attention weight generator or a self-attention weight generator.
- 11. A user device for training a model for image classification, comprising: Processor, and A non-transitory computer-readable storage medium storing instructions that, when executed, cause a processor to: receiving a first image sample set of a base class of a base task; Training a model based on the first image sample set to obtain a base classification weight for a base class of the base task from the trained model; Sequentially receiving a plurality of new tasks, and Upon receiving any one of the plurality of new tasks: A second set of image samples of a new class of the any one new task is received, Training a weight generator based on the base classification weight, one or more other classification weights for a new class of one or more previously received other new tasks of the plurality of new tasks, one or more of the first image sample set and the second image sample set to obtain a new classification weight for the new class of any one new task using the base classification weight and the one or more other classification weights as inputs to the weight generator, and The model is updated with new classification weights for the new classes of any one new task.
- 12. The user equipment of claim 11, wherein the processor is further configured to train the weight generator using a random number of base classes selected from the base classes and the fake new tasks of the fake new classes or using a fixed number of fake new tasks of the fake new classes selected from the base classes.
- 13. The user equipment of claim 12, wherein in training the weight generator, the processor is further configured to determine an average cross entropy loss using samples randomly selected from a set of image samples for a class of training the weight generator.
- 14. The user device of claim 11, wherein the updated model is used to classify the first sample set in the second image sample set of any new task into a new class.
- 15. The user device of claim 11, wherein training a model based on the first set of image samples comprises extracting features from the first set of image samples and training the model based on the extracted features.
- 16. The user equipment of claim 11, wherein, in training the weight generator, the processor is further configured to: extracting features from a second sample set of the second image sample set of the any one new task, and The extracted features, the base classification weights, and the one or more other new classification weights are used by a weight generator to generate new classification weights.
- 17. The user equipment of claim 16, wherein the number of the one or more other new tasks is less than or equal to three.
- 18. The user equipment of claim 11, wherein, in training the weight generator, the processor is further configured to: extracting features from a second sample set of the second image sample set of the any one new task, and New classification weights are generated by the weight generator using the classification weights and extracted features for classes selected from the base class and the new class of the one or more other new tasks.
- 19. The user equipment of claim 18, wherein for each new task, a random number of classes is selected for the classification weights used to generate the new classification weights.
- 20. The user equipment of any of claims 11 to 19, wherein the weight generator is a bi-directional attention weight generator or a self-attention weight generator.
Description
Method and apparatus for training a model for image classification The present application is based on and claims priority of U.S. provisional patent application filed and assigned 63/071,067 by U.S. patent and trademark office (USPTO) at month 8 and 27 of 2020 and priority of U.S. non-provisional patent application filed and assigned 17/156,126 by U.S. patent and trademark office (USPTO) at month 1 and 22 of 2021, the contents of which are incorporated herein by reference. Technical Field The present disclosure relates generally to machine learning methods, and more particularly, to methods and apparatus for training models for image classification. Background In the field of machine learning, it may be difficult to accumulate enough data to improve the accuracy of the model. In limited data scenarios, a few sample learning algorithm has been employed to discover patterns in data and make inferences. This technique is commonly used in the field of computer vision for categorizing or classifying photographs. In a small sample learning task (where N is the number of classes and K is the number of samples (or images) in each class), a small training set D is provided. The training set size is |d|=n·k. The base training set D 0 may be utilized to learn transferable knowledge for improved low-sample learning. The base training set D 0 contains a large number of labeled samples from a large number of classes. However, the classes in the base training set D 0 are different from the classes in the training set D. Thus, traditional few sample learning trains models with a small amount of training data or samples and not with the underlying class. The segment (episode) represents a training and testing pair of a few-sample learning task. Fig. 1 is a diagram illustrating a fragmentation (episodic) less-sample learning method. The first training task 102, the second training task 104, and the first test task 106 each include a respective support set 108, 110, and 112 having three classes (n=3) and two samples (images) per class (k=2). The first training task 102, the second training task 104, and the first test task 106 each also include a respective set of queries 114, 116, and 118 having three samples (images). The class in each of the first training task, the second training task, and the first test task is different. Both metric-based training algorithms and gradient-based training algorithms were developed on top of the segmentation learning framework. For example, self-supervised loss may be added to the feature extractor training process to achieve robust semantic feature learning and improve low sample classification. Furthermore, wasserstein-based methods may be added to better align the distribution of features with the distribution of classes under consideration. However, as described above, conventional few-sample learning does not consider the underlying class used in training. Less sample learning has been developed that does not forget the underlying class to classify new classes when only a small number of labeled samples are provided for the new class, while also preserving the ability to classify the underlying class on which the feature embedding network is trained. For example, the feature embedding network and the classification weights for the underlying classes are pre-trained by conventional supervised learning and then fixed. FIG. 2 is a diagram illustrating a few sample learning of a non-forgetting base class focused on generating classification weights for new classes. The sample or test image 202 is provided to a feature extractor 204, and the feature extractor 204 outputs features of the sample to a classifier 206. The classifier 206 obtains a base classification weight (or classification weight for the base class) 208 from training data 210 of the base class. The low sample classification weight generator 212 generates new classification weights (or classification weights for new classes) 214 for the limited training data 216 of the new class and provides the new classification weights 214 to the classifier 206. More specifically, for the few sample classification weight generator 212, the weight inscription method (WEIGHT IMPRINTING method) calculates prototypes of new classes from the pre-trained feature embedding network and uses them as classification weights for the new classes. Further, by a weight generator that takes as input a new class prototype and a classification weight 208 for a base class, the generation of the classification weight 214 for the new class is learned by applying a relationship between the base class and the new class using an attention-based mechanism in the generation of the new classification weight 214. Based on the base classification weight 208 and the new classification weight 214, the classifier outputs probabilities for the base class and the new class of the sample 202. Furthermore, new classification weights can be trained by gradient-based optimiza