CN-116188900-B - Small sample image classification method based on global and local feature augmentation
Abstract
The invention discloses a small sample image classification method based on global and local feature augmentation, which comprises the steps of dividing an image dataset into a training set, a verification set and a test set, dividing the test set into a support set and a query set, respectively preprocessing all the image datasets, respectively pre-training images in the preprocessed training set to obtain an optimal feature extractor, extracting feature images of each image in the support set by using the feature extractor, carrying out channel fusion to obtain an augmentation set S 1 of image samples in the support set, carrying out local background smoothing to obtain an augmentation set S 2 of the image samples in the support set, extracting the feature images of each image in the query set by using the feature extractor, and classifying according to the distances between the feature images and the image features in the support set and the augmentation set S 1 、S 2 to obtain a prediction label. The invention can be used for classifying common small sample images, classifying fine-granularity small sample images and classifying multiple types of images with higher precision and efficiency under the condition of not introducing excessive parameters.
Inventors
- LI WENBIN
- SHI BOYAO
- HUO JING
- GAO YANG
Assignees
- 南京大学
Dates
- Publication Date
- 20260508
- Application Date
- 20230110
Claims (10)
- 1. The small sample image classification method based on global and local feature augmentation is characterized by comprising the following steps: Dividing the image dataset into a training set, a validation set and a test set, and further dividing the test set into a support set And a query set ; Respectively preprocessing the obtained images in the training set, the verification set and the test set according to the required resolution; The method comprises the steps of firstly randomly extracting small batches of image data from a training set for multiple times, secondly adopting rotation transformation to create enhanced copies of images, utilizing the enhanced image data to train to obtain a plurality of feature extractors, then using the auxiliary loss of rotation angle prediction to create an optimal output manifold of the data for enhancing the generalization capability of the feature extractors, adopting an orthogonal regularization method of a semantic orthogonal learning framework to calculate the correlation among all channels in the feature graph of the image data in the training set extracted by the feature extractors, restraining with an identity matrix, evaluating the result on a verification set through the feature extractors, and selecting the optimal feature extractor; extracting a support set by adopting an optimal feature extractor obtained by training The feature map of each image is measured according to the weight of the channel, and k channels with small weight are selected to be integrated with the channels of the global perception of other image features to be used as a support set Augmentation set of medium image samples ; Extracting a support set by adopting a feature extractor obtained by training Selecting clusters of local descriptors of foreground and background from each image feature by clustering, smoothing the background by local perception at the center of the background cluster, and taking the smoothed image feature as a support set Augmentation set of medium image samples ; Extracting a query set by adopting a feature extractor obtained by training Feature map of each image in (a) and then based on the feature map and the support set Augmentation set 、 And classifying the distances of the image features in the model to obtain the predictive label.
- 2. The small sample image classification method based on global and local feature augmentation as claimed in claim 1, wherein the image dataset is a small sample image classification dataset MINIIMAGENET, TIEREDIMAGENET or CIFAR-FS, or a fine-grained reference classification dataset CUB Birds.
- 3. The method for classifying small sample images based on global and local feature augmentation as claimed in claim 1, wherein the obtained images in the training set, the validation set and the test set are all scaled to obtain a resolution of 84 per image 84。
- 4. The method for classifying small sample images based on global and local feature augmentation as claimed in claim 1, wherein small batches of image data are randomly extracted from the training set a plurality of times, denoted as The extracted feature map is expressed as Feature map The similarity calculation formula between each channel is as follows: ; Wherein, the And Is a characteristic diagram The first of (3) And (b) The number of channels in the channel is the same, Is that Is to be used in the present invention, For calculating norms of the matrix, thereby obtaining Similarity matrix of (c) ; Matrix the similarity To the identity matrix Near, the calculation formula is: ; Wherein, the As a loss function.
- 5. The method for classifying small sample images based on global and local feature augmentation as claimed in claim 1, wherein the feature extractor is obtained by training Extracting a support set under a corresponding test set Feature map of each image in (a) For each image feature, the importance of the channel is measured according to the weight of the channel, and k channels with the minimum weight are selected Composing channel feature-tag pairs Wherein Is in a characteristic diagram K channels with small weight top are selected, and then the k channels are randomly selected from the characteristic diagrams of other image data Wherein Is in a characteristic diagram Randomly selected k channels, and forming an augmentation set after fusion The process is as follows: ; Wherein, the Is the corresponding characteristic after the fusion, Is a selected class label And original class labels A compromise between the method and the device for controlling the label holding of the amplified samples, and finally, the new fused samples are taken as a support set Augmentation set corresponding to medium image sample I.e. 。
- 6. The method for classifying small sample images based on global and local feature augmentation as claimed in claim 1, wherein the feature extractor is obtained by training Extracting a support set under a corresponding test set Feature map of each image in (a) And represents it as a collection of local descriptors , Is the first The local descriptors are clustered into two clusters by adopting a clustering algorithm Wherein Is the i-th cluster formed by aggregation, and takes the weight of local descriptors in two clusters and large as a foreground cluster Another is used as a background cluster The center of each cluster is denoted as Wherein the first Cluster center of each cluster The calculation process is as follows: ; Wherein, the Represent the first A cluster; Represent the first The first of the clusters And finally, replacing the local descriptor in the background cluster with the center of the cluster to smooth the local background, wherein the smoothing process is shown in the following formula: ; Wherein, the Is a foreground cluster Is provided with a local descriptor of the model, Is the cluster center of the background cluster, Is the first after background smoothing The smoothed image features are used as support sets Augmentation set corresponding to medium image sample I.e. 。
- 7. The method for classifying small sample images based on global and local feature augmentation as claimed in claim 1, wherein the feature extractor is obtained by training Extracting query sets under corresponding test sets Each image of (3) Is characterized by (a) Then according to the support set and the support set And the obtained 、 Computing feature mean values of each category from image features in the image Calculation of And the distance between the average values of all classes, and obtaining the classification probability distribution through softmax The calculation process is as follows: ; Wherein, the Is that The characteristic mean value of the class, exp, represents the exponential function, The distance used for calculating the feature vector home is commonly known as Euclidean distance.
- 8. A small sample image classification system based on global and local feature augmentation, comprising: An image data set dividing module for dividing the image data set into a training set, a verification set and a test set, and further dividing the test set into a support set And a query set ; The image data preprocessing module is used for respectively preprocessing the obtained images in the training set, the verification set and the test set according to the required resolution; the pre-training module is used for pre-training the image data in the preprocessed training set, namely firstly randomly extracting small batches of image data from the training set for multiple times, secondly adopting rotation transformation to create enhanced copies of the images, utilizing the enhanced image data to train to obtain a plurality of feature extractors, then using the auxiliary loss of rotation angle prediction to create an output manifold with optimal data for enhancing the generalization capability of the feature extractors, adopting an orthogonal regularization method of a semantic orthogonal learning framework, calculating the correlation among all channels in the feature diagram of the image data in the training set extracted by the feature extractors, restraining the correlation with a unit matrix, and selecting the optimal feature extractors through result evaluation of the feature extractors on a verification set; The test module is used for extracting the support set by adopting the optimal feature extractor obtained by training The feature map of each image is measured according to the weight of the channel, and k channels with small weight are selected to be integrated with the channels of the global perception of other image features to be used as a support set Augmentation set of medium image samples ; Extracting a support set by adopting a feature extractor obtained by training Selecting clusters of local descriptors of foreground and background from each image feature by clustering, smoothing the background by local perception at the center of the background cluster, and taking the smoothed image feature as a support set Augmentation set of medium image samples ; Extracting a query set by adopting a feature extractor obtained by training Feature map of each image in (a) and then based on the feature map and the support set Augmentation set 、 And classifying the distances of the image features in the model to obtain the predictive label.
- 9. An apparatus for a small sample image classification method based on global and local feature augmentation, comprising a memory and a processor, wherein: A memory for storing a computer program capable of running on the processor; A processor for performing the steps of a small sample image classification method based on global and local feature augmentation as claimed in any one of claims 1-7 when said computer program is run.
- 10. A storage medium having stored thereon a computer program which, when executed by at least one processor, implements the steps of a small sample image classification method based on global and local feature augmentation as claimed in any one of claims 1 to 7.
Description
Small sample image classification method based on global and local feature augmentation Technical Field The invention relates to a computer vision technology, in particular to a small sample image classification method based on global and local feature augmentation. Background Deep learning based approaches have achieved dramatic results in various image understanding tasks, however these successes often require a huge amount of labeled training data, in image classification, training a reliable convolutional neural recognition network often requires hundreds or even thousands of training data per class, however, in many specific cases, annotation of the data is expensive, only limited labeled samples are accessible, which severely impacts the performance of the deep learning model. In contrast, humans are fully capable of learning a new visual concept from one or a few examples and quickly migrating to new data. To overcome this challenge, small sample learning has been developed, which aims at reliable and rapid learning from limited data, and has attracted extensive attention from community communities. Currently, there are various advanced methods for classifying small sample images, and these methods can be broadly classified into the following three categories: (1) A small sample image classification method based on metric learning; (2) A small sample image classification method based on optimization; (3) A small sample image classification method based on fine tuning. Wherein a small sample classification method based on metric learning aims at learning an embedding space of a feature in which data from different categories can be distinguished by a simple distance metric. There are many methods that have been proposed in this category. MATCHINGNET (MATCHING NETWORKS) the matching network uses a new nearest neighbor method with embedded feature extractor and combines the advantages of parameterized and non-parametric quantization methods to classify. The ProtoNet (prototypical network) prototype network takes the mean representation of the vectors in the same class as the prototype of the corresponding class and classifies the test samples according to their distance from the prototype of the different class. RelationNet (relation network) the relational network classifies test samples by first learning a depth metric function instead of the fixed metric function used in the small sample approach. Furthermore, many approaches attempt to apply a local feature representation to the FSL, rather than using a global feature representation in the feature space. DN4 finds the category closest to the input image by comparing local descriptors between the image and the category, wherein each local descriptor corresponds to a local region of the image, and compares the degree of similarity between the test image and the local descriptor of each category based on KNN algorithm in the classification, thereby classifying. DC-IMP directly studies local activation and fuses the results to learn the characteristics of a particular task. CrossTransformers finds the coarse spatial correspondence between the query and the marker image, and then calculates the distance between the spatially corresponding features for final classification. The optimization-based small sample classification method aims at learning good initialization, so that a model can be quickly adapted to new tasks which are not seen through a series of training sets. As a representative, the MAML (model-modeling meta-learning) model independent learning method follows a pure element training model and a second-order gradient, learns to adapt to new tasks quickly, only needs a small amount of gradient update, can be applied to any model, learns a very good model initialization parameter, and converges in the model quickly through a small amount of samples. Reptile (first-order meta-learning), like first-order MAML, uses only one-step degree information to adjust and update parameters, repeatedly samples a task to train, and continuously changes the initialized parameters based on the weight of the task training. ANIL (almost no inner loop) further explores the validity of MAML and eliminates the inner loop, which only updates the parameters corresponding to the network head during training and testing and aims to obtain the inner loop update at the last layer, so as to match the performance of MAML with less calculation cost. LEO (latent embedding optimization) learn a low-dimensional semantic embedding, decouple the meta-learning technology based on optimization from the high-dimensional space of model parameters, and the algorithm is more suitable for processing the problem of small samples. MetaOptNet (meta-LEARNING WITH differentiable convex optimization) utilizes a convex linear classifier, i.e., an implicit derivative of the optimality condition of a convex problem of a linear Support Vector Machine (SVM), and a dual formula of the optimal probl