CN-121437867-B - Small sample target detection method, device and storage medium based on style augmentation

CN121437867BCN 121437867 BCN121437867 BCN 121437867BCN-121437867-B

Abstract

The invention provides a small sample target detection method based on style augmentation, which comprises the following steps of extracting content features of query images and style features of support images, generating style augmentation samples by fusing the content features and the style features, extracting embedded features of the style augmentation samples, calculating angular similarity between the embedded features and center vectors of various categories, applying angle intervals to the angular similarity of target categories, scaling the angular similarity of all categories by using scaling factors, outputting category labels of the targets by using a classifier, and outputting a position boundary box of the targets by using a detection head to obtain a final detection result. According to the invention, by constructing a technical framework of collaborative optimization of data layer style augmentation and feature layer discrimination constraint, on the premise of not damaging foreground semantics, the distribution diversity of training data is remarkably expanded, and a strong discrimination measurement standard is constructed in a high-dimensional feature space, so that the detection precision and robustness of a model in a data scarcity scene are improved.

Inventors

FENG YONG
HUANG YAN
WU ANG

Assignees

武汉卓目科技股份有限公司

Dates

Publication Date: 20260508
Application Date: 20251230

Claims (10)

1. The small sample target detection method based on style augmentation is characterized by comprising the following steps of: S1, extracting characteristics, namely respectively encoding an input query image and a support image, and extracting content characteristics of the query image and style characteristics of the support image; S2, style fusion, namely fusing the content characteristics and the style characteristics by using a self-adaptive instance normalization method to generate a style enhancement sample; s3, similarity measurement, namely sending the style enhancement sample into a detection backbone network to extract embedded features, and calculating the angular similarity between the embedded features and the predefined center vectors of each class; s4, discriminant constraint, namely applying an angle interval to the angle similarity of the target category, and scaling the angle similarity of all categories by using a scaling factor; s5, detecting and outputting, namely outputting class labels of the targets through a classifier based on the scaled similarity of each class, and outputting a position boundary box of the targets by utilizing a detection head to obtain a final detection result.
2. The method for small sample target detection based on style augmentation as claimed in claim 1, wherein in step S1, a pretrained convolutional neural network is used as a shared encoder for the query image Supporting images Coding, extracting content features on L network layers respectively And style characteristics , wherein, And calculating the channel mean and standard deviation of each layer of features.
3. The small sample target detection method based on style augmentation as claimed in claim 2, wherein in step S2, the adaptive instance normalization operation is performed according to the following formula: ; Wherein, the Representing adaptive embodiment normalization operations; The mean value is represented as such, Representing standard deviation.
4. A method for small sample object detection based on style augmentation as claimed in claim 3, wherein in step S2, a learnable fusion coefficient is introduced for each layer during multi-layer feature fusion The fusion characteristics of the i-th layer are: ; ; Wherein, the For the sigmoid activation function, As a matrix of weights that can be learned, For controlling the balance of content maintenance and style injection.
5. The method for detecting a small sample target based on style augmentation as claimed in claim 1, wherein the step S3 specifically comprises: For embedded feature x and class center vector Respectively carrying out L2 normalization: ; Wherein, the And Representing the normalized embedded feature and the j-th class center vector, An L2 norm representing the vector; and calculating cosine values between the normalized embedded features and the class center vector as angle similarity: ; Wherein, the Is vector quantity And (3) with An included angle between the two.
6. The method for detecting a small sample target based on style augmentation as claimed in claim 1, wherein the step S4 specifically comprises: For the target class y, its angular similarity is determined Replaced by Wherein m is an additive angular interval; And multiplying the angle similarity of all the categories by a scaling factor s to obtain the scaled similarity of each category as a classification logic value.
7. The method for detecting a small sample object based on style augmentation as claimed in claim 6, wherein in the step S5, the method for outputting the class label of the object through the classifier is as follows: using Arcface functions as classifiers, converting the classification logic values into class prediction probabilities: ; Wherein, the Representing the predicted probability that the embedded feature x belongs to the true class y.
8. The method for detecting a small sample object based on style augmentation as claimed in claim 7, wherein in step S5, the detection head adopts a regional advice network and a regression branch; the regional suggestion network is used for generating candidate regions on the feature map; the regression branch is used for predicting position coordinates (x, y, w, h) of the candidate region, wherein x, y, w, h respectively represent center coordinates and length and width of the candidate region; And screening out a final detection result (box, cls) by applying a non-maximum suppression algorithm based on the category prediction probability and the predicted position coordinates, wherein the box represents a position boundary box of the target, and the cls represents a category label of the target.
9. An electronic device comprising a memory and a processor, the memory having stored thereon a computer program executable on the processor, characterized in that the processor, when executing the computer program, implements the steps of the object detection method according to any of claims 1 to 8.
10. A computer readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the steps of the object detection method according to any one of claims 1 to 8.

Description

Small sample target detection method, device and storage medium based on style augmentation Technical Field The invention relates to the technical field of image detection and identification, in particular to a small sample target detection method, device and storage medium based on style augmentation. Background In recent years, the deep learning technology has been remarkably successful in the field of target detection, but the performance of the deep learning technology is seriously dependent on the support of large-scale high-quality labeling data. However, in practical application scenarios, such as remote sensing, industrial quality inspection, and medical image analysis, the cost of collecting and labeling a large number of new types of targets with sufficient data is high, and even difficult to achieve. This strong dependence on data greatly limits the popularization and application of traditional depth detection models. To solve the above problems, a small sample object detection (FSOD) technique has been developed. The core idea is to have the model pre-trained on the base class with a large number of samples to obtain generic feature representation capabilities, and then to adapt quickly with only a small number of samples (i.e. "support samples") for each new class. While this paradigm eases the dependence on data, existing approaches still face significant challenges in adapting to new classes. Firstly, the limited sample number on the data layer leads to serious insufficient intra-class diversity, the model is very easy to fit to few samples seen in training, and the model is difficult to generalize to different forms, illumination and backgrounds of similar targets. Traditional data enhancement methods (such as flipping, rotating and the like) can only bring limited linear transformation, and the method based on the generated model is difficult to enhance background diversity and effectively keep the semantic structure of a foreground object unchanged. Second, at the feature discriminant level, in high-dimensional feature space, conventional softmax penalty functions tend to learn separable features, but it is difficult to ensure that the learned features are sufficiently discriminant, i.e., feature distributions of the same class are not sufficiently compact, but lack sufficient spacing between features of different classes. This phenomenon is greatly amplified under the condition of few samples with rare samples, and limited samples make the intra-class variance control and the inter-class distinction extremely difficult, which ultimately results in a large amount of inter-class confusion of the model during testing. In summary, the existing few-sample target detection technology mainly has the following bottleneck that the data enhancement mode is single, the sample diversity and the semantic consistency are difficult to be considered, and the characteristic embedding space is lack of effective discrimination constraint, so that intra-class dispersion and inter-class confusion are caused. Therefore, an innovative method for cooperatively solving the data diversity and feature discrimination enhancement is urgently needed to improve the robustness and generalization capability of the model in the data scarcity scene. Disclosure of Invention The invention provides a small sample target detection method, device and storage medium based on style augmentation, which solve the problems of how to ensure that foreground semantics are not destroyed, remarkably increase background and environmental diversity to improve model generalization capability, and how to establish effective discrimination constraint in a high-dimensional embedding space to improve intra-class polymerization property, inter-class separability and the like under the condition that training samples are extremely scarce in the prior art. The technical scheme of the invention is realized as follows: the first aspect of the invention provides a small sample target detection method based on style augmentation, which comprises the following steps: S1, extracting characteristics, namely respectively encoding an input query image and a support image, and extracting content characteristics of the query image and style characteristics of the support image; S2, style fusion, namely fusing the content characteristics and the style characteristics by using a self-adaptive instance normalization method to generate a style enhancement sample; s3, similarity measurement, namely sending the style enhancement sample into a detection backbone network to extract embedded features, and calculating the angular similarity between the embedded features and the predefined center vectors of each class; s4, discriminant constraint, namely applying an angle interval to the angle similarity of the target category, and scaling the angle similarity of all categories by using a scaling factor; s5, detecting and outputting, namely outputting class labels of t