CN-115995020-B - Small target detection algorithm based on full convolution

CN115995020BCN 115995020 BCN115995020 BCN 115995020BCN-115995020-B

Abstract

The invention provides a small target detection algorithm based on full convolution, which comprises a convolution network model part, a loss calculation part and a training parameter adjustment part, wherein the convolution network model part is used for extracting image features and predicting targets, the loss calculation part is used for calculating prediction loss during training to acquire gradients and guide a network model to conduct weight learning, the training parameter adjustment part is used for feeding data with labels into the network model to conduct forward reasoning, carrying out reverse gradient feedback through a loss function gradient, and adjusting a network learning rate and a data set according to verification accuracy to obtain optimal model weights. The invention uses multi-scale feature fusion to improve the extraction capability of different scale target features, uses double-scale target prediction to solve the problem that small targets have small information occupation ratio in a feature map, is easily influenced by large target features, and uses the real boundary calculation loss of the targets to guide a network to learn target boundary features more accurately.

Inventors

GAO MING
MIAO GONGXUN
XIONG YINGCHAO
XU JIAWEI

Assignees

中孚安全技术有限公司

Dates

Publication Date: 20260505
Application Date: 20221229

Claims (4)

1. The small target detection algorithm based on full convolution is characterized by comprising a convolution network model part, a loss calculation part and a training parameter adjustment part; The convolution network model part comprises a backbone network module, a multi-scale feature fusion module and a double-scale prediction module, wherein the backbone network module sequentially performs feature extraction of different scales on images by using a backbone network; The convolution network model part is used for extracting image features and predicting targets; the loss calculation part is used for calculating the prediction loss during training to acquire a gradient guiding network model for weight learning; the training parameter adjusting part is used for feeding the data with the labels into the network model for forward reasoning, carrying out reverse gradient feedback through the gradient of the loss function, and finally adjusting the network learning rate and the data set according to the verification precision, wherein the specific steps of the algorithm are as follows: s1, constructing a network model; step S2, constructing a loss function based on the target boundary distance, wherein the loss function based on the target boundary distance is as follows because the full convolution network performs class prediction on each pixel, so that scattered spot errors are more likely to occur: Where L pixelloss represents the loss of each pixel, classes represents the set of all classes, y true represents the label of the pixel in a class, y pred represents the predicted score of the pixel in a class, The weight coefficient is determined according to the distance between a certain category and the nearest connected domain in the label, and the calculation formula is as follows: On the upper part Indicating that pixel i is in the label region of the class to which y true belongs, Indicating that pixel i is not in the label region of the class to which y true belongs, dis indicates the distance of pixel i from the nearest label region of the class to which y true belongs, the weight further constraining the misprediction away from the correct label region; Step S3, training and parameter adjustment: S31, collecting target image data required by a task, and designating a label for the data according to a label format of semantic segmentation to obtain a data set required by training; Step S32, dividing the data set into a training set, a verification set and a test set according to the proportion, wherein the general proportion is 7:1:2, and the data set is modified according to the data volume condition; step S33, feeding the training set into the network model constructed in the step S1 for forward calculation, obtaining a prediction result, calculating a gradient by using the loss function constructed in the step S2, and reversely returning to adjust model parameters; Step S34, after training a plurality of batches, according to the accuracy performance of the verification set, adjusting the learning rate parameter, and simultaneously observing whether the model loss descending trend is positively correlated with the accuracy ascending trend of the verification set so as to avoid the occurrence of the fitting phenomenon; and step S35, finally, testing by using a testing set according to training results of training a plurality of rounds, and selecting an optimal network model as a result model to store for the next small target detection reasoning.
2. The small target detection algorithm based on full convolution according to claim 1, wherein the backbone network module is specifically as follows: Inputting a batch image pixel matrix I into a backbone network module: I=[B,C,H,W] wherein B is the number of batch images, C is the number of channels, the images with 3 channels are usually R red, G green and B blue color features when input, H is the image height, and W is the image width; three different scale features C3, C4 and C5 are output after the backbone module, wherein 3,4 and 5 represent the power of 2 times of the feature matrix scale downsampling.
3. The small target detection algorithm based on full convolution according to claim 1, wherein the multi-scale feature fusion module is specifically as follows: Dividing an input feature matrix into two parts in a channel dimension, performing convolution operation and the like on the first part, directly shorting the second part to the output tail of the module, splicing the result of the second part with the result of the first part, and finally obtaining feature matrices P3 and P4 with two scales.
4. The full convolution based small target detection algorithm according to claim 1, wherein the dual-scale prediction module is specifically as follows: the general semantic segmentation prediction structure is used for P3 and P4 to respectively obtain two prediction results with different scales, and the method is specifically implemented as follows: Taking the target with the target connected domain area being larger than 32 multiplied by 32 as a large target, and taking charge of prediction by a P4 feature matrix to obtain a large target result R4, taking the target with the target connected domain area being smaller than 32 multiplied by 32 as a small target, taking charge of prediction by a P3 feature matrix to obtain a small target result R3; when the prediction result and the label are subjected to calculation loss, different label graphs are generated according to the size targets: When the P3 predicted result loss is calculated, a small target label is used for calculation, and when the P4 predicted result loss is calculated, a large target label is used for calculation; In the small target label graph, a small target is used as a first-level label area, and a large target is used as a second-level label area; In the training stage, the first-level label area normally calculates the loss caused by each pixel, the second-level label area does not calculate the loss caused by the pixels predicted as the background area, the target which does not belong to the scale is not predicted, and the influence caused by the feature conflict is prevented; and taking out the connected domain conforming to the small target rule in the small target result graph R3, and covering the connected domain to the large target result graph R4 to obtain a final prediction result R.

Description

Small target detection algorithm based on full convolution Technical Field The invention belongs to the technical field of small target detection, and particularly relates to a small target detection algorithm based on full convolution. Background Image-based object detection tasks are an important research focus in the field of computer vision. Targets with pixel areas smaller than 32×32 are regarded as small targets in the coco target detection dataset, and this type of target has been a difficulty of research because of the small amount of information. In recent years, due to the progress of computing equipment and deep learning theory, the accuracy of target detection tasks is greatly improved, and research on small target detection gradually shows a certain effect. Researchers have four main directions of research aiming at the characteristics of small targets. The method comprises the steps of firstly increasing the existence of small targets in an image through a data enhancement strategy according to the small target pixel occupation ratio, such as random clipping, random scaling, target region copy and paste, GAN generation of small targets and the like, secondly extracting multi-scale features with small target area size, and further improving the extraction capacity of information with different scales of the image, such as FPN and various derivation methods of FPN, by fusing feature matrixes with different depths and different scales in a backbone network, and thirdly considering that related relations between targets and scenes and between targets and the targets, such as the possibility of fish in water is greater than the possibility of fish in the sky, according to the image global feature fusion. In the method, a channel attention mechanism and a space attention mechanism are generally used for realizing the learning and the utilization of global feature information, and fourth, an anchor frame-free mechanism aiming at a small target is adopted, in the target detection method based on the anchor frame, the size of the anchor frame is generally set according to priori experience, and the content in the anchor frame is calculated for classification or regression. In the method, positive and negative sample division is needed, a small target generates larger precision fluctuation when slightly deviation is generated in the cross-merging ratio calculation, so that the small target is difficult to learn, and after the target detection method without an anchor frame appears, researchers improve the detection effect of the small target, for example, an enhanced feature extraction network is used for directly predicting the center point and the size of the target frame. The method based on data enhancement, multi-scale feature extraction and global feature fusion shows general performance improvement in the field of target detection, and can be used as a plug-in module flexibly in various models of target detection. The method breaks through the definition of the prefabricated anchor frame based on the anchor-free frame target detection algorithm, and alleviates the imbalance problem of small target sample positive samples in the target detection process on a prediction mechanism, but the anchor-free frame detection method still generally uses the maximum external rectangle of the target to calculate the target frame intersection ratio to generate loss, so as to guide network learning. Disclosure of Invention The invention provides a small target detection algorithm based on full convolution, which uses multi-scale feature fusion to improve the extraction capability of target features of different scale information, uses double-scale target prediction to solve the problem that the information duty ratio in a small target feature map is less and is easily influenced by large target features, and uses the real boundary calculation loss of a target to guide a network to learn target boundary features more accurately. The invention adopts the following technical scheme for solving the technical problems: The small target detection algorithm based on full convolution comprises a convolution network model part, a loss calculation part and a training parameter adjustment part; the convolution network model part comprises a backbone network module, a multi-scale feature fusion module and a double-scale prediction module, wherein the backbone network module sequentially performs feature extraction of different scales on images by using a backbone network; The convolution network model part is used for extracting image features and predicting targets; the loss calculation part is used for calculating the prediction loss during training to acquire a gradient guiding network model for weight learning; the training parameter adjusting part is used for feeding the data with the labels into the network model for forward reasoning, carrying out reverse gradient feedback through the gradient of the loss function, a