CN-121305045-B - Remote sensing small target detection method and device based on density-oriented attention enhancement

CN121305045BCN 121305045 BCN121305045 BCN 121305045BCN-121305045-B

Abstract

A remote sensing small target detection method and device based on density-oriented attention enhancement comprises the steps of obtaining a preprocessed remote sensing image data set, constructing a remote sensing small target detection model based on density-oriented attention enhancement, wherein the remote sensing small target detection model is structurally characterized in that a main network, a deformable encoder, a feature alignment module, a density map generation module, a feature guide enhancement module, a deformable decoder and an output layer are sequentially connected from an input layer, a training set is input into the constructed remote sensing small target detection model to train to obtain a trained remote sensing small target detection model, the trained remote sensing small target detection model is verified by a verification set to obtain a final remote sensing small target detection model, and a test set is input into the final remote sensing small target detection model to finish recognition of different targets in the test set. The invention improves the accuracy and reliability of small targets while maintaining the advantages of the end-to-end detection frame.

Inventors

LIU JINGJING
WANG HUIQIONG
SUN LI
SONG MINGLI
LI CHUNGUANG

Assignees

浙江大学

Dates

Publication Date: 20260508
Application Date: 20251205

Claims (6)

1. The remote sensing small target detection method based on density-oriented attention enhancement is characterized by comprising the following steps of: Step 1, acquiring a preprocessed remote sensing image data set, and dividing the preprocessed remote sensing image data set into a training set, a verification set and a test set according to a proportion, wherein the method comprises the following steps: Step 1.1, acquiring a remote sensing image containing a small target, marking a class label and a boundary frame coordinate of the small target, and constructing a data set; Step 1.2, preprocessing the image in the data set, including data size normalization, data enhancement and pixel value normalization; Step 1.3, dividing the preprocessed data set into a training set, a verification set and a test set according to a set proportion; The method comprises the steps of 2, constructing a remote sensing small target detection model based on density-oriented attention enhancement, wherein the remote sensing small target detection model is structurally characterized in that a main network, a deformable encoder, a feature alignment module, a density map generation module, a feature guide enhancement module, a deformable decoder and an output layer are sequentially connected from an input layer, the main network is used for extracting multi-scale features, the deformable encoder is used for primarily enhancing the multi-scale features, the feature alignment module is used for realizing space alignment fusion of the multi-scale features, the density map generation module is used for generating a density map reflecting small target distribution probability for the fused features, the feature guide enhancement module is used for conducting attention enhancement on the multi-scale features again by utilizing the density map, the deformable decoder is used for receiving the finally enhanced multi-scale features and outputting a prediction result through decoding operation, and the output layer is used for outputting final target types and boundary frame coordinates; Step 3, inputting the training set into the built remote sensing small target detection model for training to obtain a trained remote sensing small target detection model; and 4, inputting the test set into a final remote sensing small target detection model to finish the identification of different targets in the test set.
2. The remote sensing small target detection method based on density-guided attention enhancement as set forth in claim 1 is characterized in that the structure of the remote sensing small target detection model further comprises a main network adopting ResNet architecture and comprising five convolution stages, wherein P2, P3, P4 and P5 are sequentially output multi-scale feature graphs, P2 retains small target detail information, P5 comprises deep semantic information, the deformable encoder is composed of a plurality of deformable attention layers, each attention layer comprises a multi-head deformable self-attention module and a feed-forward network, P2-P5 features output by the main network are enhanced, optimized multi-scale features S2-S5 are generated, the feature alignment module sequentially comprises a first convolution layer, a bilinear interpolation up-sampling layer, a feature splicing layer, a second convolution layer, a deformable convolution layer and a residual error connecting layer, the density graph generation module sequentially comprises a third convolution layer, three cavity layers, a spatial attention sub-module and a fourth convolution enhancement layer for output conversion, each attention layer comprises a multi-head deformable self-attention module and a feed-forward network, the two feature guide sub-layers comprise a multi-scale feature graph, the two-frame and the two-head-profile sub-modules are connected with each other in a cascade mode, and the two-frame-type error prediction modules comprise a full-rate prediction frame, and the two-rate-frame-rate prediction sub-modules are formed by the two-frame-rate prediction error rate prediction modules are connected with each other, and the two-frame-rate prediction sub-frame modules comprise a full-rate prediction frame.
3. The remote sensing small target detection method based on density-oriented attention enhancement according to claim 2, wherein the process of building the remote sensing small target detection model further comprises the steps of setting the attention head number and the characteristic dimension of the deformable encoder and the deformable decoder, designing the convolution kernel size, the convolution step length and the convolution kernel number of each convolution layer of each module according to the input and functional requirements of each module of a network, and setting the target query vector number of the deformable decoder according to the distribution density of targets in a data set.
4. The method for detecting a small target based on density-guided attention enhancement as recited in claim 3, wherein said step 3 comprises: Step 3.1, forward propagation is carried out on the training set according to a batch input model, and the input image is processed by a backbone network, a deformable encoder, a feature alignment module, a density map generation module, a feature guide enhancement module and a deformable decoder to output target class probability and boundary frame prediction; Step 3.2, calculating a Loss function, wherein the Loss function comprises the steps of adopting Focal Loss as the classification Loss, adopting GIoU Loss as the regression Loss, and taking the total Loss as the weighted sum of the classification Loss and the regression Loss; setting an initial learning rate, a weight attenuation coefficient and a learning rate strategy by adopting a AdamW optimizer, and carrying out iterative updating on model parameters; Step 3.4, during the training process, periodically evaluating the performance of the model by using a verification set, and storing the model weight with optimal performance; And 3.5, taking the model with the optimal performance of the verification set as a final remote sensing small target detection model.
5. A density-guided attention-enhancement based remote sensing small target detection device comprising a memory and one or more processors, the memory having executable code stored therein, the one or more processors, when executing the executable code, operative to implement the density-guided attention-enhancement based remote sensing small target detection method of any one of claims 1-4.
6. A computer readable storage medium, having stored thereon a program which, when executed by a processor, implements the density-guided attention-based remote sensing small target detection method of any one of claims 1 to 4.

Description

Remote sensing small target detection method and device based on density-oriented attention enhancement Technical Field The invention relates to the technical fields of remote sensing technology, computer vision and target detection intersection, in particular to a method and a device for detecting a remote sensing small target based on density-oriented attention enhancement. Background In complex and diverse remote sensing application scenarios, small target detection technology is a core requirement in the fields of environmental monitoring, disaster early warning, homeland planning and the like. The remote sensing image has the characteristics of wide coverage, high background complexity, obvious target scale difference and the like, and particularly in the scenes of ocean monitoring, farmland land block recognition and the like, targets such as ships, small buildings and the like often have the characteristics of low pixel occupation ratio and sparse characteristic information. The DETR model based on the transducer architecture effectively avoids the problem of anchor frame design deviation through an end-to-end detection mechanism, and the global attention mechanism can capture the correlation characteristics of targets and complex backgrounds in remote sensing images, but the global attention mechanism still has a bottleneck in remote sensing small target detection. Firstly, the problem of spatial dislocation exists in the process of multi-scale feature fusion, the existing method directly fuses features from different levels of a backbone network, and the problem of misalignment of feature map space caused by downsampling operation is ignored. The geometric distortion can not accurately match shallow details with deep semantics, and severely restricts the positioning accuracy of small targets. Second, the attention mechanisms employed tend to lack effective guidance. The global attention computation of DETR is costly and it is difficult to automatically focus on objects with very small pixel duty cycles in complex backgrounds in guided situations, resulting in flooding of small object features and high miss rates. Disclosure of Invention Aiming at the defects of the prior art, the invention provides the remote sensing small target detection method and the remote sensing small target detection device based on density guidance attention enhancement, which take the remote sensing small target under a complex background as a detection target and realize the collaborative design of alignment, density guidance and attention enhancement, so that the detection accuracy and recall rate of the small target are effectively improved while the detection efficiency is ensured. The invention provides a remote sensing small target detection method based on density-oriented attention enhancement, which comprises the following steps: Step 1, acquiring a preprocessed remote sensing image data set, and dividing the preprocessed remote sensing image data set into a training set, a verification set and a test set according to a proportion; Step 2, constructing a remote sensing small target detection model based on density-oriented attention enhancement, wherein the remote sensing small target detection model has the structure that a main network, a deformable encoder, a feature alignment module, a density map generation module, a feature guide enhancement module, a deformable decoder and an output layer are sequentially connected from an input layer; Step 3, inputting the training set into the built remote sensing small target detection model for training to obtain a trained remote sensing small target detection model; and 4, inputting the test set into a final remote sensing small target detection model to finish the identification of different targets in the test set. Further, the step 1 includes: Step 1.1, acquiring a remote sensing image containing a small target, marking a class label and a boundary frame coordinate of the small target, and constructing a data set; Step 1.2, preprocessing the image in the data set, including data size normalization, data enhancement and pixel value normalization; and 1.3, dividing the preprocessed data set into a training set, a verification set and a test set according to a set proportion. Further, the backbone network in step 2 is used for extracting multi-scale features, the deformable encoder is used for primarily enhancing the multi-scale features, the feature alignment module is used for realizing space alignment fusion of the multi-scale features, the density map generation module is used for generating a density map reflecting small target distribution probability for the fused features, the feature guiding enhancement module is used for guiding attention to strengthen the multi-scale features again by using the density map, the deformable decoder is used for receiving the finally enhanced multi-scale features and outputting a prediction result through decoding operation, an