CN-122024045-A - Remote sensing image detection method based on posterior dynamic query and density attention
Abstract
The invention discloses a remote sensing image detection method based on posterior dynamic query and density attention, which is based on an RT-DETR model and provides a density self-adaptive attention module (DAA). The module predicts the regional target density through a density estimation network, expands the receptive field by adopting a large convolution kernel for a high-density region to capture target association, and adopts a small convolution kernel for a low-density region to retain target details, and combines multi-branch feature processing and a weighted fusion strategy to realize dynamic allocation of computing resources and inhibit redundant background noise. Meanwhile, a Posterior Dynamic Query Decoder (PDQD) is designed, a newly added cross attention module realizes the dynamic interaction of query and image features, a multi-round iterative updating mechanism is used for continuously optimizing query vectors based on preliminary prediction results, and the feature capturing capability of small targets and shielding areas is enhanced. The method integrates posterior dynamic query decoding and a density self-adaptive attention mechanism, and effectively improves the target detection performance of the remote sensing image.
Inventors
- ZHANG XIAODONG
- XUE JIAHUI
Assignees
- 沈阳工业大学
Dates
- Publication Date
- 20260512
- Application Date
- 20260121
Claims (10)
- 1. A remote sensing image detection method based on posterior dynamic query and density attention is characterized by comprising the following steps: step 1, acquiring a remote sensing image, preprocessing the image, and acquiring a standardized image; Step 2, constructing a reference model framework based on RT-DETR-R18, inputting a standardized image into a backbone network, and extracting multi-scale basic features; Step 3, enhancing the basic features through a density self-adaptive attention module, and dynamically adjusting a feature extraction strategy according to the regional target density to obtain a fusion feature map; step 4, inputting the fusion feature map into a posterior dynamic query decoder, and mining key features in a complex scene through a dynamic query generation and iteration update mechanism; And 5, based on the prediction result output by the decoder, completing the matching of the query and the real target through a Hungary algorithm, calculating the parameters of the comprehensive loss function optimization model, and outputting a final detection result.
- 2. The remote sensing image detection method based on posterior dynamic query and density attention according to claim 1 is characterized in that in step 1, preprocessing is carried out on an obtained remote sensing image, the preprocessing comprises image resolution standardization and pixel intensity normalization processing, consistency of data distribution is guaranteed, model generalization capability is improved by adopting comprehensive data enhancement operation, the comprehensive data enhancement operation comprises random 90-degree rotation, random horizontal overturn, brightness contrast adjustment and Gaussian noise addition, and all processed images and corresponding target labeling information thereof are stored in a standardized format to form a standardized image.
- 3. The remote sensing image detection method based on posterior dynamic query and density attention as claimed in claim 1, wherein in the step 2, a reference model architecture comprises a main network, a high-efficiency encoder and an initial decoder, wherein the main network adopts ResNet-18 architecture to extract image multi-scale basic features, different level features are primarily fused through a multi-scale fusion network, the encoder part adopts Encoder layers of a transducer to process deep features output by the main network, position coding is introduced to inject spatial position information, and the perception capability of the model on a target spatial relationship is enhanced.
- 4. The method for detecting a remote sensing image based on posterior dynamic query and density attention according to claim 1, wherein in step 3, the density adaptive attention module comprises a density estimation network, a multi-branch feature processing unit and a density weighted fusion module: the density estimation network carries out density prediction on the input feature map through convolution operation and outputs a target density score of each region; The multi-branch characteristic processing unit is provided with three parallel branches, 3×3 convolution, 5×5 convolution and 7×7 convolution are adopted respectively, and all the branches adopt depth separable convolution to reduce the parameter; the high-density area increases the branch weight of 7 multiplied by 7 large convolution kernel to enlarge the receptive field and capture the related information of the target, the low-density area increases the branch weight of 3 multiplied by 3 small convolution kernel to reserve the details of the small target, and the medium-density area fuses the receptive field and the detail characteristics through multi-branch weighting to realize the dynamic allocation of computing resources; The density weighted fusion module carries out weighted fusion on the output characteristics of each branch, dynamically adjusts the characteristic information weight, supplements key characteristics through residual connection, enhances the adaptability of the model to complex background and density change, and acquires an enhanced characteristic diagram.
- 5. The remote sensing image detection method based on posterior dynamic query and density attention according to claim 4, wherein the enhancement feature map output by the density self-adaptive attention module and the shallow middle layer feature output by the main network are subjected to multistage linkage fusion, bidirectional fusion logic from top to bottom and from bottom to top is adopted, firstly, bilinear interpolation upsampling is carried out on the enhancement feature map, channel splicing is carried out on the enhancement feature map and the shallow middle layer feature map, complementary information is strengthened through a fusion unit guided by sparse attention after channel dimension is adjusted through 1 x 1 convolution, multi-scale fusion features are output, the receptive field is further enlarged through expansion convolution layers with expansion rates of 2,4 and 6 respectively, semantic and detail information of different layers are fused in parallel, and finally the multi-scale fusion feature map is output.
- 6. The remote sensing image detection method based on posterior dynamic query and density attention according to claim 1, wherein in step 4, global feature coding is performed on the fusion feature map, an initial query vector set is generated, confidence scores of each query vector are calculated through an uncertainty evaluation function, high confidence query vectors with 80% of confidence are screened out, redundant invalid queries are removed, the screened query vectors are input into a posterior dynamic query decoder, and the posterior dynamic query decoder comprises a cross attention module, a dynamic query update mechanism and an anchor frame prediction unit: The cross attention module carries out first cross attention calculation on the query vector and the fusion feature map, realizes preliminary matching of the query and the image features, and outputs a preliminary prediction result; The dynamic query updating mechanism calls the decoder through multiple rounds of iteration in the reasoning stage, calculates the feature response weight of the target area based on the preliminary prediction result, dynamically adjusts the feature distribution of the query vector, continuously optimizes the query vector, breaks through the interaction limit of the fixed layer number, and fully excavates the feature details of the complex scene; And finally, generating a target category label and a boundary frame coordinate through an anchor frame prediction unit.
- 7. The remote sensing image detection method based on posterior dynamic query and density attention according to claim 1, wherein in step 5, the matching of the query and the real target frame is realized by adopting a Hungary algorithm, and the matching standard combines the category consistency and the bounding box intersection ratio, and the comprehensive loss function comprises classification loss, bounding box regression loss and matching loss, wherein the specific formula is as follows: the classification loss adopts cross entropy loss, and the accuracy of classification prediction is measured: ; Wherein, the For model pair number Individual query prediction categories Is a function of the probability of (1), Is the number of queries; The regression loss of the boundary frame adopts smooth L1 loss, and the prediction precision of the boundary frame is measured: ; Wherein, the And Respectively the first Prediction frame and real frame coordinates of the individual queries; the matching loss measures the quality of the matching of the query to the target frame: ; Wherein, the And Respectively the first The true category and the predicted category of the individual targets, And A real bounding box and a prediction bounding box, respectively; The total loss function is: ; and optimizing model parameters by minimizing a total loss function, and finally outputting an image detection result of the target category, the boundary frame coordinates and the confidence score.
- 8. The remote sensing image detection method based on posterior dynamic query and density attention according to claim 7 is characterized in that in step 5, post-processing optimization is further carried out on an image detection result, a repeated detection frame is removed by adopting a non-maximum suppression algorithm, an IOU threshold is set to be 0.5, the multiple labeling problem caused by target overlapping is solved, a confidence coefficient compensation mechanism is introduced, secondary feature verification is carried out on small targets with the confidence coefficient of 0.3 and 0.5, the detection reliability of the low-confidence small targets is improved by matching corresponding region features in a feature map, and finally a standardized detection result is output.
- 9. The remote sensing image detection system adopting the remote sensing image detection method based on posterior dynamic query and density attention as claimed in claim 1 is characterized by comprising a backbone network, a high-efficiency hybrid encoder, a cross-level feature fusion module, an uncertainty query selection module, a posterior dynamic query decoder and a result post-processing module, wherein the modules are in seamless connection through feature channels.
- 10. A computer readable storage medium containing a computer program which, when executed by a processor, is capable of implementing the method for detecting a remote sensing image based on posterior dynamic query and density attention as set forth in claim 1, processing the remote sensing image and generating a detection result.
Description
Remote sensing image detection method based on posterior dynamic query and density attention Technical Field The invention relates to the field of remote sensing image target detection, in particular to a remote sensing image detection method based on posterior dynamic query and density attention, which is suitable for accurately detecting targets in remote sensing images in complex scenes and can be widely applied to the fields of urban planning, military monitoring, agricultural monitoring, post-disaster evaluation and the like. Background With the rapid development of remote sensing technology, the remote sensing technology is widely applied in various fields. Particularly in the important fields of urban planning, military monitoring, agricultural monitoring, post-disaster evaluation and the like, remote sensing images have become an indispensable tool. However, in these complex scenarios, how to accurately detect the target information from the remote sensing image, especially the resolution and identification of the target in the complex background, has gradually become a key technical means for improving the intelligent decision making and data analysis capabilities. The complex background refers to image interference in the remote sensing image caused by various external factors, wherein the factors comprise illumination change, influence of climatic conditions, natural obscuration, uneven density of targets in the image and the like. Illumination variations, shadows, and weather factors (e.g., rain, fog, snow, etc.) often cause the visibility of images and the recognition of objects to be greatly reduced. And in environments such as cities, the interference of dynamic elements such as dense buildings, traffic and the like often occurs, so that the boundary of a target is undefined or confused with the background, and the difficulty of target detection is further increased. First, target density non-uniformity in a remote sensing image is a typical problem. In a wide range of remote sensing images, the target distribution often appears highly non-uniform. Second, some regions may be very dense with targets, while other regions may have only individual targets, or even no targets. Such density differences make it difficult for the detection algorithm to process objects in different areas equally, especially in low density areas where small objects are easily ignored or misjudged, and in high density areas where occlusion or overlap between objects may occur, resulting in detection difficulties. Secondly, the difficulty of target detection is further exacerbated by the complex background in the remote sensing image. In natural environment, remote sensing images are often interfered by factors such as cloud layers, building shadows, vegetation and the like, and the background elements are easy to generate similar visual characteristics with targets, so that the extraction of the target characteristics is seriously influenced, and the detection accuracy is reduced. In particular, in a high-resolution remote sensing image, the boundary of a target is not clear, and complex background noise and shielding enable visual features of the target to be more fuzzy, so that the characterization capability is insufficient. The interaction between the query vector and the features of the original RT-DETR model is limited to a fixed layer number, the interaction cannot be dynamically adjusted according to preliminary prediction after the interaction is output, the feature excavation is insufficient under a complex scene, the model characterization capability is limited, a AIFI module adopts processing logic of a fixed receptive field, the scene with the severely-changed target density cannot be adapted, the target associated information is difficult to capture in a high-density area, redundant background noise is easy to introduce in a low-density area, the detection precision is influenced, and the calculation resource waste is caused. Therefore, development of a remote sensing image detection technology capable of adapting to complex scenes and improving detection accuracy and robustness is needed. Disclosure of Invention In view of the shortcomings of the prior art, the invention aims to provide a remote sensing image detection method based on posterior dynamic query and density attention, and aims to solve the problems that in the prior art, the remote sensing image detection has limited characterization capability in a complex scene and poor adaptability to target density unevenness. In order to achieve the above purpose, the invention adopts the following technical scheme: in a first aspect, a remote sensing image detection method based on posterior dynamic query and density self-adaptive attention includes: step 1, acquiring a remote sensing image, preprocessing the image, and acquiring a standardized image; Step 2, constructing a reference model framework based on RT-DETR-R18, inputting a standardized image