CN-121982569-A - Remote sensing image rotation target detection method and system based on joint attention mechanism

CN121982569ACN 121982569 ACN121982569 ACN 121982569ACN-121982569-A

Abstract

The invention provides a remote sensing image rotation target detection method and system based on a joint attention mechanism, which comprise the steps of carrying out feature extraction by utilizing a convolutional neural network based on acquired remote sensing image data to obtain a multi-scale feature map, carrying out semantic space coupling processing on the multi-scale feature map through the joint attention mechanism to obtain a target orientation feature descriptor, carrying out feature refinement on the multi-scale feature map to obtain a refined feature map, generating a rotation candidate frame according to the refined feature map, carrying out self-adaptive sample screening and rotation alignment processing on the rotation candidate frame, and predicting the rotation aligned feature by utilizing an isovariable geometry convolution detection head to output the category information and rotation boundary frame parameters of a target.

Inventors

Xia Caifeng
GAO HONGWEI
YANG WEI
LIU BO

Assignees

沈阳理工大学

Dates

Publication Date: 20260505
Application Date: 20260123

Claims (10)

1. The remote sensing image rotation target detection method based on the joint attention mechanism is characterized by comprising the following steps of: Acquiring remote sensing image data to be detected; Based on the remote sensing image data, performing feature extraction by using a preset convolutional neural network to obtain a multi-scale feature map; Carrying out semantic space coupling processing on the multi-scale feature map through a joint attention mechanism to obtain a target orientation feature descriptor, and carrying out feature refinement on the multi-scale feature map based on the target orientation feature descriptor to obtain a refined feature map; generating a rotation candidate frame according to the refined feature map, and carrying out self-adaptive sample screening on the rotation candidate frame to obtain a screened rotation candidate frame; And predicting the candidate region features by utilizing an isogeometric convolution detection head, and outputting the class information and the rotation boundary frame parameters of the target as rotation target detection results of the remote sensing image data.
2. The method of claim 1, wherein the performing feature extraction using a predetermined convolutional neural network based on the remote sensing image data to obtain a multi-scale feature map comprises: Inputting the remote sensing image data into a preset convolutional neural network trunk for feature extraction to obtain multi-level feature graphs corresponding to different network levels; carrying out hierarchical division on the multi-level feature map to obtain a deep feature map and a shallow feature map; Performing cross-scale feature fusion on the deep feature map and the shallow feature map through a feature pyramid network to generate a multi-scale feature map; The deep feature map is a feature map with high semantic information and low spatial resolution; The shallow feature map is a feature map with low semantic information and high spatial resolution.
3. The method of claim 1, wherein the performing semantic space coupling processing on the multi-scale feature map through a joint attention mechanism to obtain a target orientation feature descriptor, and performing feature refinement on the multi-scale feature map based on the target orientation feature descriptor to obtain a refined feature map, includes: Performing multi-angle rotation pooling on the multi-scale feature map to obtain feature responses under different rotation views; splicing the characteristic responses under the different rotation views to generate a target orientation characteristic descriptor; carrying out global average pooling on the multi-scale feature map to obtain an average feature vector; carrying out global maximum pooling on the multi-scale feature map to obtain a global feature vector; Fusing the target orientation feature descriptors, the average feature vectors and the global feature vectors to obtain fused feature vectors, and activating the fused feature vectors through a multi-layer perceptron to generate a channel attention map; generating a spatial attention map based on the channel attention map; adaptively adjusting the fusion proportion of the channel attention force diagram and the space attention force diagram through a learnable gating factor, and carrying out weighted fusion on the adjusted channel attention force diagram and the space attention force diagram and the multi-scale feature diagram to output a refined feature diagram; the target orientation feature descriptors are used for representing potential global direction information of targets in the remote sensing image data.
4. The method of claim 3, wherein generating a spatial attention profile based on the channel attention profile comprises: Carrying out weighting operation on the channel attention map and the multi-scale feature map to obtain a feature map after channel weighting; Respectively carrying out average pooling and maximum pooling on the feature images weighted by the channels in the channel dimension to obtain corresponding pooled feature images; Spatially upsampling the channel attention map to obtain an upsampled channel attention map; and splicing the pooled feature map and the up-sampled channel attention map, and generating a space attention map through convolution operation on the spliced feature map.
5. The method of claim 1, wherein the generating a rotation candidate box from the refined feature map comprises: Presetting rotary anchor frames with different scales, length-width ratios and rotation angles at a plurality of spatial positions of the refined feature map; Generating a corresponding rotation candidate frame according to the rotation anchor frame and the refined feature map; Wherein the rotation candidate frame comprises one or more of position information, scale information and rotation angle information of the target.
6. The method of claim 1, wherein the adaptively screening the rotation candidate boxes to obtain screened rotation candidate boxes comprises: Obtaining geometric parameters of the rotation candidate frame and response information of the rotation candidate frame on the refined feature map; evaluating the sample validity of the rotation candidate frame according to the geometric parameter and the response information; According to the sample effectiveness, carrying out self-adaptive screening on the rotation candidate frames to obtain screened rotation candidate frames; Wherein the geometric parameters comprise one or more of aspect ratio information, angle information and whether center coordinate information is contained; the response information comprises characteristic response intensity and confidence.
7. The method of claim 1, wherein the performing rotation alignment processing on the features corresponding to the filtered rotation candidate frames to obtain candidate region features includes: Determining a rotation region of interest according to the position information and the rotation angle information of the rotation candidate frame after screening; Extracting region features from the refined feature map based on the rotated region of interest; and carrying out rotation alignment processing on the region features to obtain candidate region features.
8. The method of claim 1 or 7, wherein predicting the candidate region feature by using a constant geometry convolution detection head, and outputting the class information of the target and the rotation bounding box parameter as the rotation target detection result of the remote sensing image data, comprises: Decomposing the rotation region features into a radial component and a tangential component; Based on the position information and the rotation angle information of the screened rotation candidate frame, combining the radial component and the tangential component, performing parameterization configuration on convolution kernel sampling points in the constant geometry convolution detection head, and generating a geometric offset and a modulation scalar of the convolution kernel sampling points; carrying out convolution processing on the candidate region features according to the geometric offset and the modulation scalar of the convolution kernel sampling points to obtain prediction features; and outputting the category information and the rotation boundary frame parameters of the target as rotation target detection results of the remote sensing image data according to the prediction characteristics.
9. A remote sensing image rotation target detection system based on a joint attention mechanism, comprising: the data acquisition module is used for acquiring remote sensing image data to be detected; The feature extraction module is used for carrying out feature extraction by utilizing a preset convolutional neural network based on the remote sensing image data to obtain a multi-scale feature map; the coupling processing module is used for carrying out semantic space coupling processing on the multi-scale feature map through a joint attention mechanism to obtain a target orientation feature descriptor, and carrying out feature refinement on the multi-scale feature map based on the target orientation feature descriptor to obtain a refined feature map; The sample screening module is used for generating a rotation candidate frame according to the refined feature map, and carrying out self-adaptive sample screening on the rotation candidate frame to obtain a screened rotation candidate frame; and the target detection module is used for carrying out rotation alignment processing on the characteristics corresponding to the screened rotation candidate frames to obtain candidate region characteristics, predicting the candidate region characteristics by utilizing the constant geometry convolution detection head, and outputting the class information and the rotation boundary frame parameters of the target as rotation target detection results of the remote sensing image data.
10. The system of claim 9, wherein the feature extraction module comprises: the characteristic layering sub-module is used for inputting the remote sensing image data into a preset convolutional neural network trunk to perform characteristic extraction to obtain multi-level characteristic diagrams corresponding to different network levels; The hierarchy dividing sub-module is used for carrying out hierarchy division on the multi-hierarchy feature map to obtain a deep feature map and a shallow feature map; The feature fusion sub-module is used for carrying out cross-scale feature fusion on the deep feature map and the shallow feature map through a feature pyramid network to generate a multi-scale feature map; The deep feature map is a feature map with high semantic information and low spatial resolution; The shallow feature map is a feature map with low semantic information and high spatial resolution.

Description

Remote sensing image rotation target detection method and system based on joint attention mechanism Technical Field The invention relates to the technical field of computer vision and artificial intelligence, in particular to a remote sensing image rotation target detection method and system based on a joint attention mechanism. Background At present, with the rapid development of high-resolution earth observation technology, remote sensing image processing has become a core support technology for key tasks such as national maritime traffic supervision, urban planning monitoring, disaster emergency assessment, military reconnaissance and the like. Unlike natural scene images, remote sensing images are usually taken from a top view, wherein objects of interest (such as ships, vehicles, bridges, etc.) exhibit significant features of random directional arrangement, large aspect ratio, dense distribution, etc. Therefore, rotational object detection techniques, i.e., rotational bounding boxes that accurately locate and predict objects in images, have become a research hotspot in the fields of computer vision and artificial intelligence. In existing rotational object detection techniques, the pose of the object is typically modeled as a geometric parameter in euclidean space. However, the angular space is periodic and topologically discontinuous in nature, and the existing deep learning detection framework mostly uses the feature extraction and regression paradigm of horizontal target detection, which has the following problems: in the feature extraction stage, the prior art generally uses a standard convolution neural network, and standard convolution only has the variability of translation and the like and lacks the variability of rotation and the like. When a target in an image rotates at any angle, a convolution kernel cannot extract consistent characteristic representation, so that characteristic response is fluctuated severely along with the change of the angle of the target, namely, the characteristics are not aligned, and further classification blurring and positioning accuracy reduction of a detection model are caused; In the characteristic enhancement stage, the attention mechanism in the prior art generally processes the channel attention and the space attention as two independent steps in series, the decoupling design often ignores that the semantic significance and the space distribution of the target are modulated by the target orientation, so that the characteristic enhancement effect is poor, and particularly for a target with a large length-width ratio, the attention module cannot adaptively adjust the attention area according to the rotation state of the target, so that the variance of the characteristic under different viewpoints is increased, and the complete characteristic of the target with an extreme posture is difficult to capture; In the sampling stage of the candidate frame, the prior art mainly relies on the cross-over ratio statistical mean value or fixed threshold value (such as ATSS strategy) to divide positive and negative samples, and for the common high aspect ratio targets in the remote sensing image, the cross-over ratio is extremely sensitive to the angle deviation, and the small angle deviation can lead to the sharp reduction of the cross-over ratio, so that the geometric difficult samples containing key regression information are erroneously filtered or have too low weight, noise exists in the gradient direction in the training process, and the model is difficult to effectively converge. In summary, the existing rotating target detection technology is difficult to adapt to complex characteristics of a rotating target in a remote sensing image, has insufficient detection precision and robustness, and cannot meet the requirements of actual application scenes. Disclosure of Invention In order to solve the problems that the existing rotating target detection technology cannot adapt to the complex characteristics of a rotating target in a remote sensing image, so that the detection precision and robustness are insufficient and the actual application scene cannot be met, the invention provides a remote sensing image rotating target detection method based on a joint attention mechanism, which comprises the following steps: Acquiring remote sensing image data to be detected; Based on the remote sensing image data, performing feature extraction by using a preset convolutional neural network to obtain a multi-scale feature map; Carrying out semantic space coupling processing on the multi-scale feature map through a joint attention mechanism to obtain a target orientation feature descriptor, and carrying out feature refinement on the multi-scale feature map based on the target orientation feature descriptor to obtain a refined feature map; generating a rotation candidate frame according to the refined feature map, and carrying out self-adaptive sample screening on the rotation candidate