CN-122023770-A - Method for detecting main direction rotating target of deformable quadrilateral attention

CN122023770ACN 122023770 ACN122023770 ACN 122023770ACN-122023770-A

Abstract

The invention discloses a method for detecting a main direction rotating target of attention of a deformable quadrilateral, and belongs to the technical field of computer vision. The method comprises the steps of 1) loading a rotating target data set, preprocessing an image, converting the image into a model training format, 2) mapping a fixed window into a quadrilateral adapting to the geometric form of a target through a detection head of a deformable quadrilateral attention module through a dynamic prediction transformation matrix, 3) integrating a principal component analysis module in the detection head, calculating a main direction and a rotating boundary frame based on the vertex coordinates of the target quadrilateral, and 4) inputting a test set into the trained model, and outputting the boundary frame, the category and the confidence coefficient of the rotating target containing angles. Compared with the existing method, the method has the advantages that the accuracy is improved by 3.2% -5.6% on the data sets of DOTA-v1.0 and HRSC2016 by mAP (mean time average value) respectively, the robustness of detection of irregular and multi-pose rotating targets (such as ships and airplanes) is obviously enhanced, and the method is suitable for remote sensing images and aerial target detection scenes.

Inventors

KUANG LIDAN
XIE JUNWU

Assignees

长沙理工大学

Dates

Publication Date: 20260512
Application Date: 20260123

Claims (3)

1. A method for detecting a rotational object in a main direction of attention of a deformable quadrangle, comprising the steps of: Firstly, acquiring a remote sensing image dataset, preprocessing input data, including image normalization, rotation enhancement, random clipping and label frame conversion, and converting an original target label into a quadrilateral format to adapt to a rotation target detection task; In the preprocessing step, data enhancement comprises horizontal overturn, vertical overturn and random rotation so as to improve the generalization capability of a model; Thirdly, inputting the feature map output by the backbone network to a detection head of a deformable quadrilateral attention module, generating a composite projective transformation matrix by the module through predicting transformation parameter vectors, mapping a fixed rectangular window into a target quadrilateral window, performing grid sampling on the feature map feature by utilizing a sampling function, and calculating attention weight to realize quadrilateral feature aggregation; inputting the quadrilateral characteristics output by the deformable quadrilateral attention module into a main direction analysis module branch, analyzing and calculating a main direction angle of a target through a main component, and generating rotating frame parameters including center point coordinates, width, height and angle information according to the main direction angle; the method comprises the steps of inputting the characteristics of a rotating frame into a detection head network, and outputting the type of a target and the prediction result of a boundary frame, wherein the detection head network is of a double-branch structure, the first branch is used for target classification, and outputting the type probability of each candidate region; Adopting AdamW optimizer to execute back propagation and parameter updating, repeating training until the iteration number reaches a preset maximum value, and obtaining an optimal rotation target detection model; and eighth step, inputting the test image into the trained model, and outputting a target detection result containing the rotating frame.
2. A method for a detection head of a deformable quadrangular attention module as set forth in claim 1, wherein a feature map is to be input to the deformable quadrangular attention module ( For the width and height of the feature map, For the number of channels) by global average pooling, leakyReLU activation and 1 x 1 convolution network prediction to obtain a 9-dimensional transform parameter vector : Wherein Conv represents a1×1 convolution operation, leakyReLU is a LeakyReLU activation function applied after the average pooling, dividing the feature map X into a plurality of basic windows of size w×w; Based on Calculating a scaling matrix Shear matrix Rotation matrix Translation matrix Perspective matrix : Wherein the method comprises the steps of =W/W sum The transformation matrix T is finally obtained by multiplying the basic transformation matrices: given the transformation matrix T, the final coordinates of the target quadrilateral are calculated by a standard projection procedure: Wherein the method comprises the steps of Is an intermediate value.
3. The method of claim 1, wherein the main direction analysis module includes four vertex coordinates for the obtained quadrangle First, it is expressed as a matrix, and data is subjected to centering processing: Wherein the method comprises the steps of 、 Respectively the vertex coordinates are at And Calculating covariance matrix of the centralized data And is opposite to Performing eigenvalue decomposition to obtain two eigenvectors corresponding to the maximum eigenvalue And (3) with At an angle of Can be calculated by the following formula: calculating by projection a length width w of the rotating bounding box in the main direction and a height h of the rotating bounding box perpendicular to the main direction, wherein Is that Is defined by the corresponding four vertex coordinates, , Is the polar difference of projection coordinates : Wherein the method comprises the steps of And Respectively is Is set to be equal to the maximum value and the minimum value of (c), And Respectively is Based on the sum of the principal direction angles Generating accurate rotation bounding box parameters: , Wherein the method comprises the steps of 、 Corresponding to the displacement of the bounding box corner from the center, Is that Is defined by the center point coordinates of (a).

Description

Method for detecting main direction rotating target of deformable quadrilateral attention Technical Field The invention relates to the technical field of computer vision, in particular to a rotating target detection method based on deformable quadrilateral attention and principal component analysis, which is suitable for detecting rotating targets in scenes such as remote sensing images, aerial images and the like. Background Targets in remote sensing images (e.g., aircraft, watercraft) typically have a variety of orientations and morphologies, and conventional horizontal inspection frames are difficult to meet with accurate positioning requirements, thus requiring the introduction of a rotating bounding box. Most of the existing rotating target detection methods are based on adding angle information to a horizontal frame, but the geometric characteristics of the rotating frame cannot be fully combined, so that the detection precision of an irregular target is insufficient. For example, the R-CNN based framework relies on complex regional suggestion networks, while the transfomer based approach uses a fixed window attention mechanism, limiting the model's adaptability to target shape and orientation. Furthermore, complex backbone networks may introduce noise characteristics that affect detection performance. Therefore, a method for dynamically adapting the geometric characteristics of the target and improving the detection accuracy is needed. Disclosure of Invention The invention aims to overcome the defects of the prior object detection technology and provide a rotating object detection method based on deformable quadrilateral attention (Deformable Quadrilateral Attention, DQA) and principal component analysis (PRINCIPAL COMPONENT ANALYSIS, PCA). According to the method, the attention area is dynamically adjusted through the deformable quadrilateral attention module, the angle prediction is optimized through the combination of the principal component analysis module, and the accuracy and the robustness of rotation target detection are remarkably improved. The flow chart of the method is shown in fig. 1, and the specific implementation steps are as follows: The first step, loading data detected by a rotating target by using a corresponding database; And secondly, preprocessing the image, and converting the input image into a model training format. Preprocessing comprises (2.1) image scaling, namely scaling the image to 1024×1024 model input sizes, (2.2) normalization, namely normalizing pixel values to be in a range of [0, 1], (2.3) channel conversion, namely converting the image from BGR format to RGB format, and (2.4) label frame conversion, namely converting original target labels to quadrilateral format to adapt to a rotating target detection task; Inputting the preprocessed image data into a feature extraction network, wherein (3.1) the feature extraction network inputs an image with the size of 1024 multiplied by 3, passes through a first convolution layer, the convolution kernel of the first convolution layer is 3, the step is 2, the boundary expansion is 1, then a feature image with the size of 512 multiplied by 512 and the channel number of 64 is output, and the feature image is transmitted to a second convolution layer; (3.2) after passing through the second convolution layer, outputting a characteristic map with the size of 256×256 and the channel number of 128, and transmitting the characteristic map to the third convolution layer; (3.3) after passing through the third convolution layer, outputting a characteristic diagram with the size of 128×128 and the channel number of 256, and transmitting the characteristic diagram to a regional suggestion network (RPN) to generate a candidate region; And (3.4) outputting a plurality of candidate areas by the area proposal network, extracting area characteristics with fixed size through the RoI alignment layer, and obtaining a characteristic diagram with the size of 7 multiplied by 7 and the channel number of 256. These feature maps will be directly input to the deformable quadrilateral attention module; fourth, the feature map output by the feature extraction network is input into a deformable quadrilateral attention module DQA, as shown in fig. 2: (4.1) Window partitioning and feature extraction by inputting feature map Divided into a plurality of basic windows of size w x w(For the width and height of the feature map,Number of channels), extracting queries from each windowKey and keySum valueThe characteristic mark is as follows: , (4.2) transform parameter prediction, feature map Predicting 9-dimensional transformation parameter vectors by global averaging pooling, leakyReLU activation functions with a 1x1 convolutional network: (4.3) Composite projective transformation matrix generation nine learnable parameters generated for the above equation~The following 5 basic transformation matrices are obtained by equation (3): Wherein each basic transformatio