CN-122023769-A - Aerial small target detection method applied to unmanned aerial vehicle
Abstract
A small aerial target detection method applied to an unmanned aerial vehicle comprises the steps of firstly creating an RFA-DETR improved model, wherein the model comprises a DDCSM module, a HWD-TPA module, a BDFEAN network module and a HLACI module, secondly preprocessing an aerial image of the unmanned aerial vehicle, transmitting the processed image to the RFA-DETR improved model, thirdly extracting dynamic characteristics by using the DDCSM module, fourthly carrying out characteristic encoding and preliminary fusion, fifth carrying out characteristic enhancement of frequency domain contrast driving by using the HWD-TPA module, sixth carrying out space-frequency double-domain characteristic fusion and enhancement by using the BDFEAN network, seventh carrying out hierarchical context integration by using the HLACI module, and eighth carrying out target detection and outputting a final detection result. The invention constructs the high-precision and high-efficiency end-to-end small target detection frame, and the model is used for detecting the small target of the aerial image of the unmanned aerial vehicle, so that the detection precision and efficiency of the small target are obviously improved while the weight of the unmanned aerial vehicle is kept to be light, and the effectiveness and practical value of the method are improved.
Inventors
- WANG HONGYA
- GUO YOUQUAN
- CUI RUI
- YANG HEHONG
- ZHAO LIJUAN
- YU LIXIA
Assignees
- 淮阴工学院
Dates
- Publication Date
- 20260512
- Application Date
- 20260116
Claims (10)
- 1. The aerial small target detection method applied to the unmanned aerial vehicle is characterized by comprising the following steps of: The method comprises the steps of firstly, creating an RFA-DETR improved model, wherein the model comprises a DDCSM module, a HWD-TPA module, a BDFEAN network module and a HLACI module; Preprocessing an aerial image of the unmanned aerial vehicle, and transmitting the processed image to an RFA-DETR improved model; Step three, dynamic feature extraction is carried out by utilizing DDCSM modules; step four, feature coding and preliminary fusion; utilizing the HWD-TPA module to perform characteristic enhancement of frequency domain contrast driving; step six, utilizing BDFEAN network modules to perform space-frequency dual-domain feature fusion and enhancement; Seventhly, hierarchical context integration is carried out by utilizing HLACI modules; and step eight, target detection and final detection result output.
- 2. The method for detecting the small air target applied to the unmanned aerial vehicle according to claim 1, wherein the creating of the RFA-DETR improved model in the step one optimizes the RT-DETR model in the following optimization mode: In the feature extraction stage, a DDCSM module is utilized to replace a ResNet backbone network in an RT-DETR model, and the perceptibility of the network to a small target is enhanced through a dynamic receptive field and multipath feature interaction mechanism, wherein a DDCSM module adopts CSP thought to combine with a dynamic receptive field strategy, so that the spatial range of feature extraction is adaptively adjusted, and the parameter quantity and the calculation complexity are reduced while the expression capacity of the model is maintained; In the characteristic fusion and enhancement stage, three complementary modules, namely a HWD-TPA module, a BDFEAN network module and a HLACI module, are introduced, the HWD-TPA module is utilized to realize the frequency domain decomposition of the characteristics through Haar wavelet transformation, a dual-path attention mechanism is designed and driven by foreground high-frequency information and background low-frequency information respectively to realize the self-adaptive enhancement of the boundary and internal details of a small target, a BDFEAN module is utilized to construct a multi-path and multi-domain characteristic enhancement framework, the characteristics are enhanced in the space domain and the frequency domain at the same time, the characteristic expression capacity of the small target is improved, and the HLACI module is utilized to combine the local-global dual attention mechanism and the self-adaptive characteristic guidance strategy to enhance the detection capacity of the model on the small target; The optimized RFA-DETR improved model comprises four core components of (1) a dynamic backbone network based on DDCSM, (2) a contrast driving characteristic aggregation module based on HWD-TPA, (3) a space-frequency double-domain attention network based on BDFEAN and (4) a hierarchical attention fusion module based on HLACI.
- 3. The method for detecting the small air target applied to the unmanned aerial vehicle according to claim 1 or 2, wherein the dynamic feature extraction is performed by using DDCSM modules in the third step, the specific operation mode is that channel mapping is performed by 1×1 convolution, input features are mapped to a higher-dimensional representation space, the features are divided into two branches for processing, the first branch directly transmits original feature information, and the second branch performs dynamic feature extraction by using DIMBlock connected in series; DIMBlock module fuses the transducer and CNN, adopts a double-branch architecture, comprises a characteristic mixing branch and a characteristic transformation branch, and is both assisted by residual connection, and is given with input characteristics The workflow of DIMBlock modules is expressed as: ; ; Wherein, the And The method comprises the steps of initializing a level scaling parameter which can be learned to be a smaller value to ensure the stability of residual error learning in an initial training stage, enabling DropPath to be a random depth regularization technology, enhancing model generalization capability by randomly discarding part of paths in a training process, enabling BN to represent batch normalization operation, enabling DIM to capture multi-scale features, firstly dividing the features in a channel dimension evenly, then processing each grouping feature through DynamicInceptionDWConv d with different configurations, and enabling ConvolutionalGLU to be led into a gating mechanism, so that a network can adaptively select and strengthen feature channels with information, and meanwhile, suppressing uncorrelated noise features.
- 4. The method for detecting the small air target applied to the unmanned aerial vehicle according to claim 3, wherein the DIMBlock module comprises three core components of DIM multi-scale extraction, DIDWConv dynamic receptive field convolution and ConvolutionalGLU gating mechanism; the DIDWConv module integrates three different forms of depth separable convolutions, including standard square convolutions such as 3×3, two kinds of rectangular convolutions, namely 1×11 and 11×1, and capturing characteristic modes in different directions and scales; given input features Generating dynamic kernel weight by global average pooling and 1×1 convolution to obtain tensor W with 3C×1×1, and performing Reshape to obtain three tensors Finally, applying a Softmax function to the three tensors to obtain three attention weight coefficients The workflow is expressed as follows: ; ; ; And then applying depth convolution and weighted fusion of three forms: ; Wherein, the The size of the square convolution kernel is indicated, Representing the stripe convolution kernel size, GAP represents global average pooling.
- 5. The method for detecting the small air target applied to the unmanned aerial vehicle according to claim 1, wherein the step three is characterized in that multi-scale features S3, S4 and S5 are output, the feature maps respectively correspond to different spatial resolutions and semantic levels, the step four is characterized in that a AIFI module is applied to the highest-layer features S5 to conduct intra-attention scale feature interaction, and the CCFF module is used for fusing the three features with different scales S3, S4 and S5 to output encoded multi-scale features P3, P4 and P5.
- 6. The method for detecting the small air target applied to the unmanned aerial vehicle according to claim 1, wherein the HWD-TPA module is used for constructing a cascaded dual-path attention mechanism, namely, a high-frequency path is focused on a target boundary and texture details firstly, so as to enhance the local characteristics of the small target, and a low-frequency path is focused on a global context and further refines and perfects the target representation by utilizing structural information; HWD-TPA module is aggregated by wavelet decomposed frequency domain information guide features given input features Firstly, performing preliminary processing on input features through two serially connected 3X 3 convolution blocks, then applying a Haar wavelet transform to decompose the features, and enabling HaarWaveletConv modules to realize a differentiable version of the two-dimensional Haar wavelet transform for frequency domain decomposition of the features: ; Wherein, the Representing the low-frequency background component, Representing a high frequency front Jing Fenliang; Rearranging the features into a space-channel order and generating a value projection, and then applying a sliding window expansion operation to the value features, generating a local receptive field, expressed as: ; ; ; Wherein, the In order to pay attention to the number of heads, In order for the core to be of a size, In order to obtain the number of spatial positions after expansion, Representing a sliding window expansion operation, followed by a two-stage attention mechanism, first capturing detail features in the high frequency foreground attention stage: ; Wherein, the A channel re-arrangement operation is shown, Represents an average pooling of the data in the pool, As a function of Softmax (r), In the case of a linear projective transformation, Feature aggregation is achieved by: ; Wherein, the And The projection and folding operations are shown separately, Weighted attention, similarly, the low frequency background attention phase enhances context awareness: ; ; ; Wherein, the A characteristic deployment operation is indicated and is shown, For a linear transformation of the background features, And The operations are rearranged for the different channels.
- 7. The method for detecting small targets in the air for unmanned aerial vehicle according to claim 1, wherein said BDFEAN network module in step six comprises two merged paths, a top-down path in the FPN part network and a bottom-up path in the PAN part network; The self-top-down path is characterized in that channel alignment is carried out on all levels of features through 1X 1 convolution, then MultiScalePCA modules are used for fusing adjacent layer features, and the modules calculate channel attention weights through self-adaptive 1D convolution and weight fused sampled low-resolution features; And in the bottom-up path of the PAN part network, adopting MultiScalePCA _Down module to realize feature downsampling fusion, reserving important low-level feature information, further enhancing the distinguishing capability of the features through FSA module, and constructing a complete bidirectional feature extraction network.
- 8. The method for detecting the small air target applied to the unmanned aerial vehicle according to claim 7, wherein the MultiScalePCA module introduces an adaptive channel attention mechanism to enable a network to dynamically learn the importance of different scale features, and then adopts one-dimensional convolution to establish long-distance dependence among channels to realize adaptive up-sampling feature fusion based on channel attention; The MultiScalePCA _Down module is specially used for realizing the feature fusion of a bottom-up path, receives two feature graphs with different scales as input, and realizes the efficient feature downsampling fusion through channel attention and downsampling convolution; The FSA module consists of two parallel branches, namely an adaptive global frequency filter AGF and a spatial attention SA, so that the network can simultaneously utilize complementary information of a frequency domain and a spatial domain, wherein the AGF branch emphasizes a significant region from a local spatial angle and simultaneously enhances characteristic representation in the spatial domain and the frequency domain to improve the significance of a small target from the global frequency spectrum angle.
- 9. The method for detecting small air targets for unmanned aerial vehicle according to claim 8, wherein the AGF branch realizes the selective enhancement of characteristic spectrum by frequency domain transformation and adaptive filtering, converts the characteristic to frequency domain, constructs low-pass and high-pass frequency masks, extracts low-frequency and high-frequency components respectively, and applies a learnable complex weight filter to the low-frequency part Finally returning to the spatial domain through inverse transformation; The SA branch generates a spatial attention map through channel statistics, emphasizing spatial context.
- 10. The method for detecting the small air target applied to the unmanned aerial vehicle according to claim 1, wherein the HLACI module in the seventh step receives two feature maps of different levels, after feature projection, the HLACI module realizes feature enhancement through parallel multipath processing, the basic feature fusion path combines two paths of features through simple and effective addition fusion operation, and then the mixed features are extracted through grouping convolution: ; Wherein, the Is as the parameter of A kind of electronic device A grouping convolution operation, wherein the grouping number is 4; Local-global attention enhancement path local attention (receptive field 2) and global attention (receptive field 4) are applied to each projection feature, respectively: ; ; Wherein, the And Representing the local and global attention functions respectively, Attention mechanism for corresponding parameter set The specific implementation is LocalGlobalAttention; for input features Firstly, dividing a feature area, and dividing input features into The size is The method comprises the steps of (1) carrying out regional statistics feature extraction, calculating statistics of each region, carrying out nonlinear feature transformation, enhancing feature expression through a multi-layer perceptron, and simultaneously generating attention weights and weighting enhanced features: ; ; ; Wherein, the And The multi-layer perceptron is LayerNorm, which is a layer normalization operation; Next, a learnable prompt vector is passed Calculating the characteristic correlation to realize self-adaptive characteristic screening, wherein Representation of Is used for the normalization of the results of (a), Representing multiplication by element; ; Feature sizes are recovered by a learnable transformation matrix and upsampling operation, Is a learnable conversion matrix; in order to perform the shape reconstruction operation, For bilinear interpolation up-sampling, Is that Convolving; ; ; finally, HLACI module concatenates all enhancement features and fuses through a series of efficient convolution operations, HLACI introduces a dual path attention mechanism that captures both local fine features (receptive field 2) and global context information (receptive field 4), expressed by formal analysis as: ; Wherein, the A feature fusion operation is represented and is performed, And Representing local and global attention functions, respectively.
Description
Aerial small target detection method applied to unmanned aerial vehicle Technical Field The invention relates to the technical field of small target detection, in particular to an aerial small target detection method applied to an unmanned aerial vehicle. Background Traditional aerial unmanned aerial vehicle small target detection methods mainly rely on manually designed feature extraction technologies, such as scale-invariant feature transform (SIFT), speeded-up robust features (SURF), and the like. With the rapid development of machine learning and deep learning technologies, the field of aerial unmanned aerial vehicle small target detection is greatly broken through. Currently, deep learning-based target detection algorithms are mainly divided into two classes, two-stage detectors and single-stage detectors. Two-stage detectors such as the R-CNN series first generate candidate regions, and then classify and bounding box regress the regions. Such methods generally have higher detection accuracy and lower omission ratio, but are slow and have large calculation requirements, and are not suitable for real-time detection application. Single-stage detectors such as YOLO series and SSD directly predict target positions and categories, which have the advantages of fast speed, low computational load, etc., but relatively low accuracy. As the transducer architecture is applied in the field of vision, transducer-based detectors such as the DETR family provide end-to-end solutions that do not require non-maximum suppression (NMS). Despite the great progress made in the object detection technology, aerial unmanned aerial vehicle small object detection still faces many challenges. Such as (1) small objects with insufficient feature information and easy loss, and with increasing network depth and multiple convolution operations, small objects tend to lose a large amount of key feature information, which makes it difficult to detect and identify in high-level feature maps. (2) Complex background interference and dense small object scenes aerial images typically contain large amounts of background information, such as buildings, trees, and roads, which complex and diverse background information can interfere with the proper detection of small objects. In addition, targets in the aerial image are often densely distributed, serious shielding and overlapping exist, and the detection difficulty is increased. Disclosure of Invention Aiming at the technical problem of difficult detection of a small target in an unmanned aerial vehicle aerial photographing scene, the technical scheme provides an aerial small target detection method applied to an unmanned aerial vehicle, an RFA-DETR improved model is created, and the technical problem of limitation of a traditional detector can be effectively solved through a DDCSM module, a HWD-TPA module, a BDFEAN network module and a HLACI module in the model. The invention is realized by the following technical scheme: An aerial small target detection method applied to an unmanned aerial vehicle comprises the following steps: The method comprises the steps of firstly, creating an RFA-DETR improved model, wherein the model comprises a DDCSM module, a HWD-TPA module, a BDFEAN network module and a HLACI module; Preprocessing an aerial image of the unmanned aerial vehicle, and transmitting the processed image to an RFA-DETR improved model; Step three, dynamic feature extraction is carried out by utilizing DDCSM modules; step four, feature coding and preliminary fusion; utilizing the HWD-TPA module to perform characteristic enhancement of frequency domain contrast driving; step six, utilizing BDFEAN network modules to perform space-frequency dual-domain feature fusion and enhancement; Seventhly, hierarchical context integration is carried out by utilizing HLACI modules; and step eight, target detection and final detection result output. Further, in the step one, the creating of the RFA-DETR improved model is to optimize the RT-DETR model in the following optimization mode: In the feature extraction stage, a DDCSM module is utilized to replace a ResNet backbone network in an RT-DETR model, and the perceptibility of the network to a small target is enhanced through a dynamic receptive field and multipath feature interaction mechanism, wherein a DDCSM module adopts CSP thought to combine with a dynamic receptive field strategy, so that the spatial range of feature extraction is adaptively adjusted, and the parameter quantity and the calculation complexity are reduced while the expression capacity of the model is maintained; In the characteristic fusion and enhancement stage, three complementary modules, namely a HWD-TPA module, a BDFEAN network module and a HLACI module, are introduced, the HWD-TPA module is utilized to realize the frequency domain decomposition of the characteristics through Haar wavelet transformation, a dual-path attention mechanism is designed and driven by foreground high-frequ