CN-122024105-A - Light-weight infrared weak unmanned aerial vehicle target detection method based on multi-domain feature interaction

CN122024105ACN 122024105 ACN122024105 ACN 122024105ACN-122024105-A

Abstract

The invention relates to a light-weight infrared weak unmanned aerial vehicle target detection method based on multi-domain feature interaction, and belongs to the technical field of computer vision and intelligent monitoring. The method comprises the steps of firstly constructing an infrared weak unmanned aerial vehicle target detection data set, then utilizing a central directional difference convolution operator of a target saliency contrast enhancement module to extract gradient and intensity dual characteristics of an infrared target to generate a saliency enhancement characteristic image, then carrying out step-by-step downsampling and characteristic abstraction on the saliency characteristic image through lightweight phantom attention blocks stacked in a trunk characteristic extraction network to obtain a characteristic pyramid, then adopting a multi-domain characteristic interaction aggregation module to carry out cross-level fusion on characteristics in the characteristic pyramid set, inputting the aggregation characteristic image into a decoupling detection head, and outputting a final detection result. The invention effectively solves the problems of weak and small target texture deletion, strong background interference and large calculation amount of the detection model, and obviously improves the reasoning speed while ensuring high detection precision.

Inventors

TANG LUN
LU BAOSHENG
CHEN QIANBIN

Assignees

重庆邮电大学

Dates

Publication Date: 20260512
Application Date: 20260205

Claims (10)

1. A target detection method of a lightweight infrared weak unmanned aerial vehicle based on multi-domain feature interaction is characterized by comprising the following steps: S1, establishing an infrared unmanned aerial vehicle detection data set, preprocessing data by adopting a virtual-real combination strategy, and carrying out cluster analysis on a real boundary frame in the data set based on a K-means clustering algorithm to generate a priori anchor frame set adapting to infrared weak and small target scale distribution; S2, constructing a target saliency contrast enhancement module, calculating pixel gradient change and intensity information in a local neighborhood of an image through a central directional difference convolution operator, and generating a saliency feature map containing infrared target edge textures and contrast information; s3, constructing a trunk feature extraction network based on lightweight phantom attention blocks, and carrying out step-by-step downsampling and feature abstraction on the salient feature map through the stacked lightweight phantom attention blocks to obtain a feature pyramid; S4, constructing a multi-domain feature interaction aggregation module, and performing cross-level fusion on features in a feature pyramid set based on a space-channel joint calibration mechanism to generate an aggregation feature map; S5, inputting the aggregation feature map into a decoupling detection head, predicting the class confidence of the unmanned aerial vehicle target through the classification branch, predicting the offset of the target boundary box through the regression branch, and outputting a final detection result.
2. The method for detecting the target of the lightweight infrared weak unmanned aerial vehicle based on multi-domain feature interaction of claim 1, wherein in step S1, in the process of constructing a data set, the method comprises the following steps: Virtual-real combined data enhancement, namely combining simulation platforms to simulate infrared thermal imaging video streams under different weather and different backgrounds Collecting infrared data under real scene The two data are used as an infrared data set together; k-means anchor frame clustering, extraction The width and height information of all the real annotation frames in the document are defined as the distance measurement Running K-means clustering algorithm, inputting The width and height sets of all the real labeling frames are output as Individual cluster centers, i.e. The group adapts to a priori frame width and height dimensions of small target dimensions.
3. The method for detecting the target of the lightweight infrared weak unmanned aerial vehicle based on multi-domain feature interaction of claim 2, wherein in step S2, the target saliency contrast enhancement module firstly adopts a central directional differential convolution to extract the position on the feature map Center-directed differential convolution output at : In the formula, Representing the relative position index within the local neighborhood, For a local neighborhood coordinate set, Representing the position of the differential convolution kernel The weight of the position is calculated, Representing the input feature map at a central location The pixel value at which it is located, Representing input feature map in neighborhood position Pixel values at; then, setting super parameters Self-adaptively adjusting the context information of the reserved background and the fine gradient information of the target to obtain final output: Wherein, the Representing standard convolutions at positions Output value at the location, super parameter The bigger, the more concerned the edge and texture, the super parameter The smaller the more the overall brightness is of concern.
4. The method for detecting the target of the light-weight infrared weak unmanned aerial vehicle based on multi-domain feature interaction is characterized in that in the step S3, a light-weight phantom attention block stacked in a trunk feature extraction network generates a phantom feature map by adopting linear transformation, a channel attention mechanism is embedded to dynamically adjust feature weights, and a multi-scale feature pyramid set is output, wherein the light-weight phantom attention block comprises a phantom feature generation unit and an ECA attention unit, the phantom feature generation unit generates part of a body feature map according to input features, then generates phantom features by utilizing linear transformation and splices the body features and the phantom features to obtain complete features, and the ECA attention unit firstly calculates an adaptive convolution kernel, then executes one-dimensional convolution capturing cross-channel interaction information to generate weight information and acts the weight on the feature map to obtain a final feature pyramid.
5. The method for detecting the target of the lightweight infrared weak unmanned aerial vehicle based on multi-domain feature interaction of claim 4, wherein the phantom feature generation unit in the step S3 performs the following steps: Let the input channel be The output channel is First, standard convolution generation is used Intrinsic characteristics map : In the formula, In order to be a significant feature map, Representing the weight of the convolution kernel, As a result of the bias term, ; For a pair of Performing linear transformation to obtain the rest The phantom feature map: Wherein, the In order to achieve a compression ratio, Representation generation of the first A linear transformation operation of the phantom feature map; ultimately will be intrinsic characteristics And phantom features Splicing in the channel dimension to obtain complete characteristics 。
6. The method for detecting the target of the lightweight infrared weak unmanned aerial vehicle based on multi-domain feature interaction of claim 4, wherein the ECA attention unit of step S3 is executed as follows: First pair Global average pooling is performed to obtain channel descriptors ; Computing adaptive convolution kernel size using mapping functions : In the formula, A hyper-parameter representing the proportional relationship of control channel dimensions to convolution kernel size, Representation represents an offset constant; Representing an operation representing taking the nearest odd number; Performing one-dimensional convolution to capture cross-channel interaction information and generating weights through Sigmoid activation : Finally, the weight is acted on the feature map to obtain output 。
7. The method for detecting targets of light-weight infrared weak unmanned aerial vehicle based on multi-domain feature interaction of claim 4, wherein in step S4, a multi-domain feature interaction aggregation module introduces a space-channel joint calibration mechanism, generates an attention mask by using space detail information of shallow features, guides semantic reconstruction of deep features to generate an aggregation feature map, and sets a feature pyramid output by a main feature extraction network , Respectively corresponding to the features with different resolutions, adopting a bottom-up space attention guiding path for adjacent low-level features And high-level features The fusion process is defined as: First, a spatial attention mask of low-level features is calculated : Upsampling the high-level features and fusing with the weighted low-level features: Wherein, the A convolution operation for dimension reduction is shown, Representing element-by-element multiplication, using Inhibition of Background noise region in (a).
8. The method for detecting the target of the lightweight infrared weak unmanned aerial vehicle based on multi-domain feature interaction of claim 7, wherein in step S5, a decoupling head structure is adopted, namely different convolution branches are used for classification tasks and regression tasks, and the training phase and the total loss function are adopted The definition is as follows: Wherein, the Is a confidence loss; In order to classify the loss of the device, Regression loss for bounding boxes; 、、 balance weight coefficients representing bounding box regression loss, confidence loss, and classification loss, respectively.
9. The method for detecting the target of the lightweight infrared weak unmanned aerial vehicle based on multi-domain feature interaction of claim 8, wherein in step S5, the bounding box returns a loss function The method is expressed as follows: Wherein, the Is the cross-over ratio of the two adjacent layers, And The center points of the predicted and real frames respectively, In order for the euclidean distance to be the same, For the diagonal length of the smallest bounding rectangle containing two boxes, For the width and height of the frame, As a parameter of the weight-bearing element, To measure the aspect ratio uniformity.
10. The method for detecting the target of the light infrared weak unmanned aerial vehicle based on multi-domain feature interaction, which is disclosed by claim 9, is characterized in that in step S5, a virtual-real combined migration learning strategy is adopted, a virtual simulation engine is utilized to generate a synthetic infrared unmanned aerial vehicle data set containing different postures and backgrounds for pre-training, model initialization weights are obtained, and then a real collected infrared data set is used for fine-tuning training.

Description

Light-weight infrared weak unmanned aerial vehicle target detection method based on multi-domain feature interaction Technical Field The invention belongs to the technical field of computer vision and intelligent monitoring, and relates to a light infrared weak unmanned aerial vehicle target detection method based on multi-domain feature interaction, which is particularly suitable for real-time feature extraction and accurate identification of long-distance low-speed small unmanned aerial vehicle targets in complex backgrounds (such as cloud cover, trees and building edges) and low signal-to-noise environments. Background With the explosive growth of Unmanned Aerial Vehicle (UAV) industry, the UAV is increasingly popularized in the fields of aerial photography, logistics, agricultural plant protection and the like. However, a "black-flying" drone poses serious challenges to airspace security, privacy protection, and important infrastructure. In the anti-unmanned aerial vehicle system, the infrared thermal imaging technology is one of core means for detecting low-altitude tiny targets by virtue of the advantages of all-weather operation, smoke interference resistance, strong concealment, long acting distance and the like. Although infrared detection has unique advantages, infrared weak unmanned aerial vehicle target detection still faces the following significant technical bottlenecks in practical application: First, the target features are extremely weak. Limited by the imaging mechanism and remote detection requirements of infrared sensors, unmanned aerial vehicle targets usually only occupy tens of pixels (even less than 10\times10 pixels) in infrared images, lack color, texture and geometric information, are in a 'spot' shape, and are extremely easy to submerge in sensor noise. Second, background interference is complex. In urban or field environments, flowing cloud edges, building outlines, tree shadows, and high-light heat source disruptors (e.g., flyers, chimneys) tend to form high frequency signals in the image, resulting in extremely high false alarm rates (FALSE ALARM RATE) generated by conventional threshold-based or simple morphological detection methods. Thirdly, the calculated amount of the model contradicts with real-time performance. Existing high-precision deep learning detection algorithms (such as fast R-CNN and standard YOLOv, YOLOv/v 5) generally depend on huge parameters and calculation overhead to ensure feature extraction capability, and are difficult to deploy on embedded edge computing equipment (such as NVIDIA Jetson series and FPGA) at the front end of an anti-unmanned aerial vehicle system. Whereas traditional lightweight networks (e.g., mobileNet-SSD) tend to trade speed for feature extraction depth, resulting in a significant increase in miss rate for weak small objects. Fourth, the feature fusion mechanism is imperfect. Existing Feature Pyramid (FPN) structures typically employ simple addition or stitching operations when fusing deep semantic features with shallow detail features. Because information of the infrared small targets is seriously dispersed in the deep network, the simple fusion mode cannot effectively utilize shallow high-resolution space information to calibrate deep semantic features, and positioning accuracy is insufficient. Aiming at the problems, in the prior art, part of schemes enhance the image contrast through super-resolution technology to improve the target recognition, but increase inference delay to influence the real-time detection effect, and other part of schemes strengthen the target feature expression through a designed complex attention mechanism, but are difficult to consider the requirement of model weight reduction and are not beneficial to the deployment of embedded equipment. Therefore, a detection method is needed that can mechanically enhance the contrast of the infrared weak and small target and ensure real-time performance through a lightweight design. Disclosure of Invention In view of the above, the invention aims to provide a target detection method of a light-weight infrared weak unmanned aerial vehicle based on multi-domain feature interaction, which is characterized in that a target significant contrast enhancement module (TSCM) is built by integrating a physical model with enhanced image local contrast into the front end design of a convolutional neural network, a light-weight phantom attention block (LGAB) is designed as a backbone by combining the operation thought of a Ghost module and an ECA attention mechanism, and a multi-domain feature interaction aggregation module (MFIAM) based on spatial guidance is provided. The method greatly reduces the calculation complexity and simultaneously remarkably improves the detection robustness of the infrared weak and small targets. In order to achieve the above purpose, the present invention provides the following technical solutions: a light infrared weak unmanned aerial vehicle target