CN-122024102-A - Multi-spectrum iteration enhancement-based method for detecting aerial photographing target of unmanned aerial vehicle in severe environment

CN122024102ACN 122024102 ACN122024102 ACN 122024102ACN-122024102-A

Abstract

The invention discloses a multi-spectrum iteration enhancement-based method for detecting an aerial photographing target of an unmanned aerial vehicle in a severe environment, and belongs to the technical field of computer vision. The method comprises the steps of acquiring RGB pictures and infrared pictures under severe environments such as rain and fog, night and the like, constructing a data set, constructing an unmanned aerial vehicle aerial photographic target detection model, realizing high-efficiency depth fusion of visible light and infrared multispectral characteristics by introducing a novel iterative layered attention and difference enhancement fusion frame, simultaneously redesigning a tail enhancement unit of a HGBlock module in a backbone network, training the unmanned aerial vehicle aerial photographic target detection model by utilizing the data set, and finally detecting an unmanned aerial photographic target and evaluating the model performance by utilizing the trained detection model. According to the method, through a fusion mode combining interactive alignment, differential enhancement and iterative feedback, the target detection performance and instantaneity of the unmanned aerial vehicle under complex weather and illumination conditions are improved.

Inventors

LIU JIAXING
JIANG MINGXIN

Assignees

淮阴工学院

Dates

Publication Date: 20260512
Application Date: 20260203

Claims (10)

1. The method for detecting the unmanned aerial vehicle aerial photographing target in the severe environment based on multispectral iteration enhancement is characterized by comprising the following steps: S1, collecting visible light images and infrared images in a severe environment, and constructing a data set; S2, constructing an unmanned aerial vehicle aerial photographing target detection model; The unmanned aerial vehicle aerial photographing target detection model comprises a double-branch RT-DETR main network, a neck, a detection head and a decoder, wherein the double-branch RT-DETR main network comprises a visible light branch and an infrared branch, and is used for processing visible light images and infrared images; The iterative hierarchical attention and differential enhancement fusion framework comprises a hierarchical interactive attention fusion module and a differential feature enhancement module, and optimizes target feature extraction by combining an iterative optimization feedback mechanism, so as to gradually inhibit redundant information; S3, training the unmanned aerial vehicle aerial photography target detection model obtained in the step S2 by utilizing the data set obtained in the step S1; and S4, detecting the unmanned aerial vehicle aerial photographing target by using the unmanned aerial vehicle aerial photographing target detection model trained in the S3, and evaluating the performance of the model.
2. The multi-spectral iterative enhancement-based method for detecting the aerial photographing target of the unmanned aerial vehicle in the severe environment, according to claim 1, is characterized in that the specific processing procedure of the dual-branch RT-DETR backbone network comprises the following steps: Firstly, carrying out preliminary feature extraction on an input image through HGstem modules, and further deepening feature representation through a plurality of cascaded HGBlock modules; the multi-scale features from the backbone network are received through an iterative hierarchical attention and differential enhancement fusion framework and subjected to cross-modal and cross-level depth fusion.
3. The multi-spectrum iteration enhancement-based method for detecting the aerial photographing target of the unmanned aerial vehicle in the severe environment, according to claim 1, wherein the dual-branch RT-DETR backbone network optimizes HGBlcok modules by adopting SWCEBlock modules, and the characteristic processing process of SWCEBlock modules comprises the following steps: Performing multi-frequency domain processing on the input characteristics by using wavelet convolution, and performing convolution operation on each frequency band of the characteristic diagram on different scales to obtain multi-frequency enhancement characteristics; performing projection branch and gate control interaction branch processing on the multi-frequency enhancement features through Star type double-branch interaction structure, and obtaining a channel re-weighted feature map by adopting element-by-element multiplication; And (3) carrying out channel weighted recalibration on the characteristic diagram after channel re-weighting by using a channel recalibration mechanism, and carrying out output alignment by using 1X 1 convolution.
4. The multi-spectral iterative enhancement-based method for detecting an aerial photographing target of a severe environment unmanned aerial vehicle according to claim 3, wherein the process of performing projection branching and gating interaction branching processing on the multi-frequency enhancement features through a Star type dual-branch interaction structure comprises the following steps: Firstly, a projection branch uses 1×1 point-by-point convolution to linearly map the multi-frequency enhancement features and mix the multi-frequency enhancement features with channels; secondly, gating interaction branches extract gating responses for modulation from the multi-frequency enhancement features through deep convolution, and correction is carried out by applying batch normalization; And finally, realizing two-branch interaction by adopting element-by-element multiplication.
5. The multi-spectrum iteration enhancement-based method for detecting the aerial photographing target of the unmanned aerial vehicle in the severe environment, which is characterized in that the implementation process of the channel recalibration mechanism comprises the following steps: firstly, processing a characteristic diagram after channel re-weighting by using global average pooling to acquire global description information of each channel; then, using the full connection layer to carry out linear mapping on the global description information to generate channel weight; and after the channel weight is obtained, carrying out channel weight calibration on the feature map to obtain a weighted feature map.
6. The multi-spectrum iteration enhancement-based method for detecting the target of the unmanned aerial vehicle in the severe environment, according to claim 1, is characterized in that the iterative optimization process of the iterative hierarchical attention and differential enhancement fusion framework comprises the following steps: In each iteration, a hierarchical interaction attention fusion module is used for performing cross-modal interaction alignment and attention fusion on visible light features and infrared features from a dual-branch backbone network, and cross-modal enhancement characterization is generated; Carrying out differential complementary modeling on the cross-modal enhancement characterization through a differential feature enhancement module, and feeding differential information back and injecting the differential information into the next round of input; And after iteration is carried out for a plurality of rounds, the final two-mode enhancement features are fused and output and are used as input features of a subsequent module.
7. The multi-spectral iterative enhancement-based method for detecting an aerial photographing target of a severe environment unmanned aerial vehicle according to claim 6, wherein the steps of using a hierarchical interaction attention fusion module to perform cross-modal interaction alignment and attention fusion on visible light features and infrared features from a dual-branch backbone network, and generating a cross-modal enhancement characterization include: A front-stage cross-modal interaction stage, namely establishing visible light-infrared complementary association and finishing feature alignment by adopting symmetrical bidirectional cross-modal attention to obtain enhanced features; The local attention enhancing stage comprises the steps of weighting each position of enhancing features through local attention, adjusting according to the relative importance of different areas in an input feature map, and outputting local enhancing features; And in the multi-scale fusion and global modulation output stage, the number of channels of the fused feature map is adjusted through 1X 1 convolution, then multi-scale space modeling is carried out through a plurality of depth separable convolutions with different expansion rates, multi-scale fusion features are obtained through channel dimension splicing, and then global modulation coefficients aiming at visible light and infrared modes are generated through a global attention mechanism and are applied to the multi-scale fusion features in a residual mode to output final fusion features.
8. The multi-spectrum iteration enhancement-based method for detecting the aerial photographing target of the unmanned aerial vehicle in the severe environment, which is characterized by the following steps, wherein the differential feature enhancement module works in an iterative manner: Step 1, calculating difference features between a visible light model feature map and an infrared model feature map of the current iteration, wherein the difference features are adjusted through a learnable weighting coefficient; step 2, feeding back the difference characteristic to the current visible light mode characteristic diagram and the infrared mode characteristic diagram after the difference characteristic is subjected to multi-layer perceptron and layer normalization, and dynamically adjusting the difference characteristic by a learnable weighting parameter to obtain a visible light mode characteristic diagram and an infrared mode characteristic diagram of the next iteration; And 3, repeating the steps 1 and 2 for preset iteration times, and adding the visible light mode characteristic diagram and the infrared mode characteristic diagram which are obtained through final iteration to obtain a fusion characteristic diagram.
9. An electronic device comprising a processor and a memory, the memory storing program code that, when executed by the processor, causes the processor to perform the steps of the multi-spectral iterative enhancement-based method of detecting a target for unmanned aerial vehicle in a harsh environment of any one of claims 1 to 8.
10. A storage medium storing a computer program or instructions which, when run on a computer, performs the steps of the multi-spectral iterative enhancement based method for detecting a target for unmanned aerial vehicle in a harsh environment as claimed in any one of claims 1 to 8.

Description

Multi-spectrum iteration enhancement-based method for detecting aerial photographing target of unmanned aerial vehicle in severe environment Technical Field The invention relates to the technical field of computer vision, in particular to a multi-spectrum iteration enhancement-based unmanned aerial vehicle aerial photographing target detection method in a severe environment. Background With the rapid development of unmanned aerial vehicle technology, unmanned aerial vehicles have been widely used in the fields of disaster relief, agricultural monitoring, environmental protection, security monitoring and the like, and particularly in severe environments, the target detection technology plays an important role. However, the conventional target detection method, especially the single-mode method based on the visible light (RGB) image, is often affected by environmental factors in complex environments such as low light, rain and fog, and the like, so that the detection effect is not ideal. RGB images are difficult to cope with complex background and low signal to noise ratio, and the application thereof is limited. To overcome the limitations of the single-mode method, multispectral target detection has become a research hotspot in recent years. By combining an RGB image and an Infrared (IR) image, multispectral target detection can take full advantage of both modalities. In low light and severe weather conditions, the IR image provides thermal radiation information to help detect targets, while the RGB image provides detailed texture information. The complementarity of the two modes can keep higher detection precision in a severe environment, and the robustness and the reliability are improved. Despite the significant advantages of cross-modal fusion techniques, challenges remain in practical applications. First, the imaging principle and feature extraction mode differences between different modalities complicate efficient fusion. Second, interference of background noise and redundant information, especially in complex environments, may affect the extraction of target information. In addition, cross-modal fusion requires processing of large amounts of data, increasing computational complexity, possibly resulting in real-time detection performance bottlenecks. Finally, the environmental adaptability is poor, and especially in bad weather such as low illumination, rain fog, and the like, the quality of the modal information is reduced, and the detection effect is affected. Existing multi-spectral target detection techniques, while making some progress in dealing with harsh environments such as low-light, rainy and foggy weather, complex backgrounds, etc., have several key drawbacks that limit their performance in practical applications. First, existing cross-modal feature fusion methods mostly rely on simple stitching or weighted fusion, and these methods cannot effectively handle the difference between RGB and IR images. Because the two modes have obvious differences in imaging principles and feature extraction, the feature alignment is insufficient due to a simple fusion mode, complementary information is difficult to capture effectively, and the detection precision is influenced finally especially under the conditions of low signal-to-noise ratio and complex background. Second, to enhance target detection performance, existing methods typically employ multi-layer stacked convolutional networks and multi-scale feature fusion strategies. Although these strategies enhance the context modeling capabilities of the model, they also significantly increase the computational effort and reduce the speed of reasoning, resulting in failure to meet real-time application requirements, especially on embedded devices, where the computational burden is excessive. Furthermore, the prior art is not sufficiently effective in handling background noise and redundant information. In severe environments, especially in rainy and foggy and low light conditions, background noise has a serious impact on detection accuracy. While some approaches attempt to address this problem through noise suppression, there is a lack of efficient mechanisms to dynamically adjust for differences between different modalities, and thus the target information cannot be extracted efficiently. Finally, the cross-modal fusion technique fails to effectively suppress background noise shared between modalities, and the noise and redundant information are retained, so that target features are further diluted, and detection accuracy is reduced. Therefore, the method for detecting the unmanned aerial vehicle aerial photographing target in the severe environment based on multispectral iteration enhancement is provided. Disclosure of Invention The invention aims to provide a multi-spectrum iteration enhancement-based unmanned aerial vehicle aerial photographing target detection method in a severe environment, and aims to improve the target detection precision of an u