CN-121999400-A - Enhanced detection method and system for small target of unmanned aerial vehicle

CN121999400ACN 121999400 ACN121999400 ACN 121999400ACN-121999400-A

Abstract

The invention relates to the technical field of unmanned aerial vehicle detection, and provides an enhanced detection method and system for small targets of unmanned aerial vehicles, which comprise the steps of 1, acquiring aerial images of unmanned aerial vehicles, inputting the aerial images into an improved YOLOv s backbone network for feature extraction, outputting feature images of four layers P2, P3, P4 and P5, 2, constructing a layer expansion path aggregation network HEPAN in a neck network, compressing a layer channel of the feature images through four 1X 1 convolutions, constructing a bidirectional fusion path from top to bottom to top, realizing full interaction of cross-scale information, 3, embedding a high-efficiency multi-scale attention module EMA in the path from bottom to top of HEPAN, recalibrating P3 and P4 layer features, 4, inputting the fused feature images into a multi-scale detection head comprising a newly added P2 small target detection head, and outputting a target detection result. The invention can perform detection preferably.

Inventors

FU GUI
LI JUNYI
ZHANG XIAOQIANG
LIU LIWEN
DONG LU

Assignees

中国民用航空飞行学院

Dates

Publication Date: 20260508
Application Date: 20260409

Claims (7)

1. The enhanced detection method for the unmanned aerial vehicle small target is characterized by comprising the following steps of: step 1, acquiring an aerial image of an unmanned aerial vehicle, inputting the aerial image into an improved YOLOv s backbone network for feature extraction, and outputting feature graphs of four levels P2, P3, P4 and P5; Step 2, constructing a hierarchical expansion path aggregation network HEPAN in the neck network, compressing a hierarchical channel of the feature map through four 1×1 convolutions, and constructing a bidirectional fusion path from top to bottom to top so as to realize full interaction of cross-scale information; Step 3, embedding an efficient multi-scale attention module EMA in a HEPAN path from bottom to top, and recalibrating the P3 and P4 layer characteristics; And 4, inputting the fused characteristic diagram into a multi-scale detection head comprising a newly added P2 small target detection head, and outputting a target detection result.
2. The enhanced detection method for small targets of unmanned aerial vehicles according to claim 1, wherein in the improved YOLOv s backbone network, feature extraction is performed through a C3k2 module, a P2 detection layer is added on the basis of keeping P3, P4 and P5 detection layers of YOLOv s, and a P2 feature map is derived from output after second downsampling in the backbone network.
3. The method for detecting small unmanned aerial vehicle targets according to claim 2, wherein in hierarchical channel compression, features are input for each layer Compressing it by a1×1 convolution to obtain compressed features The process is described as follows: ; By selectively reducing the number of channels, The model can focus on the most discriminant space and semantic information in the multi-scale detection task while reducing the calculation burden.
4. The enhanced detection method for small objects of unmanned aerial vehicle of claim 3, wherein in HEPAN, a cross-layer jump connection is further provided for shallow features Its gradient is transferred to a subsequent plurality of fusion nodes by : ; Wherein, the As a function of the loss, Representing slave current layer in cross-layer connection The numbered index of the subsequent fusion node reached by the departure, The number of total connection nodes; and the fine granularity space information of the high-resolution P2 layer and the global semantic information of the deep P5 layer are fused through a cross-layer aggregation mechanism, so that the adaptability of the model to scale change is enhanced.
5. The unmanned aerial vehicle small target-oriented enhanced detection method of claim 4, wherein the EMA module effectively captures cross-channel and space information while keeping lightweight through grouping convolution and parallel branch design, and the input feature map X epsilon R C×H×W is firstly divided into g groups along the channel dimension, and each group is respectively input with two parallel branches: the branch A captures cross-channel dependence by adopting 1X 1 convolution, and the branch B extracts a space local structure by adopting 3X 3 convolution; The two groups of outputs are aggregated through a cross-space learning mechanism, the outputs of the branch A are globally pooled to generate channel weights, the channel weights are multiplied by the outputs of the branch B channel by channel, and then the channel weights are fused with the original outputs of the branch B, the channel numbers are restored through 1X 1 convolution, the attention weight graph is generated through Sigmoid activation, finally the channel weights are multiplied by the input X element by element to obtain a recalibrated output Y, and the dimension of the whole process is kept unchanged.
6. The method of claim 5, wherein the EMA module is disposed in the bottom-up path after the C3k2 outputs of P3 and P4, respectively.
7. An enhanced detection system for small targets of unmanned aerial vehicles is characterized in that the enhanced detection system for small targets of unmanned aerial vehicles is an enhanced detection method for small targets of unmanned aerial vehicles according to any one of claims 1 to 6.

Description

Enhanced detection method and system for small target of unmanned aerial vehicle Technical Field The invention relates to the technical field of unmanned aerial vehicle detection, in particular to an enhanced detection method and system for a small target of an unmanned aerial vehicle. Background In recent years, unmanned aerial vehicles are widely applied to the fields of smart cities, traffic monitoring, agricultural plant protection, disaster relief and the like by virtue of the advantages of flexibility, wide coverage range, low cost and the like. The high-resolution camera carried by the unmanned aerial vehicle can acquire mass aerial images in real time, and provides abundant data sources for target detection. However, the images at the view angle of the unmanned aerial vehicle have the clear characteristics that the scale of the targets is changed severely, a large number of targets (such as pedestrians and vehicles) occupy only tens or even a few pixels, the ratio of small targets (the area is smaller than 32×32 pixels) is usually more than 70%, the targets are densely distributed, the shielding is serious, the background is complex and changeable, and a large amount of inter-class interference is included. These factors make the performance of the general object detector greatly reduced in the unmanned airport scene, and the problems of missed detection and false detection are prominent. The target detection algorithm is roughly divided into a two-stage model (such as R-CNN and Faster R-CNN) and a single-stage model (such as the YOLO series) YOLO11 which is used as one of the latest models with better object detection practicability in the series, and the three-layer detection head of C3k2 module, PANet neck structure and P3/P4/P5 is adopted, so that the target detection algorithm is excellent in general tasks. However, the unmanned aerial vehicle aerial image has the following defects that firstly, the scale coverage of a characteristic pyramid is not fine enough, the receptive field of a minimum detection layer P3 is too large to capture small target details, secondly, PANet cross-scale fusion mode is simple, semantic and spatial information interaction is insufficient, and finally, the self-adaptive enhancement capability of target significance under a complex background is lacking, and background noise is easy to be misdetected as a target. Disclosure of Invention The invention provides an enhanced detection method and system for a small target of an unmanned aerial vehicle, which can overcome certain or certain defects in the prior art. The invention relates to an enhanced detection method for a small target of an unmanned aerial vehicle, which comprises the following steps of: step 1, acquiring an aerial image of an unmanned aerial vehicle, inputting the aerial image into an improved YOLOv s backbone network for feature extraction, and outputting feature graphs of four levels P2, P3, P4 and P5; Step 2, constructing a hierarchical expansion path aggregation network HEPAN in the neck network, compressing a hierarchical channel of the feature map through four 1×1 convolutions, and constructing a bidirectional fusion path from top to bottom to top so as to realize full interaction of cross-scale information; Step 3, embedding an efficient multi-scale attention module EMA in a HEPAN path from bottom to top, and recalibrating the P3 and P4 layer characteristics; And 4, inputting the fused characteristic diagram into a multi-scale detection head comprising a newly added P2 small target detection head, and outputting a target detection result. Preferably, in the improved YOLOv s backbone network, feature extraction is performed through a C3k2 module, and on the basis of keeping the P3, P4 and P5 detection layers of YOLOv s, a P2 detection layer is added, wherein the P2 feature map is derived from output after second downsampling in the backbone network. Preferably, in hierarchical channel compression, features are input for each layerCompressing it by a1×1 convolution to obtain compressed featuresThe process is described as follows: By selectively reducing the number of channels, The model can focus on the most discriminant space and semantic information in the multi-scale detection task while reducing the calculation burden. Preferably, in HEPAN, a cross-layer jump connection is also provided for shallow featuresIts gradient may be transferred to a subsequent plurality of fusion nodes by: Wherein, the As a function of the loss,Representing slave current layer in cross-layer connectionThe numbered index of the subsequent fusion node reached by the departure,The number of total connection nodes; and the fine granularity space information of the high-resolution P2 layer and the global semantic information of the deep P5 layer are fused through a cross-layer aggregation mechanism, so that the adaptability of the model to scale change is enhanced. Preferably, the EMA module effectively captures cross-cha