CN-121999389-A - Method for accurately detecting weak feature targets of optical image pair under airborne condition
Abstract
The invention discloses a method for accurately detecting weak feature targets by using an optical image pair under an airborne condition, which comprises the steps of inputting a shot optical image into a target feature extraction network by an unmanned aerial vehicle, outputting a target feature image by the target feature extraction network, inputting the target feature image into a feature pyramid network, outputting a fusion feature image by the feature pyramid network, inputting the fusion feature image into a target region of interest network, and outputting the positions of targets in the image by the target region of interest network, wherein a densely nested attention network is established as the target feature extraction network, and information features are enhanced by an attention mechanism, so that the detection accuracy of small targets is improved.
Inventors
- LIN DEFU
- YIN XINGYU
- JIN REN
- Chu Zhaochen
- YU YINAN
Assignees
- 北京理工大学
Dates
- Publication Date
- 20260508
- Application Date
- 20241101
Claims (7)
- 1. The method for detecting the weak feature target accuracy of the optical image pair under the airborne condition is characterized by comprising the following steps of: inputting the shot optical image into a target feature extraction network by the unmanned aerial vehicle, and outputting a target feature image by the target feature extraction network; inputting the target feature map into a feature pyramid network, and outputting a fusion feature map by the feature pyramid network; And inputting the fusion feature map into a target region-of-interest network, and outputting the position of the target in the image by the target region-of-interest network.
- 2. The method for detecting the target precision of the weak feature of the optical image pair under the airborne condition according to claim 1, wherein, Establishing a dense nested concentration network as a target feature extraction network, wherein the dense nested concentration network is a multi-layer nested network formed by stacking a plurality of U-shaped sub-networks, each U-shaped sub-network comprises an encoder and a decoder, jump connection is carried out between the encoder and the decoder, A node is provided between the encoder and decoder of each U-shaped subnetwork, and the nodes of adjacent U-shaped subnetworks are hopped to connect such that each decoder is capable of receiving encoder output characteristics from adjacent layer U-shaped subnetworks.
- 3. The method for detecting the target precision of the weak feature of the optical image pair under the airborne condition according to claim 2, wherein, The output of a node in a densely nested concentration network is expressed as: L i,j =P max (F(L i-1,j )),j=0 Where i denotes the ith downsampling layer along the encoder, j is the jth convolution node of the encoder of the U-shaped subnetwork, L i,j denotes the output of the jth convolution node of the ith downsampling layer, P max (·) denotes the maximum pooling with a stride of 2, F (·) denotes multiple concatenated convolutional layers of the same convolutional block, μ (·) denotes the upsampling layer, [ ·, · ] denotes the concatenated layer.
- 4. The method for detecting the target precision of the weak feature of the optical image pair under the airborne condition according to claim 2, wherein, The densely nested attention network also enhances information features through an attention mechanism.
- 5. The method for detecting the target precision of the weak feature of the optical image pair under the airborne condition according to claim 4, wherein, In the densely nested attention network, a channel attention network is provided, which is expressed as: M c (L)=σ[MLP(P max (L))+MLP(P avg (L))] Wherein M c (L) is an intermediate parameter of a channel attention map, sigma represents a sigmoid function, MLP represents a multi-layer perceptron, P avg (·) represents an average pooling with a step size of 2, L represents a feature map of a dense nested attention network output, Representing element-wise multiplication, L' represents the output of the channel attention network.
- 6. The method for detecting the target precision of the weak feature of the optical image pair under the airborne condition according to claim 5, wherein, In the closely nested attention network there is provided a spatial attention network in cascade with a channel attention network connection, the spatial attention network being denoted as: M s (L')=σ[f 7×7 (P max (L')),(P avg (L'))] Where M s (L) is an intermediate parameter of the spatial attention map, f 7×7 represents a convolution operation with a filter size of 7 x7, and l″ represents the output of the spatial attention network, i.e., the target feature map.
- 7. The method for detecting the target precision of the weak feature of the optical image pair under the airborne condition according to claim 1, wherein, The target region of interest network comprises a region of interest bounding box sub-network, a region of interest feature sub-network, a dynamic instance interaction header and a prediction header network.
Description
Method for accurately detecting weak feature targets of optical image pair under airborne condition Technical Field The invention relates to a method for precisely detecting weak feature targets of an optical image pair under an airborne condition, and belongs to the technical field of aircraft control. Background In unmanned aerial vehicle cluster application, airborne air-to-air visual angle on-board weak feature unmanned aerial vehicle target detection is a core technology for target situation awareness. Under the condition of detecting the target under the airborne condition, because the target is relatively far away from the my space, the imaging scale is usually smaller under the airborne visual angle, and the target belongs to a small target, namely the target with the width-to-height and image aspect ratio of the target boundary box smaller than 0.1. However, the existing general detection method faces the problem of insufficient feature extraction capability when processing small-scale targets. For example, in YOLOv and YOLOv, darknet-53 are used as a backbone network with 5 downsampling layers to reduce the feature map size to 1/32 of the original, and in CENTERNET, the backbone network uses DLA-34 with a downsampling ratio of 1/8. For a small target with an average scale of 25.5×16.4 in the FL-Drones dataset, the size of the existing backbone network after multiple downsampling is lower than 1 pixel, so that the feature extraction capability is limited, and the features of the small-scale target are difficult to accurately extract. Although some methods have good detection effect on small targets, the methods all require huge calculation force requirements, and an unmanned aerial vehicle-mounted computer is difficult to meet the huge calculation force requirements. For example, dogfight methods utilize an optical flow method to estimate the motion law of the unmanned aerial vehicle to obtain a good small target detection effect, but after the unmanned aerial vehicle is deployed on an onboard computer, the running frame rate is less than 5fps, and real-time requirements are difficult to meet. In another example, the HyperNet method has strong feature extraction capability through a multi-scale fusion network, can accurately capture the features of the small-scale targets, improves the detection performance of the small targets, but repeated calculation among different scales increases the memory and the calculation cost, and is difficult to meet the real-time deployment requirement under an airborne environment. Therefore, an intensive study on the method for detecting the weak feature target with precision by using the optical image under the airborne condition is necessary to solve the above problems. Disclosure of Invention In order to overcome the above problems, the present inventors have conducted intensive studies and have proposed a method for detecting the target accuracy of weak features of an optical image under airborne conditions, comprising the steps of: inputting the shot optical image into a target feature extraction network by the unmanned aerial vehicle, and outputting a target feature image by the target feature extraction network; inputting the target feature map into a feature pyramid network, and outputting a fusion feature map by the feature pyramid network; And inputting the fusion feature map into a target region-of-interest network, and outputting the position of the target in the image by the target region-of-interest network. In a preferred embodiment, a densely nested attention network is established as the target feature extraction network, the densely nested attention network being a multi-layer nested network formed by stacking a plurality of U-shaped sub-networks, each U-shaped sub-network comprising an encoder and a decoder, the encoder and the decoder being in jump connection, A node is provided between the encoder and decoder of each U-shaped subnetwork, and the nodes of adjacent U-shaped subnetworks are hopped to connect such that each decoder is capable of receiving encoder output characteristics from adjacent layer U-shaped subnetworks. In a preferred embodiment, the output of the nodes in the closely nested attention network is expressed as: Li,j=Pmax(F(Li-1,j)),j=0 Where i denotes the ith downsampling layer along the encoder, j is the jth convolution node of the encoder of the U-shaped subnetwork, L i,j denotes the output of the jth convolution node of the ith downsampling layer, P max (·) denotes the maximum pooling with a stride of 2, F (·) denotes multiple concatenated convolutional layers of the same convolutional block, μ (·) denotes the upsampling layer, [ ·, · ] denotes the concatenated layer. In a preferred embodiment, the densely nested attention network also enhances information features through an attention mechanism. In a preferred embodiment, a channel attention network is provided in the closely nested attention network, the channel attention