CN-121982433-A - Progressive unmanned aerial vehicle detection method and system based on super-resolution guidance

CN121982433ACN 121982433 ACN121982433 ACN 121982433ACN-121982433-A

Abstract

The invention discloses a progressive unmanned aerial vehicle detection method and system based on super-resolution guidance. The method aims at the problems that the image detection precision of a low-resolution unmanned aerial vehicle is low, the traditional super-resolution+detection series framework has insufficient feature synergy, pseudo texture interference, computational redundancy and the like, a double-resolution feature synergy framework is provided, features are extracted in parallel through fixed-scale and random-scale branches, the fixed-scale features are guided to be enhanced by utilizing the random-scale features, a candidate frame mapping mechanism is adopted to realize focusing of a target area, feature-level local super-resolution enhancement is only carried out on the target area, invalid computation of full-image super-resolution is avoided, and spatial consistency and semantic alignment of the target area of the double branches are ensured through cross-scale mapping loss and cosine similarity loss constraint. The invention realizes the deep coupling of super-resolution reconstruction and unmanned aerial vehicle detection, effectively suppresses pseudo-texture interference while improving the target recognition of the low-resolution unmanned aerial vehicle, and improves the detection precision.

Inventors

WEN ZHIWEI
HE WENPING
YE JIANBIAO
GUAN YUHAO
Dai Dekun
WU XIANDE
ZOU FAN
WANG SHUANGBIN

Assignees

浙江华是科技股份有限公司

Dates

Publication Date: 20260505
Application Date: 20260408

Claims (10)

1. The progressive unmanned aerial vehicle detection method based on super-resolution guidance is characterized by comprising the following steps of: S1, acquiring an original unmanned aerial vehicle image, respectively adopting fixed resolution and random resolution for processing, inputting the images into a backbone network of a detection model, and correspondingly extracting a fixed-scale feature map and a random-scale feature map; S2, carrying out scale alignment and feature interaction on the random scale feature map and the fixed scale feature map to generate an enhanced fixed scale feature map; s3, predicting a fixed-scale candidate frame and a random-scale candidate frame based on the enhanced fixed-scale feature map and the random-scale feature map respectively, mapping the fixed-scale candidate frame to the random-scale feature map, and calculating a candidate frame mapping loss value according to the mapping result and the random-scale candidate frame; S4, cutting out a local feature map from the fixed-scale feature map and the aligned random-scale feature map according to the mapping candidate frame and the random-scale candidate frame, and performing super-division enhancement on the local feature map to obtain two super-division local feature maps; s5, calculating semantic consistency loss values according to cosine similarity loss functions by using the fixed-scale candidate frames and the two superdivision partial feature images, and updating parameters of the detection model by combining the candidate frame mapping loss values; S6, repeating the steps S1-S5 until all the loss values and the values are converged, and obtaining a final detection model; S7, acquiring an unmanned aerial vehicle image to be detected, processing by adopting fixed resolution, inputting the unmanned aerial vehicle image to a final detection model for detection, and outputting unmanned aerial vehicle type and candidate frame coordinates.
2. The method according to claim 1, wherein S2 comprises: Performing scale alignment on the random scale feature map and the fixed scale feature map in a feature fusion module of the detection model to obtain an aligned feature map; performing element-by-element point multiplication on the alignment feature map and the fixed-scale feature map to obtain a modulation feature map; Overlapping the modulation characteristic map and the fixed-scale characteristic map element by element to obtain a preliminary enhancement characteristic map; And carrying out ReLU activation function operation on the preliminary enhancement feature map, and obtaining an enhanced fixed-scale feature map through batch normalization processing.
3. The method according to claim 1, wherein S3 comprises: the target candidate frames and the coordinates and the confidence coefficient thereof are predicted by the enhanced fixed scale feature map and the random scale feature map through a rough detection head of a detection model respectively; calculating a scale conversion factor according to the proportion of the fixed resolution and the random resolution, and mapping the fixed scale candidate frame to the random scale feature map according to the scale conversion factor to obtain a mapping candidate frame; performing boundary verification on the mapping candidate frames, and removing the mapping candidate frames exceeding the range of the random scale feature map; Calculating the cross ratio of the screened mapping candidate frames to the random scale candidate frames, and reserving the mapping candidate frames with the cross ratio larger than or equal to a preset cross ratio threshold; and calculating a candidate frame mapping loss value according to the reserved mapping candidate frame and the random scale candidate frame.
4. A method according to claim 3, wherein the candidate block mapping loss value is calculated according to the following formula: Lroi = α×L_reg+β×L_iou; When |GT-pred| is less than or equal to 1, L_reg=0.5× (GT-Pred) 2, and when |GT-pred| >1, L_reg= |GT-pred| -0.5. L_iou = -ln(IOU); Wherein alpha is a first preset weight coefficient, beta is a second preset weight coefficient, L_reg is a first coordinate regression loss value, L_iou is an intersection ratio loss value, lroi is a candidate frame mapping loss value, GT is the coordinate of the reserved mapping candidate frame, pred is the coordinate of the random scale candidate frame, and IOU is the average value of the intersection ratio of the reserved mapping candidate frame and the random scale candidate frame.
5. A method according to claim 3, wherein S4 comprises: Cutting out corresponding local feature images from the fixed-scale feature images and the random-scale feature images aligned with the fixed-scale feature images according to the coordinates of the reserved mapping candidate frames and the random-scale candidate frames; Inputting the cut local feature map into a local superdivision module for feature level superdivision enhancement to obtain a mapped local enhancement feature map and a random scale local enhancement feature map, wherein the local superdivision module comprises a channel attention sub-module and a residual error learning sub-module; noise filtering is carried out on the mapping local enhancement feature map and the random scale local enhancement feature map; and performing pseudo-texture avoidance on the filtered mapping local enhancement feature map and the filtered random scale local enhancement feature map to obtain a mapping oversubscribed local feature map and a random scale oversubscribed local feature map.
6. The method of claim 5, wherein S5 comprises: Aligning the fixed-scale candidate frame with the map oversubscribed local feature map and the random-scale oversubscribed local feature map to obtain an aligned fixed-scale candidate frame, the aligned map oversubscribed local feature map and the random-scale oversubscribed local feature map; carrying out feature flattening on the aligned fixed-scale candidate frames, the aligned mapped oversubstantial local feature map and the random-scale oversubstantial local feature map to obtain fixed-scale feature vectors, mapped oversubstantial local feature vectors and random-scale oversubstantial local feature vectors; Calculating the fixed-scale feature vector, the local feature vector after mapping the super-division and the local feature vector after random-scale super-division according to a cosine similarity loss function to obtain a semantic consistency loss value; inputting the aligned map super-divided local feature map and the aligned random scale super-divided local feature map into a fine detection head of a detection model to predict a target candidate frame and coordinates, categories and category confidence coefficients of the target candidate frame; calculating to obtain a category loss value and a second coordinate regression loss value according to the predicted target candidate frame, the coordinates, the category and the category confidence coefficient of the target candidate frame and the target calibration frame, the coordinates and the category of the manually calibrated original unmanned aerial vehicle image; Calculating according to the semantic consistency loss value, the candidate frame mapping loss value, the category loss value and the second coordinate regression loss value to obtain a target loss value; and reversely updating the parameters of the detection model according to the target loss value.
7. Progressive unmanned aerial vehicle detecting system based on super resolution guide, characterized by comprising: The feature extraction unit is used for acquiring an original unmanned aerial vehicle image, processing the original unmanned aerial vehicle image by adopting fixed resolution and random resolution respectively, inputting the original unmanned aerial vehicle image into a backbone network of the detection model, and correspondingly extracting a fixed-scale feature map and a random-scale feature map; The feature enhancement unit is used for carrying out scale alignment and feature interaction on the random scale feature map and the fixed scale feature map to generate an enhanced fixed scale feature map; The coarse prediction unit is used for predicting a fixed-scale candidate frame and a random-scale candidate frame based on the enhanced fixed-scale feature map and the random-scale feature map respectively, mapping the fixed-scale candidate frame to the random-scale feature map, and calculating a candidate frame mapping loss value according to the mapping result and the random-scale candidate frame; the super-division enhancement unit is used for cutting out a local feature map from the fixed-scale feature map and the aligned random-scale feature map according to the mapping candidate frame and the random-scale candidate frame, and performing super-division enhancement on the local feature map to obtain two super-division local feature maps; The inverse updating unit is used for calculating semantic consistency loss values according to cosine similarity loss functions of the fixed-scale candidate frames and the two superdivision local feature images, and updating parameters of the detection model by combining the candidate frame mapping loss values; The repeated training unit is used for repeating the characteristic extraction unit, the characteristic enhancement unit, the coarse prediction unit, the super-division enhancement unit and the reverse updating unit until all the loss values and the values are converged to obtain a final detection model; The detection unit is used for acquiring the unmanned aerial vehicle image to be detected, processing the unmanned aerial vehicle image only by adopting fixed resolution, inputting the unmanned aerial vehicle image to a final detection model for detection, and outputting unmanned aerial vehicle type and candidate frame coordinates.
8. The system of claim 7, wherein the feature enhancement unit comprises: the alignment subunit is used for carrying out scale alignment on the random scale feature map and the fixed scale feature map in a feature fusion module of the detection model to obtain an alignment feature map; The dot multiplication subunit is used for carrying out element-by-element dot multiplication on the alignment feature map and the fixed-scale feature map to obtain a modulation feature map; The superposition subunit is used for superposing the modulation characteristic diagram and the fixed-scale characteristic diagram element by element to obtain a preliminary enhancement characteristic diagram; And the enhancement subunit is used for carrying out ReLU activation function operation on the preliminary enhancement feature map, and obtaining an enhanced fixed-scale feature map through batch normalization processing.
9. The system of claim 7, wherein the coarse prediction unit comprises: The screening subunit is used for respectively predicting a target candidate frame and coordinates and confidence coefficient of the target candidate frame through a rough detection head of a detection model by the enhanced fixed-scale feature map and the random-scale feature map; The mapping subunit is used for calculating a scale conversion factor according to the proportion of the fixed resolution and the random resolution, and mapping the fixed scale candidate frame to the random scale feature map according to the scale conversion factor to obtain a mapping candidate frame; The rejecting subunit is used for conducting boundary check on the mapping candidate frames and rejecting the mapping candidate frames exceeding the range of the random scale feature map; A retaining subunit, configured to calculate a cross-over ratio between the screened mapping candidate frame and the random scale candidate frame, and retain the mapping candidate frame with the cross-over ratio greater than or equal to a preset cross-over ratio threshold; And the calculating subunit is used for calculating the mapping loss value of the candidate frame according to the reserved mapping candidate frame and the random scale candidate frame.
10. The system of claim 9, wherein the candidate block mapping loss value is calculated according to the following formula: Lroi = α×L_reg+β×L_iou; When |GT-pred| is less than or equal to 1, L_reg=0.5× (GT-Pred) 2, and when |GT-pred| >1, L_reg= |GT-pred| -0.5. L_iou = -ln(IOU); Wherein alpha is a first preset weight coefficient, beta is a second preset weight coefficient, L_reg is a first coordinate regression loss value, L_iou is an intersection ratio loss value, lroi is a candidate frame mapping loss value, GT is the coordinate of the reserved mapping candidate frame, pred is the coordinate of the random scale candidate frame, and IOU is the average value of the intersection ratio of the reserved mapping candidate frame and the random scale candidate frame.

Description

Progressive unmanned aerial vehicle detection method and system based on super-resolution guidance Technical Field The invention relates to the technical field of computer vision and target detection, in particular to a progressive unmanned aerial vehicle detection method and system based on super-resolution guidance. Background The unmanned aerial vehicle is widely applied in a plurality of fields by virtue of the advantages of flexibility, wide coverage range and the like, and the monitoring of the unmanned aerial vehicle is realized by depending on a ground monitor. In the visual monitoring task of the ground monitor to the unmanned aerial vehicle, the monitoring precision directly determines the monitoring effect. However, when the ground monitor monitors the unmanned aerial vehicle flying in the air, the unmanned aerial vehicle is affected by objective factors such as far monitoring distance, shooting angle offset and the like, the acquired unmanned aerial vehicle image often has the problems of low resolution and lack of unmanned aerial vehicle target characteristic information, and great challenges are brought to subsequent unmanned aerial vehicle detection. In order to solve the detection difficulty of the low-resolution unmanned aerial vehicle image, the prior art usually adopts a frame of super-resolution reconstruction and target detection in series, namely, the resolution of the image is improved through a super-resolution module, and then the processed image is input into a detection module to finish the identification of the unmanned aerial vehicle. On the one hand, the series frame ignores the intrinsic difference between a super-resolution reconstruction target and an unmanned aerial vehicle detection target, the reconstruction target focuses on the fidelity of an image pixel level, the detection target focuses on the feature extraction of an unmanned aerial vehicle semantic level, the two targets are misplaced to cause poor cooperativity, a large amount of redundant calculation exists in a series structure, the overall calculation efficiency is extremely low, on the other hand, the existing super-resolution module only uses the image reconstruction quality as a single training target, the generated texture details of the existing super-resolution module can not be matched with the unmanned aerial vehicle identification requirement, even fake textures and extra noise can be introduced, and the invalid or interference information can seriously interfere the discrimination logic of a subsequent detection module, so that the detection precision of the unmanned aerial vehicle is further reduced. Therefore, how to break through the limitation of the existing tandem frame, realize the efficient cooperation of super-resolution reconstruction and unmanned aerial vehicle detection, avoid pseudo-texture interference and improve calculation efficiency while improving the target feature recognition of the low-resolution unmanned aerial vehicle, and become a key problem to be solved in the current ground monitor unmanned aerial vehicle monitoring scene. Disclosure of Invention The embodiment of the invention provides a progressive unmanned aerial vehicle detection method and system based on super-resolution guidance, which are used for solving the problem that in the prior art, the target detection precision is low due to the fact that super-resolution reconstruction and unmanned aerial vehicle detection cannot be efficiently cooperated. In order to achieve the aim, on one hand, the invention provides a progressive unmanned aerial vehicle detection method based on super-resolution guidance, which comprises the steps of S1, obtaining an original unmanned aerial vehicle image, processing the original unmanned aerial vehicle image by adopting fixed resolution and random resolution respectively, inputting the original unmanned aerial vehicle image into a backbone network of a detection model, correspondingly extracting a fixed-scale feature image and a random-scale feature image, S2, carrying out scale alignment and feature interaction on the random-scale feature image and the fixed-scale feature image to generate an enhanced fixed-scale feature image, S3, predicting a fixed-scale candidate frame and a random-scale candidate frame respectively based on the enhanced fixed-scale feature image and the random-scale feature image, mapping the fixed-scale candidate frame to the random-scale feature image, calculating a candidate frame mapping loss value according to a mapping result and the random-scale candidate frame, S4, cutting out the fixed-scale feature image and the aligned random-scale feature image to obtain two super-partial feature images, S5, calculating the fixed-scale candidate frame and the two super-partial feature images according to the enhanced fixed-scale feature images, carrying out repeated detection on the fixed-scale candidate frame and the random-scale feature image, and obtaining a final