CN-122023762-A - Multi-mode fusion unmanned aerial vehicle small target robust detection and recognition method and system

CN122023762ACN 122023762 ACN122023762 ACN 122023762ACN-122023762-A

Abstract

The invention relates to the technical field of artificial intelligence and low-altitude safety protection, in particular to a multi-modal fused unmanned aerial vehicle small target robust detection and recognition method and system, wherein the method comprises multi-modal data acquisition and preprocessing, double-branch feature extraction and cross-modal alignment, multi-modal feature fusion, target detection and recognition decision and result feedback and model optimization, by integrating visible light, infrared and radio frequency multi-source information and adopting an improved GD-YOLO network architecture and an aggregation-distribution fusion mechanism, effective extraction and complementary fusion of small target features are realized. The system comprises a multi-mode data acquisition module, a data preprocessing module, a double-branch feature extraction module, a multi-mode feature fusion module and a target detection and identification decision module, and has the advantages of high detection precision, strong environmental adaptability, high instantaneity and low deployment cost. The invention can be widely applied to low-altitude safety protection scenes such as airport clearance area monitoring, important facility protection and the like.

Inventors

LIU JIANXUN
JIN HAO
LIN HUI
Wen Yalin
ZHANG XUEYING
LI HUIBO
SHU YU
GAO MIN
ZHU FUJIAN
GUO XIAOLEI

Assignees

中国电子科技集团有限公司电子科学研究院

Dates

Publication Date: 20260512
Application Date: 20251216

Claims (10)

1. The multi-mode fusion unmanned aerial vehicle small target robust detection and identification method is characterized by comprising the following steps of: S1, synchronously acquiring multi-mode data of a target area in real time, preprocessing the multi-mode data, and storing the data into a buffer queue, wherein the multi-mode data comprises visible light images, infrared images and radio frequency signal data; S2, receiving image data by utilizing a visual branch to output a visual feature vector, receiving radio frequency signal data by a radio frequency branch to output a radio frequency feature vector, and executing cross-modal feature alignment on the visual feature vector and the radio frequency feature vector; s3, splitting the aligned features into low-order features and high-order features according to the feature level, generating low-order and high-order fusion feature vectors by using corresponding aggregation-distribution branches respectively, and obtaining multi-mode fusion feature vectors after weighted fusion; s4, inputting the multi-mode fusion feature vector into CLLAHead detection heads for screening to obtain an optimal detection frame, and carrying out unmanned aerial vehicle model fine classification and identification based on the feature vector corresponding to the optimal detection frame; And S5, combining the target model, the flying speed, the track and the distance from the warning area of the unmanned aerial vehicle obtained through recognition, and calculating threat level.
2. The method according to claim 1, wherein the method further comprises: and S6, pushing the detection and identification result to a monitoring center and a subsequent driving-off execution system, updating model parameters through online learning, expanding a dynamic characteristic database, and ensuring detection stability through multi-frame association tracking.
3. The method of claim 1, wherein the step of determining the position of the substrate comprises, In step S1, the preprocessing includes: sequentially performing white balance adjustment, self-adaptive histogram equalization and 3×3 Gaussian filtering treatment on the visible light image; performing non-uniformity correction, temperature calibration, adaptive temperature threshold segmentation and morphological open operation on the infrared image; And filtering and denoising the radio frequency signals, performing fast Fourier transform, converting the radio frequency signals into frequency spectrum data, and extracting characteristics of center frequency, bandwidth and modulation modes.
4. The method of claim 3, wherein the step of, In step S1, the preprocessing further comprises the steps of carrying out coordinate calibration on the visual data based on the spatial registration parameters, and storing the standardized and aligned modal characteristic data into a cache queue.
5. The method of claim 4, wherein the step of determining the position of the first electrode is performed, In step S2, the visual branch receives the image data after the fusion of the visible light and the infrared, performs a slicing operation on the input image through the Focus module, replaces the traditional downsampling operation with the double-scale hole convolution, extracts the second to five layers of multi-scale visual features through the improved GD-YOLO backhaul network, and outputs a visual feature vector; the improved GD-YOLO backhaul network is optimized based on YOLOv architecture, and adopts a darknet structure.
6. The method of claim 5, wherein the step of determining the position of the probe is performed, In step S2, the radio frequency branch receives the preprocessed radio frequency signal data, performs multiple rounds of one-dimensional convolution operation on the radio frequency spectrum data by using a one-dimensional convolution network, extracts spectrum features, modulation features and time features respectively, and outputs a radio frequency feature vector after integration; and aligning the cross-modal features, calculating semantic correlation weights between the visual features and the radio frequency features through the attention mechanism model, matching and adjusting semantic layers of the two types of features according to the correlation weights, and outputting aligned visual feature vectors and radio frequency feature vectors.
7. The method according to claim 1, characterized in that said step S3 comprises in particular the sub-steps of: receiving the aligned visual feature vector and the radio frequency feature vector, and combining the visual feature vector and the radio frequency feature vector into a bimodal alignment feature set; Splitting the alignment feature into a low-order feature and a high-order feature according to the feature level, and respectively inputting the low-order feature and the high-order feature into corresponding aggregation-distribution branches; in the low-order aggregation-distribution branch, performing size alignment on low-order features by adopting average pooling operation, performing feature depth fusion by using a multi-layer re-parameterized convolution block, and injecting local detail features by using a attention mechanism to generate a low-order fusion feature vector; in the high-order aggregation-distribution branch, cross-modal semantic fusion is realized through a Transformer block by carrying out pooling dimension reduction on high-order semantic features, convolution dimension reduction and splitting are carried out on fusion features, and semantic fusion features and local features are combined to generate a high-order fusion feature vector; And distributing self-adaptive weights for the low-order and high-order fusion features according to the environmental conditions and the feature confidence level, and generating a final multi-mode fusion feature vector after weighted summation.
8. The method according to claim 1, characterized in that said step S4 comprises in particular the sub-steps of: inputting CLLAHead the multi-mode fusion feature vector into a detection head, and generating a target candidate frame by combining a multi-level feature extraction and attention mechanism and optimizing a distributed focus loss function; screening candidate frames with confidence coefficient higher than a set threshold value, performing non-maximum value inhibition treatment, and reserving an optimal detection frame; Based on the feature vector corresponding to the optimal detection frame, carrying out unmanned aerial vehicle model fine classification and identification, and distinguishing unmanned aerial vehicle and interfering objects through feature matching and motion characteristic analysis; Combining the target model, the flying speed, the track and the distance from the warning area, calculating threat grade scores, and dividing the threat grade scores into three grades of low, medium and high; And carrying out confidence calibration on the threat level scores according to the environment complexity and the detection scene, and formatting and outputting detection results.
9. The method of claim 8, wherein the step of determining the position of the first electrode is performed, The set threshold may be dynamically adjustable.
10. A multi-modal fused unmanned aerial vehicle small target robust detection and recognition system for implementing the method of any one of claims 1 to 9, the system comprising: The system comprises a multi-mode data acquisition module, an integrated sensing device and a stabilizing mechanism, wherein the sensing device comprises a high-definition visible light camera, a thermal infrared imager and a wide-band radio frequency antenna array, and the stabilizing mechanism is a three-axis stabilizing turntable and is used for synchronously acquiring multi-mode data of a target area; The data preprocessing module is connected with the multi-mode data acquisition module and is used for carrying out multi-mode preprocessing, coordinate calibration and standardization on the multi-mode data and storing the processed data into a cache queue; The dual-branch feature extraction module is connected with the data preprocessing module and comprises a visual feature extraction sub-module, a radio frequency feature extraction sub-module and a cross-modal alignment sub-module, and is used for extracting multi-scale visual features and radio frequency features and realizing cross-modal semantic alignment; The multi-mode feature fusion module is connected with the double-branch feature extraction module, adopts an aggregation-distribution mechanism, comprises a low-order aggregation-distribution branch and a high-order aggregation-distribution branch and is used for generating multi-mode fusion feature vectors; The target detection and identification decision module is connected with the multi-mode feature fusion module, integrates CLLAHead a detection head, a non-maximum value suppression unit, a fine granularity classification sub-module, an interfering object region sub-module and a threat level evaluation sub-module and is used for outputting a formatted detection result; the result feedback and model optimization module is respectively connected with the target detection and identification decision module and the monitoring center and is used for updating model parameters and maintaining a dynamic characteristic database on line.

Description

Multi-mode fusion unmanned aerial vehicle small target robust detection and recognition method and system Technical Field The invention relates to the technical field of artificial intelligence and low-altitude safety protection, in particular to a multi-mode fusion unmanned aerial vehicle small target robust detection and identification method and system. Background Along with the rapid iteration and wide popularization of unmanned aerial vehicle technology, the unmanned aerial vehicle technology is deeply applied to the fields of logistics transportation, electric power inspection, video aerial photography and the like, but the phenomenon of low-altitude black flight also forms a serious threat to public safety. According to statistics, the number of safety incidents such as aviation accidents, important facility invasion and the like caused by unmanned aerial vehicle interference is hundreds of times in the world, the direct economic loss is up to hundreds of billions of dollars, the low-altitude protection pressure of sensitive areas such as airport clearance areas, nuclear power stations, government authorities and the like is continuously increased, and high-efficiency and reliable unmanned aerial vehicle detection and identification technology is needed to be used as safety guarantee. The current mainstream unmanned aerial vehicle detection technology forms four technical routes, and has the advantages and disadvantages of long detection distance and strong weather interference resistance, but weak recognition capability on Low-altitude Low-speed Small targets (LSS), high false alarm rate caused by the influence of ground clutter in urban environments, strong specificity and Low false alarm rate of the detection technology based on radio frequency signals, incapability of covering unmanned aerial vehicles with autonomous flight or silent flight, strict real-time requirements on signal analysis, low cost of the detection technology based on acoustics, limited working distance, serious disturbance on detection effect in urban high-noise environments, strong intuitiveness and abundant information of the detection technology based on optical images, sensitivity to illumination conditions, reduced performance inherent technology bottlenecks in night, haze and other complex environments, and faced technical bottlenecks in Small target detection scenes. The core defects of the prior art further restrict the practical effects of low-altitude security, namely firstly, the small target detection performance is insufficient, the traditional deep learning algorithm such as YOLO and Faster R-CNN is difficult to effectively distinguish when processing unmanned aerial vehicle targets with pixel ratio smaller than 32 multiplied by 32, the characteristic loss caused by scarce characteristic information and downsampling and the omission rate is high, secondly, the environment adaptability is poor, a single-mode system is difficult to cope with complex scenes such as illumination change, bad weather, electromagnetic interference and the like, an effective multi-mode complementary mechanism is lacking, thirdly, the false alarm rate is too high, the interferences such as birds, balloons and fallen leaves in the low-altitude environment are similar to the movement characteristics of the unmanned aerial vehicle, the traditional algorithm is difficult to effectively distinguish, fourthly, the real-time performance is difficult to meet the application requirement, the processing delay of an embedded equipment end is often over 200 milliseconds caused by the high calculation complexity of multi-mode data processing, the real-time early warning and response cannot be realized, and the system practicability is seriously affected. Disclosure of Invention The invention aims to overcome the defects of the prior art, provides a multi-mode fusion unmanned aerial vehicle small target robust detection and identification method and system, and is used for solving the core defects of the existing unmanned aerial vehicle detection technology in the aspects of insufficient small target detection performance, poor environmental adaptability, excessively high false alarm rate, difficult real-time performance meeting application requirements and the like, realizing high-precision, high-robustness and high-real-time detection and identification of the low-altitude unmanned aerial vehicle small target, and optimizing deployment cost and full life cycle use benefit. In order to achieve the above purpose, the present invention provides the following technical solutions: According to one aspect of the invention, a multi-mode fusion unmanned aerial vehicle small target robust detection and identification method is provided, which comprises the following steps: S1, synchronously acquiring multi-mode data of a target area in real time, preprocessing the multi-mode data, and storing the data into a buffer queue, wherein the multi-mode data comprises