CN-122024104-A - Anti-unmanned aerial vehicle tracking method and device based on multi-modal diffusion characteristic enhancement and entropy guiding self-adaptive sampling
Abstract
The invention relates to an anti-unmanned aerial vehicle tracking method based on multi-modal diffusion characteristic enhancement and entropy guiding self-adaptive sampling, which belongs to the field of computer vision and comprises the steps of obtaining infrared and visible light synchronous video streams of an anti-unmanned aerial vehicle system, initializing a target template frame and a current search frame, extracting multi-scale potential characteristic representation of infrared and visible light images based on a diffusion model, carrying out characteristic decoupling and detail reservation through a dense reversible neural network, calculating conditional information entropy of the dual-modal characteristics by utilizing an entropy guiding self-adaptive search area adjustment strategy, dynamically adjusting expansion factors of a search area to ensure that a target is always positioned in a view field, constructing a dense attention mask module, fusing the multi-modal characteristics, generating a target segmentation mask, guiding characteristic focusing through mask, outputting a target classification score graph and a boundary frame regression parameter by utilizing a pre-measuring head, optimizing the network by adopting a coverage type minimum point distance and a loss function, and outputting a target position.
Inventors
- TANG LUN
- Du Tanxi
- CHEN QIANBIN
Assignees
- 重庆邮电大学
Dates
- Publication Date
- 20260512
- Application Date
- 20260205
Claims (9)
- 1. An anti-unmanned aerial vehicle tracking method based on multi-mode diffusion characteristic enhancement and entropy guiding self-adaptive sampling is characterized by comprising the following steps: S1, acquiring an infrared and visible light synchronous video stream of an anti-unmanned aerial vehicle system, and initializing a target template frame and a current search frame; S2, constructing a double-flow feature extraction network based on a diffusion model, respectively inputting infrared and visible light images into diffusion denoising branches, extracting multi-scale potential feature representation, and performing feature decoupling and detail reservation through a dense reversible neural network module; s3, calculating the conditional information entropy of the bimodal feature by utilizing an entropy guiding self-adaptive search area adjustment strategy to evaluate the modal reliability, and dynamically adjusting the expansion factor of the search area by combining the existence probability of the target to ensure that the target is always positioned in the field of view; s4, constructing a dense attention mask module, fusing multi-mode features and generating a target segmentation mask, and guiding feature focusing through the mask; s5, outputting a target classification score graph and a boundary frame regression parameter by using the pre-measuring head, optimizing a network by using a coverage type minimum point distance intersection ratio loss function, and outputting a final target position.
- 2. The method for tracking an anti-unmanned aerial vehicle based on multi-modal diffusion feature enhancement and entropy-oriented adaptive sampling according to claim 1, wherein in step S2, the dual-flow feature extraction network based on the diffusion model comprises a forward diffusion process and a reverse feature extraction process: forward process pair input image Gradually adding Gaussian noise at time steps Noise image of (a) Expressed as: Wherein, the As a result of the standard gaussian noise, Is the cumulative noise variance coefficient; reverse procedure utilizes denoising network Predicting noise and extracting potential features The feature extraction formula is defined as: Wherein, the Represents an intermediate layer feature map of the denoising network, And (5) performing splicing operation for the channels.
- 3. The method for tracking an anti-unmanned aerial vehicle based on multi-modal diffusion feature enhancement and entropy-guided adaptive sampling according to claim 1, wherein in step S2, the dense reversible neural network module adopts an affine coupling layer structure to input features Split into two parts Output of The calculation is as follows: Wherein, the Is an arbitrary convolutional neural network transform function.
- 4. The anti-unmanned aerial vehicle tracking method based on multi-modal diffusion feature enhancement and entropy guiding adaptive sampling according to claim 1, wherein in step S3, the specific calculation process of the entropy guiding adaptive search area adjustment strategy is as follows: First, the condition information entropy of each mode is estimated by using the classification prediction head Quantization modality Of (3), wherein An infrared image is represented and the image is displayed, Representing a visible light image: Wherein, the Is of a mode shape Is characterized in that, In the case of a category label, In order to classify the probability distribution of the head output, Is of a mode shape Is used to determine the i-th feature vector of (c), N is the mode Is a feature vector total number of (1); Then, the existence probability of the fused target is calculated Dynamically calculating search factors in combination with modal uncertainty : Wherein, the For the current search factor to be a current, In order to be a step size, For a bimodal average normalized information entropy, As the entropy weight coefficient(s), For a preset maximum search factor threshold, As a threshold value of the confidence level, Is the reference search factor.
- 5. The method for tracking an anti-unmanned aerial vehicle based on multi-modal diffusion feature enhancement and entropy-oriented adaptive sampling of claim 1, wherein in step S4, the dense attention mask module fuses infrared and visible light features using a cross attention mechanism to generate fusion features : Wherein the method comprises the steps of The characteristic of the infrared light is indicated, Which is indicative of the characteristics of the image of the visible light, Represents a linear transformation, Q represents a query vector, K represents a key vector, V represents a value vector, Representing the dimensions of the keys, T representing the matrix transpose, softmax being the normalized exponential function, concat representing the splice operation along the channel dimension, conv representing the convolution operation; generating a spatial mask by Sigmoid activation For weighted fusion features: Wherein, the Representing element-wise multiplication.
- 6. The method for tracking an anti-unmanned aerial vehicle based on multi-modal diffusion feature enhancement and entropy-oriented adaptive sampling according to claim 1, wherein in step S5, the coverage type minimum point distance intersection ratio loss function is defined as follows: Aiming at a micro unmanned aerial vehicle target, constructing a distance penalty term based on a minimum bounding rectangle, and a loss function The calculation formula is as follows: Wherein, the For the cross-correlation of the predicted frame with the real frame, And The euclidean distance between the left upper corner and the right lower corner of the predicted frame and the real frame respectively, And The width and the height of the minimum rectangle which can tightly surround the prediction frame and the real frame are respectively; Total loss function Consists of classification loss, c-MPDIoU regression loss and mask loss: Wherein the method comprises the steps of In order to classify the focus loss, To cover the minimum point distance cross-correlation regression loss, In order for the mask to be lost, , , The weight coefficients of the classification loss, the c-MPDIoU regression loss, and the mask loss, respectively.
- 7. An apparatus comprising an image acquisition module, a processor, an input device, an output device, and a memory, wherein the image acquisition module, the processor, the input device, the output device, and the memory are interconnected, wherein the image acquisition module is configured to acquire synchronized infrared and visible video streams, and the memory is configured to store a computer program comprising program instructions, the processor being configured to invoke the program instructions to perform the multi-modal diffusion feature enhancement and entropy guided adaptive sampling-based anti-drone tracking method of any one of claims 1-6.
- 8. A computer readable storage medium, wherein a computer program is stored on the storage medium, which when executed by a processor, implements the anti-drone tracking method based on multi-modal diffusion feature enhancement and entropy-guided adaptive sampling of any one of claims 1-6.
- 9. A computer program product comprising a computer program which, when executed by a processor, implements the anti-drone tracking method based on multi-modal diffusion feature enhancement and entropy-oriented adaptive sampling of any one of claims 1-6.
Description
Anti-unmanned aerial vehicle tracking method and device based on multi-modal diffusion characteristic enhancement and entropy guiding self-adaptive sampling Technical Field The invention belongs to the technical fields of computer vision, deep learning and photoelectric signal processing, and relates to an anti-unmanned aerial vehicle tracking method and device based on multi-mode diffusion characteristic enhancement and entropy guiding self-adaptive sampling. Background With the rapid development of low-altitude economy, the miniature Unmanned Aerial Vehicle (UAV) is increasingly widely applied to the fields of aerial photography, logistics and the like, but also brings potential safety hazards such as 'black flight' navigation disturbance, privacy stealing and the like. The anti-unmanned aerial vehicle tracking technology is used as a core link of a countering system and is responsible for continuously locking the target position in a complex dynamic scene. The existing anti-unmanned aerial vehicle tracking method mainly faces four challenges. First, the problem of single modality limitations, the failure of visible images at night or under smoke shielding, the lack of texture details for infrared images. Although the existing multi-modal fusion method tries to combine the two, due to the obvious modal isomerism existing between different modalities, direct fusion often leads to feature mutual exclusion and even the effect of '1+1 < 1'. Secondly, the problem of losing high-frequency information in feature extraction is that the edge and texture details of a micro unmanned aerial vehicle are easily lost in the downsampling process of a traditional Convolutional Neural Network (CNN) or a Transformer. Diffusion Models (Diffusion Models), while having powerful generation and feature representation capabilities, have not been effectively used in feature extraction for real-time tracking tasks. Furthermore, there is a problem in that small objects are easily out of view. Existing local search trackers (e.g., focusTrack,) tend to move out of a fixed search area when the object is subject to severe motion or camera shake. Although FocusTrack proposes adaptive search area adjustment (SRA), it relies on only a single classification confidence, and does not consider interference with confidence in environmental imaging quality (e.g., modality information imbalance). Finally, there is also the problem of low regression accuracy of small targets, the unmanned aerial vehicle targets have very small duty ratios in the images, the sensitivity of the conventional IoU loss function to loss values is low when the targets are slightly offset, and the penalty term is disabled when the existing improved version (such as MPDIoU) processes very small targets because the denominator is usually the full-image size. Disclosure of Invention In view of the above, the present invention aims to provide an anti-unmanned aerial vehicle tracking method and device based on multi-modal diffusion characteristic enhancement and entropy guiding adaptive sampling. In order to achieve the above purpose, the present invention provides the following technical solutions: In a first aspect, the present invention provides an anti-unmanned aerial vehicle tracking method based on multi-modal diffusion feature enhancement and entropy-oriented adaptive sampling, comprising the steps of: S1, acquiring an infrared and visible light synchronous video stream of an anti-unmanned aerial vehicle system, and initializing a target template frame and a current search frame; S2, constructing a double-flow feature extraction network based on a diffusion model, respectively inputting infrared and visible light images into diffusion denoising branches, extracting multi-scale potential feature representation, and performing feature decoupling and detail reservation through a dense reversible neural network module; s3, calculating the conditional information entropy of the bimodal feature by utilizing an entropy guiding self-adaptive search area adjustment strategy to evaluate the modal reliability, and dynamically adjusting the expansion factor of the search area by combining the existence probability of the target to ensure that the target is always positioned in the field of view; s4, constructing a dense attention mask module, fusing multi-mode features and generating a target segmentation mask, and guiding feature focusing through the mask; s5, outputting a target classification score graph and a boundary frame regression parameter by using the pre-measuring head, optimizing a network by using a coverage type minimum point distance intersection ratio loss function, and outputting a final target position. Further, in step S2, the dual-flow feature extraction network based on the diffusion model includes a forward diffusion process and a reverse feature extraction process: forward process pair input image Gradually adding Gaussian noise at time stepsNoise image of (a