CN-121999328-A - Multi-mode tracking algorithm dynamic fusion decision method based on background complexity perception

CN121999328ACN 121999328 ACN121999328 ACN 121999328ACN-121999328-A

Abstract

The invention discloses a multi-mode tracking algorithm dynamic fusion decision method based on background complexity perception, and relates to the technical field of computer vision and multi-sensor fusion. The method comprises the steps of collecting multi-source mode data, preprocessing to generate unified format input data, constructing a multi-dimensional background complexity quantization model, extracting texture entropy TE, motion interference MI and illumination fluctuation coefficient LF, obtaining background complexity value BCV through entropy weight method self-adaptive weighting, dividing complexity grades, dynamically adapting fusion strategy according to the complexity grades, adopting feature level weighted fusion for low complexity grades, starting decision level fusion for medium and high complexity grades, introducing a mode redundancy check mechanism, counting tracking performance indexes based on sliding windows, updating model weighting coefficient and fusion strategy parameters on line through self-adaptive gradient descent method, and outputting target tracking results of self-adaptive scene change. The invention improves the adaptability and the robustness of the multi-mode tracking to complex scenes, and gives consideration to the real-time performance and the reliability.

Inventors

XIE CANRONG
WU ZHIWEN
HU YICHAN
CHEN QINGTING
LI YANG
Nong Langqi

Assignees

广西大学

Dates

Publication Date: 20260508
Application Date: 20251222

Claims (10)

1. The multi-mode tracking algorithm dynamic fusion decision method based on background complexity perception is characterized by comprising the following steps of: Synchronously acquiring multi-source mode data of visible light, infrared and millimeter wave radars, and generating multi-mode input data in a uniform format after time alignment, spatial registration and noise suppression processing; constructing a multi-dimensional background complexity quantization model, extracting texture entropy TE, motion interference MI and illumination fluctuation coefficient LF of the multi-mode input data, obtaining a background complexity value BCV through entropy weight method self-adaptive weighting calculation, and dividing a scene into three complexity levels of low, medium and high based on a dynamic threshold; The low complexity level adopts feature level weighted fusion, the medium and high complexity level starts decision level fusion and introduces a modal redundancy check mechanism; based on the sliding window statistical tracking performance index, the weighting coefficient and the fusion strategy parameter of the complexity quantization model are updated online through a self-adaptive gradient descent method, and a target tracking result of the self-adaptive scene change is output.
2. The method for dynamically fusing decision-making by using a multi-modal tracking algorithm based on background complexity perception according to claim 1, wherein the preprocessing process of the multi-source modal data comprises the following steps: the time synchronization module is adopted to realize the time stamp alignment of visible light data, infrared data and millimeter wave radar data, and the control time deviation is in a preset micro deviation range; projecting millimeter wave Lei Dadian cloud data to an image pixel plane through a coordinate system conversion matrix to finish space registration, wherein the coordinate system conversion matrix is determined by a sensor external parameter calibration result; and carrying out noise suppression on the infrared data by adopting bilateral filtering, carrying out Gaussian filtering smoothing treatment on the visible light data, removing invalid points by adopting straight-through filtering on Lei Dadian cloud data, and finally generating a multi-mode data matrix with uniform dimensions.
3. The method for dynamically fusing decision-making by using a multi-modal tracking algorithm based on background complexity perception according to claim 2, wherein the calculation process of the texture entropy TE is as follows: Selecting a background area with a preset size around the center of the target as an analysis window; Counting the distribution of gray values in the window to obtain a gray histogram, and calculating the occurrence probability of each gray level Wherein Is the first Pixel duty cycles of the individual gray levels within the analysis window; By the formula Calculating texture entropy, wherein A total number of gray levels for the background region; The larger the texture entropy value, the more complex the background texture, and the more disturbing the tracking.
4. The method for dynamically fusing decision-making by using a multi-modal tracking algorithm based on background complexity perception according to claim 3, wherein the calculation process of the motion interference MI is as follows: solving an optical flow field of a background area by adopting an LK optical flow method to obtain a two-dimensional motion vector of each background pixel Wherein 、 The motion speeds of the pixels in the horizontal direction and the vertical direction are respectively; Calculating the modulo length of all motion vectors Wherein Representing the size of the motion vector; By the formula Solving standard deviation of the module length, wherein Calculating a function for the standard deviation; The larger the motion disturbance degree value is, the more moving targets in the background are, and the higher the shielding risk on the tracking target is.
5. The method for dynamically fusing decision-making by using a multi-modal tracking algorithm based on background complexity perception according to claim 4, wherein the calculation process of the illumination fluctuation coefficient LF is as follows: selecting a current frame and a previous frame Background area of frame, wherein Representing the number of historical frames participating in calculation for a preset positive integer; respectively calculating gray average value of background area of each frame Wherein The gray average value of the background area of the current frame; Before taking Average value of frame gray average value as reference average value ; By the formula The light fluctuation coefficient is calculated and the light fluctuation coefficient is calculated, The average value of the gray average value of the frame background area; when the illumination fluctuation coefficient is larger than a preset threshold value, judging that the illumination is suddenly changed, and triggering a middle complexity level fusion strategy.
6. The method for dynamically fusing and deciding a multi-modal tracking algorithm based on background complexity perception according to claim 5, wherein the background complexity value BCV adopts an adaptive weighted calculation mode based on an entropy weight method, and the specific process is as follows: Carrying out min-max normalization processing on texture entropy TE, motion interference MI and illumination fluctuation coefficient LF to obtain a standardized result Wherein Corresponding to the normalized value of TE, A normalized value corresponding to the MI of the sample, A normalized value corresponding to LF; calculating entropy weight coefficient of each index Comprising: Calculate the first Entropy of each index, the formula is: Wherein For the number of sample frames, , Is the first Frame No Original values of the individual indicators; Calculate the first The difference coefficient of each index is expressed as the following formula: The larger the difference coefficient is, the higher the degree of distinction of the index to the complexity is; Calculate the first The entropy weight coefficient of each index is expressed as the following formula: and meet the following ; By the formula Calculating background complexity values ; The dynamic threshold includes a first threshold And a second threshold value When (when) Low complexity level when At a medium level of complexity when At a high level of complexity, and The tracking accuracy requirement according to actual application scene can be adjusted dynamically.
7. The method for dynamically fusing decision-making by using a multi-modal tracking algorithm based on background complexity perception according to claim 1, wherein the specific implementation of the feature level weighted fusion is as follows: extracting deep feature vectors from each mode data through a deep convolutional neural network, wherein the feature vectors of visible light and infrared data are output from a network full-connection layer, and the feature vectors of Lei Dadian cloud data are output from a point cloud feature extraction sub-network; Splicing the modal feature vectors into a fusion feature matrix; Adopting a1 multiplied by 1 convolution check to fuse the feature matrix for dimensional compression, and eliminating feature redundancy; Dynamic allocation of fusion weights based on peak sidelobe ratio PSR of each modal response diagram, wherein a weight coefficient calculation formula is as follows Wherein Is the first The weight of the seed modality, Is the first The higher the PSR value, the more prominent the peak of the response diagram, the less interference, And carrying out weighted fusion on the compressed feature matrix according to the weight to output a feature level fusion result.
8. The method for dynamically fusing decisions by using a multi-modal tracking algorithm based on background complexity perception according to claim 1, wherein the decision-level fusion comprises two stages of modal independent decision and improved evidence theory fusion: in the mode independent decision stage, a visible light mode adopts a related filtering algorithm to output a tracking boundary frame and confidence coefficient, an infrared mode adopts a particle filtering algorithm to output the tracking boundary frame and the confidence coefficient, and a radar mode adopts a Kalman filtering algorithm to output a target position and the confidence coefficient, wherein the confidence coefficient is a reliability evaluation value of each mode on a self tracking result; in the improved evidence theory fusion stage, the confidence coefficient of each mode is used as an evidence body, and a basic probability distribution function is corrected by introducing a conflict coefficient, wherein a correction formula is as follows Wherein In order to achieve a collision system, the system, In order to identify the frame(s), And synthesizing the corrected evidence body, and outputting the fused target position, the boundary frame and the comprehensive confidence coefficient.
9. The method for dynamically fusing decision-making by using a multi-modal tracking algorithm based on background complexity perception according to claim 1, wherein the specific flow of the modal redundancy check mechanism is as follows: Calculating the overlapping rate IOR of the tracking result and the decision-level fusion result which are independently output by each mode, wherein the overlapping rate calculation formula is as follows Wherein A tracking bounding box region for a single modality, For the tracking bounding box region of the fusion result, Is the area of intersection of the two regions, Is the union area of the two areas; When (when) When an overlapping threshold is preset, judging that the mode is an abnormal mode; And if the corrected overlapping rate is still smaller than a preset overlapping threshold, temporarily shielding the mode until the overlapping rate of continuous multiframes meets the preset overlapping threshold, and recovering the participation right of the mode.
10. The background complexity awareness based multi-modal tracking algorithm dynamic fusion decision method of claim 1 wherein the online optimization update process comprises: setting the size of a sliding window as the number of preset frames, and counting tracking performance indexes in the window, wherein the tracking performance indexes comprise a center position error CLE and an overlapping rate IOR; Calculating average overlap ratio within window Average center position error ; Construction of a loss function Wherein In order for the coefficient of balance to be present, Updating weighting coefficients for diagonal pixel length of image by adaptive gradient descent Wherein As a gradient of the loss function, In order to adapt the rate of learning to the user, 、 The weighting coefficients before and after updating are respectively used for synchronously updating the weighting parameters in the fusion strategy and the conflict coefficient threshold value in the evidence theory, so that the online self-adaptive optimization of the model parameters is realized.

Description

Multi-mode tracking algorithm dynamic fusion decision method based on background complexity perception Technical Field The invention relates to the technical field of computer vision and multi-sensor fusion, in particular to a method for realizing high-precision target tracking of visible light, infrared and millimeter wave radar multi-mode data based on multi-dimensional background complexity quantification, dynamic adaptation fusion strategy and on-line parameter optimization. Background Target tracking is one of core technologies in the field of computer vision, is widely applied to key scenes such as automatic driving, unmanned aerial vehicle monitoring and public safety, and has the core requirements of accurately positioning targets in continuous video frames and maintaining tracking stability. The single-mode tracking method relies on single sensor data and is easy to be limited in a complex actual environment, the characteristics of a visible light mode fail in a weak light and night scene, the infrared mode lacks detailed information, the millimeter wave radar mode positioning accuracy is limited, and the tracking requirement of various scenes is difficult to be met. The multi-mode tracking technology can break through the performance bottleneck of single mode by fusing the complementary information of different sensors such as visible light, infrared, millimeter wave radar and the like, remarkably improves the tracking robustness, and becomes a research hot spot in recent years. Along with the continuous expansion of application scenes, the requirements of the multi-mode tracking algorithm on the aspects of complex environment adaptability, instantaneity and precision are continuously improved, and the innovation and optimization of a fusion strategy are promoted. One of the core challenges of multi-mode tracking is the efficient fusion of multi-source data, and the prior art mainly adopts a fusion strategy of a fixed mode, including single forms such as feature level fusion, decision level fusion and the like. The feature level fusion is used for directly splicing or weighting and combining the features of each mode, although the calculation efficiency is higher, redundant information and conflict data among modes are not considered, the interference is easy to introduce in a background clutter scene, the decision level fusion is used for independently deciding through each mode and then synthesizing a result, and although the influence of single mode errors can be reduced, the calculation complexity is higher, and the real-time requirement is difficult to meet. The common problem of these fixed fusion strategies is the lack of adaptability to scene dynamic changes, and the same fusion logic is adopted regardless of background complexity, so that the efficiency redundancy is caused in a simple scene, the precision is insufficient in a complex scene, and the balance of tracking performance and calculation efficiency cannot be considered. The dynamic change of the background environment is a key factor affecting the multi-mode tracking performance, and the scene complexity in practical application is affected by various factors including the background texture clutter degree, the number of moving interference targets, the illumination intensity fluctuation and the like. Most of the conventional multi-mode tracking algorithms do not build a background complexity quantization model of the system, and the scene dynamic change characteristics are difficult to accurately describe only through simple environment judgment or experience threshold adjustment parameters. Some algorithms attempt to introduce a scene self-adaptive mechanism, but rely on environmental indexes with single dimensions, such as adjustment strategies based on illumination change or shielding conditions, and lack multi-dimensional comprehensive evaluation, so that scene judgment accuracy is insufficient. In addition, the parameter update of the existing algorithm is mostly offline preconfiguration or fixed step length adjustment, cannot be optimized in real time according to tracking performance feedback, is difficult to adapt to complex and changeable practical application scenes, and limits the robustness and generalization capability of the tracking algorithm. Although the multi-mode tracking technology has advanced to a certain extent, in practical application, a plurality of problems to be solved still face, namely, the fact that space-time alignment precision of multi-mode data is insufficient, data mismatch is easily caused by sampling frequency and resolution differences of different sensors to affect a fusion effect, the fact that a scientific quantification method and an adaptive weighting mechanism are lacked in background complexity assessment, interference degree of a scene on tracking is difficult to accurately reflect, the fact that dynamic adaptability of a fusion strategy and scene complexity is insufficient, fusion