CN-121982585-A - Target tracking method, device and equipment based on multi-unmanned aerial vehicle feature interaction
Abstract
The application relates to a target tracking method, device and equipment based on multi-unmanned aerial vehicle feature interaction. The method comprises the steps of obtaining a real-time image sequence by a certain unmanned aerial vehicle in a cluster, marking a first frame target to obtain a target template image, obtaining other unmanned aerial vehicle real-time image sequences and target template images, extracting features from a next frame of the unmanned aerial vehicle real-time image sequence, the unmanned aerial vehicle real-time image sequence and other target template images by a feature extraction unit to obtain self-search, other search, self-target template and other target template features, inputting the self-search, other search and other target template features into an unmanned aerial vehicle feature propagation unit, enhancing the self-search features by the other search and template features to obtain first enhanced features and second enhanced features, carrying out similarity estimation on the enhanced features and the self-target template features to obtain a first target response image and a second target response image, obtaining a target response image by a positioning sensing response image association unit in cooperation with key information, inputting the next frame target template image of the first frame by a detection head, and accordingly realizing follow-up frame tracking and improving real-time tracking accuracy of the target.
Inventors
- SUN HAO
- WU HAN
- JI KEFENG
- KUANG GANGYAO
Assignees
- 中国人民解放军国防科技大学
Dates
- Publication Date
- 20260505
- Application Date
- 20260122
Claims (9)
- 1. A target tracking method based on multi-unmanned aerial vehicle feature interaction, the method being implemented in each unmanned aerial vehicle in a unmanned aerial vehicle cluster, the method comprising: Acquiring a real-time image sequence, wherein the real-time image sequence comprises multi-frame images sequenced in time, marking a target to be tracked in a first frame image to obtain a target template image, and simultaneously acquiring the real-time image sequences of other unmanned aerial vehicles in the unmanned aerial vehicle cluster and the target template image; The method comprises the steps that a feature extraction unit is utilized to respectively extract features of a self real-time image sequence, a next frame image of other real-time image sequences, a self target template image and other target template images, and self search features, other search features, self target template features and other target template features are obtained; Inputting the self-searching feature, other searching features and other target template features into an unmanned aerial vehicle feature transmission unit, and respectively enhancing the self-searching feature by utilizing the other searching features and the other target template features to obtain a first enhancement feature and a second enhancement feature; Performing similarity estimation by using the first enhancement features, the second enhancement features and the self target template features to obtain a first target response graph and a second target response graph in a search area, and then adopting a positioning perception response graph association unit to adaptively cooperate key information in the first target response graph and the second target response graph to obtain a target response graph; inputting the target response image into a detection head to obtain a target template image in a next frame image of a first frame image in the self real-time image sequence, and sequentially obtaining the target template image in each frame image in the self real-time image sequence to realize target tracking.
- 2. The target tracking method based on multi-unmanned aerial vehicle feature interaction according to claim 1, wherein in the unmanned aerial vehicle feature propagation unit: Enhancing the consistency target characterization in the self-search features by using the other target template features through a time consistency enhancing branch to obtain the first enhancing feature; And through the space complementation fusion branch, the target representation in the self-search feature is promoted by combining the other search features with target information under different unmanned aerial vehicle visual angles, so as to obtain the second enhancement feature.
- 3. The multi-unmanned aerial vehicle feature interaction-based target tracking method of claim 2, wherein in the temporal consistency enhancement branch: after the self-searching feature passes through the linear layer, the forward feature and the reverse feature are obtained through the parallel forward branch and backward branch treatment respectively; after passing through a linear layer, the other target template features are added with the self-searching features passing through the linear layer, and then the local features guide the activation layer to generate an adjustment coefficient; And multiplying the adjusting coefficient with the forward characteristic and the reverse characteristic respectively to realize characteristic weighting, adding the weighted results, and adding the weighted results with the initial self-searching characteristic through a linear layer to obtain the first enhancement characteristic.
- 4. The target tracking method based on multi-unmanned aerial vehicle feature interaction according to claim 3, wherein the local feature guidance activation layer comprises a two-dimensional convolution block, a Sigmoid linear block and a two-dimensional convolution block which are sequentially connected.
- 5. The multi-unmanned aerial vehicle feature interaction-based target tracking method of claim 4, wherein in the spatially complementary fusion branch: after the self-searching feature passes through the linear layer, the forward feature and the reverse feature are obtained through the parallel forward branch and backward branch treatment respectively; after the other search features pass through the linear layer, the forward features and the reverse features are obtained through the parallel forward branches and the parallel backward branches respectively; the self-searching feature is spliced with other searching features after being processed by the linear layer to obtain splicing features, and the splicing features are guided by the global features to activate the layer to generate an adjusting coefficient; And multiplying the adjusting coefficient with the forward feature and the reverse feature corresponding to the self-searching feature respectively, multiplying the forward feature and the reverse feature corresponding to other searching features to realize feature weighting, adding the forward and reverse results weighted by the self-searching feature respectively, adding the forward and reverse results weighted by other searching features, adding the two added results, passing through a linear layer, and adding the added results with the initial self-searching feature to obtain the second enhancement feature.
- 6. The multi-unmanned aerial vehicle feature interaction-based target tracking method of claim 5, wherein in the temporal consistency enhancing branch and spatial complementary fusion branch: the forward branch is formed by sequentially connecting a forward one-dimensional convolution and a forward state space model; The backward branch is formed by sequentially connecting a backward one-dimensional convolution and a backward state space model.
- 7. The target tracking method based on multi-unmanned aerial vehicle feature interaction according to claim 6, wherein in the localization aware response map association unit: After the first target response diagram and the second target response diagram pass through the convolution layer and the Sigmoid linear block, respectively obtaining an X-axis compression characteristic and a Y-axis compression characteristic corresponding to the first target response diagram and an X-axis compression characteristic and a Y-axis compression characteristic corresponding to the second target response diagram through an X-axis average pooling layer and a Y-axis average pooling layer; splicing the X-axis compression characteristics corresponding to the first target response diagram and the second target response diagram to obtain X-axis compression splicing characteristics; Splicing Y-axis compression characteristics corresponding to the first target response diagram and the second target response diagram to obtain Y-axis compression splicing characteristics; Respectively carrying out linear splitting treatment on the X-axis compression splicing characteristic and the Y-axis compression splicing characteristic to obtain an X-axis first weighting coefficient and an X-axis second weighting coefficient corresponding to the X-axis compression splicing characteristic and a Y-axis first weighting coefficient and a Y-axis second weighting coefficient corresponding to the Y-axis compression splicing characteristic; and carrying out weighted fusion on the first target response graph and the second target response graph by using the first X-axis weighting coefficient, the second X-axis weighting coefficient, the first Y-axis weighting coefficient and the second Y-axis weighting coefficient to obtain the target response graph.
- 8. A target tracking device based on multi-unmanned aerial vehicle feature interaction, the device comprising: The system comprises a real-time image acquisition module, a target template image acquisition module and a target template image acquisition module, wherein the real-time image acquisition module is used for acquiring a real-time image sequence, the real-time image sequence comprises multi-frame images which are ordered in time, a target to be tracked is marked in a first frame image, and a real-time image sequence of other unmanned aerial vehicles in the unmanned aerial vehicle cluster and the target template image are acquired; The feature extraction module is used for extracting features of the self real-time image sequence, the next frame image of other real-time image sequences, the self target template image and other target template images by utilizing the feature extraction unit to obtain self search features, other search features, self target template features and other target template features; the feature interaction enhancement module is used for inputting the self-search feature, other search features and other target template features into the unmanned aerial vehicle feature propagation unit, and enhancing the self-search feature by utilizing the other search features and the other target template features to obtain a first enhancement feature and a second enhancement feature; The target response diagram obtaining module is used for carrying out similarity estimation by utilizing the first enhancement features, the second enhancement features and the target template features of the target response diagram obtaining a first target response diagram and a second target response diagram in a search area, and then adopting a positioning perception response diagram association unit to adaptively cooperate key information in the first target response diagram and the second target response diagram to obtain a target response diagram; The target tracking module is used for inputting the target response image into the detection head, obtaining a target template image in a next frame image of a first frame image in the self real-time image sequence, and sequentially obtaining the target template image in each frame image in the self real-time image sequence so as to realize target tracking.
- 9. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any of claims 1 to 7 when the computer program is executed.
Description
Target tracking method, device and equipment based on multi-unmanned aerial vehicle feature interaction Technical Field The application relates to the technical field of unmanned aerial vehicle image recognition, in particular to a target tracking method, device and equipment based on multi-unmanned aerial vehicle feature interaction. Background The unmanned aerial vehicle plays an important role in various fields such as intelligent security, military reconnaissance, urban management and the like by virtue of the advantages of low cost, high maneuverability, wide viewing angle and the like. However, as task complexity increases, limitations of aerial videos of single unmanned aerial vehicles on coverage and perception capability gradually appear, so that a multi-unmanned aerial vehicle cooperative system becomes a research hotspot. The target tracking of multiple unmanned aerial vehicles aims at combining multi-view information of different unmanned aerial vehicles, continuously predicts the subsequent state based on the initial state of the target, and is receiving more and more research attention due to the continuous observation capability of the target under the conditions of shielding, emergencies and complex backgrounds. However, the difference in viewing angles of different drones results in differences in target appearance and illumination, which presents a serious challenge for multi-drone target tracking. The twin network becomes a main flow form of current target tracking due to the advantages of accuracy and instantaneity, and the twin network generates a target response diagram by calculating the similarity between a target template and a search area to indicate the possible position of a target in a current frame. Under the normal form, unmanned aerial vehicle feature interaction becomes a research focus of target tracking of multiple unmanned aerial vehicles due to the fact that multiple aerial video information can be fused and cooperated, and through efficient feature interaction, perception information of different visual angles can be modeled uniformly, so that more comprehensive feature representation is captured. At present, a twin network based on a template sharing mechanism is the most mainstream feature interaction framework. The method comprises the steps of taking an image ASNet as a precursor method, introducing target templates of multiple unmanned aerial vehicles into a single machine searching area to respectively perform similarity estimation, designing a response graph fusion strategy to combine different results to predict, constructing a more robust target template by combining different visual angle information through a self-attention module TranMDOT for performing similarity estimation with the searching area, and enhancing key characteristics of the searching area in the machine by using a sparse self-attention module before cross-machine similarity estimation by using CRM. Despite some progress, multi-machine feature interaction networks based on template sharing mechanisms still face challenges in complex scenarios. Firstly, the features to be interacted are only mined from the cross-machine template, and the target characterization and the surrounding background information at the current moment are lacking, so that when the shielding and the background interference are faced, the similarity calculation with the target template is difficult due to the fact that the features of the search area of part of unmanned aerial vehicles are fuzzy, and tracking errors occur. Secondly, the interaction mode based on similarity estimation is highly sensitive to the visual angle difference, the difference of target templates of different unmanned aerial vehicles is obvious due to the visual angle difference (such as appearance, size, visible components and the like), the target information of one search area is difficult to be highly similar to all target templates, and therefore interaction results are poor and tracking performance is reduced. Finally, predefined feature fine-tuning strategies are difficult to adapt to complex and changeable dynamic environments, because the information richness of different unmanned aerial vehicles in the unmanned aerial vehicle working scene can be changed drastically, and great challenges are brought to such strategies. Disclosure of Invention Based on the above, it is necessary to provide a target tracking method, device and equipment based on multi-unmanned aerial vehicle feature interaction, which can accurately track a target. A target tracking method based on multi-unmanned aerial vehicle feature interaction, the method being implemented in each unmanned aerial vehicle in a cluster of unmanned aerial vehicles, the method comprising: Acquiring a real-time image sequence, wherein the real-time image sequence comprises multi-frame images sequenced in time, marking a target to be tracked in a first frame image to obtain a target template im