CN-121999244-A - Traffic scene-oriented hyperspectral hybrid expert adapter target tracking method
Abstract
The invention discloses a traffic scene-oriented hyperspectral hybrid expert adapter target tracking method and system, and belongs to the technical field of hyperspectral visual target tracking. The method comprises the steps of preprocessing a hyperspectral vehicle tracking video sequence, dividing a hyperspectral channel into a multi-stream group, performing dynamic clipping and patch embedding to construct a multi-stream image token sequence, then introducing a hyperspectral interactive mixed expert tracking network comprising a multi-stream transducer extraction module, a mixed expert adapter and a space-time hidden state token evolution mechanism, performing joint modeling on hyperspectral features of a multi-stream template image and a search area image, further performing self-adaptive enhancement on the multi-dimensional features and suppressing background interference through a mixed expert module consisting of a spectrum expert, a space expert, a time expert and a vision expert, and finally performing target positioning prediction based on the enhanced features to output a target tracking result. The method can effectively excavate the interaction information between hyperspectral wavebands, and realize robust target tracking by combining space-time context, and is suitable for vehicle target tracking application in complex traffic scenes.
Inventors
- YUAN DI
- WEI PENGFEI
- CHEN RUI
- HU XIAOPENG
Assignees
- 西安电子科技大学广州研究院
Dates
- Publication Date
- 20260508
- Application Date
- 20260202
Claims (8)
- 1. The traffic scene-oriented hyperspectral hybrid expert adapter target tracking method is characterized by comprising the following steps of: (1) Step 1, preprocessing a traffic scene hyperspectral video sequence, dividing a hyperspectral image channel into a plurality of spectral flow groups, and extracting a corresponding template image flow and a corresponding search area image flow; (2) Step 2, constructing a visual transducer tracking network model based on a multi-stream interactive mixed expert adapter, wherein the model introduces a multi-stream transducer extraction module and a mixed expert adapter integrated with heterogeneous expert; (3) And 3, carrying out parallel extraction and interaction enhancement on the multi-stream hyperspectral features by utilizing the visual transformer tracking network model, realizing space-time information fusion by using hidden state tokens transferred between layers, and outputting a target tracking prediction result.
- 2. The traffic scene-oriented hyperspectral hybrid expert adapter target tracking method according to claim 1, wherein the preprocessing of the traffic scene hyperspectral video sequence specifically comprises: acquiring a hyperspectral video frame, and dividing an original hyperspectral channel into N spectrum groups according to spectrum band characteristics, wherein N is an integer greater than 1; Cutting the object center point to obtain N groups of template image streams and N groups of search area image streams, carrying out patch embedding treatment and superposition position coding on each group of image streams, and converting the multi-stream images into parallel high-dimensional feature token sequences.
- 3. The traffic scene oriented hyperspectral hybrid expert adapter object tracking method of claim 1, wherein the visual Transformer tracking network model comprises a plurality of cascaded multi-stream transform extraction modules, and the multi-stream transform extraction modules of the parallel hybrid expert adapter comprise a parallel self-attention layer, a normalization layer and a multi-layer perceptron; The parallel self-attention layer is used for respectively extracting response characteristics inside each spectrum flow, and the mixed expert adapter is used for carrying out depth characteristic exchange between N spectrum flows and the hidden state token.
- 4. The traffic scene oriented hyperspectral hybrid expert adapter target tracking method of claim 1, wherein the interaction flow of the hybrid expert adapter specifically comprises: the feature tokens of N spectral flows are subjected to dimension splicing with the hidden state tokens transmitted by the upper layer to construct a global perception sequence, and the global perception sequence is compressed by utilizing a linear dimension reduction layer and is input into a mixed expert model layer to carry out self-adaptive feature enhancement; And restoring the features processed by the mixed expert module into N updated spectral flow feature tokens through a parallel upsampling layer, and synchronously generating a hidden state token required by the next level.
- 5. The traffic scene oriented hyperspectral hybrid expert adapter target tracking method of claim 1, wherein four heterogeneous expert networks are integrated in the hybrid expert model layer and are used for extracting traffic target features from different dimensions, and the method specifically comprises the following steps: the spectrum expert adopts a one-dimensional convolution structure for extracting the wave band correlation characteristics of the cross-spectrum flow; a space expert, adopting a depth separable convolution structure, and extracting local space features by restoring the sequence into a space topological structure; A vision expert adopts a full-connection network of residual cascade for extracting target scale characteristics under different receptive fields; and a time expert, which adopts a cyclic neural network GRU structure and is used for extracting the time domain evolution characteristics of the target in the sequence dimension.
- 6. The traffic scenario-oriented hyperspectral hybrid expert adapter objective tracking method of claim 1, wherein the hybrid expert adapter further incorporates a gating switch network that sparsely distributes tokens to optimal combinations among the heterogeneous experts using capacity factors according to the degree of contribution of the input features, and optimizes the expert's load distribution in combination with auxiliary balance loss.
- 7. The traffic scene-oriented hyperspectral hybrid expert adapter target tracking method is characterized by comprising the steps of carrying out element-by-element summation and fusion on N spectral feature tokens processed by a multi-stream transducer extraction module to obtain high signal-to-noise ratio comprehensive target features, inputting the comprehensive target features to a positioning head detection module, and restoring a geometric boundary box of a target in a traffic scene.
- 8. The traffic scene-oriented hyperspectral hybrid expert adapter target tracking method is characterized by comprising a first module, a preprocessing module and a second module, wherein the first module is used for carrying out band grouping segmentation and patch embedding on hyperspectral images; The second module, the multi-stream transform extracting module is used for extracting multi-stream spectrum characteristics, and simultaneously carrying out characteristic interaction between different spectrums through a parallel mixed expert adapter and updating a space-time hidden state token; The third module is used for carrying out aggregation normalization on the spectrum flow characteristics to generate enhanced target characterization, and the tracking prediction module is used for decoding the target position according to the enhanced characteristics and outputting a target tracking track.
Description
Traffic scene-oriented hyperspectral hybrid expert adapter target tracking method Technical Field The invention relates to the technical field of hyperspectral visual target tracking, in particular to a traffic scene-oriented hyperspectral hybrid expert adapter target tracking method which is suitable for vehicle high-performance tracking under traffic scenes with much interference of similar objects and is responsible for illumination environments. Background With the rapid development of intelligent traffic systems and autopilot technologies, visual target tracking is receiving extensive attention from academia and industry as a basic technology for realizing vehicle behavior analysis, traffic flow monitoring and autopilot perception. Current visual tracking methods are mostly based on conventional red, green, blue (RGB) three-channel color images. However, in complex traffic scenarios, RGB images are extremely susceptible to interference from environmental factors such as dim light or intense light changes, contrast and color fidelity of RGB images are greatly reduced at tunnel entrances, night driving or direct glare, background camouflage is co-color interference, target vehicles tend to be highly similar in visual color to surrounding vehicles or background environments (e.g., dark asphalt pavement, green vegetation) on busy roads, and tracker drift is easily caused, and occlusion problems, in which vehicles are frequently occluded by traffic signs, trees or other large vehicles during driving, result in loss of target features. The introduction of hyperspectral imaging techniques offers new possibilities to solve the above-mentioned problems. The hyperspectral image has extremely high spectral resolution, and can capture the tiny material difference (spectral characteristics) of the target object, so that the target vehicle can still be accurately identified through spectral fingerprint under the condition that the RGB image fails. However, the application of hyperspectral images to traffic target tracking still faces a number of technical bottlenecks: 1) High-dimensional data redundancy and computational load hyperspectral images contain tens or hundreds of bands, the data volume is huge, and high redundancy exists between the bands. The existing method generally adopts dimension reduction or simple feature stacking, which not only easily causes the loss of key spectrum information, but also brings huge calculation pressure, and is difficult to meet the real-time requirement of traffic scenes. 2) The limitation of static feature extraction is that traffic scenes are highly dynamic, and the motion features, the gesture changes of vehicles and the environment interference show different behaviors on different spectrum frequency bands. The existing fixed parameter model is difficult to carry out self-adaptive expert processing on spectrum information with different characteristics, so that the flexibility of feature extraction is insufficient. 3) Space-time context information is underutilized-video tracking is a time-series task. The existing hyperspectral tracker focuses on fusion of single-frame spatial features and spectral features, and lacks of deep modeling and continuous transmission of historical states (motion trend and semantic features at the previous moment) of a target, so that the model cannot always effectively recover tracking after long-term occlusion is processed. 4) The interaction mechanism has low efficiency, namely, how to efficiently exchange information among different band packets and restrain interference of background noise in the multi-stream feature processing is a difficulty to be solved in the hyperspectral tracking field at present. Therefore, how to design a robust tracking method which can fully excavate depth interaction information between hyperspectral wave bands, dynamically adjust feature processing strategies according to traffic environment and consider time-space evolution rules has become an important research direction in the current traffic vision perception field. Disclosure of Invention In order to solve the technical problems, the invention aims to provide a traffic scene-oriented hyperspectral hybrid expert adapter target tracking method, which can improve tracking performance and training efficiency of processing challenges such as analog interference, illumination interference and the like in a complex traffic scene through a multi-flow transducer extraction module and a hybrid expert adapter. The first technical scheme adopted by the invention is that the hyperspectral hybrid expert adapter target tracking method facing the traffic scene comprises the following steps: preprocessing the hyperspectral vehicle tracking video sequence to obtain a multi-stream grouped image data stream; Extracting hyperspectral features of the multi-stream template image and the search area image, and constructing a hyperspectral interaction mixed expert tracking network