CN-121861613-B - Unmanned vehicle self-adaptive fusion multi-mode information all-weather sensing method and system
Abstract
The invention discloses an all-weather sensing method and system for unmanned vehicle self-adaptive fusion of multi-mode information, wherein the method comprises the steps of acquiring multi-source time sequence sensing data of surrounding environment of the unmanned vehicle, and carrying out time sequence alignment and pretreatment on the multi-source time sequence sensing data; based on the physical characteristics of the multi-source time sequence sensing data, respectively using a preset deep neural network backbone to perform initial characteristic coding on the multi-source time sequence sensing data, extracting independent characteristics, performing characteristic change on the independent characteristics to obtain multi-mode characteristics, mapping the multi-mode characteristics to a shared embedded space with uniform dimension, respectively performing time attention enhancement and space attention enhancement on the multi-mode characteristics, adaptively adjusting the weight of each mode characteristic based on the current environment condition, and fusing the multi-mode characteristics after space-time enhancement into uniform space-time joint characteristics. The invention has good universality and expandability, and is suitable for different unmanned vehicle platforms and various environment sensing tasks.
Inventors
- JIANG CHEN
- Tao Hongkang
- Zhong teng
- QIU HAOBO
- MENG LEI
Assignees
- 华中科技大学
Dates
- Publication Date
- 20260512
- Application Date
- 20260318
Claims (8)
- 1. The unmanned vehicle self-adaptive fusion multi-mode information all-weather sensing method is characterized by comprising the following steps of: acquiring multi-source time sequence sensing data of the surrounding environment of the unmanned vehicle, and performing time sequence alignment and preprocessing on the multi-source time sequence sensing data, wherein the multi-source time sequence sensing data comprises a visual mode, a radar mode and a laser mode; based on physical characteristics of multi-source time sequence sensing data, respectively using a preset deep neural network backbone to perform initial characteristic coding on the multi-source time sequence sensing data, extracting independent characteristics, performing characteristic change on the independent characteristics to obtain multi-mode characteristics, and mapping the multi-mode characteristics into a shared embedded space with uniform dimension; time attention enhancement is carried out on the multi-mode features, time sequence enhancement features are acquired, and the time sequence enhancement features are acquired Then, respectively carrying out average pooling and maximum pooling on the time sequence enhancement features along the channel dimension, splicing the two pooling results, extracting the spatial correlation through a convolutional neural network layer, and generating the spatial attention The figure: Wherein, the Representing an activation function; representing a splice operation along a channel dimension; indicating that the convolution kernel is of size Is a convolution operation of (1); For the generated two-dimensional spatial weight matrix, wherein the value approaching 1 represents the position as a key target, the value approaching 0 represents the position as noise or background, avgPool is average pooling, and MaxPool is maximum pooling; Time sequential enhancement features using generated weight matrices Element-by-element point multiplication under a broadcasting mechanism is performed: Wherein, the Is a feature tensor after spatial attention enhancement; After the feature tensor of the time attention enhancement and the space attention enhancement of the data of each mode is obtained as a weight matrix, the feature response intensity of each mode is extracted by using a global descriptor generating function, and the self-adaptive confidence weight of each mode is calculated through an exponential normalization operation: defining global feature scoring functions The global feature scoring function consists of global average pooling and a multi-layer perceptron: Computing normalized weights in a modal dimension using a Softmax function : Wherein the method comprises the steps of Representing different sensor modalities; representing the importance duty cycle of the modality in the final decision; Based on the calculated dynamic weight, weighting and summing the modal characteristics to output the final space-time joint characteristics : 。
- 2. The method for sensing the multi-modal information in all weather by the unmanned vehicle in the self-adaptive fusion mode according to claim 1, wherein the step of acquiring the multi-source time sequence sensing data of the surrounding environment of the unmanned vehicle and performing time sequence alignment and preprocessing on the multi-source time sequence sensing data comprises the following steps: the method comprises the steps of respectively obtaining multi-source time sequence sensing data of the surrounding environment of the unmanned vehicle by using a visible light camera, a millimeter wave radar and a laser radar: Wherein, the For a multi-modal joint input vector, For visual modality, RGB image data captured by a visible light camera is represented, defined as Wherein The height and width of the image, respectively; The radar mode is a target point trace or a distance-Doppler diagram returned by the millimeter wave radar and comprises distance, azimuth angle and relative radial speed information of a target; The laser mode is three-dimensional point cloud data acquired by a laser radar or characteristic tensor subjected to voxelization, and is defined as , wherein, In order to be a point of value, The number of characteristic channels; is a time sequence input set with a length of Is a space-time observation window of (1); after the time sequence input set is acquired, time stamps are marked for all mode data based on a unified clock source, interpolation alignment is carried out on the feature level, original data from different sensors are uniformly converted into a vehicle body coordinate system taking the mass center of the unmanned vehicle as an origin, and coordinate data of the original data in the vehicle body coordinate system are acquired.
- 3. The method for sensing the multi-modal information in all weather based on the unmanned vehicle self-adaptive fusion of claim 2, wherein the step of performing initial feature encoding on the multi-source time sequence sensing data by using preset deep neural network backbones respectively based on physical features of the multi-source time sequence sensing data comprises the following steps: based on the physical characteristics of each mode data, the pre-processed multi-source time sequence sensing data is subjected to initial characteristic coding by using a preset deep neural network backbone respectively so as to adapt to the data structures of different sensors: Wherein, wherein: A feature extraction network representing vision, radar and laser radar respectively, The visual modality extracts features through a convolutional neural network encoder, The radar modality extracts features through a radar feature extraction network, Extracting features by a point cloud feature encoder in a laser mode; The learning weight parameters corresponding to each network are obtained; For intermediate feature tensors output by the encoder, where the dimensions of the modal features are Not yet unified; mapping the intermediate feature tensors to a unified-dimensional shared embedding space using a feature projection layer: Wherein, the Is of a mode shape A corresponding linear projection matrix; Is a bias vector; to have the same channel dimension after mapping Is provided for the alignment features of (a).
- 4. The method for sensing the unmanned vehicle in all weather by adaptively fusing multi-modal information according to claim 1, wherein the step of enhancing the time attention of the multi-modal feature is: Time attention enhancement of multi-modal features, build length To continuously make the same mode Stacking alignment features of individual time steps along a time dimension, extracting the same spatial position in the past Feature sequence within a frame: , wherein, Represent the first Feature tensors after time and space alignment; using linear transformation matrices Mapping features to different subspaces: Calculating the current time And historical time of day Attention weighting between : Wherein the method comprises the steps of In order for the scaling factor to be a factor, For the current moment Is used to determine the feature vector of a query, For historical time Is used to determine the key feature vector of (1), For historical time Is a value feature vector of (1); For historical characteristic value Performing weighted aggregation to obtain time sequence enhancement features containing context information : 。
- 5. The unmanned vehicle self-adaptive fusion multi-mode information all-weather perception method according to claim 1, wherein after acquiring space-time joint characteristics, spatial resolution of the space-time joint characteristics is recovered, the number of characteristic channels is compressed into category numbers through a convolution layer, and then output is mapped to a [0,1] interval through a Sigmoid activation function, so that a pixel-level semantic probability map is obtained; And executing threshold judgment on the semantic probability map, generating a passable region binary mask, defining a safety region based on the value of the region binary mask, and guiding the vehicle to run in the safety region.
- 6. An unmanned vehicle self-adaptive fusion multi-modal information all-weather sensing system, characterized in that the unmanned vehicle self-adaptive fusion multi-modal information all-weather sensing system is used for realizing the unmanned vehicle self-adaptive fusion multi-modal information all-weather sensing method as claimed in any one of claims 1 to 5, and the system comprises: The acquisition processing module is used for acquiring multi-source time sequence sensing data of the surrounding environment of the unmanned vehicle and carrying out time sequence alignment and preprocessing on the multi-source time sequence sensing data, wherein the multi-source time sequence sensing data comprises a visual mode, a radar mode and a laser mode; The feature coding module is used for carrying out initial feature coding on the multi-source time sequence sensing data by using a preset depth neural network backbone respectively based on the physical features of the multi-source time sequence sensing data, extracting independent features, carrying out feature change on the independent features, obtaining multi-mode features, and mapping the multi-mode features into a shared embedded space with uniform dimension; the feature fusion module is used for respectively carrying out time attention enhancement on the multi-mode features, acquiring time sequence enhancement features and acquiring the time sequence enhancement features Then, respectively carrying out average pooling and maximum pooling on the time sequence enhancement features along the channel dimension, splicing the two pooling results, extracting the spatial correlation through a convolutional neural network layer, and generating the spatial attention The figure: Wherein, the Representing an activation function; representing a splice operation along a channel dimension; indicating that the convolution kernel is of size Is a convolution operation of (1); For the generated two-dimensional spatial weight matrix, wherein the value approaching 1 represents the position as a key target, the value approaching 0 represents the position as noise or background, avgPool is average pooling, and MaxPool is maximum pooling; Time sequential enhancement features using generated weight matrices Element-by-element point multiplication under a broadcasting mechanism is performed: Wherein, the Is a feature tensor after spatial attention enhancement; After the feature tensor of the time attention enhancement and the space attention enhancement of the data of each mode is obtained as a weight matrix, the feature response intensity of each mode is extracted by using a global descriptor generating function, and the self-adaptive confidence weight of each mode is calculated through an exponential normalization operation: defining global feature scoring functions The global feature scoring function consists of global average pooling and a multi-layer perceptron: Computing normalized weights in a modal dimension using a Softmax function : Wherein the method comprises the steps of Representing different sensor modalities; representing the importance duty cycle of the modality in the final decision; Based on the calculated dynamic weight, weighting and summing the modal characteristics to output the final space-time joint characteristics : 。
- 7. A computer readable storage medium having stored thereon a computer program, wherein the program when executed by a processor implements the method for the all-weather perception of information in an unmanned vehicle adaptive fusion multimodal system according to any of claims 1 to 5.
- 8. An electronic device comprising a memory, a processor and a computer program stored on the memory and running on the processor, wherein the processor implements the unmanned vehicle adaptive fusion multimodal information all-weather awareness method of any of claims 1-5 when executing the computer program.
Description
Unmanned vehicle self-adaptive fusion multi-mode information all-weather sensing method and system Technical Field The invention relates to the technical field of intelligent unmanned systems and automatic driving, in particular to an all-weather sensing method and system for self-adaptive fusion of multi-mode information of an unmanned vehicle. Background With the rapid development of automatic driving and unmanned system technology, unmanned vehicles are increasingly required to be applied in the fields of logistics distribution, urban inspection, park security, mining transportation and the like. The environment perception is used as a basis for realizing the independent decision and control of the safety of the unmanned vehicle, and the performance of the unmanned vehicle directly determines the safety and reliability of the whole system. Existing unmanned vehicle sensing systems mostly rely on single or small number of sensors, such as visible light cameras or lidar. However, under complex environmental conditions such as rain, fog, snow, night or glare, single-mode sensors are susceptible to severe interference, resulting in significant degradation or even failure of sensing accuracy. Although the multi-mode sensor fusion technology has been proposed, the existing method mostly adopts a simple feature splicing or fixed weight fusion strategy, is difficult to dynamically adapt to environmental changes, and cannot fully mine complementary information of multi-mode data in time and space dimensions, so that the robustness of the system is insufficient in all-weather complex scenes. Therefore, a high-robustness sensing system capable of adaptively fusing multi-modal information in time and space dimensions is needed to meet the practical application requirements of unmanned vehicles in complex environments. Disclosure of Invention Based on the above, the invention aims to provide an all-weather sensing method and system for self-adaptively fusing multi-mode information of an unmanned vehicle, which aim to solve the problems that the existing unmanned vehicle sensing system depends on a single or a small number of sensors under complex environmental conditions, the sensing result is easily interfered by factors such as rain, fog, snow, night, strong light and the like, and the existing multi-mode fusing method mostly adopts a static fusing strategy, is difficult to self-adaptively adjust the fusion weight in time and space dimensions, and causes insufficient robustness of the system. In order to achieve the above purpose, the invention provides an all-weather sensing method for unmanned aerial vehicle self-adaptive fusion of multi-mode information, which comprises the following steps: acquiring multi-source time sequence sensing data of the surrounding environment of the unmanned vehicle, and performing time sequence alignment and pretreatment on the multi-source time sequence sensing data; based on physical characteristics of multi-source time sequence sensing data, respectively using a preset deep neural network backbone to perform initial characteristic coding on the multi-source time sequence sensing data, extracting independent characteristics, performing characteristic change on the independent characteristics to obtain multi-mode characteristics, and mapping the multi-mode characteristics into a shared embedded space with uniform dimension; And respectively carrying out time attention enhancement and space attention enhancement on the multi-modal features, adaptively adjusting the weight of each modal feature based on the current environment condition, and fusing the multi-modal features subjected to space-time enhancement into unified space-time joint features. According to an aspect of the foregoing technical solution, the step of obtaining multi-source time sequence sensing data of an environment surrounding the unmanned vehicle, and performing time sequence alignment and preprocessing on the multi-source time sequence sensing data includes: the method comprises the steps of respectively obtaining multi-source time sequence sensing data of the surrounding environment of the unmanned vehicle by using a visible light camera, a millimeter wave radar and a laser radar: Wherein, the For a multi-modal joint input vector,For visual modality, RGB image data captured by a visible light camera is represented, defined asWhereinThe height and width of the image, respectively; The radar mode is a target point trace or a distance-Doppler diagram returned by the millimeter wave radar and comprises distance, azimuth angle and relative radial speed information of a target; The laser mode is three-dimensional point cloud data acquired by a laser radar or characteristic tensor subjected to voxelization, and is defined as , wherein,In order to be a point of value,The number of characteristic channels; is a time sequence input set with a length of Is a space-time observation window of (1); after the time sequence i