CN-122024195-A - Multitasking sensing method and device for off-road pavement

CN122024195ACN 122024195 ACN122024195 ACN 122024195ACN-122024195-A

Abstract

The application provides a multi-task perception method and device for an off-road pavement, wherein the method comprises the steps of processing an RGB image by utilizing a detail extraction model to obtain low-dimensional detail features, processing the RGB image by utilizing a semantic extraction model to obtain high-dimensional semantic features, processing point cloud data by utilizing a PointPillars network to obtain elevation features, fusing the low-dimensional detail features, the high-dimensional semantic features, the elevation features and IMU features by utilizing a fusion model to obtain fusion features, encoding the fusion features by utilizing a shared encoder to obtain general features, processing the general features by utilizing a pavement classification decoder to obtain a pavement classification result, processing the general features by utilizing a pavement segmentation decoder to obtain a pavement segmentation result, and processing the general features by utilizing a pavement roughness quantization decoder to obtain a pavement roughness quantization result. The application can realize the accurate classification of typical off-road pavement such as soil road, grass road, broken stone road and the like.

Inventors

LI ZHIWEI
ZHANG WEIZHENG
WU ZIHAO
WANG LI
ZHOU YANG
ZHANG YUQIAN
SHEN TIANYU
TAN QIFAN
WANG YADONG

Assignees

北京化工大学

Dates

Publication Date: 20260512
Application Date: 20260205

Claims (9)

1. A method of multi-task perception of off-road pavement, comprising: Acquiring RGB images, point cloud data and IMU data acquired by a self-vehicle on an off-road surface at the current moment; processing the RGB image by using the detail extraction model to obtain low-dimensional detail characteristics; processing the point cloud data by utilizing PointPillars networks to obtain elevation characteristics; Fusing the low-dimensional detail features, the high-dimensional semantic features, the elevation features and the IMU features by utilizing a fusion model to obtain fusion features; encoding the fusion features by using a shared encoder to obtain general features; The method comprises the steps of processing general features by a road surface classification decoder to obtain a road surface classification result, processing the general features by a road surface segmentation decoder to obtain a road surface segmentation result, and processing the general features by a road surface roughness quantization decoder to obtain a road surface roughness quantization result.
2. The method of claim 1, wherein the detail extraction model comprises a first downsampling unit, a second downsampling unit, a third downsampling unit and a first global average pooling layer connected in sequence, wherein the first downsampling unit comprises 23 x 3 convolutional layers, wherein the second downsampling unit comprises 3 x 3 convolutional layers, wherein the third downsampling unit comprises 3 x 3 convolutional layers; the method for processing the RGB image by utilizing the detail extraction model to obtain the low-dimensional detail characteristics comprises the following steps: processing the RGB image by using a first downsampling unit to obtain a first feature map The size is , wherein, For the high of the RGB image, Is the width of the RGB image; the first characteristic diagram is processed by a second downsampling unit Processing to obtain a second characteristic diagram The size is ; The third downsampling unit is used for the second characteristic diagram Processing to obtain a third feature map The size is Third feature map using first global averaging pooling layer Processing to obtain low-dimensional detail features 。
3. The method of claim 1, wherein the semantic extraction model comprises a Stem module, a first aggregation-expansion layer, a second aggregation-expansion layer, a third aggregation-expansion layer, a global semantic embedding module, an upsampling unit, and a second global average pooling layer, wherein the Stem module comprises parallel first and second branches and a stitching unit, wherein the first branch comprises a 3 x 3 convolution and a maximum pooling layer, wherein the second branch comprises a1 x1 convolution layer and a 3 x 3 convolution layer, wherein the aggregation-expansion layer comprises an aggregation convolution layer, a depth convolution layer, and a projection convolution layer, and wherein the global semantic embedding module comprises a third global average pooling layer and a residual unit The method for processing the RGB image by using the semantic extraction model to obtain high-dimensional semantic features comprises the following steps: Processing the RGB image by using a Stem module to obtain a fourth feature map The size is , wherein, For the high of the RGB image, Is the width of the RGB image; Using the first aggregation-expansion module to compare the fourth feature map Processing to obtain a fifth feature map The size is ; Fifth feature map by using second aggregation-expansion module Processing to obtain a sixth feature map The size is ; Utilizing a third aggregation-expansion module to compare a sixth feature map Processing to obtain seventh feature map The size is ; Seventh feature map with global semantic embedding module Processing to obtain eighth feature map The size is ; Utilizing up-sampling unit to make eighth feature map Processing to obtain a ninth feature map The size is ; Using a second global average pooling layer for a ninth feature map Processing to obtain high-dimensional semantic features 。
4. The method of claim 1, wherein the fusion model comprises a multi-modal attention unit and a gating fusion unit comprising a full connectivity layer and a Sigmoid activation function; the method for fusing the low-dimensional detail features, the high-dimensional semantic features, the elevation features and the IMU features by utilizing the fusion model to obtain fusion features comprises the following steps: processing the low-dimensional detail features, the high-dimensional semantic features, the elevation features and the IMU features by using a multi-mode attention unit to obtain an intermediate feature map; And processing the intermediate feature map by using a gating fusion unit to obtain fusion features.
5. The method of claim 1, wherein the shared encoder employs Layer fransformer network.
6. The method according to claim 1, wherein the method further comprises: Establishing a training set, wherein the training set comprises a plurality of groups of space-time synchronous RGB image samples, point cloud data samples and IMU data samples; Processing the RGB image sample by using a detail extraction model to obtain a low-dimensional detail characteristic sample; Processing the point cloud data sample by utilizing PointPillars networks to obtain an elevation characteristic sample; Fusing the low-dimensional detail feature sample, the high-dimensional semantic feature sample, the elevation feature sample and the IMU feature sample by utilizing a fusion model to obtain a fusion feature sample; Encoding the fusion characteristic sample by using a shared encoder to obtain a general characteristic sample; The method comprises the steps of obtaining a general feature sample, obtaining a road surface classification prediction result by using a road surface classification decoder, obtaining a road surface segmentation prediction result by using a road surface segmentation decoder to process the general feature sample, and obtaining a road surface roughness quantization prediction result by using a road surface roughness quantization decoder to process the general feature sample; Calculating a cross entropy loss value based on the road classification prediction result and the road classification real result ; Calculating the intersection-to-intersection ratio loss value based on the road surface classification prediction result and the road surface classification real result ; Calculating a mean square error loss value based on the road surface classification prediction result and the road surface classification real result ; Calculating the total loss value : Wherein, the For the task weight coefficient, satisfy ; And updating parameters of a detail extraction model, a semantic extraction model, pointPillars network, a fusion model, a shared encoder, a road classification decoder, a road segmentation decoder and a road roughness quantization decoder by using the total loss value.
7. A multi-task sensory device for off-road pavement, comprising: The acquisition unit is used for acquiring RGB images, point cloud data and IMU data acquired by the self-vehicle on the off-road pavement at the current moment; the first processing unit is used for processing the RGB image by utilizing the detail extraction model to obtain low-dimensional detail characteristics; The second processing unit is used for processing the point cloud data by utilizing PointPillars networks to obtain elevation characteristics; The fusion unit is used for fusing the low-dimensional detail features, the high-dimensional semantic features, the elevation features and the IMU features by utilizing the fusion model to obtain fusion features; The coding unit is used for coding the fusion characteristic by utilizing a shared coder to obtain a general characteristic; The multi-task perception unit is used for processing the general features by using the road surface classification decoder to obtain a road surface classification result, processing the general features by using the road surface segmentation decoder to obtain a road surface segmentation result, and processing the general features by using the road surface roughness quantization decoder to obtain a road surface roughness quantization result.
8. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method according to any of claims 1-6 when executing the computer program.
9. A computer readable storage medium storing computer instructions which, when executed by a processor, implement the method of any one of claims 1-6.

Description

Multitasking sensing method and device for off-road pavement Technical Field The application relates to the technical field of artificial intelligence, in particular to a multi-task perception method and device for an off-road pavement. Background Along with the increasingly urgent demands of fields such as geological exploration, emergency rescue, field operation and the like on off-road intelligent equipment, a road surface type sensing technology under a complex off-road environment becomes a core bottleneck for restricting the improvement of equipment performance. The off-road pavement has the characteristics of high unstructured, various pavement types (such as dry soil roads, weed rough roads and multi-particle-size broken stone roads), obvious influence of interference such as illumination change, dust shielding, car body bump and the like, and continuous and stable recognition precision is difficult to ensure by a traditional single-sensor sensing scheme. The problems of weak single-mode anti-interference, low efficiency of multi-mode fusion, missing multi-task capability and the like existing in off-road pavement perception, At present, the off-road pavement perception method has three problems that firstly, visual perception depends on a single feature extraction network, or low-dimensional details such as pavement boundaries and textures are lost due to excessive pursuit of high-dimensional semantics, or similar pavement (such as grass roads and gravel roads) are easy to be confused due to lack of semantic capabilities of category distinction due to focusing details, secondly, multi-mode fusion adopts a simple feature splicing or fixed weight weighting strategy, complementarity of modes such as vision, laser radar, IMU and the like is not fully excavated, extreme conditions of single-mode failure (such as dust shielding laser radar) in the off-road pavement cannot be dealt with, thirdly, the method is mainly a single-task design, only pavement types can be output, and engineering constraints of limited calculation power and high real-time requirements of a vehicle-mounted platform are not adapted. Disclosure of Invention In view of the above, the present application provides a method and apparatus for multi-task sensing on off-road pavement to solve the above-mentioned technical problems. In a first aspect, an embodiment of the present application provides a method for sensing multiple tasks on an off-road surface, including: Acquiring RGB images, point cloud data and IMU data acquired by a self-vehicle on an off-road surface at the current moment; processing the RGB image by using the detail extraction model to obtain low-dimensional detail characteristics; processing the point cloud data by utilizing PointPillars networks to obtain elevation characteristics; Fusing the low-dimensional detail features, the high-dimensional semantic features, the elevation features and the IMU features by utilizing a fusion model to obtain fusion features; encoding the fusion features by using a shared encoder to obtain general features; The method comprises the steps of processing general features by a road surface classification decoder to obtain a road surface classification result, processing the general features by a road surface segmentation decoder to obtain a road surface segmentation result, and processing the general features by a road surface roughness quantization decoder to obtain a road surface roughness quantization result. In a second aspect, an embodiment of the present application provides a multi-task sensing device for off-road pavement, including: The acquisition unit is used for acquiring RGB images, point cloud data and IMU data acquired by the self-vehicle on the off-road pavement at the current moment; the first processing unit is used for processing the RGB image by utilizing the detail extraction model to obtain low-dimensional detail characteristics; The second processing unit is used for processing the point cloud data by utilizing PointPillars networks to obtain elevation characteristics; The fusion unit is used for fusing the low-dimensional detail features, the high-dimensional semantic features, the elevation features and the IMU features by utilizing the fusion model to obtain fusion features; The coding unit is used for coding the fusion characteristic by utilizing a shared coder to obtain a general characteristic; The multi-task perception unit is used for processing the general features by using the road surface classification decoder to obtain a road surface classification result, processing the general features by using the road surface segmentation decoder to obtain a road surface segmentation result, and processing the general features by using the road surface roughness quantization decoder to obtain a road surface roughness quantization result. In a third aspect, an embodiment of the present application provides an electronic device, including a memory, a processor, and