CN-121982239-A - Method and device for generating map, electronic equipment and storage medium

CN121982239ACN 121982239 ACN121982239 ACN 121982239ACN-121982239-A

Abstract

The application relates to a method, a device, an electronic device and a storage medium for generating a map. The method comprises the steps of obtaining perception data, wherein the perception data comprise depth image data, generating a target multi-channel feature map based on point cloud data corresponding to the depth image data, determining a target single channel feature map and a global affine plane based on the target multi-channel feature map, and generating an elevation map based on the target single channel feature map and the global affine plane.

Inventors

GUO DEJUN
LIU ZHIYU
WANG SIQIANG
SU YUZHU
HUANG JINGYANG
LI CHUANZHENG

Assignees

广州小鹏汽车科技有限公司

Dates

Publication Date: 20260505
Application Date: 20260126

Claims (10)

1. A method of generating a map, comprising: obtaining perception data, wherein the perception data comprises depth image data; generating a target multichannel feature map based on point cloud data corresponding to the depth image data; Determining a target single channel feature map and a global affine plane based on the target multi-channel feature map; and generating an elevation map based on the target single channel feature map and the global affine plane.
2. The method of claim 1, wherein the target multi-channel feature map comprises a plurality of grids, wherein generating the target multi-channel feature map based on point cloud data corresponding to the depth image data comprises: Acquiring a predefined empty grid pattern; according to the coordinate information of the 3D points in the point cloud data, distributing the 3D points to the target grids of the empty grid graph; determining one or more of a height mean value, a height standard deviation value, the number of points, an average distance from all 3D points to a target coordinate system and an occupancy marker of all 3D points in each grid; and generating the target multichannel feature map based on one or more of the height mean value, the height standard deviation value, the number of points, the average distance between all 3D points and the target coordinate system and the occupation mark of all 3D points in each grid.
3. The method of claim 2, wherein the determining a target single channel feature map and a global affine plane based on the target multi-channel feature map comprises: determining a residual error height value, a front-rear direction gradient term, a left-right direction gradient term and an overall height offset term of each grid based on the target multi-channel feature map; generating the target single-channel feature map according to the residual error height value of each grid; and generating the global affine plane according to the front-back direction gradient item, the left-right direction gradient item and the integral height offset item.
4. The method of claim 1, wherein the determining a target single channel feature map and a global affine plane based on the target multi-channel feature map comprises: inputting the target multichannel feature map to a trained elevation map prediction model to obtain the target single channel feature map and the global affine plane output by the elevation map prediction model; The elevation graph prediction model comprises a lightweight feature fusion module for a target detection task and a dual-task wharf, wherein the lightweight feature fusion module for the target detection task comprises a cavity convolution layer, and the dual-task wharf comprises a grid-by-grid height regression head and a global affine plane correction head.
5. The method of claim 1, wherein the perceptual data further comprises torso pose data corresponding to the depth image data, wherein the determining a target single channel feature map and a global affine plane based on the target multi-channel feature map comprises: And inputting the target multichannel feature map and the trunk gesture data into a trained elevation map prediction model to obtain the target single channel feature map and the global affine plane output by the elevation map prediction model.
6. The method according to claim 1 or 5, characterized in that the method further comprises: and carrying out masking processing on the depth image data to obtain a processed binary image.
7. The method of claim 6, wherein the determining a target single channel feature map and a global affine plane based on the target multi-channel feature map comprises: And inputting the target multichannel feature map, the trunk gesture data and the binary map into a trained elevation map prediction model to obtain the target single channel feature map and the global affine plane output by the elevation map prediction model.
8. An apparatus for generating a map, comprising: the acquisition module is used for acquiring perception data, wherein the perception data comprises depth image data; the first generation module is used for generating a target multichannel characteristic map based on point cloud data corresponding to the depth image data; the determining module is used for determining a target single channel feature map and a global affine plane based on the target multi-channel feature map; and the second generation module is used for generating an elevation map based on the target single channel feature map and the global affine plane.
9. An electronic device, comprising: processor, and A memory having executable code stored thereon, which when executed by the processor, causes the processor to perform the method of any of claims 1-7.
10. A computer readable storage medium having stored thereon executable code which when executed by a processor of an electronic device causes the processor to perform the method of any of claims 1-7.

Description

Method and device for generating map, electronic equipment and storage medium Technical Field The present application relates to the field of robotics, and in particular, to a method and apparatus for generating a map, an electronic device, and a storage medium. Background With the development of robotics, it is a critical requirement to be able to move autonomously in the human living and working environment. In these environments, the ground is often not flat, and there may be steps, slopes, potholes, or temporarily placed obstacles. Therefore, the real-time and accurate perception of the topography elevation information under and in front of the foot is of great importance for the robot to perform stable gait planning and safe path planning. In the related art, if a scheme of indirectly generating a height map by adopting 2D segmentation is adopted, complex calibration processing is required. For example, the technical scheme of grid statistics and rule filtering is easy to cause the problems of integral deviation or detail loss. It can be seen that none of the above solutions can generate an accurate elevation map. Disclosure of Invention In order to solve or partially solve the problems existing in the related art, the application provides a method, a device, electronic equipment and a storage medium for generating a map, and an accurate elevation map is generated through a target single channel feature map capable of representing high-frequency local details and a global affine plane capable of representing a low-frequency basal plane. The first aspect of the application provides a map generation method, which comprises the steps of obtaining perception data, wherein the perception data comprise depth image data, generating a target multi-channel feature map based on point cloud data corresponding to the depth image data, determining a target single-channel feature map and a global affine plane based on the target multi-channel feature map, and generating an elevation map based on the target single-channel feature map and the global affine plane. In some embodiments, the target multi-channel feature map includes a plurality of grids, generates the target multi-channel feature map based on point cloud data corresponding to the depth image data, including obtaining a predefined empty grid map, assigning 3D points to target grids of the empty grid map according to coordinate information of the 3D points in the point cloud data, determining one or more of a height average value, a height standard deviation value, a number of points, an average distance of all 3D points to the target coordinate system, and an occupancy marker for all 3D points within each grid, and generates the target multi-channel feature map based on the height average value, the height standard deviation value, the number of points, the average distance of all 3D points to the target coordinate system, and the one or more occupancy markers for all 3D points within each grid. In some embodiments, determining the target single channel feature map and the global affine plane based on the target multi-channel feature map includes determining a residual height value, a fore-aft direction gradient term, a left-right direction gradient term, and an overall height offset term for each grid based on the target multi-channel feature map, generating the target single channel feature map based on the residual height value for each grid, and generating the global affine plane based on the fore-aft direction gradient term, the left-right direction gradient term, and the overall height offset term. In some embodiments, determining the target single channel feature map and the global affine plane based on the target multi-channel feature map comprises inputting the target multi-channel feature map to a trained elevation map prediction model to obtain the target single channel feature map and the global affine plane output by the elevation map prediction model, wherein the elevation map prediction model comprises a lightweight feature fusion module for a target detection task and a dual-task docking head, the lightweight feature fusion module for the target detection task comprises a cavity convolution layer, and the dual-task docking head comprises a grid-by-grid height regression head and a global affine plane correction head. In some embodiments, the perception data further comprises torso gesture data corresponding to the depth image data, and determining the target single channel feature map and the global affine plane based on the target multi-channel feature map comprises inputting the target multi-channel feature map and the torso gesture data into a trained elevation map prediction model to obtain the target single channel feature map and the global affine plane output by the elevation map prediction model. In some embodiments, the method further comprises masking the depth image data to obtain a processed binary image. In some embodiments, de