CN-122023104-A - Space occupation prediction method and device, electronic equipment and vehicle

CN122023104ACN 122023104 ACN122023104 ACN 122023104ACN-122023104-A

Abstract

The application provides a space occupation prediction method and device, electronic equipment and a vehicle, and relates to the technical field of automatic driving. The method comprises the steps of obtaining multi-view image data around a vehicle, extracting multi-scale image features, constructing a plurality of mutually orthogonal feature planes, constructing a three-dimensional feature body based on the feature planes, determining a first feature part corresponding to a space within a preset range of the center of the vehicle and a second feature part corresponding to a space outside the preset range from the three-dimensional feature body, carrying out occupation prediction on the first feature part by utilizing first spatial resolution, carrying out occupation prediction on the second feature part by utilizing second spatial resolution, wherein the first spatial resolution is higher than the second spatial resolution, and finally outputting space occupation distribution of the surrounding environment of the vehicle. The application replaces complex three-dimensional convolution by constructing the orthogonal feature plane and adopts a double-resolution strategy, thereby obviously reducing the calculation cost of remote sensing.

Inventors

LUO XIANGHONG

Assignees

广州汽车集团股份有限公司

Dates

Publication Date: 20260512
Application Date: 20251225

Claims (10)

1. A space occupation prediction method, comprising: Extracting characteristics of the multi-view image data to obtain multi-scale image characteristics; Constructing a plurality of mutually orthogonal feature planes based on the multi-scale image features; Constructing a three-dimensional feature based on the plurality of mutually orthogonal feature planes; Determining a first characteristic part corresponding to a space within a preset range of a vehicle center and a second characteristic part corresponding to a space outside the preset range from the three-dimensional characteristic body; Performing occupation prediction on the first characteristic part by using a first spatial resolution, and performing occupation prediction on the second characteristic part by using a second spatial resolution, wherein the first spatial resolution is higher than the second spatial resolution; and outputting the space occupation distribution of the surrounding environment of the vehicle based on the prediction results of different resolutions.
2. The method of claim 1, wherein constructing a plurality of mutually orthogonal feature planes based on the multi-scale image features comprises: Respectively initializing reference points on the plurality of mutually orthogonal feature planes; establishing a coordinate mapping relation between the reference point and the multi-scale image features by utilizing internal parameters and external parameters of a vehicle camera; And based on the coordinate mapping relation, sampling and fusing the multi-scale image features to generate a plurality of mutually orthogonal feature planes.
3. The method of claim 1, wherein the plurality of mutually orthogonal feature planes comprises a bird's-eye view plane corresponding to a bird's-eye view angle, a front view plane corresponding to a front view angle, and a side view plane corresponding to a side view angle, wherein constructing a three-dimensional feature based on the plurality of mutually orthogonal feature planes comprises: Expanding the characteristics of the aerial view plane, the front view plane and the side view plane to three-dimensional space dimensions along respective normal directions; And polymerizing the features of the three extended planes to generate the three-dimensional feature body.
4. The method of claim 1, further comprising, after the constructing a plurality of feature planes orthogonal to each other, obtaining a history feature plane generated at a previous frame time, and interacting a feature plane of a current frame with the history feature plane by using a time sequence attention mechanism to obtain a feature plane after time sequence update; The constructing the three-dimensional feature body based on the feature planes which are orthogonal to each other comprises constructing the three-dimensional feature body based on the feature planes which are updated in time sequence.
5. The method of claim 2, wherein after generating the plurality of mutually orthogonal feature planes, further comprising performing an inter-plane feature interaction on the plurality of mutually orthogonal feature planes using a self-attention mechanism to obtain a spatially interacted feature plane; the construction of the three-dimensional feature body based on the plurality of mutually orthogonal feature planes comprises the construction of the three-dimensional feature body based on the feature planes after the spatial interaction.
6. The method of claim 1, wherein said performing occupancy prediction for the first feature using a first spatial resolution comprises: Performing an upsampling operation on the first feature portion; and inputting the up-sampled characteristics into a decoder for prediction to obtain an occupation prediction result of the first spatial resolution.
7. The method according to any one of claims 1 to 6, wherein the determining a second feature portion corresponding to a space outside the preset range from the three-dimensional feature body includes determining a feature other than the first feature portion in the three-dimensional feature body as the second feature portion or determining the three-dimensional feature body that is not clipped as the second feature portion; the performing occupancy prediction on the second feature portion using a second spatial resolution includes: and inputting the features after the hold or downsampling treatment into a decoder for prediction to obtain an occupation prediction result of the second spatial resolution.
8. A space occupation prediction apparatus, comprising: The data acquisition module is used for acquiring multi-view image data around the vehicle; The feature construction module is used for extracting multi-scale image features, constructing a plurality of mutually orthogonal feature planes based on the features, and further constructing a three-dimensional feature body; The system comprises a dual-resolution prediction module, a first space resolution prediction module, a second space resolution prediction module and a first space resolution prediction module, wherein the dual-resolution prediction module is used for determining a first feature part corresponding to a space within a preset range of a vehicle center and a second feature part corresponding to a space outside the preset range from the three-dimensional feature body; and the result output module is used for outputting space occupation distribution based on prediction results of different resolutions.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of any one of claims 1 to 7 when the program is executed by the processor.
10. A vehicle, characterized in that it comprises an electronic device according to claim 9.

Description

Space occupation prediction method and device, electronic equipment and vehicle Technical Field The application belongs to the technical field of automatic driving perception, and particularly relates to a space occupation prediction method and device, electronic equipment and a vehicle. Background In the context aware task of an autopilot system, a visual space Occupancy (Occupancy) algorithm provides critical three-dimensional context information for downstream decisions and control by estimating the Occupancy state and semantic tags of each voxel in a scene from a sequence of multi-view images. Existing related art models, such as VoxFormer, typically employ three-dimensional initialization feature queries (Query) or dense three-dimensional convolution networks to construct voxel features. However, this approach is extremely significant in terms of consumption of GPU computing resources and memory resources. Particularly, in the running process of the vehicle, unified high-resolution processing is adopted for near and far environment perception, so that a large amount of computing resources are wasted in a far background area with small influence on safety, and the severe requirements of the vehicle-mounted chip on real-time performance and high-efficiency resource utilization are difficult to meet. Disclosure of Invention The embodiment of the application provides a space occupation prediction method, a device, electronic equipment and a vehicle, and aims to solve the technical problems that the existing space occupation prediction algorithm is large in calculation resource consumption and difficult to deploy in real time at a vehicle end. In a first aspect, an embodiment of the present application provides a method for predicting space occupation, including: Extracting characteristics of the multi-view image data to obtain multi-scale image characteristics; constructing a plurality of mutually orthogonal feature planes based on the multi-scale image features; Determining a first characteristic part corresponding to a space within a preset range of a vehicle center and a second characteristic part corresponding to a space outside the preset range from the three-dimensional characteristic body; Performing occupation prediction on the first characteristic part by using a first spatial resolution, and performing occupation prediction on the second characteristic part by using a second spatial resolution, wherein the first spatial resolution is higher than the second spatial resolution; and outputting the space occupation distribution of the surrounding environment of the vehicle based on the prediction results of different resolutions. In the technical scheme, the three-dimensional space features are decoupled into a plurality of mutually orthogonal feature planes (for example, three-view TPV expression), so that huge calculation amount caused by full-space three-dimensional convolution is avoided. Further, by distinguishing the first characteristic part near the center of the vehicle from the second characteristic part far away, and predicting by adopting differentiated spatial resolution, the computing load of the edge area is greatly reduced on the premise of ensuring the perception precision of the core area (such as a near obstacle). The strategy remarkably improves the reasoning speed and the resource utilization rate of the model, so that the model is more suitable for being deployed on a vehicle-mounted embedded platform with limited resources. In an embodiment, the constructing a plurality of feature planes orthogonal to each other based on the multi-scale image features includes: Respectively initializing reference points on the plurality of mutually orthogonal feature planes; establishing a coordinate mapping relation between the reference point and the multi-scale image features by utilizing internal parameters and external parameters of a vehicle camera; And based on the coordinate mapping relation, sampling and fusing the multi-scale image features to generate a plurality of mutually orthogonal feature planes. In this embodiment, by means of this reference point initialization+geometric projection+fusion, the relevant features do not need to be searched blindly in the whole image range, but the features are directly captured according to the specific positions of the geometric prior image. This not only reduces the amount of computation significantly, but also ensures that the feature planes generated have a well-defined physical geometry. In an embodiment, the plurality of mutually orthogonal feature planes includes a bird's-eye view plane corresponding to a bird's-eye view angle, a front view plane corresponding to a front view angle, and a side view plane corresponding to a side view angle, and the constructing a three-dimensional feature based on the plurality of mutually orthogonal feature planes includes: Expanding the characteristics of the aerial view plane, the front view plane and t