CN-120894490-B - Automatic driving decision method based on three-dimensional scene reconstruction and multi-mode large model
Abstract
The application relates to the technical field of automatic driving, in particular to an automatic driving decision method based on three-dimensional scene reconstruction and a multi-mode large model, which comprises the steps of carrying out incremental static 3D Gaussian modeling on a static background of a current time step in a driving scene to obtain a background Gaussian model; the method comprises the steps of constructing a dynamic object of a current time step in a driving scene in a composite dynamic Gao Situ mode to obtain a dynamic object Gaussian diagram, integrating the dynamic object Gaussian diagram into a background Gaussian model to obtain a road condition Gaussian field of the current time step, rendering the road condition Gaussian field according to a multi-view image to obtain a road condition plane image of the current time step, and obtaining a driving path of the next time step based on the real-time road condition plane image and a trained automatic driving network system. When the driving path is predicted according to the road condition plane image, the intention understanding and planning alignment can be realized by combining the multi-mode large model LLM.
Inventors
- LIAN FENG
- CHEN HAO
- GUO ZHIQING
- XIE YANZHAO
- Liao Daizhi
- ZHENG WEIDONG
- XU JINMING
- ZHU QINGQING
- LI RUIYING
- LUO HUAN
- Tong Zhenjiang
Assignees
- 广州大学
Dates
- Publication Date
- 20260508
- Application Date
- 20250710
Claims (6)
- 1. An automatic driving decision method based on three-dimensional scene reconstruction and a multi-mode large model is characterized by comprising the following steps: performing incremental static 3D Gaussian modeling on a static background of a current time step in a driving scene to obtain a background Gaussian model, wherein the background Gaussian model of a time step above the parameter of the background Gaussian model of the current time step is updated as a priori; Carrying out composite dynamic Gao Situ construction on a dynamic object in the current time step in a driving scene to obtain a dynamic object Gaussian diagram; integrating the dynamic object Gaussian diagram into the background Gaussian model to obtain a road condition Gaussian field of the current time step; rendering the road condition Gaussian field according to the multi-view image to obtain a road condition plane image of the current time step; the automatic driving network system trains through a learning model module taking an intelligent body as a center and a learning model module taking an automatic driving automobile as a center; the automatic driving network system based on the real-time road condition plane image and the training completion obtains a driving path of the next time step, and the automatic driving network system specifically comprises: performing region annotation and segmentation on the road condition plane image to obtain BEV characteristic tensors; Determining a desired feature tensor from the BEV feature tensor; determining a driving path of the next time step according to the expected characteristic tensor; the method comprises the steps of carrying out regional annotation and segmentation on the road condition plane image to obtain BEV characteristic tensors, and specifically comprises the following steps: Cutting the self-vehicle and foreground object areas through a three-dimensional boundary box; the lane area is segmented by using a general vision scene mask; pooling the clipping region to generate characteristic representation of each agent, and splicing the characteristic representation along the batch direction to form an agent BEV characteristic tensor; Wherein the determining the desired feature tensor according to the BEV feature tensor specifically comprises: Adapting to a BEV space through a multi-layer perceptron and connecting to an agent's desired feature tensor, wherein a combination of language model and contrast learning is used to ensure that BEV features accurately reflect desired information; The method for determining the driving path of the next time step according to the expected characteristic tensor specifically comprises the following steps: extracting self-query features according to the expected feature tensor; and determining the driving path of the next time step according to the self-query characteristics.
- 2. The automatic driving decision method based on three-dimensional scene reconstruction and multi-mode large model according to claim 1, wherein the incremental static 3D gaussian modeling is performed on the static background of the current time step in the driving scene to obtain a background gaussian model, and the method specifically comprises the following steps: acquiring background point cloud data, background images and the background Gaussian model of the current time step; The background Gaussian model of the previous time step is used as a priori to determine the parameters of the background Gaussian model of the current time step, wherein the parameters of the background Gaussian model comprise a position, a covariance matrix, a spherical harmonic coefficient and opacity; And updating parameters of the background Gaussian model by the background point cloud data and the background image of the current time step to obtain the background Gaussian model of the current time step.
- 3. An automatic driving decision method based on three-dimensional scene reconstruction and multi-modal large model according to claim 2, characterized in that the background gaussian model of the above time step is used as a parameter for determining the background gaussian model of the current time step a priori, specifically comprising: and aligning the overlapping area of the background Gaussian model of the previous time step and the background Gaussian model of the current time step, merging the Gaussian coordinates of the overlapping area into the background Gaussian model of the previous time step, and initializing the background Gaussian model of the current time step.
- 4. An automatic driving decision method based on three-dimensional scene reconstruction and multi-modal large model according to claim 3, characterized in that said updating parameters of said background gaussian model with said background point cloud data and said background image of the current time step comprises: Updating the position and covariance matrix of a non-overlapping region in the background Gaussian model of the current time step by using the background point cloud data of the current time step; and updating the spherical harmonic function coefficient and the opacity in the background Gaussian model of the current time step by the background image.
- 5. The automatic driving decision method based on three-dimensional scene reconstruction and multi-mode large model according to claim 3, wherein in integrating the dynamic object gaussian diagram into the background gaussian model to obtain the road condition gaussian field of the current time step, the method specifically comprises: Determining the opacity of the dynamic object Gaussian according to the distance between the dynamic object Gaussian and the center coordinates of the camera; Taking the adjusted opacity as a Gaussian distribution center of the dynamic object, and calculating a change matrix from the Gaussian distribution of the dynamic object to the background Gaussian model; and splicing the dynamic object Gaussian graph into the background Gaussian model according to time sequence through a change matrix to obtain the road condition Gaussian field.
- 6. The automatic driving decision method based on three-dimensional scene reconstruction and multi-mode large model according to claim 1, wherein the rendering the road condition gaussian field according to the multi-view image obtains a dynamic driving scene of the current time step, and specifically comprises: Mapping the road condition Gaussian field to a two-dimensional plane through a 3D Gaussian splashing renderer to obtain the road condition plane image.
Description
Automatic driving decision method based on three-dimensional scene reconstruction and multi-mode large model Technical Field The application relates to the technical field of automatic driving, in particular to an automatic driving decision method based on three-dimensional scene reconstruction and a multi-mode large model. Background The intelligent traffic multi-mode sensing and decision system breaks through the limitations of the traditional SLAM and NeRF in dynamic scenes, and provides high-robustness environment modeling capability for L4-level automatic driving. Aiming at the problems of frequent change of static background and variability of dynamic objects in a complex dynamic scene, an incremental three-dimensional Gaussian modeling and dynamic composition reconstruction method is provided. The incremental learning is combined with Gaussian characterization, efficiency and accuracy are considered, the vehicle-mounted calculation force limit is adapted, the resource consumption and cost are greatly reduced, and meanwhile, multi-scale scene reconstruction from an open road to a closed park (such as a mining area and a warehouse) is supported, so that the L2-L4 level automatic driving requirement is covered. The vehicle can encounter a static background which exists in a large scale for a long time during running and a problem of frequent change along with objects in the moving background of the vehicle. Such as urban road construction, temporary obstacles suddenly appear in view. The incremental static three-dimensional Gaussian model provided by the method remarkably improves reconstruction accuracy by updating background geometry in real time, greatly reduces continuous errors of lane lines, such as accurate modeling of the lane lines, buildings and the like in urban NOA (Navigate on Autopilot, automatic auxiliary navigation system), and provides a reliable basis for high-precision positioning and path planning. In urban autopilot, real-time reconstruction of suddenly-intruded vehicles is particularly important and in unstructured scenarios. However, the existing 3D-GS (3D-Gaussia Splatting) is difficult to characterize multi-target long-term motion, such as pedestrian track is difficult to predict in a driving scene of dense traffic, but the comprehensive dynamics Gao Situ proposed by the intelligent traffic multi-mode sensing and decision system realizes accurate geometric recovery and track prediction of dynamic objects through space-time joint modeling, so that tracking errors are greatly reduced, errors are controlled within 0.5m, and the problems of missed detection and false detection are solved. In order to reduce the three-dimensional scene reconstruction cost of an automatic driving system, a large language model is introduced to assist the automatic driving system in making decisions, and the application provides an automatic driving decision method based on three-dimensional scene reconstruction and a multi-mode large model. Disclosure of Invention In order to overcome the problems in the related art, the application provides an automatic driving decision method based on three-dimensional scene reconstruction and a multi-mode large model, which comprises the following steps: performing incremental static 3D Gaussian modeling on a static background of a current time step in a driving scene to obtain a background Gaussian model, wherein the background Gaussian model of a time step above the parameter of the background Gaussian model of the current time step is updated as a priori; Carrying out composite dynamic Gao Situ construction on a dynamic object in the current time step in a driving scene to obtain a dynamic object Gaussian diagram; integrating the dynamic object Gaussian diagram into the background Gaussian model to obtain a road condition Gaussian field of the current time step; rendering the road condition Gaussian field according to the multi-view image to obtain a road condition plane image of the current time step; And the automatic driving network system trains through a learning paradigm module taking an intelligent body as a center and a learning paradigm module taking an automatic driving automobile as a center. In one embodiment, the incremental static 3D gaussian modeling of the static background of the current time step in the driving scene is performed to obtain a background gaussian model, which specifically includes: acquiring background point cloud data, background images and the background Gaussian model of the current time step; The background Gaussian model of the previous time step is used as a priori to determine the parameters of the background Gaussian model of the current time step, wherein the parameters of the background Gaussian model comprise a position, a covariance matrix, a spherical harmonic coefficient and opacity; And updating parameters of the background Gaussian model by the background point cloud data and the background image of the current ti