CN-122023670-A - Automatic driving scene reconstruction method, system, device and storage medium based on space-time consistency constraint

CN122023670ACN 122023670 ACN122023670 ACN 122023670ACN-122023670-A

Abstract

The invention relates to the technical field of computer vision and automatic driving scene reconstruction, in particular to an automatic driving scene reconstruction method, system, device and storage medium based on space-time consistency constraint, which comprises the steps of obtaining multi-source data of a dynamic city scene, establishing a plurality of Gaussian ellipsoids according to the multi-source data of the dynamic city scene, and constructing a Gaussian sputtering model based on space-time high-frequency consistency constraint; the method comprises the steps of constructing a joint feature vector comprising a space embedding vector, a high-frequency time embedding vector and an instance embedding vector, constructing a time high-frequency consistency constraint and optimizing Gaussian ellipsoid attribute according to the joint feature vector, constructing a space high-frequency consistency constraint and optimizing static Gaussian ellipsoid attribute, training a Gaussian sputtering model according to the optimized Gaussian ellipsoid, and rendering to-be-processed dynamic urban scene multi-source data through the trained Gaussian sputtering model. The method has the beneficial effects of good high-frequency time sequence consistency of the automatic driving three-dimensional scene reconstruction result.

Inventors

LIU JI
WANG JIAJU
WU YINGBO
GAN LINHAO

Assignees

重庆大学

Dates

Publication Date: 20260512
Application Date: 20260403

Claims (10)

1. The automatic driving scene reconstruction method based on space-time consistency constraint is characterized by comprising the following steps of: Acquiring dynamic city scene multisource data, establishing a plurality of Gaussian ellipsoids according to the dynamic city scene multisource data, and constructing a Gaussian sputtering model based on time-space high-frequency consistency constraint; forming four-dimensional space-time coordinates by the three-dimensional space coordinates and the time dimension of each Gaussian ellipsoid, constructing space embedding vectors and high-frequency time embedding according to the four-dimensional space-time coordinates, and constructing instance embedding vectors according to the multi-source data of the dynamic city scene; Constructing a time high-frequency consistency constraint according to the joint feature vector, and optimizing Gaussian ellipsoid attributes based on the time high-frequency consistency constraint; constructing space high-frequency consistency constraint and optimizing static Gaussian ellipsoid attribute; Training a Gaussian sputtering model based on time-space high-frequency consistency constraint according to the optimized Gaussian ellipsoid; Rendering the multi-source data of the dynamic urban scene to be processed through a trained Gaussian sputtering model based on time-space high-frequency consistency constraint.
2. The automated driving scene reconstruction method according to claim 1, wherein the step of constructing the spatial embedding vector from four-dimensional space-time coordinates comprises: constructing four-dimensional space-time coordinates according to the position of each Gaussian point in the scene where the Gaussian ellipsoid is located in space and time; calculating a four-dimensional resolution vector of the four-dimensional space-time coordinates in each resolution level; calculating the fractional offset of each resolution coordinate in the four-dimensional resolution vector in grid points in the hash grid; Generating spatio-temporal features of each resolution level based on fractional offsets of each resolution coordinate in the four-dimensional resolution vector; and splicing the space-time characteristics under all resolution levels to obtain the space embedded vector of the Gaussian ellipsoid.
3. The method for reconstructing an automatic driving scene according to claim 1, wherein when the high-frequency time embedding is constructed according to four-dimensional space-time coordinates, generating more than one time-frequency component according to the four-dimensional space-time coordinates, and accumulating all the time-frequency components to obtain the high-frequency time embedding.
4. The method for reconstructing an autopilot scene according to claim 1, wherein the step of obtaining a joint feature vector from the spatial embedding vector, the high frequency temporal embedding and the instance embedding vector, constructing a temporal high frequency consistency constraint from the joint feature vector, and optimizing gaussian ellipsoid attributes based on the temporal high frequency consistency constraint comprises: splicing the high-frequency time embedded vector and the instance embedded vector to obtain a joint feature vector; inputting the combined feature vector into a Gaussian deformation network, and predicting to obtain an attribute residual error of the Gaussian ellipsoid in the time dimension; And carrying out time recursion updating on the center position, the scale and the rotation parameters of the Gaussian ellipsoid according to the residual error of the Gaussian ellipsoid attribute.
5. The automated driving scenario reconstruction method according to claim 1, wherein the step of constructing a spatial high frequency consistency constraint and optimizing static gaussian ellipsoid properties comprises: acquiring a Gaussian ellipsoid of a static area, and projecting the Gaussian ellipsoid of the static area to image planes under different visual angles to obtain a static area rendering image; Carrying out bilateral filtering on a real image acquired by a vehicle to obtain a low-frequency component, calculating image gradient information based on the real image acquired by the vehicle, and constructing a weight graph for emphasizing a space high-frequency structure by combining the image gradient information; constructing a structural similarity loss function according to the real image and the static area rendering image acquired by the vehicle; constructing a multi-view space high-frequency consistency loss function; The properties of the static gaussian ellipsoids are optimized based on a multi-view spatial high-frequency consistency loss function and a structural similarity loss function.
6. The method for reconstructing an automatic driving scene according to claim 1, wherein said step of training a gaussian sputtering model based on a space-time high frequency consistency constraint according to an optimized gaussian ellipsoid comprises: obtaining an autopilot scene rendering image according to the optimized Gaussian ellipsoid rendering; constructing a joint objective function according to the automatic driving scene rendering image; and optimizing the parameters of the Gaussian ellipsoid and the parameters of the model through a joint objective function.
7. An automatic driving scene reconstruction system based on space-time consistency constraint, which is suitable for the automatic driving scene reconstruction method based on space-time consistency constraint as claimed in any one of claims 1 to 6, and is characterized by comprising the following steps: The input unit is used for inputting the multi-source data of the dynamic city scene; The Gaussian initialization and scene representation construction unit is used for acquiring initial geometric representation of the dynamic city scene according to multi-source data of the dynamic city scene and constructing a Gaussian ellipsoid; The space-time embedding construction unit is used for constructing high-frequency time embedding, space embedding vectors and instance embedding vectors according to the three-dimensional space coordinates and the time information coordinates of the Gaussian ellipsoids; The time-space high-frequency consistency constraint unit is used for generating time high-frequency consistency constraint and space high-frequency consistency constraint; The Gaussian ellipsoid optimization unit is used for optimizing the attribute of the Gaussian ellipsoid according to the time high-frequency consistency constraint and the space high-frequency consistency constraint; And the projection and rasterization rendering unit is used for rendering the optimized Gaussian ellipsoid to obtain an autopilot scene rendering image under the corresponding view angle.
8. The automated driving scenario reconstruction system according to claim 7, wherein the spatio-temporal high frequency consistency constraint unit comprises a temporal high frequency consistency constraint module and a spatial high frequency consistency constraint module connected in parallel; The time high-frequency consistency constraint module is used for modeling the displacement, rotation and scale change of the Gaussian ellipsoid in the time dimension by fusing the high-frequency time embedding vector, the space embedding vector and the instance embedding vector to generate time high-frequency consistency constraint; the spatial high-frequency consistency constraint module is used for generating spatial high-frequency consistency constraint to apply supervision to the high-frequency structure area by constructing a weighted loss function based on edge perception.
9. An automatic driving scene reconstruction device based on space-time consistency constraints, comprising a memory, a processor and a computer program stored on the memory, characterized in that the processor executes the computer program to implement the steps of the automatic driving scene reconstruction method based on space-time consistency constraints of any one of claims 1-6.
10. A computer readable storage medium containing a computer program, on which the computer program is stored, characterized in that the steps of the method for reconstructing an autopilot scene based on spatio-temporal consistency constraints according to any one of claims 1-6 are implemented when said computer program is executed by one or more processors.

Description

Automatic driving scene reconstruction method, system, device and storage medium based on space-time consistency constraint Technical Field The invention relates to the technical field of computer vision and automatic driving scene reconstruction, in particular to an automatic driving scene reconstruction method, an automatic driving scene reconstruction system, an automatic driving scene reconstruction device and a storage medium based on space-time consistency constraint. Background The three-dimensional scene reconstruction and the new view synthesis are core technologies of data generation and scene modeling in the automatic driving field, and aim to construct three-dimensional representation with geometric consistency and visual reality under the limited multi-view observation condition and support the generation of high-quality rendering images under any camera pose, thereby serving the downstream tasks of high-precision map maintenance, bird's eye view perception (BEV perception), overall scene understanding, three-dimensional target detection and the like of an automatic driving system. In recent years, the nerve radiation field (Neural RADIANCE FIELDS, NERF) and the derivative method thereof have better effects on the new view synthesis of static scenes, and can generate more realistic rendering images at different viewing angles. However, neRF-type methods generally have high rendering overhead, and are difficult to meet the requirement of automatic driving scenes on real-time or near real-time rendering, and particularly have high deployment cost in urban-level complex environments. In order to improve the rendering efficiency, three-dimensional Gaussian splashing (3D Gaussian Splatting,3DGS) is proposed as an explicit three-dimensional representation method, real-time rendering is realized by rasterizing and projecting anisotropic Gaussian, rendering cost is remarkably reduced, and application potential is shown in real-time new view synthesis and reconstruction tasks of an automatic driving scene. Although 3 DGS-based methods have progressed in terms of rendering speed and scene representation, the following key problems still exist in the prior art when oriented to real dynamic urban scenes, particularly burst motion scenes such as lane change, vehicle light flickering and the like: The problem of high-frequency time sequence consistency is that in a dynamic driving scene, the existing three-dimensional scene reconstruction and new view synthesis method generally lacks explicit modeling capability for high-frequency time sequence signals, and is difficult to stably describe the periodical flickering of a steering lamp, sudden starting or stopping of a brake lamp, and rapid change behaviors such as acceleration and deceleration of a vehicle or doubling, and the like, so that frame-by-frame errors are accumulated continuously in a time dimension. The result shows that the reconstruction result of the dynamic region between adjacent frames is inconsistent, and the problems of difficult continuous reproduction of the lamplight state, jitter or fracture of the motion track and the like occur, so that the overall time sequence continuity is destroyed, and the reconstruction result is difficult to meet the requirements of automatic driving simulation and evaluation application requiring time consistency. The problem of degradation of the space high-frequency structure is that the existing method generally depends on pixel-level loss to monitor a scene under a sparse view angle or a long-distance observation condition, and the effective constraint on the space high-frequency structure is lacking, so that fine-grained geometric structures such as lane lines, curb boundaries, far lane boundaries, building elevation textures and the like are difficult to accurately reserve. The result shows that the thin line structure in the rendered image is fuzzy, the edge contour is passivated and the texture detail is missing, so that the geometric expression capability and the structure identification degree of the scene are weakened, and the reliability and the practicability of the downstream automatic driving perception task are further affected. Disclosure of Invention The invention aims to provide an automatic driving scene reconstruction method, an automatic driving scene reconstruction system, an automatic driving scene reconstruction device and a storage medium based on space-time consistency constraint, so as to solve the technical problems of insufficient high-frequency time sequence consistency and space high-frequency structure degradation in the existing automatic driving three-dimensional scene reconstruction result. In a first aspect, the present invention provides an autopilot scene reconstruction method based on spatio-temporal consistency constraints, comprising the steps of: Acquiring dynamic city scene multisource data, establishing a plurality of Gaussian ellipsoids according to the dynamic cit