CN-120070757-B - Three-dimensional Gaussian city scene reconstruction method and system based on point cloud depth diffusion

CN120070757BCN 120070757 BCN120070757 BCN 120070757BCN-120070757-B

Abstract

The invention discloses a three-dimensional Gaussian city scene reconstruction method and system based on point cloud depth diffusion. The method comprises the steps of obtaining a data set with labels of a target scene for reconstructing a scene model, processing laser radar point cloud data in the data set by using a depth diffusion method to obtain dense point clouds and dense depth maps, selecting a proper number of dense point clouds as initialization positions of a three-dimensional Gaussian by using a voxel selection method, performing depth supervision by using the dense depth maps when training the three-dimensional Gaussian model, additionally adding a dense depth loss function and loss coefficients acting on the whole loss function, and training to obtain the trained three-dimensional Gaussian model of the explicit city scene. The invention fully considers the characteristics of the three-dimensional Gaussian city scene reconstruction task, adapts to the characteristics, provides a more proper initialization method and a depth supervision mode, and can obtain a better scene modeling effect.

Inventors

LI XI
Xiao Zekang
LU YEHAO
LI RUIXIANG

Assignees

浙江大学

Dates

Publication Date: 20260508
Application Date: 20250211

Claims (8)

1. The three-dimensional Gaussian city scene reconstruction method based on the point cloud depth diffusion is characterized by comprising the following steps of: s1, acquiring a training data set of a target scene for reconstructing a scene model, wherein the training data set is provided with a time-space continuous image sequence of the target scene, corresponding camera parameters, and laser radar point cloud and SfM point cloud data; S2, projecting sparse laser radar point clouds into an image space of each image in an image sequence, traversing the image space to take each pixel as a target pixel one by one, selecting the depth of a plurality of point clouds nearest to each target pixel, calculating weights according to the distance between each point cloud and each target pixel, weighting and summing the depths of the selected plurality of point clouds according to the calculated weights to obtain depth values of the target pixels, traversing all pixels in the image space to obtain a dense depth map corresponding to the image, and projecting the dense depth map to a world coordinate system through camera pose parameters to obtain the depth point clouds; s3, uniformly sampling the depth point cloud by using a voxel selection method, forming a dense point cloud by the sampling point cloud, the SfM point cloud and the laser radar point cloud in the data set, and adding the dense point cloud into the training data set to serve as an initialization position of a three-dimensional Gaussian so as to balance modeling speed and modeling quality; S4, training the three-dimensional Gaussian model by using the training data set, additionally adding a dense depth loss function to a depth rendering result in the training process, multiplying a weighted sum of image reconstruction loss, SSIM loss and dense depth loss by a loss coefficient to obtain a final loss, inputting a dense depth map, performing supervision optimization on the three-dimensional Gaussian model based on the final loss, and improving modeling quality of the three-dimensional Gaussian model on a scene; s5, visualizing the trained three-dimensional Gaussian model to obtain an explicit modeling result; in the step S2, the specific steps of obtaining the dense depth map and the depth point cloud for each image in the image sequence are as follows: S21, will contain Sparse laser radar point cloud of individual points Obtaining a set of points from a projection of a world coordinate system to an image coordinate system Wherein Representing the pixel coordinates in the image coordinate system obtained by the projection of the ith point, Representing the depth corresponding to the i-th point after projection; s22, aiming at each pixel on the current image in the image sequence From the point set by top-k algorithm Is nearest to the pixel in the selected image coordinate system K lidar depth points of (2) At the same time, the depth point of the laser radar is required to be ensured for any one selected And any unselected lidar depth point The method comprises the following steps: ; Wherein the method comprises the steps of The distance of the euro type is expressed, , , Representation of For the following Is a complement of (a); S23, for distance pixels Each laser radar depth point of different distances The corresponding weights are calculated using the following equation : ; ; Wherein the method comprises the steps of Is a constant; For all of Weights of (2) Normalization is carried out to obtain each Final weight For k laser radar depth points Depth of (2) Weighted summation as dense depth map Each pixel of (3) Is of the diffusion depth of (a) : ; S24, dense depth map By camera internal reference matrix And camera external parameters Projecting back to the world coordinate system to obtain a depth point cloud corresponding to the current image ; The final loss calculation in step S4 is as follows: s41, passing the three-dimensional Gaussian model in each round of training iteration through camera external parameters Projecting to a camera coordinate system to obtain the rendering depth of the current iteration round Where H represents the pixel height of the image and W represents the pixel width of the image; S42, calculating dense depth loss of the image by the following formula: ; Wherein the method comprises the steps of Is the image pixel coordinates of the image, Represents an L1 norm; s43, recording dense depth loss of each image in the training data set to obtain a dense depth loss vector Wherein Representing the number of images in the training set, losing vectors for dense depth Normalized to obtain loss coefficient : ; Wherein the method comprises the steps of And Is a function of two control constants, Representing Sigmoid normalization operations; s44, calculating image reconstruction loss for ith image in training data set SSIM loss Then with dense depth loss in step S42 Together form an overall loss while the loss coefficient is calculated The i-th element of (a) Acting on the overall loss to obtain the final loss of the ith image in the training data set: ; Wherein the method comprises the steps of And Is an adjustable weight super parameter.
2. The three-dimensional gaussian urban scene reconstruction method based on point cloud depth diffusion according to claim 1, wherein said training dataset in step S1 comprises a plurality of temporally and spatially continuous image samples collected from a target scene, each image sample comprising a corresponding camera internal reference matrix External parameters of camera SfM point cloud And lidar point clouds Wherein Is a rotation matrix of the rotation, Is a translation vector.
3. The three-dimensional gaussian urban scene reconstruction method based on point cloud depth diffusion according to claim 1, wherein when the depth point cloud is uniformly sampled by using a voxel selection method in the step S3, the three-dimensional space of the depth point cloud is divided into voxel grids according to preset voxel sizes, and 1 depth point at the center of each voxel is used for replacing all depth point clouds in the voxels, so that a uniformly sampled sampling point cloud is obtained.
4. The three-dimensional Gaussian urban scene reconstruction method based on the depth diffusion of the point cloud according to claim 1, wherein the depth point cloud is controlled by adjusting the voxel size when the depth point cloud is uniformly sampled by using a voxel selection method The concentration of the point cloud samples.
5. The three-dimensional gaussian urban scene reconstruction method based on point cloud depth diffusion according to claim 1, wherein the weight value in step S43 =2 Sum Weight value in step S44 =0.2 Sum =0.01。
6. The three-dimensional gaussian scene reconstruction method based on point cloud depth diffusion according to claim 1, wherein in S5, a trained three-dimensional gaussian model is visualized by using a rasterization technique.
7. The three-dimensional Gaussian city scene reconstruction system based on the point cloud depth diffusion is characterized by comprising the following modules: The system comprises a data acquisition module, a target scene reconstruction module and a target scene reconstruction module, wherein the data acquisition module is used for acquiring a training data set of a target scene for reconstructing a scene model, and the training data set comprises a time-space continuous image sequence of the target scene, corresponding camera parameters, and laser radar point cloud and SfM point cloud data; The depth point cloud generation module is used for projecting sparse laser radar point clouds to an image space of each image in an image sequence, traversing the image space to take each pixel as a target pixel one by one, selecting depths of a plurality of point clouds nearest to each target pixel, calculating weights according to the distance between each point cloud and the target pixel, weighting and summing the depths of the selected plurality of point clouds according to the calculated weights to obtain depth values of the target pixels, traversing all pixels in the image space to obtain a dense depth map corresponding to the image, and projecting the dense depth map to a world coordinate system through camera pose parameters to obtain the depth point clouds; the initialization position generation module is used for uniformly sampling the depth point cloud by using a voxel selection method, forming a dense point cloud by the sampling point cloud, the SfM point cloud and the laser radar point cloud in the data set together, and adding the dense point cloud into the training data set to serve as a three-dimensional Gaussian initialization position so as to balance modeling speed and modeling quality; The model training module is used for training the three-dimensional Gaussian model by using the training data set, additionally adding a dense depth loss function on a depth rendering result in the training process, taking the weighted sum of the image reconstruction loss, the SSIM loss and the dense depth loss as a final loss after multiplying a loss coefficient, inputting a dense depth map, performing supervision optimization on the three-dimensional Gaussian model based on the final loss, and improving the modeling quality of the three-dimensional Gaussian model on a scene; And the visualization module is used for visualizing the trained three-dimensional Gaussian model to obtain an explicit modeling result.
8. A computer electronic device comprising a memory and a processor; The memory is used for storing a computer program; the processor is configured to implement the three-dimensional gaussian urban scene reconstruction method based on point cloud depth diffusion according to any one of claims 1 to 6 when executing the computer program.

Description

Three-dimensional Gaussian city scene reconstruction method and system based on point cloud depth diffusion Technical Field The invention relates to the field of three-dimensional reconstruction, in particular to a three-dimensional Gaussian scene reconstruction method based on point cloud depth diffusion. Background Recent advances in end-to-end autopilot highlight the importance of closed loop assessment. However, the existing simulators inevitably have field differences from the real world, which emphasizes the need for real world closed loop simulators and promotes the development of high quality urban scene modeling methods. A neural radiation field (NeRF) based method and a three-dimensional gaussian splatter (3 DGS) based method are leading techniques in this field that can provide realistic rendering effects for new angles of view. In contrast, 3 DGS-based methods are currently being widely used to boost rendering speed to real-time levels through explicit modeling and efficient differentiable splattering. Laser radar (LiDAR) point clouds, by virtue of their accurate depth priors, are widely used in 3 DGS-based autopilot methods, mainly serving the two purposes of gaussian initialization, 3DGS being heavily dependent on the initialized point clouds, being denser than the structural (SfM) points extracted from motion, contributing to a reduction in the non-textured area, thus facilitating the densification process, depth supervision, depth being critical for 3DGS scene modeling, since it directly determines the three-dimensional spatial position of the gaussian, the LiDAR depth providing effective constraints during training, reducing artifacts and improving rendering quality. However, the lidar point cloud is still too sparse, lacking sufficient surface profile and geometry during initialization. At the same time, the statistics show that on average only 0.68% of the image pixels received depth supervision. Especially in image distorted scenes, such as exposure or low light conditions, relying solely on color and sparse depth supervision can lead to severe performance degradation. Disclosure of Invention The invention aims to overcome the defects of the prior art and provides a three-dimensional Gaussian urban scene reconstruction method based on point cloud depth diffusion. The aim of the invention can be achieved by the following technical scheme: In a first aspect, the invention provides a three-dimensional Gaussian urban scene reconstruction method based on point cloud depth diffusion, which comprises the following steps: s1, acquiring a training data set of a target scene for reconstructing a scene model, wherein the training data set is provided with a time-space continuous image sequence of the target scene, corresponding camera parameters, and laser radar point cloud and SfM point cloud data; S2, projecting sparse laser radar point clouds into an image space of each image in an image sequence, traversing the image space to take each pixel as a target pixel one by one, selecting the depth of a plurality of point clouds nearest to each target pixel, calculating weights according to the distance between each point cloud and each target pixel, weighting and summing the depths of the selected plurality of point clouds according to the calculated weights to obtain depth values of the target pixels, traversing all pixels in the image space to obtain a dense depth map corresponding to the image, and projecting the dense depth map to a world coordinate system through camera pose parameters to obtain the depth point clouds; s3, uniformly sampling the depth point cloud by using a voxel selection method, forming a dense point cloud by the sampling point cloud, the SfM point cloud and the laser radar point cloud in the data set, and adding the dense point cloud into the training data set to serve as an initialization position of a three-dimensional Gaussian so as to balance modeling speed and modeling quality; S4, training the three-dimensional Gaussian model by using the training data set, additionally adding a dense depth loss function to a depth rendering result in the training process, multiplying a weighted sum of image reconstruction loss, SSIM loss and dense depth loss by a loss coefficient to obtain a final loss, inputting a dense depth map, performing supervision optimization on the three-dimensional Gaussian model based on the final loss, and improving modeling quality of the three-dimensional Gaussian model on a scene; and S5, visualizing the trained three-dimensional Gaussian model to obtain an explicit modeling result. Preferably, the training data set in the step S1 includes a plurality of image samples acquired from the target scene in time and space, and each image sample includes a corresponding camera reference matrixCamera external parametersSfM point cloud P SfM and lidar point cloud P LiDAR, whereIs a rotation matrix of the rotation,Is a translation vector. Preferably, the specific s