CN-121392157-B - Multimode collaborative two-dimensional Gaussian splatter three-dimensional reconstruction method

CN121392157BCN 121392157 BCN121392157 BCN 121392157BCN-121392157-B

Abstract

The invention discloses a multi-model collaborative two-dimensional Gaussian splatter three-dimensional reconstruction method which comprises the steps of obtaining multi-view RGB images, outputting camera parameters and sparse point clouds through a VGGT model, generating a depth map and a mask map through a monocular depth estimation network and an image segmentation network, performing densification processing through a visual shell algorithm, initializing two-dimensional Gaussian primitives through the dense point clouds, training the initialized two-dimensional Gaussian primitives, re-rendering the depth maps of all view angles, generating three-dimensional grids through a truncated symbol distance function and an isosurface extraction algorithm, performing grid optimization based on local bilateral filtering on the three-dimensional grids to obtain smooth grids, performing scale correction and using the smooth grids to obtain a three-dimensional model with real scale. The method generates an omnibearing three-dimensional model with high-fidelity geometry, real scale and new view angle rendering capability through a multi-model collaborative initialization and optimization strategy.

Inventors

LI QIAOFENG
ZHANG YI
YANG CHENGYUN
LAI ZIHAO

Assignees

浙江大学

Dates

Publication Date: 20260508
Application Date: 20251224

Claims (8)

1. The multi-model collaborative two-dimensional Gaussian splatter three-dimensional reconstruction method is characterized by comprising the following steps of: S1, acquiring multi-view RGB images, outputting camera parameters and sparse point clouds corresponding to each image by utilizing VGGT models, and respectively generating a depth map and a mask map corresponding to each image by a monocular depth estimation network and an image segmentation network; S2, based on the camera parameters and the mask map, performing densification processing on the sparse point cloud by using a visual shell algorithm to obtain dense point cloud, wherein the method comprises the following steps: s21, dividing the three-dimensional space into low-resolution voxel grids, wherein each voxel unit corresponds to a fixed volume in the three-dimensional space; S22, for each pixel point (u, v) in each mask map, performing back projection operation based on a pinhole camera model to obtain imaging light rays; s23, marking the voxels through which the light passes as possible occupation and updating the voxel confidence coefficient for the pixels with the mask value of 1, marking the voxels through which the light passes as unoccupied and attenuating the confidence coefficient for the pixels with the mask value of 0; s24, determining occupied voxels according to a confidence coefficient threshold tau to form a coarse visual shell; s25, dividing a high-resolution voxel grid in a bounding box of the coarse visual shell, and giving color information based on the RGB image to obtain a fine resolution visual shell; s26, fusing the point cloud of the fine resolution visual shell with the sparse point cloud to obtain a dense point cloud; s3, initializing two-dimensional Gaussian primitives by using the dense point cloud, wherein each two-dimensional Gaussian primitive comprises a center point position, a tangent vector, a scaling vector, opacity and spherical harmonic coefficients; S4, training the initialized two-dimensional Gaussian primitives based on the camera parameters, the depth map and the mask map, generating a rendering image through differential rendering, and optimizing the parameters of the two-dimensional Gaussian primitives by combining a related loss function; s5, after training is completed, re-rendering depth maps of all visual angles by using the camera parameters, and generating a three-dimensional grid by a truncated symbol distance function and an isosurface extraction algorithm, wherein the method comprises the following steps: S51, dividing a voxel grid according to the bounding box, wherein the voxel size is delta; s52, for each voxel center Initializing a TSDF value F (v) ≡1; s53, for each camera view angle, rendering a depth map by the trained two-dimensional Gaussian primitive, and carrying out voxel matching according to the camera internal parameters K and external parameters (R, t) Projecting to an image plane to obtain pixel coordinates (u, v); S54, extracting depth values from the depth map , A depth map representing the i-th view, Calculating symbol distance Wherein Is the z component of the proxel p; S55, cutting and normalizing Wherein δ is the truncated bandwidth; S56, updating TSDF value and finally extracting iso-surface Generating a triangle mesh; S6, grid optimization based on local bilateral filtering is carried out on the three-dimensional grid, and a smooth grid is obtained; And S7, carrying out scale correction on the camera parameters based on the physical distance between cameras, and applying the corrected scale parameters to the smooth grid to obtain a three-dimensional model with a real scale.
2. The multi-model collaborative two-dimensional gaussian splatter three-dimensional reconstruction method according to claim 1, wherein in S1, VGGT models are visual ground converters, and the output comprises a camera external parameter pi, a camera internal parameter K and a sparse point cloud; the monocular depth estimation network is DEPTHANYTHINGV < 2 >, and a depth map set d is output; The image segmentation network is SEGMENTANYTHINGV2, outputting a set of mask maps m.
3. The method for multi-model collaborative two-dimensional gaussian splatter three-dimensional reconstruction according to claim 1, wherein in S22, the back-projection operation comprises: for pixel point (u, v), the direction vector d of the imaging ray is calculated as: wherein R and k respectively represent a camera external reference rotation matrix and an internal reference matrix of the current visual angle; The starting point o is calculated as: Wherein t represents a camera extrinsic translation vector of a current view angle; The ray parameterization is as follows: where s represents a ray length parameter.
4. The multi-model collaborative two-dimensional gaussian splatter three-dimensional reconstruction method according to claim 1, wherein in S23, confidence is The update rule of (2) is: marking voxels with rays at the ith view angle In order to take up the space in the space, When the mark is not occupied, Wherein, the method comprises the steps of, Representing the weight of the i-th view, and β represents the penalty factor.
5. The method for multimode collaborative two-dimensional gaussian splatter three-dimensional reconstruction according to claim 1, wherein in S3, initializing two-dimensional gaussian primitives comprises: The central point position p is set as the three-dimensional coordinates of dense point cloud, and the vector is scaled Set as the distance between adjacent point clouds, wherein And Respectively represent the scaled values in two principal directions, tangent vector And Set as a unit vector, opacity alpha set as a constant, and spherical harmonic coefficients to characterize the apparent color.
6. The multi-model collaborative two-dimensional gaussian splatter three-dimensional reconstruction method according to claim 1, wherein in S4, the differentiable rendering comprises: rendering an RGB map: Rendering a depth map: Rendering a cumulative opacity map: Wherein, the 、、 Respectively representing a rendering color value at pixel x, a rendering depth value, and a cumulative opacity value, Representing the color value of the ith two-dimensional gaussian cell, Representing the depth value of the ith two-dimensional gaussian cell, 、 Respectively representing the opacity of the ith and jth two-dimensional gaussian primitive, 、 The gaussian values at pixel x of the ith and jth two-dimensional gaussian primitive are represented, respectively.
7. The multi-model collaborative two-dimensional gaussian splatter three-dimensional reconstruction method according to claim 1, wherein in S6, the local bilateral filtering-based grid optimization comprises: For mesh vertices Smoothed vertex position Expressed as: Wherein, the Representing the location of the vertices of the neighborhood, Representing a set of neighborhood vertices, And Respectively represent vertexes And Normal vector of (2); Representing a spatial weight term; Representing normal weight terms.
8. The method for multimode cooperative two-dimensional gaussian splatter three-dimensional reconstruction according to claim 1, wherein in S7, the scale correction is based on inter-camera physical distances measured by a TOF sensor Calculation scale system Wherein And Respectively representing translation vectors of the cameras; Applying the corrected scale parameters to the smooth grid, and updating the vertex positions of the grid P' represents the vertex position of the resulting smoothed mesh.

Description

Multimode collaborative two-dimensional Gaussian splatter three-dimensional reconstruction method Technical Field The invention relates to the field of three-dimensional vision and digitization, in particular to a multi-model collaborative two-dimensional Gaussian splatter three-dimensional reconstruction method. Background In the fields of three-dimensional vision and digitalization, the high-precision three-dimensional model of an object is efficiently and automatically reconstructed from multi-view RGB images, which is the core requirement of industrial automatic detection, product digitalization archiving, virtual display and other applications, In the system initialization and camera parameter estimation link, the traditional method severely depends on a motion restoration structure (SfM) process, the process calculates the pose and sparse point cloud of a camera through feature matching, however, under the common acquisition condition that the input image is sparse (for example, less than 10 images), the feature matching is extremely easy to fail, so that the whole three-dimensional reconstruction process cannot be started, in addition, a high-precision active scanning scheme can provide reliable initialization, but expensive digital projectors are required to be provided, complex multi-device combined calibration is required, and the system cost and the operation threshold are increased. Secondly, in terms of the reconstruction quality and the training process of the three-dimensional model, even if the initialization is successful, the reconstruction method starting from the sparse point cloud is difficult to recover the complete and dense geometric structure of the object, especially in the weak texture area with critical reconstruction quality, geometric deletion or distortion phenomenon is common, meanwhile, if the training process lacks effective supervision signals, the model is easily interfered by complex background, the model is unstable in convergence or detail is lost, the final output of many traditional processes is only a single geometric grid, and a vivid new view angle rendering image cannot be generated. In addition, in post-processing and functionality of the reconstruction result, the three-dimensional grid surface obtained by direct reconstruction or conversion often contains noise and irregular burrs, and the subsequent optimization is needed to be carried out by depending on an additional and parameter-sensitive filtering algorithm, so that the complexity of a flow is increased, and more importantly, the reconstruction result based on monocular or SfM generally has the problem of dimension blurring, namely, the generated model is a normalized model which cannot determine the actual physical dimension of the model, so that the model cannot be directly applied to industrial detection and measurement tasks with strict requirements on dimension precision. Therefore, how to design a multi-model collaborative two-dimensional gaussian splatter three-dimensional reconstruction method, and to generate a three-dimensional model with both real physical dimensions, smooth grid quality and new view rendering capability is a problem that needs to be solved by those skilled in the art. Disclosure of Invention In view of the above, the invention provides a multi-model collaborative two-dimensional Gaussian splatter three-dimensional reconstruction method, which aims to overcome the technical defects of difficult initialization, poor geometric quality of reconstruction and single result function of the existing three-dimensional reconstruction technology under a sparse visual angle, integrates a complete scheme of automatic calibration, high-quality initialization, multi-supervision signal training and post-processing optimization, so as to automatically generate a three-dimensional model for accurate measurement and high-quality visualization by using only common RGB image input. In order to achieve the above purpose, the present invention adopts the following technical scheme: A multimode collaborative two-dimensional Gaussian splatter three-dimensional reconstruction method comprises the following steps: S1, acquiring multi-view RGB images, outputting camera parameters and sparse point clouds corresponding to each image by utilizing VGGT models, and respectively generating a depth map and a mask map corresponding to each image by a monocular depth estimation network and an image segmentation network; S2, based on the camera parameters and the mask map, performing densification processing on the sparse point cloud by using a visual shell algorithm to obtain dense point cloud; s3, initializing two-dimensional Gaussian primitives by using the dense point cloud, wherein each two-dimensional Gaussian primitive comprises a center point position, a tangent vector, a scaling vector, opacity and spherical harmonic coefficients; S4, training the initialized two-dimensional Gaussian primitives based on