CN-122023688-A - Sparse multi-view image three-dimensional Gaussian reconstruction method and system combined with geographic information data

CN122023688ACN 122023688 ACN122023688 ACN 122023688ACN-122023688-A

Abstract

The invention discloses a three-dimensional Gaussian reconstruction method and a three-dimensional Gaussian reconstruction system for a sparse multi-view image combined with geographic information data, wherein the method comprises the steps of acquiring the sparse multi-view image and corresponding camera attitude parameters thereof, and acquiring DSM of a target area; the method comprises the steps of projecting three-dimensional points in DSM to all multi-view image planes, eliminating non-visible points, constructing an initial three-dimensional Gaussian model based on a reserved DSM three-dimensional point set, carrying out iterative optimization under a 3DGS frame based on the initial three-dimensional Gaussian model, and jointly minimizing loss terms in the iterative optimization process, namely luminosity loss based on an original sparse multi-view image, luminosity loss based on a derivative view, global-local depth consistency loss based on monocular depth priori, and identifying and eliminating Gaussian floating object elements by utilizing semantic feature difference metrics in the iterative optimization process. The invention improves detail missing and artifact caused by insufficient information of the sparse multi-view image, and improves the stability and model precision of Gaussian optimization.

Inventors

SONG JIAXUAN
KONG YAFEI
YANG JIAQI
GUO YUYANG
DONG YANG
FAN DAZHAO
JI SONG
GU LINYU
LIU XIN
WANG AOSHENG
LI MING
GAO DING

Assignees

中国人民解放军网络空间部队信息工程大学

Dates

Publication Date: 20260512
Application Date: 20251225

Claims (10)

1. A three-dimensional Gaussian reconstruction method for sparse multi-view images combined with geographic information data is characterized by comprising the following steps: Step1, acquiring a sparse multi-view image and corresponding camera attitude parameters thereof, and acquiring a digital surface model DSM of a target area; Step 2, projecting three-dimensional points in the DSM to all multi-view image planes, eliminating non-visible points, and constructing an initial three-dimensional Gaussian model based on a reserved DSM three-dimensional point set, wherein the position parameter of each Gaussian primitive is set based on the geographic coordinates of the corresponding DSM point; the method comprises the steps of carrying out iterative optimization under a three-dimensional Gaussian splatter 3DGS frame based on the initial three-dimensional Gaussian model, and jointly minimizing loss items in the iterative optimization process, wherein the loss items are loss of luminosity based on an original sparse multi-view image, loss of luminosity based on a derivative view, which is generated by re-projecting the original sparse multi-view image around a global rotation axis, and loss of global-local depth consistency based on monocular depth priori; and 4, in the iterative optimization process, recognizing and eliminating the Gaussian floater primitive by utilizing the semantic feature difference measurement, and outputting a final three-dimensional Gaussian model.
2. The method for three-dimensional gaussian reconstruction of sparse multi-view images in combination with geographic information data according to claim 1, wherein said constructing an initial three-dimensional gaussian model based on a set of retained DSM three-dimensional points comprises: Creating a gaussian primitive for each point in said set of reserved DSM three-dimensional points; Setting position parameters corresponding to Gaussian primitives according to three-dimensional geographic coordinates of the DSM points; An opacity initial value higher than a preset threshold value is set for each Gaussian primitive.
3. The method for three-dimensional gaussian reconstruction of sparse multi-view images in combination with geographic information data according to claim 1, wherein the method for generating derivative views comprises: Aggregating the rotation matrixes of all sparse multi-view image camera postures, and calculating a global rotation axis direction vector v: where N is the number of sparse multi-view images, A rotation matrix representing the pose of the i Zhang Xishu th multi-view video camera, Representation fetch U is an intermediate calculation vector; for each sparse multi-view image, taking the camera center as the rotation center, and respectively calculating a rotation matrix rotated around the axis by positive and negative fixed angles theta based on the global rotation axis direction vector v : Wherein I is a 3 x 3 identity matrix, Is that Is used for the normalization of the unit vector of (c), Representation of Is a cross matrix of (a); Using the calculated rotation matrix With the original camera translation vector Pose parameters that make up a derivative view ; And re-projecting the original sparse multi-view image according to the posture parameters of the derivative view to obtain a corresponding derivative view.
4. The method for three-dimensional gaussian reconstruction of sparse multi-view images in combination with geographic information data according to claim 1, wherein said step 3 comprises: Acquiring a priori depth map of the sparse multi-view image by using a pre-trained monocular depth estimation network; Synchronously rendering in 3DGS to obtain a rendering depth map; Calculating a global pearson correlation coefficient loss between the prior depth map and the rendering depth map; Dividing the prior depth map and the predicted depth map into a plurality of local image blocks respectively, calculating and averaging the pearson correlation coefficient losses among the corresponding local blocks to obtain local pearson correlation coefficient losses; The global pearson correlation coefficient loss and the local pearson correlation coefficient loss are weighted combined to form the global-local depth consistency loss.
5. The sparse multi-view image three-dimensional gaussian reconstruction method according to claim 1, wherein said global pearson correlation coefficient loss is calculated as follows: Wherein the method comprises the steps of Representing the global pearson correlation coefficient loss, Is a priori a depth map of the depth, Is a rendering of the depth map and, Is the pearson correlation coefficient and, The covariance is represented by the sign of the covariance, Representing standard deviation.
6. The sparse multi-view image three-dimensional gaussian reconstruction method according to claim 5, wherein said local pearson correlation coefficient loss is calculated as follows: Wherein the method comprises the steps of Representing the local pearson correlation coefficient loss, Is the start coordinate of the local block, Representing the width and height of the partial blocks, Is a set of blocks.
7. The method for three-dimensional gaussian reconstruction of sparse multi-view images in combination with geographic information data according to claim 1, wherein said identifying and rejecting gaussian floater primitives using semantic feature difference metrics comprises: firstly, eliminating abnormal Gaussian primitives by utilizing visibility constraint guided by a viewing cone, counting the number of effective viewing angles on which each Gaussian primitive can be projected, and eliminating Gaussian primitives with the number of the effective viewing angles being smaller than the preset number; and then, acquiring semantic feature vectors of the Gaussian primitives under different effective viewing angles by utilizing a pre-trained semantic feature extraction network, calculating cosine similarity between the corresponding semantic feature vectors of the same Gaussian primitive under different effective viewing angles, and eliminating the Gaussian primitives with average values of the cosine similarity lower than a preset threshold value.
8. The method for three-dimensional gaussian reconstruction of sparse multi-view images in combination with geographic information data according to claim 1 or 7, wherein said step 4 is performed when the iterative optimization process of the three-dimensional gaussian model proceeds to the mid-term stage.
9. A sparse multi-view image three-dimensional gaussian reconstruction system in combination with geographic information data, comprising: the data acquisition module is used for acquiring the sparse multi-view image and the corresponding camera attitude parameters thereof and acquiring a digital surface model DSM of the target area; The initialization module is used for projecting three-dimensional points in the DSM to all multi-view image planes, eliminating non-visible points, and constructing an initial three-dimensional Gaussian model based on a reserved DSM three-dimensional point set, wherein the position parameter of each Gaussian primitive is set based on the geographic coordinates of the corresponding DSM point; The iterative optimization module is used for carrying out iterative optimization under a three-dimensional Gaussian splatter 3DGS frame based on the initial three-dimensional Gaussian model, and jointly minimizing the loss items in the iterative optimization process, namely, the luminosity loss based on an original sparse multi-view image, the luminosity loss based on a derivative view, the derivative view and the global-local depth consistency loss based on a monocular depth priori, wherein the derivative view is generated by re-projecting the original sparse multi-view image around a global rotation axis; And the Gaussian pruning module is used for identifying and removing Gaussian floater primitives by utilizing semantic feature difference measurement in the iterative optimization process and outputting a final three-dimensional Gaussian model.
10. A computer readable storage medium having stored thereon a computer program, wherein the computer program when executed by a processor implements the sparse multi-view image three-dimensional gaussian reconstruction method in combination with geographical information data according to any one of claims 1 to 8.

Description

Sparse multi-view image three-dimensional Gaussian reconstruction method and system combined with geographic information data Technical Field The invention relates to the technical fields of photogrammetry and image data processing, in particular to a sparse multi-view image three-dimensional Gaussian reconstruction method and system combining geographic information data. Background The three-dimensional reconstruction technology shows increasingly wide application requirements in the fields of live-action three-dimensional construction, key target monitoring, striking and the like. Limited by imaging view angles, a large-scale three-dimensional reconstruction model based on satellite remote sensing images often lacks side information of ground objects, and a large number of overlapped sequence close-range images or unmanned aerial vehicle images are usually required to be subjected to refinement treatment. However, for a key target with a severe working environment or a strict protection of a government, only a few sparse multi-view images can be obtained, and the high-precision three-dimensional reconstruction application requirement of the key target is difficult to meet. Therefore, how to maintain the geometric consistency of the reconstructed model under the condition of sparse image input remains a difficulty of current research. Three-dimensional gaussian splats (3D Gaussian Splatting,3DGS) use gaussian volumes to represent three-dimensional models, each gaussian containing position, covariance, spherical harmonics, and opacity parameters, enabling 3DGS to accurately fit the details and variations of a scene at different perspectives, such highly flexible parametric representations being more suitable for three-dimensional reconstruction of sparse multi-view images. The traditional 3DGS framework relies on a motion restoration structure (Structure From Motion, SFM) pre-computed camera pose and point cloud to initialize the gaussian sphere and requires rich multi-perspective consistent features to infer scene information. However, the sparse multi-view image lacks homonymous matching primitives and multi-view geometric constraints, which results in insufficient reliability of the SFM point cloud and overfitting of the sparse multi-view image, and causes initialization errors, high-frequency detail loss and floating artifact phenomena. Meanwhile, due to the lack of uniform geographic references, the prior art recovers the relative geometric scene rather than the real geographic space structure. Disclosure of Invention Aiming at the problems of initialization errors, image overfitting and geographic reference deletion in three-dimensional Gaussian reconstruction of a sparse multi-view image of a 3DGS model, the invention provides a three-dimensional Gaussian reconstruction method and a three-dimensional Gaussian reconstruction system of the sparse multi-view image combined with geographic information data. In order to achieve the above purpose, the present invention adopts the following technical scheme: the invention provides a sparse multi-view image three-dimensional Gaussian reconstruction method combining geographic information data, which comprises the following steps: Step1, acquiring a sparse multi-view image and corresponding camera attitude parameters thereof, and acquiring a digital surface model DSM of a target area; Step 2, projecting three-dimensional points in the DSM to all multi-view image planes, eliminating non-visible points, and constructing an initial three-dimensional Gaussian model based on a reserved DSM three-dimensional point set, wherein the position parameter of each Gaussian primitive is set based on the geographic coordinates of the corresponding DSM point; the method comprises the steps of carrying out iterative optimization under a three-dimensional Gaussian splatter 3DGS frame based on the initial three-dimensional Gaussian model, and jointly minimizing loss items in the iterative optimization process, wherein the loss items are loss of luminosity based on an original sparse multi-view image, loss of luminosity based on a derivative view, which is generated by re-projecting the original sparse multi-view image around a global rotation axis, and loss of global-local depth consistency based on monocular depth priori; and 4, in the iterative optimization process, recognizing and eliminating the Gaussian floater primitive by utilizing the semantic feature difference measurement, and outputting a final three-dimensional Gaussian model. Further, the constructing an initial three-dimensional gaussian model based on the reserved DSM three-dimensional point set includes: Creating a gaussian primitive for each point in said set of reserved DSM three-dimensional points; Setting position parameters corresponding to Gaussian primitives according to three-dimensional geographic coordinates of the DSM points; An opacity initial value higher than a preset threshold value is set for eac