WO-2026095342-A1 - THREE-DIMENSIONAL MODELING SYSTEM AND METHOD USING ONLINE MULTI-VIEW STEREO

WO2026095342A1WO 2026095342 A1WO2026095342 A1WO 2026095342A1WO-2026095342-A1

Abstract

The present invention relates to a three-dimensional modeling technique, and more specifically to a three-dimensional modeling system and method using online multi-view stereo, in which a high-precision depth map is estimated at a high resolution, and accurate mapping utilizing strong geometric priors is implemented so as to reconstruct and render a high-quality real-time three-dimensional model. According to one embodiment of the present invention, a high-precision depth map and confidence map are generated through multi-view stereo and a depth fusion network, thereby enhancing the accuracy and quality of a 3D model.

Inventors

SONG, SOO HWAN
LEE, BYEONG GWON
PARK, JUN KYU

Assignees

동국대학교 산학협력단

Dates

Publication Date: 20260507
Application Date: 20250917
Priority Date: 20241029

Claims (7)

In a 3D modeling system using online multi-view stereo, A first execution module for estimating depth maps and confidence maps; and It includes a second execution module that performs the creation of a three-dimensional model, and The above-mentioned first execution module is A camera estimation unit that tracks the camera pose of an input keyframe; A depth estimation unit that performs multi-view stereo (MVS) based depth estimation; and A 3D modeling system using online multi-view stereo, comprising a filtering unit that refines the depth value and filters out noise.
In paragraph 1, The above first execution module and the above second execution module are A 3D modeling system using online multi-view stereo that performs real-time 3D rendering by being configured in parallel with independent threads for the frontend and backend, respectively.
In paragraph 1, The above second execution module is A 3D modeling system using online multi-view stereo that sequentially integrates the depth map and confidence map to progressively generate a 3D Gaussian splatting model.
In paragraph 1, The above second execution module is A 3D modeling system using online multi-view stereo that performs the task of iteratively optimizing Gaussian parameters in parallel with the first execution module.
In a 3D modeling method using online multi-view stereo performed by a 3D modeling system using online multi-view stereo, In the front end Step of estimating the camera pose of an input keyframe; A step of calculating a depth map and a confidence map based on the above keyframes; The step of storing the depth map and confidence map in a keyframe buffer; and The above includes the step of refining the depth map and filtering out noise, In the backend A step of performing adaptive density control to dynamically adjust Gaussian point density; A step of generating a 3D model by sequentially integrating the depth map and confidence map; and A 3D modeling method using online multi-view stereo, comprising the step of optimizing the 3D model by iteratively optimizing Gaussian parameters.
In paragraph 5 The above frontend and backend are A 3D modeling method using online multi-view stereo configured in parallel with independent threads to perform real-time 3D rendering.
A computer program recorded on a computer-readable recording medium for executing a three-dimensional modeling method using online multi-view stereo according to paragraph 5.

Description

3D modeling system and method using online multi-view stereo The present invention relates to 3D modeling technology, and more specifically, to a 3D modeling system and method using online multi-view stereo that estimates a high-precision depth map at high resolution, realizes accurate mapping using powerful geometric prior information, and reconstructs and renders a high-quality real-time 3D model. Precise 3D models are essential in various fields, including digital twins, augmented and virtual reality, industrial design, architectural visualization, and robotic airports. Generating these 3D models requires 3D reconstruction technology, and one of the most widely used methods is Multi-View Stereo (MVS). MVS generates highly precise 3D models by identifying detailed correspondences between images captured from multiple viewpoints. Recently, MVS has been combined with Neural Rendering techniques and is also being utilized in Novel View Synthesis. Neural Rendering can realistically reproduce complex scenes through deep learning technology, and in particular, the 3D Gaussian Splatting (3DGS) technique enables real-time and high-quality rendering. 3DGS utilizes a Gaussian-based particle system to provide excellent detail and real-time rendering performance, even in complex scenes. However, existing multi-view stereo algorithms are designed to operate on a linear basis and typically require a significant amount of time to process 3D models in batches. This is a major factor limiting the use of multi-view stereo in fields requiring real-time processing, such as robotics or real-time graphics applications. For example, autonomous vehicles or real-time AR/VR systems require capturing scenes in real time and generating 3D models immediately, but existing multi-view stereo methods are not suitable for meeting these requirements. To address this problem, the Dense SLAM (Simultaneous Localization and Mapping) method is used. Dense SLAM aims for 3D reconstruction in online environments and estimates depth maps by applying multi-view stereo to a continuous sequence of images within a local time window. This allows for the simultaneous estimation of camera positions in real time and the generation of 3D models of the surrounding environment. Recent research has adopted the map representation method of Dense SLAM for techniques such as Neural Rediance Fields (NeRF) or 3DGS, enabling more effective real-time 3D modeling, rendering, and viewpoint synthesis. However, existing methods focus primarily on estimating coarse 3D scenes for faster computation, which limits their ability to achieve precise 3D reconstruction. To ensure real-time computational performance, most methods utilize downsampled images or lightweight networks, resulting in a significant degradation of the quality of the generated 3D models. Furthermore, depth information estimated based on images can be inaccurate due to factors such as motion blur, occlusion, and areas lacking texture; this leads to reduced reliability of the depth data and causes noisy reconstructions. Therefore, a new approach is required to estimate high-resolution yet accurate depth maps and realize high-quality mapping by utilizing robust geometric prior information. FIGS. 1 to 3 are drawings illustrating a three-dimensional modeling system using online multi-view stereo according to an embodiment of the present invention. FIG. 4 is a diagram illustrating a 3D modeling method using online multi-view stereo according to an embodiment of the present invention. FIGS. 5 to 7 are examples of experiments performed by a 3D modeling system using online multi-view stereo according to an embodiment of the present invention. FIG. 8 is a diagram illustrating a computing device implementing a three-dimensional modeling system using online multi-view stereo according to an embodiment of the present invention. The present invention is susceptible to various modifications and may have various embodiments. Specific embodiments are illustrated in the drawings and described in detail in the detailed description. However, this is not intended to limit the present invention to specific embodiments, and it should be understood that it includes all modifications, equivalents, and substitutions that fall within the spirit and scope of the invention. In describing the present invention, detailed descriptions of related prior art are omitted if it is determined that such detailed descriptions would unnecessarily obscure the essence of the invention. Furthermore, singular expressions used in this specification and claims should generally be interpreted as meaning "one or more" unless otherwise stated. In this specification, the term "module" includes a unit composed of hardware, software, or firmware, and may be used interchangeably with terms such as logic, logic block, component, or circuit. A module may be a component formed integrally, or a minimum unit or part thereof that performs one or more functions. For