KR-20260064523-A - METHOD AND APPARATUS FOR GENERATING VIRTUAL VIEW IMAGE FROM MULTI-VIEW IMAGES

KR20260064523AKR 20260064523 AKR20260064523 AKR 20260064523AKR-20260064523-A

Abstract

A method for generating a virtual viewpoint image from a multi-view image according to the present disclosure comprises: a step of estimating depth information of a 3D Gaussian for a target viewpoint from an input multi-view image; a step of estimating one or more parameters of the 3D Gaussian based on the depth information; and a step of generating a virtual viewpoint image for a target viewpoint based on the estimated depth information and the estimated one or more parameters, wherein a 3D model for estimating the one or more parameters of the 3D Gaussian may be pre-trained based on a ground truth normal map.

Inventors

임한신
김현철

Assignees

한국전자통신연구원

Dates

Publication Date: 20260507
Application Date: 20251013
Priority Date: 20241030

Claims (20)

In a method for generating a virtual viewpoint image from a multi-viewpoint image, the method comprises: A step of estimating depth information of the 3D Gaussian for a target viewpoint from an input multi-view image; A step of estimating one or more parameters of a 3D Gaussian based on the depth information; and The method includes the step of generating a virtual viewpoint image for a target viewpoint based on the estimated depth information and the estimated one or more parameters, wherein A method in which pre-training is performed based on a ground truth normal map for a 3D model for estimating one or more parameters of the above 3D Gaussian.
In paragraph 1, The above solution normal map is generated based on the solution depth map.
In paragraph 2, The above-mentioned correct normal map is generated based on the rate of change of the depth value of the above-mentioned correct depth map, wherein The above rate of change is a method calculated based on depth values for each pixel of the correct depth map and camera parameters.
In paragraph 1, The above pre-training converts the normal values of the ground truth normal map and one or more parameters of the estimated 3D Gaussian into the same unit, and A method performed by comparing the similarity between the normal value of the above-mentioned correct normal map of the same unit and one or more parameters of the above-mentioned 3D Gaussian.
In paragraph 4, The above transformation estimates a rotation value from a quaternion value among one or more parameters of the estimated 3D Gaussian, and A method performed by using the estimated rotation value above as a normal value.
In paragraph 4, The above transformation estimates the normal value of the estimated 3D Gaussian, and A method performed by calculating a quaternion value from the estimated normal value.
In paragraph 4, A method in which the above transformation is performed by calculating a quaternion value from the normal value of the above correct normal map.
In paragraph 4, The above similarity is calculated based on at least one of the L1-norm, L2-norm, or entropy loss method.
In paragraph 4, A method in which the above pre-learning is performed by comparing the similarity between the normal value of the above-mentioned ground truth normal map and the normal value estimated from one or more parameters of the above-mentioned 3D Gaussian.
In paragraph 4, A method in which the above pre-learning is performed by comparing the similarity between a rotation value calculated from the normal value of the above-mentioned correct normal map and a rotation value estimated from one or more parameters of the above-mentioned 3D Gaussian.
In a device for generating a virtual viewpoint image from a multi-viewpoint image, the device comprises: One or more transmitters/receivers; One or more memories; and Includes one or more processors, The above one or more processors are: Estimating 3D Gaussian depth information for a target viewpoint from an input multi-view image; Estimating one or more parameters of the 3D Gaussian based on the depth information; and It is configured to generate a virtual viewpoint image for a target viewpoint based on the estimated depth information and one or more estimated parameters, An apparatus for performing pre-training based on a ground truth normal map for a 3D model for predicting one or more parameters of the above 3D Gaussian.
In Paragraph 11, The above solution normal map is a device that generates a solution depth map.
In Paragraph 12, The above-mentioned correct normal map is generated based on the rate of change of the depth value of the above-mentioned correct depth map, wherein The above rate of change is calculated based on depth values for each pixel of the correct depth map and camera parameters, in a device.
In Paragraph 11, The above pre-training converts the normal values of the ground truth normal map and one or more parameters of the estimated 3D Gaussian into the same unit, and A device that performs a comparison of the similarity between the normal value of the above correct normal map of the same unit and one or more parameters of the above estimated 3D Gaussian.
In Paragraph 14, The above transformation estimates a rotation value from a quaternion value among one or more parameters of the estimated 3D Gaussian, and A device that is performed by using the estimated rotation value above as a normal value.
In Paragraph 14, The above transformation estimates the normal value of the estimated 3D Gaussian, and A device performed by calculating a quaternion value from the estimated normal value.
In Paragraph 14, A device in which the above transformation is performed by calculating a quaternion value from the normal value of the above correct normal map.
In Paragraph 14, The above similarity is calculated based on at least one method of L1-norm, L2-norm, or entropy loss, in a device.
In Paragraph 14, The above pre-learning is performed by comparing the similarity between the normal value of the above-mentioned normal map and the normal value estimated from one or more parameters of the estimated 3D Gaussian, or the similarity between the rotation value calculated from the normal value of the above-mentioned normal map and the rotation value estimated from one or more parameters of the estimated 3D Gaussian.
One or more non-transitory computer-readable media storing one or more instructions, wherein the one or more instructions are configured to perform an operation by controlling a device that generates a virtual viewpoint image from a multi-viewpoint image when executed by one or more processors, and The above operation is: A step of estimating depth information of the 3D Gaussian for a target viewpoint from an input multi-view image; A step of estimating one or more parameters of a 3D Gaussian based on the depth information; and The method includes the step of generating a virtual viewpoint image for a target viewpoint based on the estimated depth information and the estimated one or more parameters, wherein A computer-readable medium in which pre-training is performed based on a ground truth normal map for a 3D model for estimating one or more parameters of the above 3D Gaussian.

Description

Method and apparatus for generating a virtual view image from multiple view images The present invention relates to a method for generating a virtual viewpoint image from a multi-view image and an apparatus for performing the same. More specifically, the invention relates to a method for generating a virtual viewpoint image from a multi-view image by performing prior learning based on ground truth normal information and an apparatus for performing the same. Due to the difficulty and complexity of the process of creating accurate 3D models from images, there has long been interest in technologies that generate virtual viewpoint images without generating 3D models from multi-viewpoint images. In particular, interest in technologies for generating virtual viewpoint images from multi-view images has increased significantly recently as technologies for generating high-quality virtual viewpoint images from multi-view images, such as NeRF (Neural Radiance Fields) and 3D Gaussian Splatting (3DGS), have been developed. Among the techniques for generating virtual viewpoint images from multi-view images, 3DGS can explicitly represent a scene using 3D Gaussians as primitives. Here, each 3D Gaussian can generally have at least one parameter among position, rotation, scale, color, or opacity. However, while 3DGS can perform virtual viewpoint image generation at high speed through efficient parallel processing of 3D Gaussians, it has the problem that it is not suitable for high-speed processing, such as real-time generation from input images, because it requires optimizing the parameters of the 3D Gaussian for each scene. Various methods have been proposed that do not perform scene-specific optimization, making them suitable for real-time generation, while applying 3D Gaussians which have strengths in scene representation for virtual viewpoint image generation. As one of the methods that do not perform optimization, a depth information-based 3D Gaussian estimation method has been proposed. Through the depth information-based 3D Gaussian estimation method, depth information can be estimated from an image even in a limited environment and range, and based on this, the parameters of the remaining 3D Gaussian can be estimated directly without optimization. The above method may include, for example, a GPS-Gaussian. In a depth information-based 3D Gaussian estimation method (e.g., GPS-Gaussian), prior training can be performed before the process of estimating the parameters of the 3D Gaussian is carried out. Specifically, prior training can be performed on the depth information estimation unit and the 3D Gaussian parameter prediction unit using training data. FIG. 1 is a drawing illustrating a depth information-based 3D Gaussian estimation method and apparatus according to one embodiment of the present disclosure. FIG. 2 is a drawing illustrating a method and apparatus for generating a virtual viewpoint image by performing prior learning based on correct normal information according to the present disclosure. FIG. 3 is a diagram illustrating an example of generating a correct depth map and a correct normal map according to the present disclosure. FIG. 4 is a flowchart of a method for generating a virtual viewpoint image by performing prior learning based on correct normal information according to one embodiment of the present disclosure. FIG. 5 is a block diagram illustrating an apparatus according to an embodiment of the present disclosure. The present disclosure is subject to various modifications and may have various embodiments, and specific embodiments are illustrated in the drawings and described in detail in the detailed description. However, this is not intended to limit the present disclosure to specific embodiments, and it should be understood that it includes all modifications, equivalents, and substitutions that fall within the spirit and scope of the present disclosure. Similar reference numerals in the drawings refer to the same or similar functions across various aspects. The shapes and sizes of elements in the drawings may be exaggerated for clearer explanation. The detailed description of exemplary embodiments described below refers to the accompanying drawings, which illustrate specific embodiments as examples. These embodiments are described in sufficient detail to enable those skilled in the art to practice the embodiments. It should be understood that various embodiments are different but need not be mutually exclusive. For example, specific shapes, structures, and characteristics described herein may be implemented in other embodiments without departing from the spirit and scope of the present disclosure in relation to one embodiment. It should also be understood that the location or arrangement of individual components within each disclosed embodiment may be changed without departing from the spirit and scope of the embodiment. Accordingly, the following detailed description is not intended to be taken in