US-12627816-B2 - Method for decoding immersive video and method for encoding immersive video

US12627816B2US 12627816 B2US12627816 B2US 12627816B2US-12627816-B2

Abstract

A video encoding method includes classifying a plurality of view images into a basic image and additional images, performing pruning on at least one of the plurality of view images on the basis of the classification result, generating an atlas on the basis of the pruning result, and encoding the atlas and metadata for the atlas. Here, the metadata includes a first flag indicating whether depth estimation needs to be performed on a decoder side.

Inventors

Jun Young JEONG
Gwang Soon Lee
Dawid Mieloch
Adrian Dziembowski
Marek Domański

Assignees

ELECTRONICS AND TELECOMMUNICATIONS RESEARCH INSTITUTE

Dates

Publication Date: 20260512
Application Date: 20231020
Priority Date: 20221024

Claims (15)

1 . A method of encoding a video, the method comprising: classifying a plurality of view images into a basic image and additional images; performing pruning on at least one of the plurality of view images on the basis of the classification result; generating an atlas on the basis of the pruning result; and encoding the atlas and metadata for the atlas, wherein the metadata includes a first flag indicating whether depth estimation needs to be performed on a decoder side, wherein in response to the first flag being encoded with a value indicating that the depth estimation needs to be performed on the decoder side, a second flag indicating whether a depth values are reprojected before the depth estimation is further included in the metadata, wherein in response to the second flag is encoded to indicate that the depth values are to be reprojected, the depth values for the depth estimation are further encoded, wherein depth values for each view image where the depth estimation is performed are obtained by reprojecting the depth values to be encoded, and wherein the depth estimation for each view image is performed in units of blocks.
2 . The method of claim 1 , wherein the metadata further includes identification information for identifying a profile of the video, and wherein, in response to the profile of the video indicating MIV (MPEG Immersive Video) GA (Geometry Absent), the atlas is constituted with only a texture component without a depth component.
3 . The method of claim 2 , wherein in response to the profile of the video being the MIV GA, the first flag is encoded with the value indicating that the depth estimation needs to be performed on the decoder side.
4 . The method of claim 3 , wherein in response to the profile of the video being other than the MIV GA, the atlas is constituted with the texture component and the depth component.
5 . The method of claim 4 , wherein the metadata further includes a third flag indicating whether depth estimation is performed on all view images.
6 . The method of claim 5 , wherein the third flag is encoded with a value indicating that the depth estimation is performed on the all view images when the profile of the video is the MIV GA.
7 . A method of decoding a video, the method comprising: decoding an atlas and metadata for the atlas; and generating a viewport image using the atlas and the metadata, wherein the metadata includes a first flag indicating whether depth estimation needs to be performed on a decoder side, wherein in response to the first flag indicating that the depth estimation needs to be performed on the decoder side, a second flag indicating whether transmitted depth values are reprojected before the depth estimation is further included in the metadata, wherein, in response to the second flag indicating that the transmitted depth values are reprojected, the transmitted depth values are reprojected for each view image where the depth estimation is performed, and wherein the depth estimation for each view image is performed in units of blocks.
8 . The method of claim 7 , wherein the metadata further includes identification information for identifying a profile of the video, and, wherein, in response to the profile of the video indicating MIV GA, the atlas is constituted with only a texture component without a depth component.
9 . The method of claim 8 , wherein in response to the profile of the video being the MIV GA, the value of the first flag is forced to indicate that the depth estimation needs to be performed on the decoder side.
10 . The method of claim 8 , wherein in response to the profile of the video being other than the MIV GA, the atlas is constituted with the texture component and the depth component.
11 . The method of claim 8 , wherein the second flag is forced to indicate that the depth estimation is performed on the all view images when the profile of the video is the MIV GA.
12 . The method of claim 7 , wherein a depth map of a first view reconstructed from a depth component of the atlas is refined by performing the depth estimation at the decoder side.
13 . The method of claim 12 , wherein a depth map of a second view is obtained by refining a reprojected depth map generated by reprojecting the depth map or a refined depth map of the first view.
14 . The method of claim 7 , wherein the metadata further includes a third flag indicating whether depth estimation is performed on all view images.
15 . A non-transitory computer-readable recording medium storing instructions for carrying out encoding of a video, the instructions comprising: classifying a plurality of view images into a basic image and additional images; performing pruning on at least one of the plurality of view images on the basis of the classification result; generating an atlas on the basis of the pruning result; and encoding the atlas and metadata for the atlas, wherein the metadata includes a first flag indicating whether depth estimation needs to be performed on a decoder side, wherein in response to the first flag being encoded with a value indicating that the depth estimation needs to be performed on the decoder side, a second flag indicating whether depth values are reprojected before the depth estimation or not is further included in the metadata, wherein in response to the second flag is encoded to indicate that the depth values are to be reprojected, the depth values for the depth estimation are further encoded, wherein depth values for each view image where the depth estimation is performed are obtained by reprojecting the depth values to be encoded, and wherein the depth estimation for each view image is performed in units of blocks.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS This application claims the benefit under 35 U.S.C. § 119 (a) of Korean Patent Application Ser. No. 10-2022-0137387 filed on Oct. 24, 2022, and Korean Patent Application No. 10-2023-0139033 filed on Oct. 17, 2023, in the Korean Intellectual Property Office, the entire disclosures of which are incorporated herein by reference for all purposes. BACKGROUND OF THE INVENTION Field of the Invention The present disclosure relates to immersive video encoding/decoding methods for supporting motion parallax for rotational and translational movements. Description of the Related Art Virtual reality services are evolving to services for maximizing immersion and realism by generating omnidirectional images in the form of live action video or computer graphics (CG) and playing the same on HMDs, smartphones, etc. Currently, it is known that in order to play natural and immersive omnidirectional videos through HMDs, it is necessary support 6 degrees of freedom (DoF). 6 DoF images need to be provided as free images in six directions such as (1) left and right rotation, (2) up and down rotation, (3) left and right movement, and (4) up and down movement through an HMD screen. However, most omnidirectional images based on live action video only support rotational movement. Accordingly, research in areas such as acquisition and reproduction technology for 6 DoF omnidirectional images is being actively conducted. SUMMARY OF THE INVENTION Therefore, the present disclosure has been made in view of the above problems, and it is an object of the present disclosure is to provide a method for performing depth estimation on a decoder side. It is an object of the present disclosure to provide a profile that allows depth estimation to be performed on the decoder side and a syntax structure according thereto. The technical objects to be achieved by the present disclosure are not limited to the technical objects mentioned above, and other technical objects not mentioned can be clearly understood by those skilled in the art from the description below. A video encoding method according to the present disclosure includes classifying a plurality of view images into a basic image and additional images, performing pruning on at least one of the plurality of view images on the basis of the classification result, generating an atlas on the basis of the pruning result, and encoding the atlas and metadata for the atlas. Here, the metadata may include a first flag indicating whether depth estimation needs to be performed on a decoder side. In the video encoding method according to the present disclosure, the metadata may further include identification information for identifying a profile of a current image, and when the profile of the current image supports depth estimation on the decoder side, the first flag may be encoded into a true value. In the video encoding method according to the present disclosure, the profile supporting depth estimation on the decoder side may include at least one of GA (Geometry Absent) or MIV DSDE (Decoder-Side Depth Estimation). In the video encoding method according to the present disclosure, encoding of depth information for the view images may not be allowed when the profile of the current image is the MIV GA, and encoding of depth information for at least some of the view images may be allowed when the profile of the current image is the MIV DSDE. In the video encoding method according to the present disclosure, the depth information may include a depth atlas. In the video encoding method according to the present disclosure, the depth atlas may include only information on the basic image among the plurality of view images. In the video encoding method according to the present disclosure, the metadata may further include a second flag indicating whether depth estimation is performed on all view images. In the video encoding method according to the present disclosure, the second flag may be encoded into a true value when the profile of the current image is the MIV GA. In the video encoding method according to the present disclosure, the second flag may be encoded into true or false when the profile of the current image is the MIV DSDE. A video decoding method according to the present disclosure includes decoding an atlas and metadata for the atlas, and generating a viewport image using the atlas and the metadata. Here, the metadata may include a first flag indicating whether depth estimation needs to be performed on a decoder side. In the video decoding method according to the present disclosure, the metadata may further include identification information for identifying a profile of a current image, and the profile supporting depth estimation on the decoder side may include at least one of MIV GA (Geometry Absent) or MIV DSDE (Decoder-Side Depth Estimation). In the video decoding method according to the present disclosure, the value of the first flag may be forced to be true whe