US-12627785-B2 - Method and apparatus for delivering a volumetric video content

US12627785B2US 12627785 B2US12627785 B2US 12627785B2US-12627785-B2

Abstract

Methods, devices and data stream are provided for signaling and decoding information representative of restrictions of navigation in a volumetric video. The data stream comprises metadata associated to video data representative of the volumetric video. The metadata comprise data representative of a viewing bounding box, data representative of a curvilinear path in the 3D space of said volumetric video; and data representative of at least one viewing direction range associated with a point on the curvilinear path.

Inventors

Bertrand Chupeau
Gérard Briand
Renaud Dore

Assignees

INTERDIGITAL VC HOLDINGS, INC.

Dates

Publication Date: 20260512
Application Date: 20241023
Priority Date: 20190722

Claims (10)

1 . A method for rendering a volumetric video according to restrictions of navigation, the method comprising decoding metadata from a data stream comprising video data representative of the volumetric video, the metadata comprising: data representative of a curvilinear path in a three-dimensional (3D) space of the volumetric video; data representative of at least two 3D points of view on the curvilinear path associated with a viewing direction range; and data representative of a viewing bounding box centered around one of the at least two 3D points of view; and rendering the volumetric video from a point of view within the viewing box associated with a first point of view of the at least two points of view in a limit of the associated viewing direction range.
2 . The method of claim 1 , wherein an action from a user displaces the first point of view to a second point of view of the at least two points of view, the rendering being performed according to the data associated with the second point of view.
3 . The method of claim 1 , wherein an action from a user displaces the first point of view to a third point of view located on the curvilinear path and between two of the at least two points of view, the rendering being performed according to data computed according to the data associated with the two of the at least two points of view.
4 . The method of claim 1 , wherein the data representative of a curvilinear path comprises parameters representative of parametric 3D curves, and wherein data representative of at least one 3D point is associated to one origin point of the curvilinear path.
5 . The method of claim 1 , wherein the metadata comprises data representative of at least two curvilinear paths in the 3D space of the volumetric video, and wherein an action from a user displaces the first point of view of a first curvilinear path to a first point of view of a second curvilinear path.
6 . A device for rendering a volumetric video according to restrictions of navigation, the device comprising a processor configured for decoding metadata from a data stream comprising video data representative of the volumetric video, the metadata comprising: data representative of a curvilinear path in a three-dimensional (3D) space of the volumetric video; data representative of at least two 3D points of view on the curvilinear path associated with a viewing direction range; and data representative of a viewing bounding box centered around one of the at least two 3D points of view; and rendering the volumetric video from a point of view within the viewing box associated with a first point of view of the at least two points of view in a limit of the associated viewing direction range.
7 . The device of claim 6 , wherein an action from a user displaces the first point of view to a second point of view of the at least two points of view, the rendering being performed according to the data associated with the second point of view.
8 . The device of claim 6 , wherein an action from a user displaces the first point of view to a third point of view located on the curvilinear path and between two of the at least two points of view, the rendering being performed according to data computed according to the data associated with the two of the at least two points of view.
9 . The device of claim 6 , wherein the data representative of a curvilinear path comprises parameters representative of parametric 3D curves, and wherein data representative of at least one 3D point is associated to one origin point of the curvilinear path.
10 . The device of claim 6 , wherein the metadata comprises data representative of at least two curvilinear paths in the 3D space of the volumetric video, and wherein an action from a user displaces the first point of view of a first curvilinear path to a first point of view of a second curvilinear path.

Description

CROSS-REFERENCE TO RELATED APPLICATION This application is a continuation U.S. Non-Provisional application Ser. No. 17/629,242 filed Jan. 21, 2022, which is a National Stage Application under 35 U.S.C. § 371 of International Application No. PCT/US2020/041878, filed Jul. 14, 2020, and claims priority of European patent application No. 19305968.0 filed Jul. 22, 2019 and European patent application No. 19306721.2 filed Dec. 20, 2019, the contents of all of which are hereby incorporated herein by reference as if fully set forth. TECHNICAL FIELD The present principles generally relate to the domain of three-dimensional (3D) scene and volumetric video content. The present document is also understood in the context of the encoding, the formatting and the decoding of data representative of the texture and the geometry of a 3D scene for a rendering of volumetric content on end-user devices such as mobile devices or Head-Mounted Displays (HMD). In particular, the present principles relate to signaling and decoding information representative of restrictions of navigation in a volumetric video. BACKGROUND The present section is intended to introduce the reader to various aspects of art, which may be related to various aspects of the present principles that are described and/or claimed below. This discussion is believed to be helpful in providing the reader with background information to facilitate a better understanding of the various aspects of the present principles. Accordingly, it should be understood that these statements are to be read in this light, and not as admissions of prior art. Recently there has been a growth of available large field-of-view content (up to) 360°. Such content is potentially not fully visible by a user watching the content on immersive display devices such as Head Mounted Displays, smart glasses, PC screens, tablets, smartphones and the like. That means that at a given moment, a user may only be viewing a part of the content. However, a user can typically navigate within the content by various means such as head movement, mouse movement, touch screen, voice and the like. It is typically desirable to encode and decode this content. Immersive video, also called 360° flat video, allows the user to watch all around himself through rotations of his head around a still point of view. Rotations only allow a 3 Degrees of Freedom (3DoF) experience. Even if 3DoF video is sufficient for a first omnidirectional video experience, for example using a Head-Mounted Display device (HMD), 3DoF video may quickly become frustrating for the viewer who would expect more freedom, for example by experiencing parallax. In addition, 3DoF may also induce dizziness because of a user never only rotates his head but also translates his head in three directions, translations which are not reproduced in 3DoF video experiences. A large field-of-view content may be, among others, a three-dimension computer graphic imagery scene (3D CGI scene), a point cloud or an immersive video. Many terms might be used to design such immersive videos: Virtual Reality (VR), 360, panoramic, 4π steradians, immersive, omnidirectional or large field of view for example. Volumetric video (also known as 6 Degrees of Freedom (6DoF) video) is an alternative to 3DoF video. When watching a 6DoF video, in addition to rotations, the user can also translate his head, and even his body, within the watched content and experience parallax and even volumes. Such videos considerably increase the feeling of immersion and the perception of the scene depth and prevent from dizziness by providing consistent visual feedback during head translations. The content is created by the means of dedicated sensors allowing the simultaneous recording of color and depth of the scene of interest. The use of rig of color cameras combined with photogrammetry techniques is a way to perform such a recording, even if technical difficulties remain. While 3DoF videos comprise a sequence of images resulting from the un-mapping of texture images (e.g. spherical images encoded according to latitude/longitude projection mapping or equirectangular projection mapping), 6DoF video frames embed information from several points of views. They can be viewed as a temporal series of point clouds resulting from a three-dimension capture. Two kinds of volumetric videos may be considered depending on the viewing conditions. A first one (i.e. complete 6DoF) allows a complete free navigation within the video content whereas a second one (aka. 3DoF+) restricts the user viewing space to a limited volume called viewing bounding box, allowing limited translation of the head and parallax experience. This second context is a valuable trade-off between free navigation and passive viewing conditions of a seated audience member. Between 3DoF+ and 6DoF experiences, it is possible to define the 4DoF+ case as an intermediate between 3DoF+ and 6DoF where the user's displacement is constrained along a curvilin