US-12626409-B2 - Encoding and decoding views on volumetric image data

US12626409B2US 12626409 B2US12626409 B2US 12626409B2US-12626409-B2

Abstract

An encoding method comprises obtaining ( 101 ) an input set of volumetric image data, selecting ( 103 ) data from the image data for multiple views based on a visibility of the data from a respective viewpoint at a respective viewing direction and/or within a respective field of view such that a plurality of the views comprises only a part of the image data, encoding ( 105 ) each of the views as a separate output set ( 31 ), and generating ( 107 ) metadata which indicates the viewpoints. A decoding method comprises determining ( 121 ) a desired user viewpoint, obtaining ( 123 ) the metadata, selecting ( 125 ) one or more of the available viewpoints based on the desired user viewpoint, obtaining ( 127 ) one or more sets of image data in which one or more available views corresponding to the selected one or more available viewpoints have been encoded, and decoding ( 129 ) at least one of the one or more available views.

Inventors

Sylvie Dijkstra-Soudarissanane
Hendrikus Nathaniël Hindriks
Emmanuel Thomas

Assignees

KONINKLIJKE KPN N.V.
NEDERLANDSE ORGANISATIE VOOR TOEGEPAST-NATUURWETENSCHAPPELIJK ONDERZOEK TNO

Dates

Publication Date: 20260512
Application Date: 20201204
Priority Date: 20191206

Claims (15)

1 . An encoder system, comprising at least one processor configured to: obtain an input set of volumetric image data, select data from said volumetric image data for each of a plurality of views on said volumetric image data based on a visibility of said data from a respective viewpoint either at a respective viewing direction, or within a respective field of view, or both, such that a plurality of said views comprises only a part of said volumetric image data, wherein selecting data for each of said views comprises view-frustum culling an initial data selection for each of the plurality of views, encode each of said culled views as a separate output set of volumetric image data, and generate metadata, said metadata indicating said plurality of viewpoints and comprising 3D position information.
2 . An encoder system as claimed in claim 1 , wherein said at least one processor is configured to: select further data for said plurality of views based on a visibility of said further data from one or more respective further viewpoints, said one or more respective further viewpoints being related to said respective viewpoint.
3 . An encoder system as claimed in claim 1 , wherein said at least one processor is configured to specify in said metadata where to obtain said output sets of volumetric image data or parts of said output sets of volumetric image data.
4 . An encoder system as claimed in claim 1 , wherein said metadata further indicates either said plurality of viewing directions, said plurality of fields of view, or both, or further viewpoint configurations.
5 . An encoder system as claimed in claim 1 , wherein said input set of volumetric image data comprises one or more point clouds.
6 . An encoder system as claimed in claim 1 , wherein said at least one processor is configured to select said data from said volumetric image data for each of said plurality of views by selecting, for each respective view, all of said volumetric image data which is visible from said corresponding viewpoint either at said corresponding viewing direction, within said corresponding field of view, or both, from said volumetric image data.
7 . An encoder system as claimed in claim 1 , wherein said plurality of views collectively comprises all of said volumetric image data.
8 . A decoder system, comprising at least one processor configured to: determine a desired user viewpoint, obtain metadata associated with encoded volumetric image data, said metadata indicating a plurality of available viewpoints, and comprising 3D position information, each of said plurality of available viewpoints corresponding to one or more available view-frustrum culled views, select one or more of said plurality of available viewpoints based on said desired user viewpoint, obtain, based on said selected one or more viewpoints, one or more sets of volumetric image data in which one or more available culled views corresponding to said selected one or more viewpoints have been encoded, and decode at least one of said one or more available culled views from said one or more sets of volumetric image data.
9 . A decoder system as claimed in claim 8 , wherein said at least one processor is configured to: determine a further desired user viewpoint, select a further available viewpoint from said available viewpoints based on said further desired user viewpoint, obtain a further set of volumetric image data in which a further available view corresponding to said further available viewpoint has been encoded, decode said further available view from said further set of volumetric image data, and fuse said decoded further available view with said at least one decoded available view.
10 . A decoder system as claimed in claim 8 , wherein said at least one processor is configured to: obtain a further set of volumetric image data in which data from one or more related views has been encoded, said one or more related views being related to said available one or more available views, decode at least one of said one or more related views from said further set of volumetric image data, and fuse said decoded at least one related view with said decoded at least one available view.
11 . A decoder system as claimed in claim 8 , wherein said at least one processor is configured to obtain metadata indicating said available viewpoints and specifying where to obtain sets of volumetric image data in which available views corresponding to said available viewpoints have been encoded or parts of said sets.
12 . A decoder system as claimed in claim 11 , wherein said metadata further indicates either a viewing direction, field of view, or both, or further viewpoint configuration for each of said available viewpoints.
13 . A method of encoding volumetric image data, comprising: obtaining an input set of volumetric image data; selecting data from said volumetric image data for each of a plurality of views on said volumetric image data based on a visibility of said data from a respective viewpoint either at a respective viewing direction, or within a respective field of view, or both, such that a plurality of said views comprises only a part of said volumetric image data wherein selecting data for each of said views comprises view-frustum culling an initial data selection for each of the plurality of views; encoding each of said culled views as a separate output set of volumetric image data; and generating metadata, said metadata indicating said plurality of viewpoints and comprising 3D position information.
14 . A method of decoding encoded volumetric image data, comprising: determining a desired user viewpoint; obtaining metadata associated with said encoded volumetric image data, said metadata indicating a plurality of available viewpoints, and comprising 3D position information, each of said plurality of available viewpoints corresponding to one or more view-frustrum available culled views; selecting one or more of said plurality of available viewpoints based on said desired user viewpoint; obtaining, based on said selected one or more viewpoints, one or more sets of volumetric image data in which one or more available culled views corresponding to said selected one or more available viewpoints have been encoded; and decoding at least one of said one or more available culled views from said one or more sets of volumetric image data.
15 . A non-transitory computer-readable medium comprising non-transitory data representing a computer program or suite of computer programs comprising at least one software code portion or a computer program product storing at least one software code portion, the software code portion, when run on a computer system, being configured for performing the method of claim 13 .

Description

This application is the U.S. National Stage of International Application No. PCT/EP2020/084685, filed on Dec. 4, 2020, which designates the U.S., published in English, and claims priority under 35 U.S.C. § 119 or 365 (c) to European Application No. EP 19214048.1, filed on Dec. 6, 2019. The entire teachings of the above applications are incorporated herein by reference. FIELD OF THE INVENTION The invention relates to an encoder system for encoding volumetric image data, e.g. a point cloud, and a decoder system for decoding encoded volumetric image data. The invention further relates to a method of encoding volumetric image data and a method of decoding encoded volumetric image data. The invention also relates to a computer program product enabling a computer system to perform such methods. BACKGROUND OF THE INVENTION Augmented Reality (AR) and Virtual Reality (VR) offer a compelling set of use cases, such as remotely attending live sports, shared and social VR, (serious) gaming and training and education. Such experiences allow viewers to connect over large distances. For truly immersive experiences in both AR and VR, a viewer requires six degrees of freedom (6DoF). That is, when wearing a head-mounted AR goggle or VR display, the viewer should experience changes in the environment when moving his/her head in all directions, i.e. when changing head position forward/backward (surge), up/down (heave) and left/right (sway) combined with changes in orientation through rotation along the lateral (yaw), transverse (pitch) and longitudinal (roll) relative axes and more generally when moving his/her head while his/her head may stay still with respect to his/her body. Volumetric formats are required to describe and thereby allow rendering of environments in which viewers can have 6DoF experiences. One aspect of such volumetric formats are volumetric video formats which have been created to describe volumetric environments which are dynamically changing over a given time. The AR and VR industry is moving towards such formats. For example, in the aforementioned use cases, the image of users could be made more realistic by using volumetric capture models. Volumetric image data may comprise Point Clouds (PCs), voxels or volumetric (polygon) meshes, for example. Meshes are used to describe 3D models in games, for example. Point clouds can be used to describe volumetric objects as a set of points, which can be then be used in virtual scenes. A point cloud is a method for representing 3D data using a (usually very large) set of three dimensional (x,y,z)∈3 points, here x,y,z usually refer to Cartesian coordinates, but other formats also exist (e.g. a 3D reference point (e.g. [0,0,0]) with angles x,y on a sphere with radius z). Depending on the type of data which is being represented, each point can have additional attributes (for example colour, reflectance, surface orientation, timestamp, movement) assigned to it. Points within a point cloud are normally considered to have a volume of zero (in other words: are normally considered to have no defined size/dimension). In order to meaningfully render such points, multiple techniques have been described in literature. In one of the more trivial methods, a thickness value is assigned to each point before or during rendering. Using this thickness, it is possible to represent each point with a 3D object (e.g. tiny spheres, voxels, hexagons or other shapes) such that it becomes visible and can hide other points which are behind it. Point clouds are well suited as a storage format for outputs from a range of measurement and capture devices. In particular, an RGB camera combined and synchronized with an infrared time-of-flight (ToF) sensor (e.g. the Microsoft Kinect) is commonly used to sense depth and colour information which can be combined and represented as a point cloud. Another technology which has resulted in the use of point clouds is LiDAR, a technology mainly known for self-driving cars and remote sensing. A mesh is a 3D structure which is composed of multiple connected points or vertices. Vertices can be connected and closed to form (planar) faces. Graphics cards (GPUs) are typically optimized for rendering sets of large meshes consisting of three- or four-sided faces. Objects can be better approximated by increasing the number of vertices. Meshes can be constructed programmatically, and/or be defined using 3D modelling software. There are also many methods for unambiguously storing mesh data, and as such there are many public and proprietary formats for this purpose, like the 3DS, OBJ, GLTF and PLY formats, for example. Voxels or ‘volumetric pixels’ are a data structure used for representing volumetric data. Commonly, voxels are defined on a 3D ‘voxel grid’ consisting of similar sized cells. In practice, voxels are used for representing various volumetric measurements and samplings with applications in medical, geospatial fields as well as more generally in computer graphi