US-12621477-B2 - Optimized position and connectivity coding for dual degree mesh compression

US12621477B2US 12621477 B2US12621477 B2US 12621477B2US-12621477-B2

Abstract

A method and apparatus comprising computer code configured to cause a processor or processors to obtain, from a bitstream, a mesh representing an encoded volumetric data of at least one three-dimensional (3D) visual content; partition a plurality of vertices of the mesh into a plurality of groups; and decode the encoded volumetric data by predicting the vertices in each group of the plurality of groups based on a plurality of traversal orders depending on an adaptive reference vertex of the vertices, and the plurality of traversal orders includes a first order from the adaptive reference vertex to a first alternative reference vertex of the vertices, and wherein the plurality of traversal orders includes a second order from the adaptive reference vertices to a second alternative reference vertex of the vertices.

Inventors

Thuong NGUYEN CANH
Chao Huang
Xiaozhong Xu
Shan Liu

Assignees

Tencent America LLC

Dates

Publication Date: 20260505
Application Date: 20240711

Claims (14)

1 . A method for video decoding, the method performed by at least one processor and comprising: obtaining, from a bitstream, a mesh representing an encoded volumetric data of at least one three-dimensional (3D) visual content; partitioning a plurality of vertices of the mesh into a plurality of groups; and decoding the encoded volumetric data by predicting the vertices in each group of the plurality of groups based on a plurality of traversal orders depending on an adaptive reference vertex of the vertices, the plurality of traversal orders comprises a first order from the adaptive reference vertex to a first alternative reference vertex of the vertices, the plurality of traversal orders comprises a second order from the adaptive reference vertices to a second alternative reference vertex of the vertices, content of the bitstream is based on a rate estimated from a sum of absolute differences (SAD) of a residual vector, the content of the bitstream represents a replacement vertex of the adaptive vertex and is determined based on determining a plurality of fractions of a line segment of a face of the mesh, the SAD of the residual vector is scaled according to R=α×|p-v| 1 , where R represent a rate, α>0 represents a scaling factor adjusting a rate approximation, p represents ones of positions along the line segment, and v represents ones of the vertices, the ones of positions along the line segment are determined according to p w =(1−w)*v+p v , where p w represents a position at fraction w of the line segment and py represents a predicted vertex, and the replacement vertex is determined according to ΔJ=ΔD+λΔR, where ΔJ is a rate-distortion function, ΔD represents a change in geometric distortion, and ΔR represents a change in bit rate.
2 . The method according to claim 1 , wherein a binary flag is included in the bitstream and indicates either the first order or the second order.
3 . The method according to claim 1 , wherein decoding the encoded volumetric data is based on determining at least one of the plurality of traversal orders based on a number of faces of the mesh and without any binary flag included in the bitstream and indicating any of the first order and the second order.
4 . The method according to claim 1 , wherein decoding the encoded volumetric data is based on interleave coding indicating at least one of face traversal and pivot traversal of the mesh.
5 . The method according to claim 1 , wherein the mesh comprises a parallelogram.
6 . An apparatus for video decoding, the apparatus comprising: at least one memory configured to store computer program code; at least one processor configured to access the computer program code and operate as instructed by the computer program code, the computer program code including: obtaining code configured to cause the at least one processor to obtain, from a bitstream, a mesh representing an encoded volumetric data of at least one three-dimensional (3D) visual content; partitioning code configured to cause the at least one processor to partition a plurality of vertices of the mesh into a plurality of groups; and decoding code configured to cause the at least one processor to decode the encoded volumetric data by predicting the vertices in each group of the plurality of groups based on a plurality of traversal orders depending on an adaptive reference vertex of the vertices, the plurality of traversal orders comprises a first order from the adaptive reference vertex to a first alternative reference vertex of the vertices, and the plurality of traversal orders comprises a second order from the adaptive reference vertices to a second alternative reference vertex of the vertices, content of the bitstream is based on a rate estimated from a sum of absolute differences (SAD) of a residual vector, the content of the bitstream represents a replacement vertex of the adaptive vertex and is determined based on determining a plurality of fractions of a line segment of a face of the mesh, the SAD of the residual vector is scaled according to R=α×|p-v| 1 , where R represent a rate, α>0 represents a scaling factor adjusting a rate approximation, p represents ones of positions along the line segment, and v represents ones of the vertices, the ones of positions along the line segment are determined according to p w =(1−w)*v+p v , where p w represents a position at fraction w of the line segment and py represents a predicted vertex, and the replacement vertex is determined according to ΔJ=ΔD+λΔR, where ΔJ is a rate-distortion function, ΔD represents a change in geometric distortion, and ΔR represents a change in bit rate.
7 . The apparatus according to claim 6 , wherein a binary flag is included in the bitstream and indicates either the first order or the second order.
8 . The apparatus according to claim 6 , wherein decoding the encoded volumetric data is based on determining at least one of the plurality of traversal orders based on a number of faces of the mesh and without any binary flag included in the bitstream and indicating any of the first order and the second order.
9 . The apparatus according to claim 6 , wherein decoding the encoded volumetric data is based on interleave coding indicating at least one of face traversal and pivot traversal of the mesh.
10 . The apparatus according to claim 6 , wherein the mesh comprises a parallelogram.
11 . A non-transitory computer readable medium storing a program causing a computer to: obtaining, from a bitstream, a mesh representing an encoded volumetric data of at least one three-dimensional (3D) visual content; partitioning a plurality of vertices of the mesh into a plurality of groups; and decoding the encoded volumetric data by predicting the vertices in each group of the plurality of groups based on a plurality of traversal orders depending on an adaptive reference vertex of the vertices, the plurality of traversal orders comprises a first order from the adaptive reference vertex to a first alternative reference vertex of the vertices, and the plurality of traversal orders comprises a second order from the adaptive reference vertices to a second alternative reference vertex of the vertices, content of the bitstream is based on a rate estimated from a sum of absolute differences (SAD) of a residual vector, the content of the bitstream represents a replacement vertex of the adaptive vertex and is determined based on determining a plurality of fractions of a line segment of a face of the mesh, the SAD of the residual vector is scaled according to R=α×|p-v| 1 , where R represent a rate, α>0 represents a scaling factor adjusting a rate approximation, p represents ones of positions along the line segment, and v represents ones of the vertices, the ones of positions along the line segment are determined according to p w =(1−w)*v+p v , where p w represents a position at fraction w of the line segment and py represents a predicted vertex, and the replacement vertex is determined according to ΔJ-ΔD+λΔR, where ΔJ is a rate-distortion function, ΔD represents a change in geometric distortion, and ΔR represents a change in bit rate.
12 . The non-transitory computer readable medium according to claim 11 , wherein a binary flag is included in the bitstream and indicates either the first order or the second order.
13 . The non-transitory computer readable medium according to claim 11 , wherein decoding the encoded volumetric data is based on determining at least one of the plurality of traversal orders based on a number of faces of the mesh and without any binary flag included in the bitstream and indicating any of the first order and the second order.
14 . The non-transitory computer readable medium according to claim 11 , wherein decoding the encoded volumetric data is based on interleave coding indicating at least one of face traversal and pivot traversal of the mesh.

Description

CROSS REFERENCE TO RELATED APPLICATION This application claims priority to U.S. provisional application 63/526,441, filed on Jul. 12, 2023, U.S. provisional application 63/526,440, filed on Jul. 12, 2023, and U.S. 63/598,097, filed on Nov. 11, 2023, the disclosures of which are incorporated herein by reference in their entireties. BACKGROUND 1. Field This disclosure relates novel methods for efficiently compressing the connectivity and attribute information of triangular and polygonal meshes by employing an optimized dual degree connectivity coding scheme, for efficiently compressing the position attribute with refining the input mesh, for interleave coding of connectivity and position attribute in dual degree mesh coding, and for dual degree mesh coding based on adaptive multiple parallelogram prediction. 2. Description of Related Art The advances in 3D capture, modeling, and rendering have promoted the ubiquitous presence of 3D contents across several platforms and devices. Nowadays, it is possible to capture a baby's first step in one continent and allow the grandparents to see (and maybe interact) and enjoy a full immersive experience with the child in another continent. Nevertheless, in order to achieve such realism, models are becoming ever more sophisticated, and a significant amount of data is linked to the creation and consumption of those models. 3D meshes are widely used to represent such immersive contents. A mesh is composed of several polygons that describe the surface of a volumetric object. Each polygon is defined by its vertices in 3D space and the information of how the vertices are connected, referred to as connectivity information. Optionally, vertex attributes, such as colors, normals, etc., could be associated with the mesh vertices. Attributes could also be associated with the surface of the mesh by exploiting mapping information that parameterizes the mesh with 2D attribute maps. Such mapping is usually described by a set of parametric coordinates, referred to as UV coordinates or texture coordinates, associated with the mesh vertices. 2D attribute maps are used to store high resolution attribute information such as texture, normals, displacements etc. Such information could be used for various purposes such as texture mapping and shading. A dynamic mesh sequence may require a large amount of data since it may consist of a significant amount of information changing over time. Therefore, efficient compression technologies are required to store and transmit such contents. Mesh compression standards IC, MESHGRID, FAMC were previously developed by MPEG to address dynamic meshes with constant connectivity and time varying geometry and vertex attributes. However, these standards do not take into account time varying attribute maps and connectivity information. DCC (Digital Content Creation) tools usually generate such dynamic meshes. In counterpart, it is challenging for volumetric acquisition techniques to generate a constant connectivity dynamic mesh, especially under real time constraints. This type of contents is not supported by the existing standards. MPEG is planning to develop a new mesh compression standard to directly handle dynamic meshes with time varying connectivity information and optionally time varying attribute maps. This standard targets lossy, and lossless compression for various applications, such as real-time communications, storage, free viewpoint video, AR and VR. Functionalities such as random access and scalable/progressive coding are also considered. And for any of those reasons there is therefore a desire for technical solutions to such problems that arose in video coding technology. SUMMARY There is included a method and apparatus comprising memory configured to store computer program code and a processor or processors configured to access the computer program code and operate as instructed by the computer program code. The computer program is configured to cause the processor implement code configured to cause the at least one processor to obtain, from a bitstream, a mesh representing an encoded volumetric data of at least one three-dimensional (3D) visual content; partitioning code configured to cause the at least one processor to partition a plurality of vertices of the mesh into a plurality of groups; and decoding code configured to cause the at least one processor to decode the encoded volumetric data by predicting the vertices in each group of the plurality of groups based on a plurality of traversal orders depending on an adaptive reference vertex of the vertices, the plurality of traversal orders includes a first order from the adaptive reference vertex to a first alternative reference vertex of the vertices, and the plurality of traversal orders includes a second order from the adaptive reference vertices to a second alternative reference vertex of the vertices. A binary flag may be included in the bitstream and indicates either the first order or the