CN-121986360-A - Method, device and medium for point cloud encoding and decoding

CN121986360ACN 121986360 ACN121986360 ACN 121986360ACN-121986360-A

Abstract

Embodiments of the present disclosure provide a solution for point cloud codec. A method for point cloud codec is presented. In the method, a prediction mode for an area of a current frame is determined for a transition between the current frame of the point cloud sequence and a bit stream of the point cloud sequence. The prediction modes include at least a first mode based on intra prediction and inter prediction. A prediction of the region is determined based on the prediction mode. The conversion is performed based on the prediction.

Inventors

XU YINGZHAN
WANG WENYI
B. Vishwanat
ZHANG KAI
ZHANG LI

Assignees

抖音视界有限公司
字节跳动有限公司

Dates

Publication Date: 20260505
Application Date: 20241007
Priority Date: 20231007

Claims (20)

1. A method for point cloud encoding and decoding, comprising: Determining a prediction mode for a region of a current frame of a point cloud sequence for a transition between the current frame and a bit stream of the point cloud sequence, the prediction mode including at least a first mode based on intra-frame prediction and inter-frame prediction; Determining a prediction of the region based on the prediction mode, and The conversion is performed based on the prediction.
2. The method of claim 1, wherein in the first mode, the prediction of the region comprises a weighted average of the intra-prediction and the inter-prediction of the region.
3. The method of claim 1 or 2, wherein the prediction mode is indicated in the bitstream.
4. A method according to any one of claims 1 to 3, wherein the prediction mode further comprises at least one of: There is no prediction mode and, Intra prediction mode, or Inter prediction mode.
5. The method of any one of claims 1 to 4, wherein the region comprises at least one of a node of the current frame, or a hierarchy of the current frame.
6. The method of any of claims 1-5, wherein the region includes a node of the current frame, and the prediction mode is determined based on at least one qualifying condition, wherein the at least one qualifying condition is based on at least one of: The depth of layer of the node is such that, The geometric position of the nodes in question, Attribute information of the node, or Neighbor information of the node.
7. The method of any of claims 1-6, wherein the region includes a node of the current frame and the prediction mode is determined based on rate-distortion optimization.
8. The method of claim 7, wherein at least one of a bit rate or distortion optimized for the rate distortion is estimated.
9. The method of claim 7, wherein at least one of a bit rate or distortion optimized for the rate distortion is determined based on at least one reconstruction value.
10. The method of any of claims 1 to 9, wherein the region comprises a node of the current frame, and the indication associated with the node is to indicate the prediction mode to be applied to the node.
11. The method of claim 10, wherein the indication is included in the bitstream.
12. The method of claim 10 or 11, wherein the indication is encoded with one of a fixed length codec, a unary codec, or a truncated unary codec, or the indication is predictively encoded.
13. The method of any of claims 1 to 12, wherein the prediction modes comprise a combined mode of a first prediction mode and a second prediction mode, and Wherein if a first prediction value of the region based on the first prediction mode is not zero, the prediction value of the region is determined to be the first prediction value, or Wherein if the first prediction value of the region based on the first prediction mode is zero, a second prediction value of the region based on the second prediction mode is determined as the prediction value of the region.
14. The method of any of claims 1 to 13, wherein the region comprises a plurality of nodes and the determined prediction mode is applied to the plurality of nodes.
15. The method of any one of claims 1 to 14, wherein the prediction mode is determined based on at least one qualifying condition, wherein the at least one qualifying condition is based on at least one of a depth of layer of the region, a geometric location of the region, attribute information of the region, or neighbor information of a node in the region.
16. The method of claim 15, wherein at least one indicator indicates at least one allowed depth of layer or at least one allowed region in the at least one qualifying condition.
17. The method of claim 16, wherein the at least one indicator is indicated to a decoder.
18. The method of claim 16 or 17, wherein the at least one indicator is encoded using one of a fixed length codec, a unary codec, or a truncated unary codec.
19. The method of claim 16 or 17, wherein the at least one indicator is predictively encoded.
20. The method of any of claims 1 to 19, wherein the prediction mode is determined based on rate-distortion optimization.

Description

Method, device and medium for point cloud encoding and decoding Technical Field Embodiments of the present disclosure relate generally to video codec technology and, more particularly, to prediction mode determination. Background A point cloud is a collection of individual data points in a three-dimensional (3D) plane, where each point has set coordinates in the X, Y, and Z axes. Thus, the point cloud may be used to represent the physical content of a three-dimensional space. For a variety of immersive applications, from augmented reality to autopilot, point clouds have proven to be a promising way to represent 3D visual data. The point cloud codec standard has evolved mainly through the development of the well-known MPEG organization. MPEG is an acronym for the moving picture expert group (Moving Picture Experts Group), which is one of the main standardization group that deals with multimedia. In 2017, the MPEG 3D graphic codec group (3 DG) published a proposal set (CFP) file to begin developing point cloud codec standards. The final criteria will encompass two categories of solutions. Video-based point cloud compression (V-PCC or VPCC) is applicable to a set of points where the distribution of points is relatively uniform. Geometry-based point cloud compression (G-PCC or GPCC) is suitable for more sparse distributions. However, it is generally desirable to further improve the codec efficiency of conventional point cloud codec techniques. Disclosure of Invention Embodiments of the present disclosure provide a solution for point cloud codec. In a first aspect, a method for point cloud codec is presented. The method includes determining a prediction mode for a region of a current frame for a transition between the current frame of the point cloud sequence and a bit stream of the point cloud sequence, the prediction mode including at least a first mode based on intra-frame prediction and inter-frame prediction, determining a prediction for the region based on the prediction mode, and performing the transition based on the prediction. In a second aspect, another method for point cloud codec is presented. The method includes determining that Region Adaptive Hierarchical Transform (RAHT) attribute codec is enabled for a current frame for a transition between the current frame of the point cloud sequence and a bit stream of the point cloud sequence, and performing the transition based on RAHT attribute codec, wherein at least one parameter of at least one quantization matrix for RAHT coefficients is indicated in the bit stream if RAHT attribute codec is enabled for the point cloud sequence, and wherein at least one parameter of at least one quantization matrix for RAHT coefficients is not included in the bit stream if RAHT attribute codec is disabled for the point cloud sequence. In a third aspect, an apparatus for processing a sequence of point clouds is presented. The apparatus for processing a point cloud sequence includes a processor and a non-transitory memory having instructions thereon, wherein the instructions, when executed by the processor, cause the processor to perform a method according to the first or second aspect of the present disclosure. In a fourth aspect, a non-transitory computer readable storage medium is presented. The non-transitory computer readable storage medium stores instructions that cause a processor to perform a method according to the first or second aspect of the present disclosure. In a fifth aspect, a non-transitory computer readable recording medium is presented. The non-transitory computer readable recording medium stores a bit stream of a point cloud sequence generated by a method performed by a point cloud processing apparatus. The method includes determining a prediction mode for a region of a current frame of the point cloud sequence, the prediction mode including at least a first mode based on intra-frame prediction and inter-frame prediction, determining a prediction for the region based on the prediction mode, and generating a bitstream based on the prediction. In a sixth aspect, a method for storing a bit stream of a point cloud sequence is presented. The method includes determining a prediction mode for a region of a current frame of the point cloud sequence, the prediction mode including at least a first mode based on intra-frame prediction and inter-frame prediction, determining a prediction of the region based on the prediction mode, generating a bitstream based on the prediction, and storing the bitstream in a non-transitory computer readable recording medium. In a seventh aspect, another non-transitory computer readable recording medium is presented. The non-transitory computer readable recording medium stores a bit stream of a point cloud sequence generated by a method performed by a point cloud processing apparatus. The method includes determining that a Region Adaptive Hierarchical Transform (RAHT) attribute codec is enabled for a current frame of a point clo