US-20260129227-A1 - METHOD, APPARATUS, AND MEDIUM FOR VIDEO PROCESSING
Abstract
Embodiments of the present disclosure provide a solution for video processing. A method for video processing is proposed. The method comprises: performing a conversion between a current point cloud (PC) sample of a point cloud sequence and a bitstream of the point cloud sequence, wherein the bitstream comprises a first indication indicating at least one of the following: a first set of attribute prediction schemes used for one or more nodes in the current PC sample that are at a first level lower than a sequence level, or a second set of attribute prediction schemes available for one or more nodes in the current PC sample that are at the first level.
Inventors
- Yingzhan XU
- Kai Zhang
- Li Zhang
Assignees
- Douyin Vision Co., Ltd.
- BYTEDANCE INC.
Dates
- Publication Date
- 20260507
- Application Date
- 20260105
- Priority Date
- 20230706
Claims (20)
- 1 . A method for point cloud coding, comprising: performing a conversion between a current point cloud (PC) sample of a point cloud sequence and a bitstream of the point cloud sequence, wherein the bitstream comprises a first indication indicating at least one of the following: a first set of attribute prediction schemes used for one or more nodes in the current PC sample that are at a first level lower than a sequence level, or a second set of attribute prediction schemes available for one or more nodes in the current PC sample that are at the first level.
- 2 . The method of claim 1 , wherein an attribute prediction scheme is configured for determining a prediction associated with attribute information of a node, wherein the prediction associated with the attribute information comprises one of the following: a prediction of a first coefficient for the attribute information, the first coefficient being a result of performing a region-adaptive hierarchical transform (RAHT) on the attribute information, or a prediction of the attribute information, wherein the first coefficient is an alternating current (AC) coefficient or a direct current (DC) coefficient.
- 3 . The method of claim 1 , wherein the first level is one of the following: a frame level, a depth layer level, a region level, or a node level.
- 4 . The method of claim 1 , wherein the bitstream further comprises a second indication indicating whether a first attribute prediction scheme is available for nodes at the sequence level, and wherein in the first attribute prediction scheme, a prediction associated with attribute information of a node is determined based on a weighted average of a first prediction associated with the attribute information that is determined based on an inter prediction scheme and a second prediction associated with the attribute information that is determined based on an intra prediction scheme.
- 5 . The method of claim 1 , wherein the second set of attribute prediction schemes comprises at least one of the following: an inter prediction scheme associated with attribute information, an intra prediction scheme associated with attribute information, or a first attribute prediction scheme configured for determining a prediction associated with attribute information of a node based on a weighted average of a first prediction associated with the attribute information that is determined based on the inter prediction scheme and a second prediction associated with the attribute information that is determined based on the intra prediction scheme, wherein a target prediction associated with attribute information of a node is determined based on predictions that are determined based on the second set of attribute prediction schemes.
- 6 . The method of claim 5 , wherein if a first prediction associated with the attribute information that is determined based on the inter prediction scheme is not zero, the target prediction is equal to the first prediction, or if the first prediction is zero, the target prediction is equal to a second prediction associated with the attribute information that is determined based on the intra prediction scheme.
- 7 . The method of claim 5 , wherein if a first prediction associated with the attribute information is determined based on the inter prediction scheme, the target prediction is equal to the first prediction, or if the first prediction is not determined, the target prediction is equal to a second prediction associated with the attribute information that is determined based on the intra prediction scheme.
- 8 . The method of claim 5 , wherein if a first prediction associated with the attribute information that is determined based on the inter prediction scheme is zero, the target prediction is equal to a second prediction associated with the attribute information that is determined based on the intra prediction scheme, or if the first prediction is not zero, the target prediction is equal to a weighted average of the first prediction and the second prediction.
- 9 . The method of claim 5 , wherein if a first prediction associated with the attribute information that is determined based on the inter prediction scheme is not zero and a second prediction associated with the attribute information that is determined based on the intra prediction scheme is not zero, the target prediction is equal to a weighted average of the first prediction and the second prediction, or if the first prediction is zero and the second prediction is not zero, the target prediction is equal to the second prediction, or if the first prediction is not zero and the second prediction is zero, the target prediction is equal to the first prediction.
- 10 . The method of claim 1 , wherein the bitstream further comprises an indication indicting a set of attribute prediction schemes available for all of nodes in the point cloud sequence, and/or wherein the bitstream further comprises an indication indicting a set of attribute prediction schemes available for all of nodes in a region of a PC sample in the point cloud sequence, and/or wherein the bitstream further comprises an indication indicting a set of attribute prediction schemes available for all of nodes at a depth layer of a PC sample in the point cloud sequence.
- 11 . The method of claim 1 , wherein the indication is fixed at an encoder and a decoder, or wherein the indication is determined at an encoder and a decoder, or wherein the indication is signaled to a decoder, or wherein the indication is determined at an encoder, wherein the indication is determined based on motion information for a PC sample, and wherein if the motion information is smaller than a threshold, a first attribute prediction scheme is not used for coding the PC sample, and the first attribute prediction scheme is configured for determining a prediction associated with attribute information of a node in the PC sample based on a weighted average of a first prediction associated with the attribute information that is determined based on an inter prediction scheme and a second prediction associated with the attribute information that is determined based on an intra prediction scheme.
- 12 . The method of claim 1 , wherein information regarding whether to disable an intra prediction scheme for an AC coefficient and/or a DC coefficient associated with attribute information of a current node in the current PC sample is determined on-the fly, wherein if an inter prediction scheme is applied and a result of the intra prediction scheme is not used to determine a target prediction of the AC coefficient and/or DC coefficient, the intra prediction scheme is disabled, or wherein if a result of the intra prediction scheme is not used to determine a target prediction of the AC coefficient and/or DC coefficient, the intra prediction scheme is disabled, and/or wherein if the intra prediction scheme is disabled, the number of neighbor nodes of the current node is set to be a specific value.
- 13 . The method of claim 1 , wherein if a spatial location of a first node in a reference PC sample of the current PC sample is the same as a current node in the current PC sample and the first node is not empty, the first node is a reference node of the current node, and if at least one of the following condition is met, the first node is not a reference node of the current node: the spatial location of the first node is different from the current node, or the first node is empty, or wherein if a spatial location of a first node in a reference PC sample of the current PC sample is the same as a current node in the current PC sample, the first node is a reference node of the current node, and if the spatial location of the first node is different from the current node the first node is not a reference node of the current node, wherein a spatial location of a node is represented by a Morton code of the node or a shifted Morton code of the node.
- 14 . The method of claim 1 , wherein the second set of attribute prediction schemes comprises an attribute prediction scheme configured for determining a prediction associated with attribute information of a node based on a weighted sum or a non-linear function of a first prediction associated with the attribute information that is determined based on the inter prediction scheme and a second prediction associated with the attribute information that is determined based on the intra prediction scheme, wherein at least one weight for determining the weighted sum or at least one weight for determining the non-linear function is fixed at an encoder and a decoder, or wherein at least one weight for determining the weighted sum or at least one weight for determining the non-linear function is determined at an encoder and a decoder, or wherein at least one weight for determining the weighted sum or at least one weight for determining the non-linear function is indicated in the bitstream, and/or wherein at least one weight for determining the weighted sum or at least one weight for determining the non-linear function is different for different depth layers, or wherein at least one weight for determining the weighted sum or at least one weight for determining the non-linear function is different for different regions.
- 15 . The method of claim 1 , wherein a prediction of an AC coefficient for attribute information of a current node in the current PC sample is obtained by performing an RAHT transform on reference attribute information of a reference node of the current node, the reference attribute information of the reference node is determined based on reconstructed attribute information and reconstructed geometry information of a reference PC sample comprising the reference node, and the reference attribute information of the reference node is represented by one of the following: reference attribute information of each sub-node of the reference node, or an average of reference attribute information of sub-nodes of the reference node, or wherein a prediction of an DC coefficient for attribute information of a current node in the current PC sample is obtained by performing an RAHT transform on reference attribute information of a reference node of the current node, the reference attribute information of the reference node is determined based on reconstructed attribute information and reconstructed geometry information of a reference PC sample comprising the reference node, and the reference attribute information of the reference node is represented by one of the following: reference attribute information of each sub-node of the reference node, or an average of reference attribute information of sub-nodes of the reference node.
- 16 . The method of claim 1 , wherein a PC sample is one of the following: a frame, a slice, a tile, or a unit containing one or more nodes or points, and/or wherein a node in a PC sample is an element of a tree structure for spatial partition of the PC sample.
- 17 . The method of claim 1 , wherein the conversion includes encoding the current PC sample into the bitstream, or wherein the conversion includes decoding the current PC sample from the bitstream.
- 18 . An apparatus for video processing comprising a processor and a non-transitory memory with instructions thereon, wherein the instructions upon execution by the processor, cause the processor to: perform a conversion between a current point cloud (PC) sample of a point cloud sequence and a bitstream of the point cloud sequence, wherein the bitstream comprises a first indication indicating at least one of the following: a first set of attribute prediction schemes used for one or more nodes in the current PC sample that are at a first level lower than a sequence level, or a second set of attribute prediction schemes available for one or more nodes in the current PC sample that are at the first level.
- 19 . A non-transitory computer-readable storage medium storing instructions that cause a processor to perform operations comprising: performing a conversion between a current point cloud (PC) sample of a point cloud sequence and a bitstream of the point cloud sequence, wherein the bitstream comprises a first indication indicating at least one of the following: a first set of attribute prediction schemes used for one or more nodes in the current PC sample that are at a first level lower than a sequence level, or a second set of attribute prediction schemes available for one or more nodes in the current PC sample that are at the first level.
- 20 . A non-transitory computer-readable recording medium storing a bitstream of a point cloud sequence which is generated by a method performed by an apparatus for video processing, wherein the method comprises: generating the bitstream of the point cloud sequence from a current point cloud (PC) sample of the point cloud sequence, wherein the bitstream comprises a first indication indicating at least one of the following: a first set of attribute prediction schemes used for one or more nodes in the current PC sample that are at a first level lower than a sequence level, or a second set of attribute prediction schemes available for one or more nodes in the current PC sample that are at the first level.
Description
CROSS REFERENCE TO RELATED APPLICATIONS This application is a continuation of International Application No. PCT/CN2024/104075, filed on Jul. 5, 2024, which claims the benefit of International Application No. PCT/CN2023/106204 filed on Jul. 6, 2023. The entire contents of these applications are hereby incorporated by reference in their entireties. FIELDS Embodiments of the present disclosure relate generally to video processing techniques, and more particularly, to attribute prediction based on region-adaptive hierarchical transform (RAHT). BACKGROUND A point cloud is a collection of individual data points in a three-dimensional (3D) plane with each point having a set coordinate on the X, Y, and Z axes. Thus, a point cloud may be used to represent the physical content of the three-dimensional space. Point clouds have shown to be a promising way to represent 3D visual data for a wide range of immersive applications, from augmented reality to autonomous cars. Point cloud coding standards have evolved primarily through the development of the well-known MPEG organization. MPEG, short for Moving Picture Experts Group, is one of the main standardization groups dealing with multimedia. In 2017, the MPEG 3D Graphics Coding group (3DG) published a call for proposals (CFP) document to start to develop point cloud coding standard. The final standard will consist in two classes of solutions. Video-based Point Cloud Compression (V-PCC or VPCC) is appropriate for point sets with a relatively uniform distribution of points. Geometry-based Point Cloud Compression (G-PCC or GPCC) is appropriate for more sparse distributions. However, coding efficiency and coding quality of conventional point cloud coding techniques is generally expected to be further improved. SUMMARY Embodiments of the present disclosure provide a solution for video processing. In a first aspect, a method for video processing is proposed. The method comprises: performing a conversion between a current point cloud (PC) sample of a point cloud sequence and a bitstream of the point cloud sequence, wherein the bitstream comprises a first indication indicating at least one of the following: a first set of attribute prediction schemes used for one or more nodes in the current PC sample that are at a first level lower than a sequence level, or a second set of attribute prediction schemes available for one or more nodes in the current PC sample that are at the first level. Based on the method in accordance with the first aspect of the present disclosure, the bitstream comprises an indication indicating an attribute prediction scheme(s) that is used and/or available for one or more nodes in the current PC sample that are at a first level lower than a sequence level. Compared with the conventional solution where information regarding the usage of attribute prediction scheme(s) is signaled at sequence level, the proposed method can advantageously signal such information at a lower level, and enable a refined control of the usage of attribute prediction scheme(s). Thereby, the coding efficiency and coding quality can be improved. In a second aspect, an apparatus for video processing is proposed. The apparatus comprises a processor and a non-transitory memory with instructions thereon. The instructions upon execution by the processor, cause the processor to perform a method in accordance with the first aspect of the present disclosure. In a third aspect, a non-transitory computer-readable storage medium is proposed. The non-transitory computer-readable storage medium stores instructions that cause a processor to perform a method in accordance with the first aspect of the present disclosure. In a fourth aspect, another non-transitory computer-readable recording medium is proposed. The non-transitory computer-readable recording medium stores a bitstream of a point cloud sequence which is generated by a method performed by an apparatus for video processing. The method comprises: performing a conversion between a current point cloud (PC) sample of the point cloud sequence and the bitstream, wherein the bitstream comprises a first indication indicating at least one of the following: a first set of attribute prediction schemes used for one or more nodes in the current PC sample that are at a first level lower than a sequence level, or a second set of attribute prediction schemes available for one or more nodes in the current PC sample that are at the first level. In a fifth aspect, a method for storing a bitstream of a point cloud sequence is proposed. The method comprises: performing a conversion between a current point cloud (PC) sample of the point cloud sequence and the bitstream; and storing the bitstream in a non-transitory computer-readable recording medium, wherein the bitstream comprises a first indication indicating at least one of the following: a first set of attribute prediction schemes used for one or more nodes in the current PC sample that are at a first level