EP-4736423-A1 - PREDICTING CHROMA DATA USING CROSS-COMPONENT PREDICTION FOR VIDEO CODING

EP4736423A1EP 4736423 A1EP4736423 A1EP 4736423A1EP-4736423-A1

Abstract

An example device includes a memory configured to store video data; and a processing system comprising one or more processors implemented in circuitry and configured to: construct a merge candidate list for a current block of video data, wherein the processing system is configured to add a first merge candidate that was predicted using a first convolutional cross component model (CCCM) to the merge candidate list and add a second merge candidate that was predicted using a second CCCM to the merge candidate list, the first CCCM being different than the second CCCM; decode a merge index value for the current block of video data, the merge index value indicating the first merge candidate; in response to the merge index value indicating the first merge candidate, form a prediction block for the current block using the first CCCM; and decode the current block using the prediction block.

Inventors

HUANG, Han
SEREGIN, VADIM
CHEN, CHUN-CHI
KARCZEWICZ, MARTA

Assignees

QUALCOMM Incorporated

Dates

Publication Date: 20260506
Application Date: 20240628

Claims (1)

Qualcomm Ref. No.2307046WO 51 WHAT IS CLAIMED IS: 1. A method of decoding video data, the method comprising: constructing a merge candidate list for a current block of video data, including adding a first merge candidate that was predicted using a first convolutional cross component model (CCCM) to the merge candidate list and adding a second merge candidate that was predicted using a second CCCM to the merge candidate list, the first CCCM being different than the second CCCM; decoding a merge index value for the current block of video data, the merge index value indicating the first merge candidate; in response to the merge index value indicating the first merge candidate, forming a prediction block for the current block using the first CCCM; and decoding the current block using the prediction block. 2. The method of claim 1, further comprising storing parameters for the first CCCM associated with the first merge candidate. 3. The method of claim 2, wherein forming the prediction block using the first CCCM comprises forming the prediction block using the parameters for the first CCCM. 4. The method of claim 1, wherein the first CCCM comprises one of cross- component linear model (CCLM), single model CCCM, multi-model CCCM, gradient and location based CCCM (GL-CCCM), block vector guided CCCM (BVG-CCCM), or enhanced BVG-CCCM (EBVG-CCCM). 5. The method of claim 4, wherein the current block comprises a current block of chrominance (chroma) data, wherein the first CCCM comprises EBVG-CCCM, and wherein forming the prediction block comprises: determining a block vector referring to a reference block; determining neighboring reference chrominance (chroma) data to the current block and neighboring reference luminance (luma) data to a collocated luma block; calculating filter coefficients to be used to filter the current block from the reference block, the neighboring reference chroma data, and the neighboring reference luma data; and filtering one or more samples of the prediction block using the filter coefficients. 1616-357WO01 Qualcomm Ref. No.2307046WO 52 6. The method of claim 5, wherein the block vector comprises a chroma block vector referring to a reference chroma block, the method further comprising determining a luma block vector referring to a reference luma block, wherein calculating the filter coefficients comprises calculating the filter coefficients using the reference luma block. 7. The method of claim 5, wherein determining the block vector comprises determining the block vector according to direct block vector (DVB) mode. 8. The method of claim 1, further comprising encoding the current block prior to decoding the current block. 9. A device for decoding video data, the device comprising: a memory configured to store video data; and a processing system comprising one or more processors implemented in circuitry, the processing system being configured to: construct a merge candidate list for a current block of video data, wherein the processing system is configured to add a first merge candidate that was predicted using a first convolutional cross component model (CCCM) to the merge candidate list and add a second merge candidate that was predicted using a second CCCM to the merge candidate list, the first CCCM being different than the second CCCM; decode a merge index value for the current block of video data, the merge index value indicating the first merge candidate; in response to the merge index value indicating the first merge candidate, form a prediction block for the current block using the first CCCM; and decode the current block using the prediction block. 10. The device of claim 9, wherein the processing system is further configured to store parameters for the first CCCM associated with the first merge candidate. 11. The device of claim 10, wherein to form the prediction block using the first CCCM, the processing system is configured to form the prediction block using the parameters for the first CCCM. 1616-357WO01 Qualcomm Ref. No.2307046WO 53 12. The device of claim 9, wherein the first CCCM comprises one of cross- component linear model (CCLM), single model CCCM, multi-model CCCM, gradient and location based CCCM (GL-CCCM), block vector guided CCCM (BVG-CCCM), or enhanced BVG-CCCM (EBVG-CCCM). 13. The device of claim 12, wherein the current block comprises a current block of chrominance (chroma) data, and wherein when the first CCCM comprises EBVG- CCCM, to form the prediction block, the processing system is configured to: determine a block vector referring to a reference block, neighboring reference chroma data to the current block, and neighboring reference luminance (luma) data to a collocated luma block; calculate filter coefficients to be used to filter the current block from the reference block, the neighboring reference chroma data, and the neighboring reference luma data; and filter one or more samples of the prediction block using the filter coefficients. 14. The device of claim 13, wherein the block vector comprises a chroma block vector referring to a reference chroma block, and wherein the processing system is further configured to determine a luma block vector referring to a reference luma block, wherein to calculate the filter coefficients, the processing system is configured to calculate the filter coefficients using the reference luma block. 15. The device of claim 13, wherein to determine the block vector, the processing system is configured to determine the block vector according to direct block vector (DVB) mode. 16. The device of claim 9, wherein the processing system is further configured to encode the current block prior to decoding the current block. 1616-357WO01 Qualcomm Ref. No.2307046WO 54 17. A device for decoding video data, the device comprising: means for constructing a merge candidate list for a current block of video data, including means for adding a first merge candidate that was predicted using a first convolutional cross component model (CCCM) to the merge candidate list and means for adding a second merge candidate that was predicted using a second CCCM to the merge candidate list, the first CCCM being different than the second CCCM; means for decoding a merge index value for the current block of video data, the merge index value indicating the first merge candidate; means for forming a prediction block for the current block using the first CCCM in response to the merge index value indicating the first merge candidate; and means for decoding the current block using the prediction block. 18. The device of claim 17, further comprising means for storing parameters for the first CCCM associated with the first merge candidate. 19. The device of claim 18, wherein the means for forming the prediction block using the first CCCM comprises means for forming the prediction block using the parameters for the first CCCM. 20. The device of claim 17, wherein the first CCCM comprises one of cross- component linear model (CCLM), single model CCCM, multi-model CCCM, gradient and location based CCCM (GL-CCCM), block vector guided CCCM (BVG-CCCM), or enhanced BVG-CCCM (EBVG-CCCM). 1616-357WO01

Description

Qualcomm Ref. No.2307046WO 1 PREDICTING CHROMA DATA USING CROSS-COMPONENT PREDICTION FOR VIDEO CODING [0001] This application claims priority to U. S. Patent Application No.18/757,018, filed June 27, 2024 and U.S. Provisional Application No.63/511,386, filed June 30, 2023, the entire contents of each of which are hereby incorporated by reference herein. U.S. Patent Application No.18/757,018, filed June 27, 2024 claims the benefit of U.S. Provisional Application No.63/511,386, filed June 30, 2023. TECHNICAL FIELD [0002] This disclosure relates to video coding, including video encoding and video decoding. BACKGROUND [0003] Digital video capabilities can be incorporated into a wide range of devices, including digital televisions, digital direct broadcast systems, wireless broadcast systems, personal digital assistants (PDAs), laptop or desktop computers, tablet computers, e-book readers, digital cameras, digital recording devices, digital media players, video gaming devices, video game consoles, cellular or satellite radio telephones, so-called “smart phones,” video teleconferencing devices, video streaming devices, and the like. Digital video devices implement video coding techniques, such as those described in the standards defined by MPEG-2, MPEG-4, ITU-T H.263, ITU-T H.264/MPEG-4, Part 10, Advanced Video Coding (AVC), ITU-T H.265/High Efficiency Video Coding (HEVC), ITU-T H.266/Versatile Video Coding (VVC), and extensions of such standards, as well as proprietary video codecs/formats such as AOMedia Video 1 (AV1) developed by the Alliance for Open Media. The video devices may transmit, receive, encode, decode, and/or store digital video information more efficiently by implementing such video coding techniques. [0004] Video coding techniques include spatial (intra-picture) prediction and/or temporal (inter-picture) prediction to reduce or remove redundancy inherent in video sequences. For block-based video coding, a video slice (e.g., a video picture or a portion of a video picture) may be partitioned into video blocks, which may also be referred to as coding tree units (CTUs), coding units (CUs) and/or coding nodes. Video 1616-357WO01 Qualcomm Ref. No.2307046WO 2 blocks in an intra-coded (I) slice of a picture are encoded using spatial prediction with respect to reference samples in neighboring blocks in the same picture. Video blocks in an inter-coded (P or B) slice of a picture may use spatial prediction with respect to reference samples in neighboring blocks in the same picture or temporal prediction with respect to reference samples in other reference pictures. Pictures may be referred to as frames, and reference pictures may be referred to as reference frames. SUMMARY [0005] In general, this disclosure describes techniques related to prediction in video coding. In particular, video data may include luminance (luma) data and chrominance (chroma) data. Luma data generally represents brightness values for a block or other region of a picture of video data. Chroma data generally represents color values for a corresponding luma block. In some cases, chroma data may be coded using corresponding luma data. Per the techniques of this disclosure, a chrominance block may be predicted using a convolutional cross-component model (CCCM). Prediction information for the chrominance block may be coded using a merge candidate index. That is, a merge candidate list may be constructed, including merge candidates, e.g., spatial neighboring blocks to the chrominance block, which may be immediately adjacent to the chrominance block or separated by one or more other blocks. Per these techniques, the neighboring blocks may be predicted according to different models for CCCM, e.g., having different model parameters. Rather than explicitly coding the parameters in the bitstream, the model or model parameters may be inherited from the merge candidate for the current chrominance block. In this manner, the bitrate for the bitstream may be reduced, because the parameters for the model do not need to be explicitly signaled, but instead, may be inherited. Additionally or alternatively, the video coder may avoid the necessity of recalculating the parameters, which may improve processing efficiency and reduce processing cycles. [0006] In one example, a method of decoding video data includes: constructing a merge candidate list for a current block of video data, including adding a first merge candidate that was predicted using a first convolutional cross component model (CCCM) to the merge candidate list and adding a second merge candidate that was predicted using a second CCCM to the merge candidate list, the first CCCM being different than the second CCCM; decoding a merge index value for the current block of video data, the 1616-357WO01 Qualcomm Ref. No.2307046WO 3 merge index value indicating the first merge candidate; in response to the merge index value indicating the first merge candidate, forming a prediction block for the