EP-4736447-A1 - CROSS COMPONENT PREDICTION

EP4736447A1EP 4736447 A1EP4736447 A1EP 4736447A1EP-4736447-A1

Abstract

Methods for encoding a video sequence into a bitstream and decoding a bitstream to output one or more pictures for a video stream. An exemplary method includes: receiving a video sequence; encoding one or more pictures of the video sequence; and generating a bitstream associated with the encoded pictures, wherein the encoding comprises: predicting chroma samples within a current block based on luma samples corresponding to the chroma samples by a plurality of cross-component residual models (CCRMs).

Inventors

LI, XINWEI
CHEN, JIE
YE, YAN
LIAO, RU-LING

Assignees

Alibaba (China) Co., Ltd.

Dates

Publication Date: 20260506
Application Date: 20240628

Claims (20)

A method for encoding a video sequence into a bitstream, comprising: receiving a video sequence; encoding one or more pictures of the video sequence; and generating a bitstream associated with the encoded pictures, wherein the encoding the one or more pictures of the video sequence comprises: predicting chroma samples within a current block based on luma samples corresponding to the chroma samples by a plurality of cross-component residual models (CCRMs) .
The method according to claim 1, wherein the chroma samples are predicted based on the luma samples corresponding to the chroma samples by the plurality of CCRMs in response to a determination that the chroma samples are to be predicted with more than one CCRM.
The method according to claim 1 or 2, wherein predicting the chroma samples based on the luma samples corresponding to the chroma samples comprises: classifying the chroma samples into a plurality of classes, wherein the plurality of CCRMs corresponding to the plurality of classes are trained based on the chroma samples and corresponding luma samples, respectively; and generating, by the plurality of CCRMs, a predicted chroma value of a target chroma sample of the chroma samples based on a luma sample corresponding to the target chroma sample.
The method according to claim 3, wherein classifying the chroma samples into the plurality of classes comprises: classifying an objective chroma sample into one of the plurality of classes based on a predicted luma value of a luma sample corresponding to the objective chroma sample.
The method according to claim 4, wherein the objective chroma sample is classified into one of the plurality of classes based on a comparison between the predicted luma value of the luma sample and a threshold, the threshold being associated with predicted luma values or reconstructed luma values of at least a part of the luma samples within the current block.
The method according to any one of claims 3 to 5, wherein the encoding the one or more pictures of the video sequence further comprises: fusing the predicted chroma value of the target chroma sample with an original predicted chroma value of the target chroma sample to obtain a finalized predicted chroma value, the original predicted chroma value being inter-predicted with respect to a reference picture or generated by intra block copy (IBC) .
The method according to claim 6, wherein the encoding the one or more pictures of the video sequence further comprises: generating a residual chroma value of the target chroma sample based on the finalized predicted chroma value.
The method according to claim 6, wherein the encoding the one or more pictures of the video sequence further comprises: filtering the finalized predicted chroma value by a low pass filter to obtain a filtered predicted chroma value of the target chroma sample.
The method according to any one of claims 3 to 5, wherein the encoding the one or more pictures of the video sequence further comprises: filtering the predicted chroma value by a low pass filter to obtain a filtered predicted chroma value of the target chroma sample; and fusing the filtered predicted chroma value of the target chroma sample with an original predicted chroma value of the target chroma sample to obtain a finalized predicted chroma value, the original predicted chroma value being inter-predicted with respect to a reference picture or generated by intra block copy (IBC) .
A method for decoding a bitstream to output one or more pictures for a video stream, comprising: receiving a bitstream; and decoding, using coded information of the bitstream, one or more pictures, wherein the decoding, using the coded information of the bitstream, the one or more pictures comprises: predicting chroma samples within a current block based on luma samples corresponding to the chroma samples by a plurality of cross-component residual models (CCRMs) .
The method according to claim 10, wherein the chroma samples are predicted based on the luma samples corresponding to the chroma samples by the plurality of CCRMs in response to a determination that the chroma samples are to be predicted with more than one CCRM.
The method according to claim 10 or 11, wherein predicting the chroma samples based on the luma samples corresponding to the chroma samples comprises: classifying the chroma samples into a plurality of classes, wherein the plurality of CCRMs corresponding to the plurality of classes are trained based on the chroma samples and corresponding luma samples, respectively; and generating, by the plurality of CCRMs, a predicted chroma value of a target chroma sample of the chroma samples based on a luma sample corresponding to the target chroma sample.
The method according to claim 12, wherein classifying the chroma samples into the plurality of classes comprises: classifying an objective chroma sample into one of the plurality of classes based on a predicted luma value of a luma sample corresponding to the objective chroma sample.
The method according to claim 13, wherein the objective chroma sample is classified into one of the plurality of classes based on a comparison between the predicted luma value of the luma sample and a threshold, the threshold being associated with predicted luma values or reconstructed luma values of at least a part of the luma samples within the current block.
The method according to any one of claims 12 to 14, wherein the decoding, using the coded information of the bitstream, the one or more pictures further comprises: fusing the predicted chroma value of the target chroma sample with an original predicted chroma value of the target chroma sample to obtain a finalized predicted chroma value, the original predicted chroma value being inter-predicted with respect to a reference picture or generated by intra block copy (IBC) .
The method according to claim 15, wherein the decoding, using the coded information of the bitstream, the one or more pictures further comprises: receiving a residual chroma value of the target chroma sample; and generating a chroma value of the target chroma sample based on the residual chroma value and the finalized predicted chroma value.
The method according to claim 15, wherein the decoding, using the coded information of the bitstream, the one or more pictures further comprises: filtering the finalized predicted chroma value by a low pass filter to obtain a filtered predicted chroma value of the target chroma sample.
The method according to any one of claims 12 to 14, wherein the decoding, using the coded information of the bitstream, the one or more pictures further comprises: filtering the predicted chroma value by a low pass filter to obtain a filtered predicted chroma value of the target chroma sample; and fusing the filtered predicted chroma value of the target chroma sample with an original predicted chroma value of the target chroma sample to obtain a finalized predicted chroma value, the original predicted chroma value being inter-predicted with respect to a reference picture or generated by intra block copy (IBC) .
An apparatus for encoding a video sequence into a bitstream, comprising: a receiving module, configured to receive a video sequence; an encoding module, configured to encode one or more pictures of the video sequence; and a generating module, configured to generate a bitstream associated with the encoded pictures, wherein the encoding module is configured to: predict chroma samples within a current block based on luma samples corresponding to the chroma samples by a plurality of cross-component residual models (CCRMs) .
The apparatus according to claim 19, wherein the chroma samples are predicted based on the luma samples corresponding to the chroma samples by the plurality of CCRMs in response to a determination that the chroma samples are to be predicted with more than one CCRM.

Description

CROSS COMPONENT PREDICTION CROSS-REFERENCE TO RELATED APPLICATIONS The disclosure claims the benefits of priority to U.S. Provisional Application No. 63/511,659, filed on July 2, 2023, and claims the benefit of U.S. Patent Application No. 18/750,267, entitled “CROSS COMPONENT PREDICTION” and filed on June 21, 2024. Both of the two applications are incorporated herein by reference in their entireties. TECHNICAL FIELD The present disclosure generally relates to video processing, and more particularly, to cross component prediction techniques used for predicting chroma samples based on collocated luma samples. BACKGROUND A video is a set of static pictures (or “frames” ) capturing the visual information. To reduce the storage memory and the transmission bandwidth, a video can be compressed before storage or transmission and decompressed before display. The compression process is usually referred to as encoding and the decompression process is usually referred to as decoding. There are various video coding formats which use standardized video coding technologies, most commonly based on prediction, transform, quantization, entropy coding and in-loop filtering. The video coding standards, such as the High Efficiency Video Coding (HEVC/H. 265) standard, the Versatile Video Coding (VVC/H. 266) standard, AVS standards, specifying the specific video coding formats, are developed by standardization organizations. With more and more advanced video coding technologies being adopted in the video standards, the coding efficiency of the new video coding standards get higher and higher. SUMMARY OF THE DISCLOSURE Embodiments of the present disclosure provide methods and apparatuses for predicting chroma samples based on collocated luma samples. In a first aspect of the present disclosure, there is provided a method for encoding a video sequence into a bitstream, including: receiving a video sequence; encoding one or more pictures of the video sequence; and generating a bitstream associated with the encoded pictures, wherein the encoding includes: predicting chroma samples within a current block based on luma samples corresponding to the chroma samples by a plurality of cross-component residual models (CCRMs) . In a second aspect of the present disclosure, there is provided a method for decoding a bitstream to output one or more pictures for a video stream, including: receiving a bitstream; and decoding, using coded information of the bitstream, one or more pictures, wherein the decoding includes: predicting chroma samples within a current block based on luma samples corresponding to the chroma samples by a plurality of cross-component residual models (CCRMs) . In a third aspect of the present disclosure, there is provided an apparatus for encoding a video sequence into a bitstream, including: a receiving module, configured to receive a video sequence; an encoding module, configured to encode one or more pictures of the video sequence; and a generating module, configured to generate a bitstream associated with the encoded pictures, wherein the encoding module is configured to: predict chroma samples within a current block based on luma samples corresponding to the chroma samples by a plurality of cross-component residual models (CCRMs) . In a fourth aspect of the present disclosure, there is provided an apparatus for decoding a bitstream to output one or more pictures for a video stream, including: a receiving module, configured to receive a bitstream; and a decoding module, configured to decode , using coded information of the bitstream, one or more pictures, wherein the decoding module is configured to: predict chroma samples within a current block based on luma samples corresponding to the chroma samples by a plurality of cross-component residual models (CCRMs) . In a fifth aspect of the present disclosure, there is provided an electronic device, including: one or more processors, and a computer-readable storage medium communicatively coupled to the one or more processors, where the computer-readable storage medium storing computer-readable instructions executable by the one or more processors that, when executed by the one or more processors, execute the method according to the first aspect or the second aspect. In a sixth aspect of the present disclosure, there is provided a non-transitory computer readable storage medium storing a bitstream of a video. The bitstream, when encoded by an encoder, causes the encoder to perform the method according to the first aspect. In a seventh aspect of the present disclosure, there is provided a non-transitory computer readable storage medium that stores a bitstream of a video. The bitstream, when decoded by a decoder, causes the decoder to perform the method according to the second aspect. In an eighth aspect of the present disclosure, there is provided a computer program product, including: computer program instructions, and the computer program instructions enab