US-20260129228-A1 - METHOD, APPARATUS, AND MEDIUM FOR VIDEO PROCESSING

US20260129228A1US 20260129228 A1US20260129228 A1US 20260129228A1US-20260129228-A1

Abstract

Embodiments of the disclosure provide a solution for video processing. A method for video processing is proposed. The method includes: applying, for a conversion between a video unit of a video and a bitstream of the video unit, at least one of the followings in a cross-component prediction (CCP) model: a set of luma samples, a set of additional luma samples, a set of reconstructed chroma samples, or a set of predicted chroma samples; determining a prediction or reconstruction of the video unit by applying the CCP model to the video unit; and performing the conversion based on the prediction or reconstruction.

Inventors

Kai Zhang
Li Zhang

Assignees

BYTEDANCE INC.

Dates

Publication Date: 20260507
Application Date: 20251231

Claims (20)

1 . A method of video processing, comprising: applying, for a conversion between a video unit of a video and a bitstream of the video unit, at least one of the following in a cross-component prediction (CCP) model: a set of luma samples or a set of additional luma samples; determining a prediction or reconstruction of the video unit by applying the CCP model to the video unit; and performing the conversion based on the prediction or reconstruction.
2 . The method of claim 1 , wherein luma positions used in the CCP model are after down-sampling for a colour format.
3 . The method of claim 1 , wherein the set of additional luma samples are at positions which are beyond a center position corresponding to a chroma sample to be predicted, a north position relative to the center position, a west position relative to the center position, an east position relative to the center position and a south position relative to the center position.
4 . The method of claim 3 , wherein the set of additional luma samples are at least one of the following positions relative to the center position: a north west position, a north east position, a south west position, or a south east position, or wherein the set of additional luma samples are at a position non-adjacent to the center position.
5 . The method of claim 4 , wherein predChromaVal=c 0 C+c 1 N+c 2 S+c 3 E+c 4 W+C 5 P+c 6 NW+c 7 NE+c 8 SW+c 9 SE+c 10 B, wherein predChromaVal represents a chroma sample to be predicted, C represents a luma sample at the center position, N represents a luma sample at the north position, S represents a luma sample at the south position, E represents a luma sample at the east position, W represents a luma sample at the west position, NW represents a luma sample at the north west position, NE represents a luma sample at the north east position, SW represents a luma sample at the south west position, SE represents a luma sample at the south east position, P and B represents nonlinear term and bias term, respectively, c 0 , c 1 , c 2 , c 3 , c 4 , C 5 , c 6 , c 7 , C 8 , C 9 and c 10 represent parameters, or wherein predChromaVal=c 0 C+c 1 N+c 2 S+c 3 E+c 4 W+C 5 P+c 6 NW+c 7 NE+c 8 SW+c 9 SE+c 10 N2+C 11 S2+c 12 E2+c 13 W2+c 14 B, wherein predChromaVal represents a chroma sample to be predicted, C represents a luma sample at the center position, N represents a luma sample at the north position, S represents a luma sample at the south position, E represents a luma sample at the east position, W represents a luma sample at the west position, NW represents a luma sample at the north west position, NE represents a luma sample at the north east position, SW represents a luma sample at the south west position, SE represents a luma sample at the south east position, N2, S2, E2 and W2 represent luma samples at non-adjacent positions to the center position, respectively, P and B represents nonlinear term and bias term, respectively, c 0 , c 1 , c 2 , c 3 , c 4 , c 5 , c 6 , c 7 , c 8 , c 9 , c 10 , c 11 , c 12 , c 13 and c 14 represent parameters.
6 . The method of claim 5 , wherein P=C 2 and/or B=1<<(bitdepth−1).
7 . The method of claim 1 , wherein if one or more luma samples are unavailable, padding is used to obtain the set of additional luma samples.
8 . The method of claim 1 , wherein a function with at least one input as an additional luma sample is involved in the CCP model.
9 . The method of claim 8 , wherein the function is a derivation of a gradient.
10 . The method of claim 1 , wherein the set of luma samples and the set of additional luma samples are obtained with down-sampling approaches same as those used in one of: cross-component linear model (CCLM), convolutional cross-component model (CCCM), gradient linear model (GLM), or CCCM using multiple downsampling filters (MF-CCCM).
11 . The method of claim 1 , wherein the set of luma samples and the set of additional luma samples are obtained with down-sampling approaches different from those used in one of: CCLM, CCCM, GLM, or MF-CCCM.
12 . The method of claim 1 , wherein in response to the video unit is coded with a first target coding mode, the set of luma samples and the set of additional luma samples are applied in the CCP model.
13 . The method of claim 12 , wherein the first target coding mode is an extended CCCM (extCCCM) mode, and/or wherein the first target coding mode has variants is a same way of CCCM, and/or wherein the first target coding mode and variants of the first target coding mode replace an exiting mode, and/or wherein the first target coding mode is used as an additional mode, and/or wherein whether to and/or how to use the first target coding mode depend on coding information of a current block, and/or wherein a flag is signaled to indicate whether the first target coding mode and/or a variant of the first target coding mode is applied, and/or wherein the first target coding mode is used with merge CCP mode, and/or wherein a training process which is same as CCCM is applied to derive parameters of the first target coding mode, but the training process is with different number of parameters.
14 . The method of claim 13 , wherein the variants of the first target coding mode comprise at least one of: multiple model-extCCCM (MM-extCCCM), extCCCM-left (extCCCM-L), extCCCM-top (extCCCM-T), MM-extCCCM-L, or MM-extCCCM-T, and/or wherein the existing mode comprises CCCM and/or variants of the CCCM, and/or wherein the coding information comprises at least one of: a mode of the current block, a mode of a neighbouring block, a mode of a luma block in a collocated region of the current block, a mode of a luma block in a collocated region of a neighbouring block, quantization parameter (QP), slice type, picture type, block width, block height, position of the current block, or reconstructed samples, and/or wherein the flag is signalled based on a condition, and/or wherein the flag is signalled, if another flag indicating CCCM mode is true, and/or wherein the flag is not signalled, if CCP merge mode is applied, and/or wherein the flag is signaled, if the first target coding mode is applicable, and/or wherein whether the first target coding mode or the variant of the first target coding mode is applied depends on signaled mode, and/or wherein the first target coding mode is regarded as a new type in merge CCP mode, and/or wherein information and/or parameters of the first target coding mode are stored in blocks, and/or wherein information and/or parameters of the first target coding mode are stored in a history table, and/or wherein stored information and/or parameters of the first target coding mode are used to generate a candidate in a merge CCP candidate list, and/or wherein a candidate with a type of the first target coding mode is involved in a candidate list reordering process, and/or wherein a candidate with a type of the first target coding mode is involved in a candidate list pruning process, and/or wherein a candidate with a type of the first target coding mode is involved in a candidate list offset updating process, and/or wherein if a candidate with a type of the first target coding mode is selected, the first target coding mode is used for the current block which is instructed by parameters of the candidate, and/or wherein a region of chroma reconstructed samples and corresponding luma reconstructed samples of the chroma reconstructed samples used by the first target coding mode or a variant of the first target coding mode is the same to that of CCCM or a variant of CCCM, and/or wherein a region of chroma reconstructed samples and corresponding luma reconstructed samples of the chroma reconstructed samples used by the first target coding mode or a variant of the first target coding mode is different from that of CCCM or a variant of CCCM.
15 . The method of claim 14 , wherein the first target coding mode is not applicable, if a gradient and location based convolutional cross-component model (GL-CCCM) is applied, and/or wherein the first target coding mode is not applicable, if unsampling CCCM is applied, and/or wherein the first target coding mode is not applicable, if MF-CCCM is applied, and/or wherein the first target coding mode is not applicable, if inside filtering is applied, and/or wherein the first target coding mode is applicable only for the first target coding mode not variants of the first target coding mode, and/or wherein the first target coding mode is applicable only for the first target coding mode and MM-extCCCM but not other variants of the first target coding mode, and/or wherein the first target coding mode is not applicable if W<T1 and/or H<T2, or wherein the first target coding mode is not applicable, if W+H<T, or wherein the first target coding mode is not applicable if W*H<T, or wherein the first target coding mode is not applicable if W>T1 and/or H>T2, or wherein the first target coding mode is not applicable if W+H>T, or wherein the first target coding mode is not applicable if W*H>T, and wherein W represents the block width, H represents the block height, T1, T2, and T represent thresholds, respectively, and/or wherein if the signaled mode is CCLM, a flag indicating CCCM mode and the flag are both true, a current block uses the first target coding mode, and/or wherein if the signaled mode is MM-CCLM, a flag indicating CCCM mode and the flag are both true are both true, a current block uses MM-extCCCM mode, and/or wherein if the signaled mode is CCLM-T, a flag indicating CCCM mode and the flag are both true, a current block uses extCCCM-T mode, and/or wherein if the signaled mode is MM-CCLM-T, a flag indicating CCCM mode and the flag are both true, a current block uses MM-extCCCM-T mode, and/or wherein if the signaled mode is CCLM-L, a flag indicating CCCM mode and the flag are both true, a current block uses extCCCM-T mode, and/or wherein if the signaled mode is MM-CCLM-L, a flag indicating CCCM mode and the flag are both true, a current block uses MM-extCCCM-T mode.
16 . The method of claim 1 , wherein the conversion includes encoding the video unit into the bitstream.
17 . The method of claim 1 , wherein the conversion includes decoding the video unit from the bitstream.
18 . An apparatus for video processing comprising a processor and a non-transitory memory with instructions thereon, wherein the instructions upon execution by the processor, cause the processor to perform acts comprising: applying, for a conversion between a video unit of a video and a bitstream of the video unit, at least one of the following in a cross-component prediction (CCP) model: a set of luma samples or a set of additional luma samples; determining a prediction or reconstruction of the video unit by applying the CCP model to the video unit; and performing the conversion based on the prediction or reconstruction.
19 . A non-transitory computer-readable storage medium storing instructions that cause a processor to perform acts comprising: applying, for a conversion between a video unit of a video and a bitstream of the video unit, at least one of the following in a cross-component prediction (CCP) model: a set of luma samples or a set of additional luma samples; determining a prediction or reconstruction of the video unit by applying the CCP model to the video unit; and performing the conversion based on the prediction or reconstruction.
20 . A non-transitory computer-readable recording medium storing a bitstream of a video which is generated by a method performed by an apparatus for video processing, wherein the method comprises: applying at least one of the following in a cross-component prediction (CCP) model: a set of luma samples or a set of additional luma samples; determining a prediction or reconstruction of a video unit of the video by applying the CCP model to the video unit; and generating the bitstream based on the prediction or reconstruction.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS This application is a continuation of International Application No. PCT/US2024/036573, filed on Jul. 2, 2024, which claims the benefit of U.S. provisional application No. 63/524,686, filed on Jul. 2, 2023. The entire contents of these applications are hereby incorporated by reference in their entireties. FIELDS Embodiments of the present disclosure relates generally to video processing techniques, and more particularly, to extended cross-component prediction. BACKGROUND In nowadays, digital video capabilities are being applied in various aspects of peoples' lives. Multiple types of video compression technologies, such as MPEG-2, MPEG-4, ITU-TH.263, ITU-TH.264/MPEG-4 Part 10 Advanced Video Coding (AVC), ITU-TH.265 high efficiency video coding (HEVC) standard, versatile video coding (VVC) standard, have been proposed for video encoding/decoding. However, coding efficiency of video coding techniques is generally expected to be further improved. SUMMARY Embodiments of the present disclosure provide a solution for video processing. In a first aspect, a method for video processing is proposed. The method comprises: applying, for a conversion between a video unit of a video and a bitstream of the video unit, at least one of the followings in a cross-component prediction (CCP) model: a set of luma samples, a set of additional luma samples, a set of reconstructed chroma samples, or a set of predicted chroma samples; determining a prediction or reconstruction of the video unit by applying the CCP model to the video unit; and performing the conversion based on the prediction or reconstruction. In this way, it can improve coding efficiency and coding performance. In a second aspect, an apparatus for video processing is proposed. The apparatus comprises a processor and a non-transitory memory with instructions thereon. The instructions upon execution by the processor, cause the processor to perform a method in accordance with the first aspect of the present disclosure. In a third aspect, a non-transitory computer-readable storage medium is proposed. The non-transitory computer-readable storage medium stores instructions that cause a processor to perform a method in accordance with the first aspect of the present disclosure. In a fourth aspect, another non-transitory computer-readable recording medium is proposed. The non-transitory computer-readable recording medium stores a bitstream of a video which is generated by a method performed by an apparatus for video processing. The method comprises: applying at least one of the followings in a cross-component prediction (CCP) model: a set of luma samples, a set of additional luma samples, a set of reconstructed chroma samples, or a set of predicted chroma samples; determining a prediction or reconstruction of a video unit of the video by applying the CCP model to the video unit; and generating the bitstream based on the prediction or reconstruction. In a fifth aspect, a method for storing a bitstream of a video is proposed. The method comprises: applying at least one of the followings in a cross-component prediction (CCP) model: a set of luma samples, a set of additional luma samples, a set of reconstructed chroma samples, or a set of predicted chroma samples; determining a prediction or reconstruction of a video unit of the video by applying the CCP model to the video unit; generating the bitstream based on the prediction or reconstruction; and storing the bitstream in a non-transitory computer-readable recording medium. This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter. BRIEF DESCRIPTION OF THE DRAWINGS Through the following detailed description with reference to the accompanying drawings, the above and other objectives, features, and advantages of example embodiments of the present disclosure will become more apparent. In the example embodiments of the present disclosure, the same reference numerals usually refer to the same components. FIG. 1 illustrates a block diagram that illustrates an example video coding system, in accordance with some embodiments of the present disclosure; FIG. 2 illustrates a block diagram that illustrates a first example video encoder, in accordance with some embodiments of the present disclosure; FIG. 3 illustrates a block diagram that illustrates an example video decoder, in accordance with some embodiments of the present disclosure; FIG. 4 illustrates nominal vertical and horizontal locations of 4:2:2 luma and chroma samples in a picture; FIG. 5 illustrates an example of encoder block diagram; FIG. 6 illustrates 67 intra prediction modes; FIG. 7 illustrates reference samples for wide-angular intra prediction; FIG. 8 illustrates