US-12621490-B2 - Method, device, and medium for video processing

US12621490B2US 12621490 B2US12621490 B2US 12621490B2US-12621490-B2

Abstract

Embodiments of the present disclosure provide a solution for video processing. A method for video processing is proposed. The method comprises: determining, during a conversion between a video unit of a video and a bitstream of the target block, information related to a combined inter-intra prediction (CIIP) enhancement mode, the video unit being applied with the CIIP enhancement mode; and performing the conversion based on the information related to the CIIP enhancement mode.

Inventors

Zhipin Deng
Kai Zhang
Li Zhang

Assignees

BEIJING BYTEDANCE NETWORK TECHNOLOGY CO., LTD.
BYTEDANCE INC.

Dates

Publication Date: 20260505
Application Date: 20240329
Priority Date: 20210929

Claims (16)

1 . A method of video processing, comprising: determining, during a conversion between a video unit of a video and a bitstream of a target block, whether to apply a transform mode to the video unit, the video unit being applied with an inter coding mode or an intra coding mode; and performing the conversion based on the determining, and wherein whether to apply the transform mode to the video unit is dependent a prediction method applied to the video unit, and wherein if the video unit is a combined inter and intra prediction (CIIP) coded block, the transform mode is applied to the video unit, or if the video unit is an advanced motion vector prediction (AMVP) coded block, the transform mode is not applied to the video unit, or if the video unit is a merge coded block, the transform mode is not applied to the video unit, or if the video unit is coded with a predetermined merge mode, the transform mode is not applied to the video unit, or if the video unit is a true-bi-prediction coded block, the transform mode is not applied to the video unit, wherein the true-bi-prediction coded block means a block coded with a future/succeeding reference picture and a previous/preceding reference picture in display order, or if the video unit is a uni-directional-prediction coded block, the transform mode is applied to the video unit, or if the video unit is a GEO coded block, the transform mode is not applied to the video unit, or if the video unit is an inter coded block or an intra coded block, the transform mode is applied to the video unit, or if the video unit is a block coded with a specific coding tool, the transform mode is not applied to the video unit, or if the video unit is a block coded with a specific coding tool, the transform mode is applied to the video unit, and wherein if the video unit is an inter coded block or an intra coded block, whether to apply the transform mode to the video unit is dependent on a block size of the video unit, wherein the transform mode is applied for the inter coded block or the intra coded block if one of the followings is satisfied: a block width of the video unit less than a first value, a block height of the video unit less than a second value, the block width less than the first value and the block height less than the second value, the block width not larger than the first value, the block height not larger than the second value, or the block width not larger than the first value and the block height not larger than the second value, or if the transform mode is applied for the inter coded block or the intra coded block if one of the followings is satisfied: a block width of the video unit multiplying a block height of the video unit less than a first value multiplying a second value, or the block width multiplying the block height not larger than a first value multiplying a second value, or wherein a restriction of the block size is applied to all blocks, or wherein a restriction of the block size is applied to a certain type of blocks.
2 . The method of claim 1 , wherein the transform mode represents at least one of: a transform kernel or core, a variance of the transform kernel or core, multiple transform kernel set, a variance of the multiple transform kernel set, a subblock based transform, a non-separable transform, a variance of the non-separable transform, a separable transform, a variance of the separable transform, a secondary transform, or a variance of the secondary transform.
3 . The method of claim 1 , wherein if the video unit is an inter coded block or an intra coded block, whether to apply the transform mode to the video unit is dependent on residual information, or wherein if the video unit is an inter coded block or an intra coded block, whether to apply the transform mode to the video unit is dependent on a temporal layer where the video unit is.
4 . The method of claim 1 , wherein if the video unit is an inter coded block or an intra coded block, whether to apply the transform mode to the video unit is dependent on a quantization parameter of the video unit.
5 . The method of claim 4 , wherein the transform mode is applied to the inter coded block with the quantization parameter less than a threshold, or wherein the transform mode is applied to the inter coded block with the quantization parameter greater than the threshold, or wherein the transform mode is applied to the intra coded block with the quantization parameter less than a threshold, or wherein the transform mode is applied to the intra coded block with the quantization parameter greater than the threshold.
6 . The method of claim 1 , wherein if the video unit is an inter coded block or an intra coded block, whether to apply the transform mode to the video unit is dependent on coding information of at least one block neighboring to the video unit.
7 . The method of claim 6 , wherein whether to apply the transform mode to the video unit depends on residual information of the at least one block neighboring to the video unit.
8 . The method of claim 1 , wherein whether to apply the transform mode to the video unit is dependent on at least one of: a prediction method applied to the video unit, residual information of the video unit, a temporal layer where the video unit is, a block size of the video unit, quantization parameter of the video unit, or coding information of at least one block neighboring to the video unit.
9 . The method of claim 8 , wherein MTS is applied to all blocks coded with a target prediction mode, and other inter coded blocks or other intra coded blocks which are not coded with the target prediction mode and have the temporal layer less than a first threshold, or wherein MTS is applied to a set of blocks which are coded with a target prediction mode and has the temporal layer less than a first threshold, or wherein MTS is applied to all blocks coded with a target prediction mode, and other inter coded blocks or other intra coded blocks which are not coded with the target prediction mode and have a block dimension than a second threshold, or wherein MTS is applied to all blocks coded with a target prediction mode, and other or other intra coded blocks coded blocks which are not coded with the target prediction mode and have a block dimension than a second threshold and the temporal layer less than a third threshold.
10 . The method of claim 1 , wherein if the transform mode is applied to the video unit, a syntax element related to the transform mode is indicated, or wherein an indication of whether to and/or how to determine whether to apply the transform mode to the video unit is indicated at one of the followings: sequence level, group of pictures level, picture level, slice level, or tile group level, or wherein an indication of whether to and/or how to determine whether to apply the transform mode to the video unit is indicated in one of the following: a sequence header, a picture header, a sequence parameter set (SPS), a video parameter set (VPS), a dependency parameter set (DPS), a decoding capability information (DCI), a picture parameter set (PPS), an adaptation parameter sets (APS), a slice header, or a tile group header, or wherein an indication of whether to and/or how to determine whether to apply the transform mode to the video unit is included in one of the following: a prediction block (PB), a transform block (TB), a coding block (CB), a prediction unit (PU), a transform unit (TU), a coding unit (CU), a virtual pipeline data unit (VPDU), a coding tree unit (CTU), a CTU row, a slice, a tile, a sub-picture, or a region containing more than one sample or pixel, or wherein the method further comprises: determining, based on coded information of the target block, whether to and/or how to determine whether to apply the transform mode to the video unit, the coded information including at least one of: a block size, a colour format, a single and/or dual tree partitioning, a colour component, a slice type, or a picture type.
11 . The method of claim 1 , wherein the conversion includes encoding the video unit into the bitstream, or wherein the conversion includes decoding the video unit from the bitstream.
12 . An apparatus for processing video data comprising a processor and a non-transitory memory with instructions thereon, wherein the instructions upon execution by the processor, cause the processor to perform acts comprising: determining, during a conversion between a video unit of a video and a bitstream of a target block, whether to apply a transform mode to the video unit, the video unit being applied with an inter coding mode or an intra coding mode; and performing the conversion based on the determining, and wherein whether to apply the transform mode to the video unit is dependent on a prediction method applied to the video unit, and if the video unit is a combined inter and intra prediction (CIIP) coded block, the transform mode is applied to the video unit, or if the video unit is an advanced motion vector prediction (AMVP) coded block, the transform mode is not applied to the video unit, or if the video unit is a merge coded block, the transform mode is not applied to the video unit, or if the video unit is coded with a predetermined merge mode, the transform mode is not applied to the video unit, or if the video unit is a true-bi-prediction coded block, the transform mode is not applied to the video unit, wherein the true-bi-prediction coded block means a block coded with a future/succeeding reference picture and a previous/preceding reference picture in display order, or if the video unit is a uni-directional-prediction coded block, the transform mode is applied to the video unit, or if the video unit is a GEO coded block, the transform mode is not applied to the video unit, or if the video unit is an inter coded block or an intra coded block, the transform mode is applied to the video unit, or if the video unit is a block coded with a specific coding tool, the transform mode is not applied to the video unit, or if the video unit is a block coded with a specific coding tool, the transform mode is applied to the video unit, and wherein if the video unit is an inter coded block or an intra coded block, whether to apply the transform mode to the video unit is dependent on a block size of the video unit, and wherein the transform mode is applied for the inter coded block or the intra coded block if one of the followings is satisfied: a block width of the video unit less than a first value, a block height of the video unit less than a second value, the block width less than the first value and the block height less than the second value, the block width not larger than the first value, the block height not larger than the second value, or the block width not larger than the first value and the block height not larger than the second value, or wherein the transform mode is applied for the inter coded block or the intra coded block if one of the followings is satisfied: a block width of the video unit multiplying a block height of the video unit less than a first value multiplying a second value, or the block width multiplying the block height not larger than a first value multiplying a second value, or wherein a restriction of the block size is applied to all blocks, or wherein a restriction of the block size is applied to a certain type of blocks.
13 . The apparatus of claim 12 , wherein the transform mode represents at least one of: a transform kernel or core, a variance of the transform kernel or core, multiple transform kernel set, a variance of the multiple transform kernel set, a subblock based transform, a non-separable transform, a variance of the non-separable transform, a separable transform, a variance of the separable transform, a secondary transform, or a variance of the secondary transform, or wherein whether to apply the transform mode to the video unit is dependent on a prediction method applied to the video unit, or wherein if the video unit is an inter coded block or an intra coded block, whether to apply the transform mode to the video unit is dependent on residual information, or wherein if the video unit is an inter coded block or an intra coded block, whether to apply the transform mode to the video unit is dependent on a temporal layer where the video unit is, or wherein if the video unit is an inter coded block or an intra coded block, whether to apply the transform mode to the video unit is dependent on a block size of the video unit, or wherein if the video unit is an inter coded block or an intra coded block, whether to apply the transform mode to the video unit is dependent on a quantization parameter of the video unit, or wherein if the video unit is an inter coded block or an intra coded block, whether to apply the transform mode to the video unit is dependent on coding information of at least one block neighboring to the video unit, or wherein whether to apply the transform mode to the video unit is dependent on at least one of: a prediction method applied to the video unit, residual information of the video unit, a temporal layer where the video unit is, a block size of the video unit, quantization parameter of the video unit, or coding information of at least one block neighboring to the video unit, or wherein if the transform mode is applied to the video unit, a syntax element related to the transform mode is indicated.
14 . A non-transitory computer-readable storage medium storing instructions that cause a processor to perform acts comprising: determining, during a conversion between a video unit of a video and a bitstream of a target block, whether to apply a transform mode to the video unit, the video unit being applied with an inter coding mode or an intra coding mode; and performing the conversion based on the determining, and wherein whether to apply the transform mode to the video unit is dependent on a prediction method applied to the video unit, and if the video unit is a combined inter and intra prediction (CIIP) coded block, the transform mode is applied to the video unit, or if the video unit is an advanced motion vector prediction (AMVP) coded block, the transform mode is not applied to the video unit, or if the video unit is a merge coded block, the transform mode is not applied to the video unit, or if the video unit is coded with a predetermined merge mode, the transform mode is not applied to the video unit, or if the video unit is a true-bi-prediction coded block, the transform mode is not applied to the video unit, wherein the true-bi-prediction coded block means a block coded with a future/succeeding reference picture and a previous/preceding reference picture in display order, or if the video unit is a uni-directional-prediction coded block, the transform mode is applied to the video unit, or if the video unit is a GEO coded block, the transform mode is not applied to the video unit, or if the video unit is an inter coded block or an intra coded block, the transform mode is applied to the video unit, or if the video unit is a block coded with a specific coding tool, the transform mode is not applied to the video unit, or if the video unit is a block coded with a specific coding tool, the transform mode is applied to the video unit, and wherein if the video unit is an inter coded block or an intra coded block, whether to apply the transform mode to the video unit is dependent on a block size of the video unit, and wherein the transform mode is applied for the inter coded block or the intra coded block if one of the followings is satisfied: a block width of the video unit less than a first value, a block height of the video unit less than a second value, the block width less than the first value and the block height less than the second value, the block width not larger than the first value, the block height not larger than the second value, or the block width not larger than the first value and the block height not larger than the second value, or wherein the transform mode is applied for the inter coded block or the intra coded block if one of the followings is satisfied: a block width of the video unit multiplying a block height of the video unit less than a first value multiplying a second value, or the block width multiplying the block height not larger than a first value multiplying a second value, or wherein a restriction of the block size is applied to all blocks, or wherein a restriction of the block size is applied to a certain type of blocks.
15 . The storage medium of claim 14 , wherein the transform mode represents at least one of: a transform kernel or core, a variance of the transform kernel or core, multiple transform kernel set, a variance of the multiple transform kernel set, a subblock based transform, a non-separable transform, a variance of the non-separable transform, a separable transform, a variance of the separable transform, a secondary transform, or a variance of the secondary transform, or wherein whether to apply the transform mode to the video unit is dependent on a prediction method applied to the video unit, or wherein if the video unit is an inter coded block or an intra coded block, whether to apply the transform mode to the video unit is dependent on residual information, or wherein if the video unit is an inter coded block or an intra coded block, whether to apply the transform mode to the video unit is dependent on a temporal layer where the video unit is, or wherein if the video unit is an inter coded block or an intra coded block, whether to apply the transform mode to the video unit is dependent on a block size of the video unit, or wherein if the video unit is an inter coded block or an intra coded block, whether to apply the transform mode to the video unit is dependent on a quantization parameter of the video unit, or wherein if the video unit is an inter coded block or an intra coded block, whether to apply the transform mode to the video unit is dependent on coding information of at least one block neighboring to the video unit, or wherein whether to apply the transform mode to the video unit is dependent on at least one of: a prediction method applied to the video unit, residual information of the video unit, a temporal layer where the video unit is, a block size of the video unit, quantization parameter of the video unit, or coding information of at least one block neighboring to the video unit, or wherein if the transform mode is applied to the video unit, a syntax element related to the transform mode is indicated.
16 . A non-transitory computer-readable recording medium storing a bitstream of a video, with stored instructions to implement a method, wherein the method comprises: which is generated by a method performed by a video processing apparatus, wherein the method comprises: determining whether to apply a transform mode to a video unit of the video, the video unit being applied with an inter coding mode or an intra coding mode; and generating a bitstream of the video unit based on the determining, and wherein whether to apply the transform mode to the video unit is dependent on a prediction method applied to the video unit, and if the video unit is a combined inter and intra prediction (CIIP) coded block, the transform mode is applied to the video unit, or if the video unit is an advanced motion vector prediction (AMVP) coded block, the transform mode is not applied to the video unit, or if the video unit is a merge coded block, the transform mode is not applied to the video unit, or if the video unit is coded with a predetermined merge mode, the transform mode is not applied to the video unit, or if the video unit is a true-bi-prediction coded block, the transform mode is not applied to the video unit, wherein the true-bi-prediction coded block means a block coded with a future/succeeding reference picture and a previous/preceding reference picture in display order, or if the video unit is a uni-directional-prediction coded block, the transform mode is applied to the video unit, or if the video unit is a GEO coded block, the transform mode is not applied to the video unit, or if the video unit is an inter coded block or an intra coded block, the transform mode is applied to the video unit, or if the video unit is a block coded with a specific coding tool, the transform mode is not applied to the video unit, or if the video unit is a block coded with a specific coding tool, the transform mode is applied to the video unit, and wherein if the video unit is an inter coded block or an intra coded block, whether to apply the transform mode to the video unit is dependent on a block size of the video unit, and wherein the transform mode is applied for the inter coded block or the intra coded block if one of the followings is satisfied: a block width of the video unit less than a first value, a block height of the video unit less than a second value, the block width less than the first value and the block height less than the second value, the block width not larger than the first value, the block height not larger than the second value, or the block width not larger than the first value and the block height not larger than the second value, or wherein the transform mode is applied for the inter coded block or the intra coded block if one of the followings is satisfied: a block width of the video unit multiplying a block height of the video unit less than a first value multiplying a second value, or the block width multiplying the block height not larger than a first value multiplying a second value, or wherein a restriction of the block size is applied to all blocks, or wherein a restriction of the block size is applied to a certain type of blocks.

Description

CROSS REFERENCE TO RELATED APPLICATIONS This application is a continuation of International Application No. PCT/CN2022/122248, filed on Sep. 28, 2022, which claims the benefit of International Application No. PCT/CN2021/121974 filed on Sep. 29, 2021. The entire contents of these applications are hereby incorporated by reference in their entireties. FIELD Embodiments of the present disclosure relates generally to video coding techniques, and more particularly, to signaling of information related to a combined inter-intra prediction (CIIP) enhancement mode. BACKGROUND In nowadays, digital video capabilities are being applied in various aspects of people's′ lives. Multiple types of video compression technologies, such as MPEG-2, MPEG-4, ITU-TH.263, ITU-TH.264/MPEG-4 Part 10 Advanced Video Coding (AVC), ITU-TH.265 high efficiency video coding (HEVC) standard, versatile video coding (VVC) standard, have been proposed for video encoding/decoding. However, coding efficiency of conventional video coding techniques is generally low, which is undesirable. SUMMARY Embodiments of the present disclosure provide a solution for video processing. In a first aspect, a method for video processing is proposed. The method comprises: determining, during a conversion between a video unit of a video and a bitstream of the target block, information related to a combined inter-intra prediction (CIIP) enhancement mode, the video unit being applied with the CIIP enhancement mode; and performing the conversion based on the information related to the CIIP enhancement mode. Compared with the conventional solution, the proposed method can advantageously improve the coding efficiency and performance. In a second aspect, another method for video processing is proposed. The method comprises: applying, during a conversion between a video unit of a video and a bitstream of the target block, a reordering procedure and a refinement procedure to a number of merge candidates for the video unit, the video unit being applied with an inter coding mode; and performing the conversion based on the reordered and refined merge candidates. Compared with the conventional solution, the proposed method can advantageously improve the coding efficiency and performance. In a third aspect, another method for video processing is proposed. The method comprises: determining, during a conversion between a video unit of a video and a bitstream of the target block, first coding information of a first inter coding mode; determining second coding information of a second inter coding mode, the video unit being applied with the first inter coding mode and the second inter coding mode, the first coding information being associated with the second coding information; and performing the conversion based on the first and second coding information. Compared with the conventional solution, the proposed method can advantageously improve the coding efficiency and performance. In a fourth aspect, another method for video processing is proposed. The method comprises: determining, during a conversion between a video unit of a video and a bitstream of the target block, whether to apply a regular prediction mode or a template matching (TM) prediction mode to the video unit dynamically; and performing the conversion based on the determining. Compared with the conventional solution, the proposed method can advantageously improve the coding efficiency and performance. In a fifth aspect, another method for video processing is proposed. The method comprises: determining, during a conversion between a video unit of a video and a bitstream of the target block, whether to apply a transform mode to the video unit, the video unit being applied with an inter coding mode or an intra coding mode; and performing the conversion based on the determining. Compared with the conventional solution, the proposed method can advantageously improve the coding efficiency and performance. In a sixth aspect, another method for video processing is proposed. The method comprises: determining, during a conversion between a video unit of a video and a bitstream of the target block, information related to a transform mode, the video unit being applied with the transform mode; and performing the conversion based on the information related to the transform mode. Compared with the conventional solution, the proposed method can advantageously improve the coding efficiency and performance. In a seventh aspect, an apparatus for processing video data is proposed. The apparatus for processing video data comprises a processor and a non-transitory memory with instructions thereon. The instructions, upon execution by the processor, cause the processor to perform a method in accordance with any of the first, second, third, fourth, fifth or sixth aspect. In an eighth aspect, a non-transitory computer-readable storage medium is proposed. The non-transitory computer-readable storage medium stores instructions that cause a processo