CN-122003864-A - Method, apparatus and medium for video processing

CN122003864ACN 122003864 ACN122003864 ACN 122003864ACN-122003864-A

Abstract

Embodiments of the present disclosure provide a solution for video processing. A method for video processing is presented. In the method, for a transition between a current video block of the video and a bitstream of the video, a first prediction of the current video block is determined based on a first codec tool. The first codec includes a Motion Vector Difference (MVD) dependent codec. A third prediction of the current video block is determined based on the first prediction and the second prediction of the current video block. The second prediction is determined based on a second codec tool that is different from the first codec tool. The conversion is performed based on the third prediction.

Inventors

ZHAO LEI
ZHANG KAI
ZHANG LI

Assignees

抖音视界有限公司
字节跳动有限公司

Dates

Publication Date: 20260508
Application Date: 20241001
Priority Date: 20231004

Claims (20)

1. A method for video processing, comprising: For a transition between a current video block of video and a bitstream of the video, determining a first prediction for the current video block based on a first codec tool, the first codec tool comprising a Motion Vector Difference (MVD) related codec tool; Determining a third prediction of the current video block based on the first prediction and a second prediction of the current video block, the second prediction being determined based on a second codec different from the first codec, and The conversion is performed based on the third prediction.
2. The method of claim 1, wherein the first codec comprises at least one of a conventional Advanced Motion Vector Prediction (AMVP) codec or a Merge Mode (MMVD) codec with motion vector differences, and Wherein the second codec tool comprises at least one of a conventional intra-frame codec tool, a conventional intra-plane codec tool, a conventional intra-frame DC codec tool, a conventional intra-frame angle codec tool, a template-based intra-frame mode derivation (TIMD) codec tool, a decoder-side intra-frame mode derivation (DIMD) codec tool, an intra-sub-split codec (ISP) codec tool, a position-dependent (intra) prediction combination (PDPC) codec tool, a matrix-based intra-prediction (MIP) inter-frame codec tool, an intra-block copy (IBC) codec tool, or a conventional inter-frame codec tool.
3. The method of claim 1, wherein the third prediction is determined based on a weighted sum of the first prediction and the second prediction.
4. A method according to any of claims 1 to 3, wherein an inter-component of intra inter-frame joint prediction (CIIP) is determined by the first codec tool.
5. The method of claim 4, wherein the first prediction is generated using a conventional Advanced Motion Vector Difference (AMVD) or MMVD, the second prediction is generated by an intra-mode, and the third prediction is determined by blending the first and second predictions using a weighted average.
6. The method of claim 5, wherein the intra mode comprises at least one of a planar mode, a DC mode, an angular mode, a matrix-based intra prediction (MIP) mode, an intra sub-division codec (ISP) mode, an Intra Block Copy (IBC) mode, or an intra template matching prediction (intra TMP) mode.
7. The method of claim 5, wherein the intra mode is derived based on at least one of template-based intra mode derivation (TIMD), decoder-side intra mode derivation (DIMD), or intra template matching prediction (intra TMP).
8. The method of claim 5, wherein the intra-frame component of CIIP is processed by a position-dependent (intra) prediction combining (PDPC).
9. The method of any of claims 1-8, wherein if CIIP-Merge mode with motion vector difference (MMVD) mode is used, the first prediction is generated with MMVD, the second prediction is generated by intra mode, and the third prediction is determined by mixing the first and second predictions.
10. The method of claim 9, further comprising: a list of candidates is determined MMVD as a list of candidates, Wherein the MMVD candidate list is different from or the same as the MMVD candidate list of conventional MMVD.
11. The method of claim 10, wherein a base Motion Vector (MV) candidate derivation scheme is used to construct the MMVD candidate list, the base MV being different or the same as a base MV of conventional MMVD.
12. The method of claim 10, wherein a number of base MV candidates is used to construct the MMVD candidate list, the number of base MV candidates being different from or the same as the number of base MV candidates of conventional MMVD.
13. The method of claim 10, wherein a number of MV offsets is used to construct the MMVD candidate list, the number of MV offsets being different or the same as the number of MV offsets of conventional MMVD.
14. The method of claim 10, wherein MMVD candidate indications are used to construct the MMVD candidate list, the MMVD candidate indications being different or the same as the MMVD candidate indications of regular MMVD.
15. The method of claim 9 or 10, wherein in CIIP-MMVD mode, only unidirectional predictions are applied to generate the first prediction.
16. The method of any of claims 10-15, wherein the MMVD candidate list for the CIIP-MMVD mode is reordered based on at least one metric after being constructed.
17. The method of claim 16, wherein the at least one metric comprises at least one of a template matching cost or a bilateral matching cost.
18. The method of claim 17, wherein the template matching cost is used to reorder the MMVD candidate list if a reconstructed template region of the current video block exists.
19. The method of claim 17, wherein the template matching cost is not used to reorder the MMVD candidate list if a reconstructed template region of the current video block does not exist.
20. The method of any of claims 10-15, wherein the MMVD candidate list for the CIIP-MMVD mode is not reordered.

Description

Method, apparatus and medium for video processing Technical Field Embodiments of the present disclosure relate generally to video processing techniques, and more particularly, to intra-inter-frame joint prediction based on Motion Vector Differences (MVDs). Background Today, digital video capabilities are being applied to various aspects of a person's life. Various types of video compression techniques have been proposed for video encoding/decoding, such as the MPEG-2, MPEG-4, ITU-T H.263, ITU-T H.264/MPEG-4 part 10 Advanced Video Codec (AVC), ITU-T H.265 High Efficiency Video Codec (HEVC) standard, the multifunctional video codec (VVC) standard. However, the codec efficiency of video codec technology is generally expected to be further improved. Disclosure of Invention Embodiments of the present disclosure provide a solution for video processing. In a first aspect, a method for video processing is presented. The method includes determining, for a transition between a current video block of the video and a bitstream of the video, a first prediction of the current video block based on a first codec tool including a Motion Vector Difference (MVD) related codec tool, determining a third prediction of the current video block based on the first prediction and a second prediction of the current video block, the second prediction being determined based on a second codec tool different from the first codec tool, and performing the transition based on the third prediction. The method according to the first aspect of the present disclosure is capable of mixing a prediction from an MVD-related codec with another prediction. In a second aspect, an apparatus for video processing is presented. The apparatus includes a processor and a non-transitory memory having instructions thereon. The instructions, when executed by a processor, cause the processor to perform a method according to the first aspect of the present disclosure. In a third aspect, a non-transitory computer readable storage medium is presented. The non-transitory computer readable storage medium stores instructions that cause a processor to perform a method according to the first aspect of the present disclosure. In a fourth aspect, another non-transitory computer readable recording medium is presented. The non-transitory computer readable recording medium stores a bitstream of video generated by a method performed by an apparatus for video processing. The method includes determining a first prediction of a current video block of the video based on a first codec, the first codec including a Motion Vector Difference (MVD) related codec, determining a third prediction of the current video block based on the first prediction and a second prediction of the current video block, the second prediction being determined based on a second codec different from the first codec, and generating a bitstream based on the third prediction. In a fifth aspect, a method for storing a bitstream of video is presented. The method includes determining a first prediction of a current video block of the video based on a first codec, the first codec including a Motion Vector Difference (MVD) related codec, determining a third prediction of the current video block based on the first prediction and a second prediction of the current video block, the second prediction being determined based on a second codec different from the first codec, generating a bitstream based on the third prediction, and storing the bitstream in a non-transitory computer readable recording medium. This summary is intended to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter. Drawings The above and other objects, features and advantages of the exemplary embodiments of the present disclosure will become more apparent by the following detailed description with reference to the accompanying drawings. In example embodiments of the present disclosure, like reference numerals generally refer to like components. FIG. 1 illustrates a block diagram of an example video codec system according to some embodiments of the present disclosure; fig. 2 illustrates a block diagram of a first example video encoder, according to some embodiments of the present disclosure; Fig. 3 illustrates a block diagram of an example video decoder, according to some embodiments of the present disclosure; FIG. 4 shows the locations of spatial and temporal neighboring blocks used in the construction of the AMVP/Merge candidate list; FIG. 5 shows the location of non-adjacent candidates in the ECM; FIG. 6 shows an affine motion model based on control points; FIG. 7 shows an example affine MVF for each sub-block; FIG. 8 shows the position of inherited affine motion predictors; FIG. 9 illustrates control p