US-20260129225-A1 - Method and Apparatus for Signaling for Compound Inter Prediction Modes

US20260129225A1US 20260129225 A1US20260129225 A1US 20260129225A1US-20260129225-A1

Abstract

This disclosure relates generally to video coding and particularly to methods and systems for signaling of compound inter prediction modes. For example, a correlation between a use of optical flow refinement and various compound inter prediction modes is explored to design example schemes for signaling an optical flow refinement flag and a compound inter prediction mode for a video block. Specifically, the signaling of the inter prediction modes for a block may depend on whether the optical flow refinement is applied to the block or not.

Inventors

Liang Zhao
Xin Zhao
Han Gao
Shan Liu

Assignees

Tencent America LLC

Dates

Publication Date: 20260507
Application Date: 20251230

Claims (20)

1 . A method for decoding a video block in a video bitstream, the method comprising; receiving a first syntax element signaled in the video bitstream indicative of whether an optical flow refinement is applied for the video block; determining whether the optical flow refinement is applied for the video block based on a value of the first syntax element; subsequent to receiving the first syntax element, receiving a second syntax element indicating a compound inter prediction mode for the video block from the video bitstream dependent on whether the optical flow refinement is applied; determining the compound inter prediction mode for the video block based on a value of the second syntax element; and predicting the video block based on the determined compound inter prediction mode.
2 . The method of claim 1 , wherein the value of the second syntax element is determined by decoding the second syntax element using a coding context dependent on whether the optical flow refinement is applied.
3 . The method of claim 2 , wherein a first context is used for decoding the second syntax element indicating the compound inter prediction mode for the video block when the optical flow refinement is applied whereas a second context different from the first context is used for decoding the second syntax element indicating the compound inter prediction mode for the video block when the optical flow refinement is not applied.
4 . The method of claim 1 , wherein when the optical flow refinement is applied to the video block, only compound inter prediction modes that require signaling of one or no motion vector difference are allowed.
5 . The method of claim 4 , wherein when the optical flow refinement is enabled for the video block, a syntax value space for the compound inter prediction mode for the video block contains a subset of NEAR_NEARMV, NEAR_NEWMV, NEW_NEARMV, JOINT_NEWMV, and JOINT_AMVDNEWMV modes.
6 . The method of claim 1 , wherein when the optical flow refinement is applied to the video block, only compound inter prediction modes that involve one or no motion vector difference are allowed.
7 . The method of claim 6 , wherein when the optical flow refinement is applied for the video block, a syntax value space for the compound inter prediction mode for the video block contains a subset of NEAR_NEARMV, NEAR_NEWMV, and NEW_NEARMV modes.
8 . The method of claim 1 , wherein, with respect to joint motion vector difference compound inter-prediction modes: when the optical flow refinement is not applied for the video block, two joint motion vector difference compound inter prediction modes are allowed; and when the optical flow refinement is applied for the video block, only one of the two joint motion vector difference compound inter prediction modes is allowed.
9 . The method of claim 8 , wherein: the two joint motion vector difference compound inter prediction modes comprise a JOINT_NEWMV mode and a JOINT_AMVDNEWMV mode; and when the optical flow refinement is applied to the video block, the only one of the two joint motion vector difference compound inter prediction modes being allowed is the JOINT_AMVDNEWMV mode.
10 . The method of claim 1 , wherein determining the compound inter prediction mode for the video block f comprises: determining a mapping between possible values for the second syntax element to a plurality of compound inter prediction modes based on whether the optical flow refinement for the video block is applied; and determining the compound inter prediction mode for the video block based on a value of the second syntax element and the mapping.
11 . The method of claim 10 , wherein the mapping between the possible values for the second syntax element to the plurality of compound inter prediction modes is different between when the optical flow refinement is applied and when the optical flow refinement is not applied.
12 . A method for decoding a video block in a video bitstream, the method comprising; receiving a first syntax element signaled in the video bitstream indicative of a compound inter prediction mode for the video block among a plurality of compound inter prediction modes; determining the compound inter prediction mode for the video block based on a value of the first syntax element; and determining whether a second syntax element for the video block is included in the video bitstream or receiving the second syntax element from the video bitstream in a manner dependent on the compound inter prediction mode indicated by the first syntax element, the second syntax element being indicative of whether an optical flow refinement is applied to the video block.
13 . The method of claim 12 , wherein extracting the second syntax element comprises decoding the second syntax element using a coding context dependent on the compound inter prediction mode indicated by the first syntax element.
14 . The method of claim 12 , where the second syntax element for the video block is included in the video bitstream only for a subset of the plurality of compound inter prediction modes.
15 . The method of claim 14 , wherein the second syntax element for the video block is included in the video bitstream only when the compound inter prediction mode for the video block requires at most one signaled motion vector difference.
16 . The method of claim 15 , wherein the subset of the plurality of compound inter prediction modes comprise NEAR_NEARMV, NEAR_NEWMV, NEW_NEARMV, JOINT_NEWMV, and JOINT_AMVDNEWMV modes.
17 . The method of claim 14 , wherein the second syntax element for the video block is included in the video bitstream only when the compound inter prediction mode for the video block involves at most one motion vector difference.
18 . The method of claim 14 , wherein the second syntax element for the video block is included in the video bitstream only when the compound inter prediction mode of the video block that involves no motion vector different, or uses adaptive motion vector resolution, or has motion vector precision coarser than a predetermined precision threshold.
19 . An electronic device, comprising a memory for storing instructions, and a processor for executing the stored instructions to: receive a first syntax element signaled in a video bitstream indicative of whether an optical flow refinement is applied for a video block in the video bitstream; determine whether the optical flow refinement is applied for the video block based on a value of the first syntax element; subsequent to receiving the first syntax element, receive a second syntax element indicating a compound inter prediction mode for the video block from the video bitstream dependent on whether the optical flow refinement is applied; determine the compound inter prediction mode for the video block based on a value of the second syntax element; and predict the video block based on the determined compound inter prediction mode.
20 . An electronic device, comprising a memory for storing instructions, and a processor for executing the stored instructions to implement the method of claim 12 .

Description

INCORPORATION BY REFERENCE This application is a continuation of and claims the benefit of priority to U.S. patent application Ser. No. 18/463,508, filed on Sep. 8, 2023, which is based on and claims the benefit of priority to U.S. Provisional Patent Application No. 63/442,721 filed on Feb. 1, 2023 and entitled “Method and Apparatus for Improved Signaling for Compound Inter Prediction Modes,” which are herein incorporated by reference in their entireties. TECHNICAL FIELD This disclosure relates generally to video coding and particularly to methods and systems for signaling of compound inter prediction modes. BACKGROUND Uncompressed digital video can include a series of pictures, and may specific bitrate requirements for storage, data processing, and for transmission bandwidth in streaming applications. One purpose of video coding and decoding can be the reduction of redundancy in the uncompressed input video signal, through various compression techniques. SUMMARY This disclosure relates generally to video coding and particularly to methods and systems for signaling of compound inter prediction modes. For example, a correlation between a use of optical flow refinement and various compound inter prediction modes is explored to design example schemes for signaling an optical flow refinement flag and a compound inter prediction mode for a video block. Specifically, the signaling of the inter prediction modes for a block may depend on whether the optical flow refinement is applied to the block or not. In an example implementation, a method for decoding a video block in a video bitstream is disclosed. The method may include receiving a first syntax element signaled in the video bitstream indicative of whether an optical flow refinement is applied for the video block; determining whether the optical flow refinement is applied for the video block based on a value of the first syntax element; subsequent to receiving the first syntax element, receiving a second syntax element indicating a compound inter prediction mode for the video block from the video bitstream dependent on whether the optical flow refinement is applied; determining the compound inter prediction mode for the video block based on a value of the second syntax element; and predicting the video block based on the determined compound inter prediction mode. In the example implementation above, the value of the second syntax element is determined by decoding the second syntax element using a coding context dependent on whether the optical flow refinement is applied. In any one of the implementations above, a first context is used for decoding the second syntax element indicating the compound inter prediction mode for the video block when the optical flow refinement is applied whereas a second context different from the first context is used for decoding the second syntax element indicating the compound inter prediction mode for the video block when the optical flow refinement is not applied. In any one of the implementations above, when the optical flow refinement is applied for the video block, only compound inter prediction modes that require signaling of one or no motion vector difference are allowed. In any one of the implementations above, when the optical flow refinement is applied for the video block, a syntax value space for the compound inter prediction mode for the video block contains a subset of NEAR_NEARMV, NEAR_NEWMV, NEW_NEARMV, JOINT_NEWMV, and JOINT_AMVDNEWMV modes. In any one of the implementations above, when the optical flow refinement is applied for the video block, only compound inter prediction modes that involve one or no motion vector difference are allowed. In any one of the implementations above, wherein when the optical flow refinement is applied for the video block, a syntax value space for the compound inter prediction mode for the video block contains an subset of NEAR_NEARMV, NEAR_NEWMV, and NEW_NEARMV modes. In any one of the implementations above, with respect to joint motion vector difference compound inter-prediction modes: when the optical flow refinement is not applied for the video block, two joint motion vector difference compound inter prediction modes are allowed; and when the optical flow refinement is applied for the video block, only one of the two joint motion vector difference compound inter prediction modes is allowed. In any one of the implementations above, the two joint motion vector difference compound inter prediction modes comprise a JOINT_NEWMV mode and a JOINT_AMVDNEWMV mode; and when the optical flow refinement is applied for the video block, the only one of the two joint motion vector difference compound inter prediction modes being allowed is the JOINT_AMVDNEWMV mode. In any one of the implementations above, determining the compound inter prediction mode for the video block may include determining a mapping between possible values for the second syntax element to a plurality of compound inter pred