US-12627831-B2 - Merge mode with motion vector differences

US12627831B2US 12627831 B2US12627831 B2US 12627831B2US-12627831-B2

Abstract

An electronic apparatus performs a method of decoding video data. The method comprises: receiving, from a bitstream, a first control flag that indicates merge mode with motion vector difference (MMVD) is enabled for one or more coding units in a video sequence; receiving a first syntax from the video data that identifies a set of motion vector difference (MVD) offsets from a plurality sets of MVD offsets; receiving, a second control flag corresponding to a respective coding unit of the one or more coding units, which indicates the MMVD is applied to the coding unit; receiving a second syntax that selects an MVD offset from the identified set of MVD offsets, and a third syntax that selects an MVD direction; forming MVD based on the selected MVD offset and MVD direction; and reconstructing the coding unit by applying the formed MVD to generate motion vectors to the coding unit.

Inventors

Xiaoyu Xiu
Wei Chen
Yi-Wen Chen
Tsung-Chuan MA
HONG-JHENG JHU
Xianglin Wang
Bing Yu

Assignees

Beijing Dajia Internet Information Technology Co., Ltd.

Dates

Publication Date: 20260512
Application Date: 20240607

Claims (12)

1 . A method of encoding video data, comprising: in response to a determination that a merge mode with motion vector difference (MMVD) is enabled for one or more coding units and the MMVD is applied to a respective coding unit of the one or more coding units: determining whether an inter prediction filter (InterPF) mode is enabled for the respective coding unit; and in response to determining that the InterPF mode is enabled for the respective coding unit: generating a syntax element, wherein the syntax element identifies an InterPF mode from a plurality of InterPF modes for the respective coding unit; and reconstructing the respective coding unit based on the identified InterPF mode.
2 . The method according to claim 1 , wherein the plurality of InterPF modes include at least two InterPF modes.
3 . The method according to claim 1 , wherein the reconstructing the respective coding unit comprises: in accordance with a determination that the InterPF mode identified by the syntax element is a first InterPF mode, for a respective sample of the respective coding unit, deriving a reconstructed sample of the respective sample according to a weighted average of an inter prediction sample of the respective sample and neighboring reconstructed samples from left, right, top and bottom of the respective sample.
4 . The method according to claim 1 , wherein the reconstructing the respective coding unit comprises: in accordance with a determination that the InterPF mode identified by the syntax element is a second InterPF mode, for a respective sample of the respective coding unit, deriving a reconstructed sample of the respective sample according to a weighted average of an inter prediction sample of the respective sample and neighboring reconstructed samples from left, and top of the respective sample.
5 . An electronic apparatus comprising: one or more processing units; memory coupled to the one or more processing units; and a plurality of programs stored in the memory that, when executed by the one or more processing units, cause the electronic apparatus to perform a method of encoding video data comprising: in response to a determination that a merge mode with motion vector difference (MMVD) is enabled for one or more coding units and the MMVD is applied to a respective coding unit of the one or more coding units: determining whether an inter prediction filter (InterPF) mode is enabled for the respective coding unit; and in response to determining that the InterPF mode is enabled for the respective coding unit: generating a syntax element, wherein the syntax element identifies an InterPF mode from a plurality of InterPF modes for the respective coding unit; and reconstructing the respective coding unit based on the identified InterPF mode.
6 . The electronic apparatus according to claim 5 , wherein the plurality of InterPF modes include at least two InterPF modes.
7 . The electronic apparatus according to claim 5 , wherein the reconstructing the respective coding unit comprises: in accordance with a determination that the InterPF mode identified by the syntax element is a first InterPF mode, for a respective sample of the respective coding unit, deriving a reconstructed sample of the respective sample according to a weighted average of an inter prediction sample of the respective sample and neighboring reconstructed samples from left, right, top and bottom of the respective sample.
8 . The electronic apparatus according to claim 5 , wherein the reconstructing the respective coding unit comprises: in accordance with a determination that the InterPF mode identified by the syntax element is a second InterPF mode, for a respective sample of the respective coding unit, deriving a reconstructed sample of the respective sample according to a weighted average of an inter prediction sample of the respective sample and neighboring reconstructed samples from left, and top of the respective sample.
9 . A method for storing a bitstream, comprising: generating a bitstream by performing an encoding method; and storing the bitstream, wherein the encoding method comprises; in response to a determination that a merge mode with motion vector difference (MMVD) is enabled for one or more coding units and the MMVD is applied to a respective coding unit of the one or more coding units: determining whether an inter prediction filter (InterPF) mode is enabled for the respective coding unit; and in response to determining that the InterPF mode is enabled for the respective coding unit: generating a syntax element, wherein the syntax element identifies an InterPF mode from a plurality of InterPF modes for the respective coding unit; and reconstructing the respective coding unit based on the identified InterPF mode.
10 . The method according to claim 9 , wherein the plurality of InterPF modes include at least two InterPF modes.
11 . The method according to claim 9 , wherein the reconstructing the respective coding unit comprises: in accordance with a determination that the InterPF mode identified by the syntax element is a first InterPF mode, for a respective sample of the respective coding unit, deriving a reconstructed sample of the respective sample according to a weighted average of an inter prediction sample of the respective sample and neighboring reconstructed samples from left, right, top and bottom of the respective sample.
12 . The method according to claim 9 , wherein the reconstructing the respective coding unit comprises: in accordance with a determination that the InterPF mode identified by the syntax element is a second InterPF mode, for a respective sample of the respective coding unit, deriving a reconstructed sample of the respective sample according to a weighted average of an inter prediction sample of the respective sample and neighboring reconstructed samples from left, and top of the respective sample.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS The present application is a continuation of U.S. application Ser. No. 17/566,139, entitled “IMPROVEMENTS ON MERGE MODE WITH MOTION VECTOR DIFFERENCES STATEMENT OF GOVERNMENT INTEREST” filed Dec. 30, 2021, which is a continuation of International Application No. PCT/US2021/022606, entitled “IMPROVEMENTS ON MERGE MODE WITH MOTION VECTOR DIFFERENCES” filed Mar. 16, 2021, which claims priority to U.S. Provisional Patent Application No. 62/989,900, entitled “IMPROVEMENTS ON MERGE MODE WITH MOTION VECTOR DIFFERENCES” filed Mar. 16, 2020, which are all incorporated herein by reference in their entirety. BACKGROUND Digital video is supported by a variety of electronic devices, such as digital televisions, laptop or desktop computers, tablet computers, digital cameras, digital recording devices, digital media players, video gaming consoles, smart phones, video teleconferencing devices, video streaming devices, etc. The electronic devices transmit, receive, encode, decode, and/or store digital video data by implementing video compression/decompression standards. Some well-known video coding standards include Versatile Video Coding (VVC), High Efficiency Video Coding (HEVC, also known as H.265 or MPEG-H Part 2) and Advanced Video Coding (AVC, also known as H.264 or MPEG-4 Part 10), which are jointly developed by ISO/IEC MPEG and ITU-T VECG. AOMedia Video 1 (AV1) was developed by Alliance for Open Media (AOM) as a successor to its preceding standard VP9. Audio Video Coding (AVS), which refers to digital audio and digital video compression standard, is another video compression standard series developed by the Audio and Video Coding Standard Workgroup of China. Video compression typically includes performing spatial (intra frame) prediction and/or temporal (inter frame) prediction to reduce or remove redundancy inherent in the video data. For block-based video coding, a video frame is partitioned into one or more slices, each slice having multiple video blocks, which may also be referred to as coding tree units (CTUs). Each CTU may contain one coding unit (CU) or recursively split into smaller CUs until the predefined minimum CU size is reached. Each CU (also named leaf CU) contains one or multiple transform units (TUs) and each CU also contains one or multiple prediction units (PUs). Each CU can be coded in either intra, inter or IBC modes. Video blocks in an intra coded (I) slice of a video frame are encoded using spatial prediction with respect to reference samples in neighboring blocks within the same video frame. Video blocks in an inter coded (P or B) slice of a video frame may use spatial prediction with respect to reference samples in neighboring blocks within the same video frame or temporal prediction with respect to reference samples in other previous and/or future reference video frames. Spatial or temporal prediction based on a reference block that has been previously encoded, e.g., a neighboring block, results in a predictive block for a current video block to be coded. The process of finding the reference block may be accomplished by block matching algorithm. Residual data representing pixel differences between the current block to be coded and the predictive block is referred to as a residual block or prediction errors. An inter-coded block is encoded according to a motion vector that points to a reference block in a reference frame forming the predictive block, and the residual block. The process of determining the motion vector is typically referred to as motion estimation. An intra coded block is encoded according to an intra prediction mode and the residual block. For further compression, the residual block is transformed from the pixel domain to a transform domain, e.g., frequency domain, resulting in residual transform coefficients, which may then be quantized. The quantized transform coefficients, initially arranged in a two-dimensional array, may be scanned to produce a one-dimensional vector of transform coefficients, and then entropy encoded into a video bitstream to achieve even more compression. The encoded video bitstream is then saved in a computer-readable storage medium (e.g., flash memory) to be accessed by another electronic device with digital video capability or directly transmitted to the electronic device wired or wirelessly. The electronic device then performs video decompression (which is an opposite process to the video compression described above) by, e.g., parsing the encoded video bitstream to obtain syntax elements from the bitstream and reconstructing the digital video data to its original format from the encoded video bitstream based at least in part on the syntax elements obtained from the bitstream, and renders the reconstructed digital video data on a display of the electronic device. With digital video quality going from high definition, to 4K×2K or even 8K×4K, the amount of vide data to be encoded/decoded grows exponentially. It