EP-4736441-A1 - CHROMA BDOF AND ADAPTIVE MULTI-PASS DMVR IN VIDEO CODING

EP4736441A1EP 4736441 A1EP4736441 A1EP 4736441A1EP-4736441-A1

Abstract

A device for decoding video data determines a luma prediction block for a luma block of a current block using an initial motion vector; applies one or more decoder-side motion vector refinement processes to the luma prediction block to determine a refined luma prediction block and refined motion vectors, the one or more motion vector refinement processes comprising a bi-directional optical flow (BDOF) process; for a chroma sample of a chroma block, determines a motion vector for the chroma sample based on refined motion vectors of one or more co-located luma samples of the luma block; determines a chroma prediction sample for a chroma prediction block based on the motion vector for the chroma sample; and determines a decoded version of the current block based on the refined luma prediction block and the chroma prediction block.

Inventors

CHEN, CHUN-CHI
HUANG, Han
ZHANG, ZHI
SEREGIN, VADIM
KARCZEWICZ, MARTA

Assignees

QUALCOMM INCORPORATED

Dates

Publication Date: 20260506
Application Date: 20240627

Claims (20)

1. A method of decoding video data, the method comprising: determining an initial motion vector for a current block of the video data; determining a luma prediction block for a luma block of the current block using the initial motion vector; applying one or more decoder-side motion vector refinement processes to the luma prediction block to determine a refined luma prediction block and refined motion vectors for each luma sample of the luma block, wherein the one or more motion vector refinement processes comprise a bi-directional optical flow (BDOF) process; for a chroma sample of a chroma block, determining a motion vector for the chroma sample based on refined motion vectors of one or more co-located luma samples of the luma block; determining a chroma prediction sample for a chroma prediction block based on the motion vector for the chroma sample; determining a decoded version of the current block based on the refined luma prediction block and the chroma prediction block; and outputting a decoded picture of the video data, wherein the decoded picture comprises the decoded version of the current block.
2. The method of claim 1, wherein determining the motion vector for the chroma sample based on the refined motion vectors of the one or more co-located luma samples comprises selecting one of the refined motion vectors of the one or more co-located luma samples to be the motion vector for the chroma block.
3. The method of claim 2, wherein determining the motion vector for the chroma sample based on the refined motion vectors of the one or more co-located luma samples of the luma block comprises scaling the one of the refined motion vectors of the one or more co-located luma samples.
4. The method of claim 1, wherein determining the motion vector for the chroma sample based on the refined motion vectors of the one or more co-located luma samples comprises averaging at least two of the refined motion vectors of the one or more colocated luma samples to determine the motion vector for the chroma block.
5. The method of claim 1, wherein applying the one or more decoder-side motion vector refinement processes to the luma prediction block to determine the refined luma prediction block and the refined motion vectors for each luma sample of the luma block comprises applying a block-based bilateral matching motion vector refinement process before the BODF process.
6. The method of claim 5, wherein applying the one or more decoder-side motion vector refinement processes to the luma prediction block to determine the refined luma prediction block and the refined motion vectors for each luma sample of the luma block comprises applying a sub-block-based bilateral matching motion vector refinement process after the block-based bilateral matching motion vector refinement process and before the BODF process.
7. The method of claim 1, further comprising: performing a sample-based BDOF process on the chroma prediction block.
8. A device for decoding encoded video data, the device comprising: a memory configured to store the encoded video data; one or more processors implemented in circuitry and configured to: determine an initial motion vector for a current block of the video data; determine a luma prediction block for a luma block of the current block using the initial motion vector; apply one or more decoder-side motion vector refinement processes to the luma prediction block to determine a refined luma prediction block and refined motion vectors for each luma sample of the luma block, wherein the one or more motion vector refinement processes comprise a bi-directional optical flow (BDOF) process; for a chroma sample of a chroma block, determine a motion vector for the chroma sample based on refined motion vectors of one or more co-located luma samples of the luma block; determine a chroma prediction sample for a chroma prediction block based on the motion vector for the chroma sample; determine a decoded version of the current block based on the refined luma prediction block and the chroma prediction block; and output a decoded picture of the video data, wherein the decoded picture comprises the decoded version of the current block.
9. The device of claim 8, wherein to determine the motion vector for the chroma sample based on the refined motion vectors of the one or more co-located luma samples, the one or more processors are further configured to select one of the refined motion vectors of the one or more co-located luma samples to be the motion vector for the chroma block.
10. The device of claim 8, wherein to determine the motion vector for the chroma sample based on the refined motion vectors of the one or more co-located luma samples, the one or more processors are further configured to average at least two of the refined motion vectors of the one or more co-located luma samples to determine the motion vector for the chroma block.
11. The device of claim 8, wherein to apply the one or more decoder-side motion vector refinement processes to the luma prediction block to determine the refined luma prediction block and the refined motion vectors for each luma sample of the luma block, the one or more processors are further configured to apply a block-based bilateral matching motion vector refinement process before the BODF process.
12. The device of claim 11, wherein to apply the one or more decoder-side motion vector refinement processes to the luma prediction block to determine the refined luma prediction block and the refined motion vectors for each luma sample of the luma block, the one or more processors are further configured to apply a sub-block-based bilateral matching motion vector refinement process after the block-based bilateral matching motion vector refinement process and before the BODF process.
13. The device of claim 8, wherein the current block comprises a bi-predicted block.
14. The device of claim 8, further comprising a display configured to display the decoded picture.
15. The device of claim 8, wherein the device comprises one or more of a camera, a computer, a mobile device, a broadcast receiver device, or a set-top box.
16. The device of claim 8, wherein the device comprises a wireless communication device, further comprising a receiver configured to receive the encoded video data.
17. The device of claim 16, wherein the wireless communication device comprises a telephone handset and wherein the receiver is configured to demodulate, according to a wireless communication standard, a signal comprising the encoded video data.
18. A device for encoding video data, the device comprising: a memory configured to store the video data; one or more processors implemented in circuitry and configured to: determine an initial motion vector for a current block of the video data; determine a luma prediction block for a luma block of the current block using the initial motion vector; apply one or more decoder-side motion vector refinement processes to the luma prediction block to determine a refined luma prediction block and refined motion vectors for each luma sample of the luma block, wherein the one or more motion vector refinement processes comprise a bi-directional optical flow (BDOF) process; for a chroma sample of a chroma block, determine a motion vector for the chroma sample based on refined motion vectors of one or more co-located luma samples; determine a chroma prediction sample for a chroma prediction block based on the motion vector for the chroma sample; determine a decoded version of the current block based on the refined luma prediction block and the chroma prediction block; store a decoded picture of the video data, wherein the decoded picture comprises the decoded version of the current block; and use the decoded picture of the video data to encode a subsequent picture of video data.
19. The device of claim 18, wherein to determine the motion vector for the chroma sample based on the refined motion vectors of the one or more co-located luma samples, the one or more processors are further configured to select one of the refined motion vectors of the one or more co-located luma samples to be the motion vector for the chroma block.
20. The device of claim 18, wherein to determine the motion vector for the chroma sample based on the refined motion vectors of the one or more co-located luma samples, the one or more processors are further configured to average at least two of the refined motion vectors of the one or more co-located luma samples to determine the motion vector for the chroma block.

Description

CHROMA BDOF AND ADAPTIVE MULTI-PASS DMVR IN VIDEO CODING [0001] This application claims priority to U.S. Patent Application No. 18/755,388, filed 26 June 2024 and U.S. Provisional Patent Application No. 63/511,514, filed 30 June 2023, the entire contents of each of which are incorporated herein by reference. U.S. Patent Application No. 18/755,388, filed 26 June 2024 claims the benefit of U.S. Provisional Patent Application No. 63/511,514, filed 30 June 2023. TECHNICAL FIELD [0002] This disclosure relates to video encoding and video decoding. BACKGROUND [0003] Digital video capabilities can be incorporated into a wide range of devices, including digital televisions, digital direct broadcast systems, wireless broadcast systems, personal digital assistants (PDAs), laptop or desktop computers, tablet computers, e-book readers, digital cameras, digital recording devices, digital media players, video gaming devices, video game consoles, cellular or satellite radio telephones, so-called “smart phones,” video teleconferencing devices, video streaming devices, and the like. Digital video devices implement video coding techniques, such as those described in the standards defined by MPEG-2, MPEG-4, ITU-T H.263, ITU-T H.264/MPEG-4, Part 10, Advanced Video Coding (AVC), ITU-T H.265/High Efficiency Video Coding (HEVC), ITU-T H.266/Versatile Video Coding (VVC), and extensions of such standards, as well as proprietary video codecs/formats such as AOMedia Video 1 (AVI) that was developed by the Alliance for Open Media. The video devices may transmit, receive, encode, decode, and/or store digital video information more efficiently by implementing such video coding techniques. [0004] Video coding techniques include spatial (intra-picture) prediction and/or temporal (inter-picture) prediction to reduce or remove redundancy inherent in video sequences. For block-based video coding, a video slice (e.g., a video picture or a portion of a video picture) may be partitioned into video blocks, which may also be referred to as coding tree units (CTUs), coding units (CUs) and/or coding nodes. Video blocks in an intracoded (I) slice of a picture are encoded using spatial prediction with respect to reference samples in neighboring blocks in the same picture. Video blocks in an inter-coded (P or B) slice of a picture may use spatial prediction with respect to reference samples in neighboring blocks in the same picture or temporal prediction with respect to reference samples in other reference pictures. Pictures may be referred to as frames, and reference pictures may be referred to as reference frames. SUMMARY [0005] This disclosure describes techniques related to inter prediction and, more specifically, techniques related to enhancement of bi-directional optical flow (BDOF) and multi-pass decoder-side motion vector refinement (DMVR). For multi-pass DMVR, a video decoder may perform one or more of block-based bilateral matching motion vector refinement, subblock based bilateral matching motion vector refinement, subblock based BDOF motion vector refinement, adaptive decoder-side motion vector refinement, and sample-based BDOF on a luma prediction block to determine a refined motion vector for predicting the luma block, and hence a refined luma prediction block. Typically, samplebased BDOF is performed after the DMVR processes are applied to the luma prediction block. These DMVR processes are not performed separately, however, for chroma prediction blocks. [0006] This disclosure described techniques for predicting a chroma block sample by sample using refined motion vectors determined from a sample-based BDOF applied to a co-located luma block. By determining, for a chroma sample of a chroma block that has one or more co-located luma samples in a luma block, a motion vector for the chroma sample based on refined motion vectors of the one or more co-located luma samples using sample-based BDOF, the techniques of this disclosure may improve the accuracy of chroma prediction blocks, which can result in improved rate-distortion tradeoffs in video coding. [0007] According to an example of this disclosure, a method of decoding video data includes: determining an initial motion vector for a current block of the video data; determining a luma prediction block for a luma block of the current block using the initial motion vector; applying one or more decoder-side motion vector refinement processes to the luma prediction block to determine a refined luma prediction block and refined motion vectors for each luma sample of the luma block, wherein the one or more motion vector refinement processes comprise a bi-directional optical flow (BDOF) process; for a chroma sample of a chroma block, determining a motion vector for the chroma sample based on refined motion vectors of one or more co-located luma samples of the luma block; determining a chroma prediction sample for a chroma prediction block based on the motion vector for the chroma sample; dete