EP-4736451-A1 - BI-DIRECTION OPTICAL FLOW SUBBLOCK REFINEMENT FOR AN AFFINE MODELED BLOCK

EP4736451A1EP 4736451 A1EP4736451 A1EP 4736451A1EP-4736451-A1

Abstract

A video decoder determines that a current block of size width (WCB) x height (HCB) is coded in an affine prediction mode; predicts each subblock of a first plurality of subblocks using an affine motion model to determine an initial prediction block, each subblock having a size of width (WSB) x height (HSB) and WSB being less than WCB and HSB being less than HCB; applies a bi-directional optical flow process to first and second subblocks of a second plurality of subblocks to determine first and second refined prediction subblocks, each subblock having a size of width (WSBIPB) x height (HSBIPB), WSBIPB being less than or equal to WCB and less than or equal to WSB and HSBIPB being less than or equal to HCB and less than or equal to HSB; and determines a refined prediction block based on the first refined subblock and the second refined subblock.

Inventors

ZHANG, ZHI
HUANG, Han
LIN, JIAN-LIANG
SEREGIN, VADIM
KARCZEWICZ, MARTA

Assignees

QUALCOMM INCORPORATED

Dates

Publication Date: 20260506
Application Date: 20240627

Claims (1)

Qualcomm Ref. No.2306925WO 69 WHAT IS CLAIMED IS: 1. A method of decoding video data, the method comprising: determining that a current block of the video data is coded in an affine prediction mode, wherein the current block has a size of width (W CB ) x height (H CB ); predicting each subblock of a first plurality of subblocks using an affine motion model associated with the affine prediction mode to determine an initial prediction block for the current block. wherein each subblock of the first plurality of subblocks has a size of width (W SB ) x height (H SB ), wherein W SB is less than W CB and H SB is less than H CB ; applying a bi-directional optical flow process to a first subblock of a second plurality of subblocks to determine a first refined prediction subblock, wherein each subblock of the second plurality of subblocks has a size of width (W SBIPB ) x height (H SBIPB ), wherein W SBIPB is less than or equal to W CB and less than or equal to W SB and H SBIPB is less than or equal to H CB and less than or equal to H SB ; applying the bi-directional optical flow process to a second subblock of the second plurality of subblocks to determine a second refined prediction subblock; determining a refined prediction block based on the first refined subblock and the second refined subblock; and determining a decoded version of the current block based on the refined prediction block. 2. The method of claim 1, wherein predicting each subblock of the first plurality of subblocks using the affine motion model associated with the affine prediction mode to determine the initial prediction block for the current block comprises: receiving two or more control point motion vectors; deriving an initial motion vector for the subblock of the first plurality of subblocks; and locating an initial prediction block for the subblock using the initial motion vector for the subblock. 3. The method of claim 2, wherein applying the bi-directional optical flow process to the first subblock of the second plurality of subblocks to determine the first refined prediction subblock comprises: 1616-353WO01 Qualcomm Ref. No.2306925WO 70 determining an updated motion vector for the first subblock of the second subblock of the second plurality of subblocks. 4. The method of claim 3, further comprising: storing the updated motion vector for the first subblock of the second plurality of subblocks; and using the updated motion vector to predict a subsequent block of video data. 5. The method of claim 4, wherein determining the refined prediction block based on the first refined subblock and the second refined subblock comprises: applying a per-pixel bi-directional optical flow process to the first refined prediction subblock. 6. The method of claim 1, wherein determining the refined prediction block based on the first refined subblock and the second refined subblock comprises: applying a second bi-directional optical flow process to the first refined prediction subblock; and applying the second bi-directional optical flow process to the second refined prediction subblock. 7. The method of claim 1, further comprising: receiving a syntax element, wherein a value of the syntax element indicates that the bi-directional optical flow process is enabled for the current block. 8. The method of claim 1, wherein W SBIPB equals 1 and H SBIPB equals 1. 9. The method of claim 1, wherein W SB is greater than or equal to 4 and H SB is greater than or equal to 4. 10. The method of claim 1, wherein the current block comprises a bi-predicted block. 11. The method of claim 1, wherein the method of decoding is performed as part of a video encoding process. 1616-353WO01 Qualcomm Ref. No.2306925WO 71 12. A device for decoding encoded video data, the device comprising: a memory configured to store video data; one or more processors implemented in circuitry and configured to: determine that a current block of the video data is coded in an affine prediction mode, wherein the current block has a size of width (W CB ) x height (H CB ); predict each subblock of a first plurality of subblocks using an affine motion model associated with the affine prediction mode to determine an initial prediction block for the current block. wherein each subblock of the first plurality of subblocks has a size of width (W SB ) x height (H SB ), wherein W SB is less than W CB and H SB is less than H CB ; apply a bi-directional optical flow process to a first subblock of a second plurality of subblocks to determine a first refined prediction subblock, wherein each subblock of the second plurality of subblocks has a size of width (W SBIPB ) x height (H SBIPB ), wherein W SBIPB is less than or equal to W CB and less than or equal to W SB and H SBIPB is less than or equal to H CB and less than or equal to H SB ; apply the bi-directional optical flow process to a second subblock of the second plurality of subblocks to determine a second refined prediction subblock; determine a refined prediction block based on the first refined subblock and the second refined subblock; and determine a decoded version of the current block based on the refined prediction block. 13. The device of claim 12, wherein to predict each subblock of the first plurality of subblocks using the affine motion model associated with the affine prediction mode to determine the initial prediction block for the current block, the one or more processors are further configured to: receive two or more control point motion vectors; derive an initial motion vector for the subblock of the first plurality of subblocks; and locate an initial prediction block for the subblock using the initial motion vector for the subblock. 1616-353WO01 Qualcomm Ref. No.2306925WO 72 14. The device of claim 13, wherein to apply the bi-directional optical flow process to the first subblock of the second plurality of subblocks to determine the first refined prediction subblock, the one or more processors are further configured to: determine an updated motion vector for the first subblock of the second subblock of the second plurality of subblocks. 15. The device of claim 14, wherein the one or more processors are further configured to: store the updated motion vector for the first subblock of the second plurality of subblocks; and use the updated motion vector to predict a subsequent block of video data. 16. The device of claim 15, wherein to determine the refined prediction block based on the first refined subblock and the second refined subblock, the one or more processors are further configured to: apply a per-pixel bi-directional optical flow process to the first refined prediction subblock. 17. The device of claim 12, wherein to determine the refined prediction block based on the first refined subblock and the second refined subblock, the one or more processors are further configured to: apply a second bi-directional optical flow process to the first refined prediction subblock; and apply the second bi-directional optical flow process to the second refined prediction subblock. 18. The device of claim 12, wherein the one or more processors are further configured to: receive a syntax element, wherein a value of the syntax element indicates that the bi-directional optical flow process is enabled for the current block. 19. The device of claim 12, wherein W SBIPB_1 equals 1 and H SBIPB_1 equals 1. 1616-353WO01 Qualcomm Ref. No.2306925WO 73 20. The device of claim 12, wherein W SB is greater than or equal to 4, H SB is greater than or equal to 4. 21. The device of claim 12, wherein the current block comprises a bi-predicted block. 22. The device of claim 12, further comprising a display configured to display a picture of decoded video data that includes the decoded version of the current block. 23. The device of claim 12, wherein the device comprises one or more of a camera, a computer, a mobile device, a broadcast receiver device, or a set-top box. 24. The device of claim 12, wherein the device comprises a wireless communication device, further comprising a receiver configured to receive the encoded video data. 25. The device of claim 24, wherein the wireless communication device comprises a telephone handset and wherein the receiver is configured to demodulate, according to a wireless communication standard, a signal comprising the encoded video data. 26. The device of claim 12, wherein the device comprises a video decoder. 27. The device of claim 12, wherein the device comprises a video encoder. 28. A computer-readable storage medium storing instructions that when executed by one or more processors cause the one or more processors to: determine that a current block of video data is coded in an affine prediction mode, wherein the current block has a size of width (W CB ) x height (H CB ); predict each subblock of a first plurality of subblocks using an affine motion model associated with the affine prediction mode to determine an initial prediction block for the current block. wherein each subblock of the first plurality of subblocks has a size of width (W SB ) x height (H SB ), wherein W SB is less than W CB and H SB is less than H CB ; apply a bi-directional optical flow process to a first subblock of a second plurality of subblocks to determine a first refined prediction subblock, wherein each subblock of the second plurality of subblocks has a size of width (W SBIPB ) x height (H SBIPB ), wherein 1616-353WO01 Qualcomm Ref. No.2306925WO 74 W SBIPB is less than or equal to W CB and less than or equal to W SB and H SBIPB is less than or equal to H CB and less than or equal to H SB ; apply the bi-directional optical flow process to a second subblock of the second plurality of subblocks to determine a second refined prediction subblock; determine a refined prediction block based on the first refined subblock and the second refined subblock; and determine a decoded version of the current block based on the refined prediction block. 29. The computer-readable storage medium of claim 28, wherein to predict each subblock of the first plurality of subblocks using the affine motion model associated with the affine prediction mode to determine the initial prediction block for the current block, the instructions cause the one or more processors to: receive two or more control point motion vectors; derive an initial motion vector for the first subblock of the plurality of subblocks; and locate an initial prediction block for the first subblock using the initial motion vector for the subblock. 30. The computer-readable storage medium of claim 29, wherein to apply the bi- directional optical flow process to the first subblock of the second plurality of subblocks to determine the first refined prediction subblock, the one or more processors are further configured to: determine an updated motion vector for the first subblock of the second subblock of the second plurality of subblocks. 31. The computer-readable storage medium of claim 30, wherein the instruction cause the one or more processors to: store the updated motion vector for the first subblock of the second plurality of subblocks; and use the updated motion vector to predict a subsequent block of video data. 1616-353WO01 Qualcomm Ref. No.2306925WO 75 32. The computer-readable storage medium of claim 31, wherein to determine the refined prediction block based on the first refined subblock and the second refined subblock, the instruction cause the one or more processors to: apply a per-pixel bi-directional optical flow process to the first refined prediction subblock. 33. The computer-readable storage medium of claim 28, wherein to determine the refined prediction block based on the first refined subblock and the second refined subblock, the instruction cause the one or more processors to: apply a second bi-directional optical flow process to the first refined prediction subblock; and apply the second bi-directional optical flow process to the second refined prediction subblock. 34. The computer-readable storage medium of claim 28, wherein instructions cause the one or more processors to: receive a syntax element, wherein a value of the syntax element indicates that the bi-directional optical flow process is enabled for the current block. 35. The computer-readable storage medium of claim 28, wherein W SBIPB_1 equals 1 and H SBIPB_1 equals 1. 36. The computer-readable storage medium of claim 28, wherein W SB is greater than or equal to 4, H SB is greater than or equal to 4. 37. The computer-readable storage medium of claim 28, wherein the current block comprises a bi-predicted block. 1616-353WO01 Qualcomm Ref. No.2306925WO 76 38. A method of decoding video data, the method comprising: determining that a current block of the video data is coded in an affine prediction mode, wherein the current block has a size of width (W CB ) x height (H CB ); determining a motion vector for the current block based on a temporal motion vector predictor candidate; determining an initial prediction block for the current block using the motion vector; applying a bi-directional optical flow process to a first subblock of a plurality of subblocks of the initial prediction block to determine a first refined prediction subblock, wherein each subblock of the plurality of subblocks has a size of width (W SBIPB ) x height (H SBIPB ), wherein W SBIPB is less than or equal to W CB and H SBIPB is less than or equal to H CB ; applying the bi-directional optical flow process to a second subblock of the plurality of subblocks to determine a second refined prediction subblock; determining a refined prediction block based on the first refined subblock and the second refined subblock; and determining a decoded version of the current block based on the refined prediction block. 1616-353WO01

Description

Qualcomm Ref. No.2306925WO 1 BI-DIRECTION OPTICAL FLOW SUBBLOCK REFINEMENT FOR AN AFFINE MODELED BLOCK [0001] This application claims priority to U.S. Patent Application No.18/754,788, filed 26 June 2024 and U.S. Provisional Patent Application No.63/511,118, filed 29 June 2023, the entire content of each of which are incorporated herein by reference. U.S. Patent Application No.18/754,788, filed 26 June 2024 claims the benefit of U.S. Provisional Patent Application No.63/511,118, filed 29 June 2023. TECHNICAL FIELD [0002] This disclosure relates to video encoding and video decoding. BACKGROUND [0003] Digital video capabilities can be incorporated into a wide range of devices, including digital televisions, digital direct broadcast systems, wireless broadcast systems, personal digital assistants (PDAs), laptop or desktop computers, tablet computers, e-book readers, digital cameras, digital recording devices, digital media players, video gaming devices, video game consoles, cellular or satellite radio telephones, so-called “smart phones,” video teleconferencing devices, video streaming devices, and the like. Digital video devices implement video coding techniques, such as those described in the standards defined by MPEG-2, MPEG-4, ITU-T H.263, ITU-T H.264/MPEG-4, Part 10, Advanced Video Coding (AVC), ITU-T H.265/High Efficiency Video Coding (HEVC), ITU-T H.266/Versatile Video Coding (VVC), and extensions of such standards, as well as proprietary video codecs/formats such as AOMedia Video 1 (AV1) that was developed by the Alliance for Open Media. The video devices may transmit, receive, encode, decode, and/or store digital video information more efficiently by implementing such video coding techniques. [0004] Video coding techniques include spatial (intra-picture) prediction and/or temporal (inter-picture) prediction to reduce or remove redundancy inherent in video sequences. For block-based video coding, a video slice (e.g., a video picture or a portion of a video picture) may be partitioned into video blocks, which may also be referred to as coding tree units (CTUs), coding units (CUs) and/or coding nodes. Video blocks in an intra- coded (I) slice of a picture are encoded using spatial prediction with respect to reference samples in neighboring blocks in the same picture. Video blocks in an inter-coded (P or 1616-353WO01 Qualcomm Ref. No.2306925WO 2 B) slice of a picture may use spatial prediction with respect to reference samples in neighboring blocks in the same picture or temporal prediction with respect to reference samples in other reference pictures. Pictures may be referred to as frames, and reference pictures may be referred to as reference frames. SUMMARY [0005] The techniques of this disclosure are related to decoder-side motion vector derivation techniques (e.g., template matching, bilateral matching, decoder-side motion vector refinement, bi-directional optical flow (BDOF)) and affine model prediction mode. Specifically, this disclosure describes techniques that may enable BDOF to be used on blocks that are predicted using affine motion prediction and blocks that are predicted using a temporal motion vector prediction (TMVP)-based motion vector, which may improve the accuracy of prediction blocks, which in turn may result in improved rate- distortion tradeoffs when encoding video data. [0006] According to an example of this disclosure, a method of decoding video data includes determining that a current block of the video data is coded in an affine prediction mode, wherein the current block has a size of width (WCB) x height (HCB); predicting each subblock of a first plurality of subblocks using an affine motion model associated with the affine prediction mode to determine an initial prediction block for the current block. wherein each subblock of the first plurality of subblocks has a size of width (WSB) x height (HSB), wherein WSB is less than WCB and HSB is less than HCB; applying a bi-directional optical flow process to a first subblock of a second plurality of subblocks to determine a first refined prediction subblock, wherein each subblock of the second plurality of subblocks has a size of width (WSBIPB) x height (HSBIPB), wherein WSBIPB is less than or equal to WCB and less than or equal to WSB and HSBIPB is less than or equal to HCB and less than or equal to HSB; applying the bi-directional optical flow process to a second subblock of the second plurality of subblocks to determine a second refined prediction subblock; determining a refined prediction block based on the first refined subblock and the second refined subblock; and determining a decoded version of the current block based on the refined prediction block. [0007] According to an example of this disclosure, a device for decoding encoded video data includes a memory configured to store video data; one or more processors implemented in circuitry and configured to determine that a current block of the video data is coded in an affin