EP-3854089-B1 - COMPLEXITY REDUCTION AND BIT-WIDTH CONTROL FOR BI-DIRECTIONAL OPTICAL FLOW

EP3854089B1EP 3854089 B1EP3854089 B1EP 3854089B1EP-3854089-B1

Inventors

XIU, Xiaoyu
HE, YUWEN
YE, YAN
LUO, JIANCONG

Dates

Publication Date: 20260513
Application Date: 20190917

Claims (8)

A video decoding method comprising: calculating a first horizontal gradient array ∂ I 0 ∂ x i j , and a first vertical gradient array ∂ I 0 ∂ y i j from a first prediction signal array I (0) ( i, j ) obtained from a first reference picture; calculating a second horizontal gradient array ∂ I 1 ∂ x i j , and a second vertical gradient array ∂ I 1 ∂ y i j from a second prediction signal array I (1) ( i, j ) obtained from a second reference picture; calculating a horizontal intermediate parameter array ψ x ( i, j ) by performing a first number of right bit shifts on a sum of (i) the first horizontal gradient array and (ii) the second horizontal gradient array; calculating a vertical intermediate parameter array ψ y ( i, j ) by performing the first number of right bit shifts on a sum of (i) the first vertical gradient array and (ii) the second vertical gradient array; performing a second number of right bit shifts on the first prediction signal array I (0) ( i, j ) and on the second prediction signal array I (1) ( i, j ); calculating a signal-difference parameter array θ ( i, j ) by calculating a difference between the right-bit-shifted version of the first prediction signal array I (0) ( i, j ) and the right-bit-shifted version of the second prediction signal array I (1) ( i, j ); calculating a signal-horizontal-gradient correlation parameter S 3 by summing components of an elementwise multiplication of the signal-difference parameter array θ ( i, j ) with the horizontal intermediate parameter array ψ x ( i, j ); calculating a cross-gradient correlation parameter S 2 by summing components of an elementwise multiplication of (i) the horizontal intermediate parameter array ψ x ( i, j ) with (ii) the vertical intermediate parameter array ψ y ( i, j ); calculating a horizontal motion refinement v x by a method comprising bit-shifting the signal-horizontal-gradient correlation parameter S 3 to obtain the horizontal motion refinement v x ; calculating a vertical motion refinement v y by a method comprising determining a product of (i) the horizontal motion refinement v x and (ii) the cross-gradient correlation parameter S 2 ; and generating a prediction of a current block in a video with bi-directional optical flow using at least the horizontal motion refinement v x and the vertical motion refinement v y .
The method of claim 1, further comprising: calculating a signal-vertical-gradient correlation parameter S 6 by summing components of an elementwise multiplication of (i) the signal-difference parameter array θ ( i, j ) with (ii) the vertical intermediate parameter array ψ y ( i, j ); performing a third number of left bit-shifts on the signal-vertical-gradient correlation parameter S 6 , wherein the third number is the second number of right bit shifts minus the first number of right bit shifts, wherein calculating the vertical motion refinement v y further comprises subtracting, from the left bit-shifted version of the signal-vertical-gradient correlation parameter S 6 , half the product of (i) the horizontal motion refinement v x and (ii) the cross-gradient correlation parameter S 2 .
A video decoding apparatus comprising a processor configured to perform at least: calculating a first horizontal gradient array ∂ I 0 ∂ x i j and a first vertical gradient array ∂ I 0 ∂ y i j from a first prediction signal array I (0) ( i, j ) obtained from a first reference picture; calculating a second horizontal gradient array ∂ I 1 ∂ x i j and a second vertical gradient array ∂ I 1 ∂ y i j from a second prediction signal array I (1) ( i, j ) obtained from a second reference picture; calculating a horizontal intermediate parameter array ψ x ( i, j ) by performing a first number of right bit shifts on a sum of (i) the first horizontal gradient array and (ii) the second horizontal gradient array; calculating a vertical intermediate parameter array ψ y ( i, j ) by performing the first number of right bit shifts on a sum of (i) the first vertical gradient array and (ii) the second vertical gradient array; performing a second number of right bit shifts on the first prediction signal array I (0) ( i, j ) and on the second prediction signal array I (1) ( i , j ); calculating a signal-difference parameter array θ ( i, j ) by calculating a difference between the right-bit-shifted version of the first prediction signal array I (0) ( i, j ) and the right-bit-shifted version of the second prediction signal array I (1) ( i , j ); calculating a signal-horizontal-gradient correlation parameter S 3 by summing components of an elementwise multiplication of the signal-difference parameter array θ ( i, j ) with the horizontal intermediate parameter array ψ x ( i, j ); calculating a cross-gradient correlation parameter S 2 by summing components of an elementwise multiplication of (i) the horizontal intermediate parameter array ψ x ( i, j ) with (ii) the vertical intermediate parameter array ψ y ( i, j ); calculating a horizontal motion refinement v x by a method comprising bit-shifting the signal-horizontal-gradient correlation parameter S 3 to obtain the horizontal motion refinement v x ; calculating a vertical motion refinement v y by a method comprising determining a product of (i) the horizontal motion refinement v x and (ii) the cross-gradient correlation parameter S 2 ; and generating a prediction of a current block in a video with bi-directional optical flow using at least the horizontal motion refinement v x and the vertical motion refinement v y .
The apparatus of claim 3, further configured to perform: calculating a signal-vertical-gradient correlation parameter S 6 by summing components of an elementwise multiplication of (i) the signal-difference parameter array θ ( i, j ) with (ii) the vertical intermediate parameter array ψ y ( i, j ); performing a third number of left bit-shifts on the signal-vertical-gradient correlation parameter S 6 , wherein the third number is the second number of right bit shifts minus the first number of right bit shifts, wherein calculating the vertical motion refinement v y further comprises subtracting, from the left bit-shifted version of the signal-vertical-gradient correlation parameter S 6 , half the product of (i) the horizontal motion refinement v x and (ii) the cross-gradient correlation parameter S 2 .
A video encoding method comprising: calculating a first horizontal gradient array ∂ I 0 ∂ x i j and a first vertical gradient array ∂ I 0 ∂ y i j from a first prediction signal array I (0) ( i, j ) obtained from a first reference picture; calculating a second horizontal gradient array ∂ I 1 ∂ x i j and a second vertical gradient array ∂ I 1 ∂ y i j from a second prediction signal array I (1) ( i, j ) obtained from a second reference picture; calculating a horizontal intermediate parameter array ψ x ( i, j ) by performing a first number of right bit shifts on a sum of (i) the first horizontal gradient array and (ii) the second horizontal gradient array; calculating a vertical intermediate parameter array ψ y ( i, j ) by performing the first number of right bit shifts on a sum of (i) the first vertical gradient array and (ii) the second vertical gradient array; performing a second number of right bit shifts on the first prediction signal array I (0) ( i, j ) and on the second prediction signal array I (1) ( i , j ); calculating a signal-difference parameter array θ ( i, j ) by calculating a difference between the right-bit-shifted version of the first prediction signal array I (0) ( i, j ) and the right-bit-shifted version of the second prediction signal array I (1) ( i , j ); calculating a signal-horizontal-gradient correlation parameter S 3 by summing components of an elementwise multiplication of the signal-difference parameter array θ ( i, j ) with the horizontal intermediate parameter array ψ x ( i, j ); calculating a cross-gradient correlation parameter S 2 by summing components of an elementwise multiplication of (i) the horizontal intermediate parameter array ψ x ( i, j ) with (ii) the vertical intermediate parameter array ψ y ( i, j ); calculating a horizontal motion refinement v x by a method comprising bit-shifting the signal-horizontal-gradient correlation parameter S 3 to obtain the horizontal motion refinement v x ; calculating a vertical motion refinement v y by a method comprising determining a product of (i) the horizontal motion refinement v x and (ii) the cross-gradient correlation parameter S 2 ; and generating a prediction of a current block in a video with bi-directional optical flow using at least the horizontal motion refinement v x and the vertical motion refinement v y .
The method of claim 5, further comprising: calculating a signal-vertical-gradient correlation parameter S 6 by summing components of an elementwise multiplication of (i) the signal-difference parameter array θ ( i, j ) with (ii) the vertical intermediate parameter array ψ y ( i, j ); performing a third number of left bit-shifts on the signal-vertical-gradient correlation parameter S 6 , wherein the third number is the second number of right bit shifts minus the first number of right bit shifts, wherein calculating the vertical motion refinement v y further comprises subtracting, from the left bit-shifted version of the signal-vertical-gradient correlation parameter S 6 , half the product of (i) the horizontal motion refinement v x and (ii) the cross-gradient correlation parameter S 2 .
A video encoding apparatus comprising a processor configured to perform at least: calculating a first horizontal gradient array ∂ I 0 ∂ x i j and a first vertical gradient array ∂ I 0 ∂ y i j from a first prediction signal array I (0) ( i, j ) obtained from a first reference picture; calculating a second horizontal gradient array ∂ I 1 ∂ x i j and a second vertical gradient array ∂ I 1 ∂ y i j from a second prediction signal array I (1) ( i, j ) obtained from a second reference picture; calculating a horizontal intermediate parameter array ψ x ( i, j ) by performing a first number of right bit shifts on a sum of (i) the first horizontal gradient array and (ii) the second horizontal gradient array; calculating a vertical intermediate parameter array ψ y ( i, j ) by performing the first number of right bit shifts on a sum of (i) the first vertical gradient array and (ii) the second vertical gradient array; performing a second number of right bit shifts on the first prediction signal array I (0) ( i, j ) and on the second prediction signal array I (1) ( i , j ); calculating a signal-difference parameter array θ ( i, j ) by calculating a difference between the right-bit-shifted version of the first prediction signal array I (0) ( i, j ) and the right-bit-shifted version of the second prediction signal array I (1) ( i , j ); calculating a signal-horizontal-gradient correlation parameter S 3 by summing components of an elementwise multiplication of the signal-difference parameter array θ ( i, j ) with the horizontal intermediate parameter array ψ x ( i, j ); calculating a cross-gradient correlation parameter S 2 by summing components of an elementwise multiplication of (i) the horizontal intermediate parameter array ψ x ( i, j ) with (ii) the vertical intermediate parameter array ψ y ( i, j ); calculating a horizontal motion refinement v x by a method comprising bit-shifting the signal-horizontal-gradient correlation parameter S 3 to obtain the horizontal motion refinement v x ; calculating a vertical motion refinement v y by a method comprising determining a product of (i) the horizontal motion refinement v x and (ii) the cross-gradient correlation parameter S 2 ; and generating a prediction of a current block in a video with bi-directional optical flow using at least the horizontal motion refinement v x and the vertical motion refinement v y .
The apparatus of claim 7, further configured to perform: calculating a signal-vertical-gradient correlation parameter S 6 by summing components of an elementwise multiplication of (i) the signal-difference parameter array θ ( i, j ) with (ii) the vertical intermediate parameter array ψ y ( i, j ); performing a third number of left bit-shifts on the signal-vertical-gradient correlation parameter S 6 , wherein the third number is the second number of right bit shifts minus the first number of right bit shifts, wherein calculating the vertical motion refinement v y further comprises subtracting, from the left bit-shifted version of the signal-vertical-gradient correlation parameter S 6 , half the product of (i) the horizontal motion refinement v x and (ii) the cross-gradient correlation parameter S 2 .

Description

BACKGROUND Video coding systems are widely used to compress digital video signals to reduce the storage need and/or transmission bandwidth of such signals. Among the various types of video coding systems, such as block-based, wavelet-based, and object-based systems, nowadays block-based hybrid video coding systems are the most widely used and deployed. Examples of block-based video coding systems include international video coding standards such as the MPEG1/2/4 part 2, H.264/MPEG-4 part 10 AVC, VC-1, and High Efficiency Video Coding (HEVC), which was developed by JCT-VC (Joint Collaborative Team on Video Coding) of ITU-T/SG16/Q.6/VCEG and ISO/IEC/MPEG. The first version of the HEVC standard was finalized in October 2013, which offers approximately 50% bit-rate saving or equivalent perceptual quality compared to the prior generation video coding standard H.264/MPEG AVC. Although the HEVC standard provides significant coding improvements than its predecessor, there is evidence that superior coding efficiency can be achieved with additional coding tools over HEVC. Based on that, both VCEG and MPEG started the exploration work of new coding technologies for future video coding standardization. The Joint Video Exploration Team (JVET) was formed in Oct. 2015 by ITU-T VECG and ISO/IEC MPEG to begin significant study of advanced technologies that could enable substantial enhancement of coding efficiency. Reference software called joint exploration model (JEM) was maintained by the JVET by integrating several additional coding tools on top of the HEVC test model (HM). In Oct. 2017, the joint call for proposals (CfP) on video compression with capability beyond HEVC was issued by ITU-T and ISO/IEC. In Apr. 2018, 23 CfP responses were received and evaluated at the 10-th JVET meeting, which demonstrated compression efficiency gain over the HEVC around 40%. Based on such evaluation results, the JVET launched a new project to develop the new generation video coding standard that is named as Versatile Video Coding (VVC). In the same month, one reference software codebase, called VVC test model (VTM), was established for demonstrating a reference implementation of the VVC standard. Meanwhile, to facilitate the assessment of new coding tools, another reference software base called benchmark set (BMS) was also generated. In the BMS codebase, a list of additional coding tools which provides higher coding efficiency and moderate implementation complexity, are included on top of the VTM and used as the benchmark when evaluating similar coding technologies during the VVC standardization process. Specifically, there are 5 JEM coding tools integrated in the BMS-2.0, including 4x4 non-separable secondary transform (NSST), generalized bi-prediction (GBi), bi-directional optical flow (BIO), decoder-side motion vector refinement (DMVR) and current picture referencing (CPR). SUMMARY The invention is defined in the independent claims. Preferred embodiments are defined in the dependent claims. Embodiments described herein may be performed to generate a prediction of a video block by an encoder or by a decoder. An encoder or decoder system may include a processor and a non-transitory computer-readable medium storing instructions for performing the methods described herein. Additional embodiments include a non-transitory computer-readable storage medium storing a video encoded using the methods described herein. BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1A is a system diagram illustrating an example communications system in which one or more disclosed embodiments may be implemented.FIG. 1B is a system diagram illustrating an example wireless transmit/receive unit (WTRU) that may be used within the communications system illustrated in FIG. 1A according to an embodiment.FIG. 2A is a functional block diagram of block-based video encoder, such as an encoder used for VVC.FIG. 2B is a functional block diagram of a block-based video decoder, such as a decoder used for VVC.FIGs. 3A-3E illustrate block partitions in a multi-type tree structure: quaternary partition (FIG. 3A); vertical binary partition (FIG. 3B); horizontal binary partition (FIG. 3C); vertical ternary partition (FIG. 3D); horizontal ternary partition (FIG. 3E).FIG. 4 is a schematic illustration of prediction using bidirectional optical flow (BIO).FIG. 5 illustrates a method of using simplified filters to generate the extended samples for BIO according to some embodiments.FIG. 6 illustrates a method of using simplified filters to generate the extended samples for BIO according to some embodiments.FIG. 7 illustrates sample and gradient padding to reduce the number of interpolated samples in the extended region of one BIO coding unit (CU) according to some embodiments.FIG. 8 is a diagram illustrating an example of a coded bitstream structure.FIG. 9 is a diagram illustrating an example communication system.FIG. 10 illustrates using integer samples as the extended samples for the BIO deriva