EP-3788785-B1 - TECHNIQUES FOR SIMPLIFIED AFFINE MOTION MODEL CODING WITH PREDICTION OFFSETS

EP3788785B1EP 3788785 B1EP3788785 B1EP 3788785B1EP-3788785-B1

Inventors

LI, GUICHUN
XU, XIAOZHONG
LI, XIANG
LIU, SHAN

Dates

Publication Date: 20260506
Application Date: 20190910

Claims (7)

A method for video decoding in a decoder, comprising: decoding (S1610) prediction information of a block in a current picture from a coded video bitstream; determining (S1620) parameters of an affine model, the parameters of the affine model being used to transform between the block and a reference block in a reference picture that has been reconstructed; and reconstructing (S1630) at least a sample of the block according to the affine model, wherein the prediction information includes a plurality of offset indices for offset values associated with the affine model in an inter prediction mode, wherein the parameters of the affine model are determined based on the plurality of offset indices, each of the plurality of the offset indices identifying a corresponding offset value in a respective pre-defined mapping table that maps indexes to corresponding offset values, wherein the plurality of offset indices comprises at least a distance offset index, an offset direction index, a delta scaling index, and a delta rotation index, the method further comprising one of: determining a base predictor of the block from a predictor candidate list based on a base predictor index that is signaled, the predictor candidate list including more than one predictor candidates; and determining a base predictor of the block based on a predefined base predictor index when the base predictor index is not signaled, wherein the block includes two or more control points, and wherein the base predictor comprises at least a scaling parameter, a rotational parameter and a translation motion vector, and decoding (S1610) prediction information comprises: decoding the distance offset index to determine a distance offset value based on the respective pre-defined mapping table of the distance offset index; decoding the offset direction index to determine an offset direction based on the respective pre-defined mapping table of the offset direction index; decoding the delta scaling index to determine a delta scaling parameter based on the respective pre-defined mapping table of the delta scaling index; and decoding the delta rotation index to determine a delta rotation parameter based on the respective pre-defined mapping table of the delta rotation index, wherein the method further comprises deriving a motion vector for one of the two or more control points of the block in the current picture based on a combination of the rotational parameter of the base predictor and the delta rotational parameter, on a combination of the scaling parameter of the base predictor and the delta scaling parameter, and on an application of the distance offset value and the offset direction to the translational motion vector of the base predictor.
The method of claim 1, wherein the pre-defined mapping table is adjustable and received at one of a sequence level, a slice level, a tile level, a tile group level, and a block level.
The method of claim 1, wherein the deriving further comprises one of: setting a scaling parameter of the base predictor as a scaling parameter of the block in the current picture based on a determination that a zero delta flag is true; and applying the delta scaling parameter to the scaling parameter of the base predictor to generate the scaling parameter of the block based on a determination that the zero delta flag is false.
The method of claim 1, wherein the deriving further comprises one of: setting a rotation parameter of the base predictor as a rotation parameter of the block based on a determination that a zero delta flag is true; and applying the delta rotation parameter to the rotation parameter of the base predictor to generate the rotation parameter of the block in the current picture based on a determination that the zero delta flag is false.
The method of claim 1, wherein the deriving further comprises one of: setting a translational motion vector of the base predictor as a translational motion vector of the block based on a determination that a zero motion vector difference flag is true; and applying the distance offset value and the offset direction onto the translational motion vector of the base predictor to generate the translational motion vector of the block based on a determination that the zero motion vector difference flag is false.
An apparatus for video decoding, comprising: processing circuitry configured to perform the method of any one of claims 1-5.
A non-transitory computer-readable medium storing instructions which when executed by a computer for video decoding cause the computer to perform the method of any one of claims 1-5.

Description

This present disclosure claims the benefit of priority to U.S. Patent Application No. 16/398,308, "TECHNIQUES FOR SIMPLIFIED AFFINE MOTION MODEL CODING WITH PREDICTION OFFSETS" filed on April 30, 2019, which claims the benefit of priority to U.S. Provisional Application No. 62/734,998, "TECHNIQUES FOR SIMPLIFIED AFFINE MOTION MODEL CODING WITH PREDICTION OFFSETS" filed on September 21, 2018. TECHNICAL FIELD The present disclosure describes embodiments generally related to video coding. BACKGROUND The background description provided herein is for the purpose of generally presenting the context of the disclosure. Work of the presently named inventors, to the extent the work is described in this background section, as well as aspects of the description that may not otherwise qualify as prior art at the time of filing, are neither expressly nor impliedly admitted as prior art against the present disclosure. Video coding and decoding can be performed using inter-picture prediction with motion compensation. Uncompressed digital video can include a series of pictures, each picture having a spatial dimension of, for example, 1920 x 1080 luminance samples and associated chrominance samples. The series of pictures can have a fixed or variable picture rate (informally also known as frame rate), of, for example 60 pictures per second or 60 Hz. Uncompressed video has significant bitrate requirements. For example, 1080p60 4:2:0 video at 8 bit per sample (1920x1080 luminance sample resolution at 60 Hz frame rate) requires close to 1.5 Gbit/s bandwidth. An hour of such video requires more than 600 GBytes of storage space. One purpose of video coding and decoding can be the reduction of redundancy in the input video signal, through compression. Compression can help reduce the aforementioned bandwidth or storage space requirements, in some cases by two orders of magnitude or more. Both lossless and lossy compression, as well as a combination thereof can be employed. Lossless compression refers to techniques where an exact copy of the original signal can be reconstructed from the compressed original signal. When using lossy compression, the reconstructed signal may not be identical to the original signal, but the distortion between original and reconstructed signals is small enough to make the reconstructed signal useful for the intended application. In the case of video, lossy compression is widely employed. The amount of distortion tolerated depends on the application; for example, users of certain consumer streaming applications may tolerate higher distortion than users of television distribution applications. The compression ratio achievable can reflect that: higher allowable/tolerable distortion can yield higher compression ratios. Motion compensation can be a lossy compression technique and can relate to techniques where a block of sample data from a previously reconstructed picture or part thereof (reference picture), after being spatially shifted in a direction indicated by a motion vector (MV henceforth), is used for the prediction of a newly reconstructed picture or picture part. In some cases, the reference picture can be the same as the picture currently under reconstruction. MVs can have two dimensions X and Y, or three dimensions, the third being an indication of the reference picture in use (the latter, indirectly, can be a time dimension). In some video compression techniques, an MV applicable to a certain area of sample data can be predicted from other MVs, for example from those related to another area of sample data spatially adjacent to the area under reconstruction, and preceding that MV in decoding order. Doing so can substantially reduce the amount of data required for coding the MV, thereby removing redundancy and increasing compression. MV prediction can work effectively, for example, because when coding an input video signal derived from a camera (known as natural video) there is a statistical likelihood that areas larger than the area to which a single MV is applicable move in a similar direction and, therefore, can in some cases be predicted using a similar motion vector derived from MVs of neighboring area. That results in the MV found for a given area to be similar or the same as the MV predicted from the surrounding MVs, and that in turn can be represented, after entropy coding, in a smaller number of bits than what would be used if coding the MV directly. In some cases, MV prediction can be an example of lossless compression of a signal (namely: the MVs) derived from the original signal (namely: the sample stream). In other cases, MV prediction itself can be lossy, for example because of rounding errors when calculating a predictor from several surrounding MVs. Various MV prediction mechanisms are described in H.265/HEVC (ITU-T Rec. H.265, "High Efficiency Video Coding", December 2016). Out of the many MV prediction mechanisms that H.265 offers, described here is a technique henceforth referred