US-12627809-B2 - History-based motion vector prediction

US12627809B2US 12627809 B2US12627809 B2US 12627809B2US-12627809-B2

Abstract

In a method of video decoding performed by a decoder, a current picture is obtained from a coded video bitstream. The current picture is divided into a plurality of units and divided into a plurality of tiles. Each tile includes at least one unit of the plurality of units. A first current unit in a first tile of the plurality of tiles is decoded. A first HMVP buffer is updated with a motion vector of the first current unit that has been decoded. A position of the first current unit in the first tile of the plurality of tiles is determined. The first HMVP buffer is reset when the first current unit is located in a first column of the first tile.

Inventors

Xiaozhong Xu
Xiang Li
Shan Liu

Assignees

Tencent America LLC

Dates

Publication Date: 20260512
Application Date: 20240919

Claims (20)

1 . A method of video decoding performed by a decoder, the method comprising: obtaining a current picture from a coded video bitstream, the current picture being divided into a plurality of units and divided into a plurality of tiles, each tile including at least one unit of the plurality of units; decoding a first current unit in a first tile of the plurality of tiles; updating a first HMVP buffer with a motion vector of the first current unit that has been decoded; determining a position of the first current unit in the first tile of the plurality of tiles; and resetting the first HMVP buffer when the first current unit is located in a first column of the first tile.
2 . The method according to claim 1 , further comprising: when the first current unit is located in a top column of the first tile, copying contents of a first row buffer to the first HMVP buffer.
3 . The method according to claim 2 , further comprising: when the first current unit is located in a last column of the first tile, copying contents of the first HMVP buffer to the first row buffer.
4 . The method of claim 1 , further comprising: decoding a second current unit in a second tile of the plurality of tiles; updating a second HMVP buffer with a motion vector of the second current unit that has been decoded; determining a position of the second current unit in the second tile of the plurality of tiles; and resetting the second HMVP buffer when the second current unit is located in a first column of the second tile.
5 . The method according to claim 4 , further comprising: when the second current unit is located in a first column of the second tile, copying contents of a second row buffer to the second HMVP buffer.
6 . The method according to claim 5 , further comprising: when the second current unit is located in a last column of the second tile, copying contents of the second HMVP buffer to the second row buffer.
7 . The method according to claim 4 , wherein the decoding of the first current unit is performed in parallel with the decoding of the second current unit.
8 . The method according to claim 4 , wherein the first HMVP buffer and the second HMVP buffer are first-in-first-out (FIFO) buffers; the updating the first HMVP buffer with the motion vector of the decoded first current unit includes storing the motion vector in a last entry of the first HMVP buffer and deleting a first entry of the first HMVP buffer; and the updating the second HMVP buffer with the motion vector of the decoded second current unit includes storing the motion vector in a last entry of the second HMVP buffer and deleting a first entry of the second HMVP buffer.
9 . A method of video encoding performed by an encoder, the method comprising: obtaining a current picture, the current picture being divided into a plurality of units and divided into a plurality of tiles, each tile including at least one unit of the plurality of units; encoding a first current unit in a first tile of the plurality of tiles; updating a first HMVP buffer with a motion vector of the first current unit that has been encoded; determining a position of the first current unit in the first tile of the plurality of tiles; and resetting the first HMVP buffer when the first current unit is located in a first column of the first tile.
10 . The method according to claim 9 , further comprising: when the first current unit is located in a top column of the first tile, copying contents of a first row buffer to the first HMVP buffer.
11 . The method according to claim 10 , further comprising: when the first current unit is located in a last column of the first tile, copying contents of the first HMVP buffer to the first row buffer.
12 . The method of claim 9 , further comprising: encoding a second current unit in a second tile of the plurality of tiles; updating a second HMVP buffer with a motion vector of the second current unit that has been encoded; determining a position of the second current unit in the second tile of the plurality of tiles; and resetting the second HMVP buffer when the second current unit is located in a first column of the second tile.
13 . The method according to claim 12 , further comprising: when the second current unit is located in a first column of the second tile, copying contents of a second row buffer to the second HMVP buffer.
14 . The method according to claim 13 , further comprising: when the second current unit is located in a last column of the second tile, copying contents of the second HMVP buffer to the second row buffer.
15 . The method according to claim 12 , wherein the encoding of the first current unit is performed in parallel with the encoding of the second current unit.
16 . The method according to claim 12 , wherein the first HMVP buffer and the second HMVP buffer are first-in-first-out (FIFO) buffers; the updating the first HMVP buffer with the motion vector of the encoded first current unit includes storing the motion vector in a last entry of the first HMVP buffer and deleting a first entry of the first HMVP buffer; and the updating the second HMVP buffer with the motion vector of the encoded second current unit includes storing the motion vector in a last entry of the second HMVP buffer and deleting a first entry of the second HMVP buffer.
17 . A non-transitory computer-readable storage medium storing instructions which, when executed by a processor, cause the processor to perform a method of encoding a bitstream, comprising: obtaining a current picture, the current picture being divided into a plurality of units and divided into a plurality of tiles, each tile including at least one unit of the plurality of units; encoding, in the bitstream, a first current unit in a first tile of the plurality of tiles; updating a first HMVP buffer with a motion vector of the first current unit that has been encoded; determining a position of the first current unit in the first tile of the plurality of tiles; resetting the first HMVP buffer when the first current unit is located in a first column of the first tile; and transmitting the encoded bitstream.
18 . The non-transitory computer-readable storage medium according to claim 17 , wherein the method further comprises: when the first current unit is located in a top column of the first tile, copying contents of a first row buffer to the first HMVP buffer.
19 . The non-transitory computer-readable storage medium according to claim 18 , wherein the method further comprises: when the first current unit is located in a last column of the first tile, copying contents of the first HMVP buffer to the first row buffer.
20 . The non-transitory computer-readable storage medium according to claim 17 , wherein the method further comprises: encoding, in the bitstream, a second current unit in a second tile of the plurality of tiles; updating a second HMVP buffer with a motion vector of the second current unit that has been encoded; determining a position of the second current unit in the second tile of the plurality of tiles; and resetting the second HMVP buffer when the second current unit is located in a first column of the second tile.

Description

INCORPORATION BY REFERENCE This present application is a continuation of U.S. application Ser. No. 18/153,489, “HISTORY-BASED MOTION VECTOR PREDICTION” filed on Jan. 12, 2023, which is a continuation of U.S. application Ser. No. 17/135,441, “METHOD AND APPARATUS FOR HISTORY-BASED MOTION VECTOR PREDICTION” filed on Dec. 28, 2020, now U.S. Pat. No. 11,589,054, which is a continuation of U.S. application Ser. No. 16/653,448, “METHOD AND APPARATUS FOR HISTORY-BASED MOTION VECTOR PREDICTION” filed on Oct. 15, 2019, now U.S. Pat. No. 10,911,760, which is a continuation of U.S. application Ser. No. 16/203,364, “METHOD AND APPARATUS FOR HISTORY-BASED MOTION VECTOR PREDICTION” filed on Nov. 28, 2018, now U.S. Pat. No. 10,491,902, which claims the benefit of priority to U.S. Provisional Application No. 62/698,559, “METHOD AND APPARATUS FOR HISTORY-BASED MOTION VECTOR PREDICTION” filed on Jul. 16, 2018, which are hereby incorporated by reference in their entirety. TECHNICAL FIELD The present disclosure describes embodiments generally related to video coding. BACKGROUND The background description provided herein is for the purpose of generally presenting the context of the disclosure. Work of the presently named inventors, to the extent the work is described in this background section, as well as aspects of the description that may not otherwise qualify as prior art at the time of filing, are neither expressly nor impliedly admitted as prior art against the present disclosure. Video coding and decoding using inter-picture prediction with motion compensation has been known for decades. Uncompressed digital video can include a series of pictures, each picture having a spatial dimension of, for example, 1920×1080 luminance samples and associated chrominance samples. The series of pictures can have a fixed or variable picture rate (informally also known as frame rate), of, for example 60 pictures per second or 60 Hz. Uncompressed video has significant bitrate requirements. For example, 1080p60 4:2:0 video at 8 bit per sample (1920×1080 luminance sample resolution at 60 Hz frame rate) requires close to 1.5 Gbit/s bandwidth. An hour of such video requires more than 600 GByte of storage space. One purpose of video coding and decoding can be the reduction of redundancy in the input video signal, through compression. Compression can help reduce aforementioned bandwidth or storage space requirements, in some cases by two orders of magnitude or more. Both lossless and lossy compression, as well as a combination thereof can be employed. Lossless compression refers to techniques where an exact copy of the original signal can be reconstructed from the compressed original signal. When using lossy compression, the reconstructed signal may not be identical to the original signal, but the distortion between the original and reconstructed signal is small enough to make the reconstructed signal useful for the intended application. In the case of video, lossy compression is widely employed. The amount of distortion tolerated depends on the application; for example, users of certain consumer streaming applications may tolerate higher distortion than users of television contribution applications. The compression ratio achievable can reflect that: higher allowable/tolerable distortion can yield higher compression ratios. Motion compensation can be a lossy compression technique and can relate to techniques where a block of sample data from a previously reconstructed picture or part thereof (reference picture), after being spatially shifted in a direction indicated by a motion vector (MV henceforth), is used for the prediction of a newly reconstructed picture or picture part. In some cases, the reference picture can be the same as the picture currently under reconstruction. MVs can have two dimensions X and Y, or three dimensions, the third being an indication of the reference picture in use (the latter, indirectly, can be a time dimension). In some video compression techniques, an MV applicable to a certain area of sample data can be predicted from other MVs, for example from those related to another area of sample data spatially adjacent to the area under reconstruction, and preceding that MV in decoding order. Doing so can substantially reduce the amount of data required for coding the MV, thereby removing redundancy and increasing compression. MV prediction can work effectively, for example, because when coding an input video signal derived from a camera (known as natural video) there is a statistical likelihood that areas larger than the area to which a single MV is applicable move in a similar direction and, therefore, can in some cases be predicted using a similar motion vector derived from neighboring area's MVs. That results in the MV found for a given area to be similar or the same as the MV predicted from the surrounding MVs, and that in turn can be represented, after entropy coding, in a smaller number of bits than what would be use