CN-118749195-B - Method and apparatus for video encoding and decoding, and storage medium

CN118749195BCN 118749195 BCN118749195 BCN 118749195BCN-118749195-B

Abstract

Aspects of the present application provide a method and apparatus for video encoding and decoding, and a storage medium, wherein the video decoding method includes receiving an encoded video bitstream including a current block in a current picture. The current block includes a plurality of sub-blocks and is to be predicted by a sub-block-based template matching motion vector prediction SbTMVP mode. A respective collocated reference sub-block is determined for each sub-block based on a combination of the displacement vector DV and the motion vector offset MVO associated with the respective sub-block. A motion vector MV field in a corresponding collocated reference sub-block of each sub-block in the current block is determined. A respective reference template for each sub-block is derived based on the determined motion vector MV field of the collocated reference sub-block. A plurality of sub-blocks of the current block are reconstructed by predicting each sub-block using a corresponding reference template in SbTMVP mode.

Inventors

ZHAO XIN
CHEN LIANFEI
GAO HAN
LI GUICHUN
LIU SHAN

Assignees

腾讯美国有限责任公司

Dates

Publication Date: 20260512
Application Date: 20221111
Priority Date: 20221109

Claims (17)

1. A decoding method performed in a decoder, comprising: Receiving an encoded video bitstream comprising a current block in a current picture, wherein the current block comprises a plurality of sub-blocks and the current block is predicted by a sub-block based template-matched motion vector prediction SbTMVP mode; Determining a respective collocated reference sub-block for each sub-block based on a combination of the displacement vector DV and the motion vector offset MVO associated with the respective sub-block; determining a motion vector MV field in the collocated reference sub-block corresponding to each sub-block in the current block; deriving a respective reference template for each sub-block based on the determined MV field of the collocated reference sub-block, and The plurality of sub-blocks of the current block are reconstructed by predicting each sub-block in the SbTMVP mode using the respective reference template.
2. The method of claim 1, wherein determining the respective collocated reference sub-block further comprises: determining a search area, wherein the search area is positioned in one of the current picture and a reference picture of the current picture; Determining one or more reference blocks of the current block based on template matching between a template of the current block and a template of each of one or more reference blocks in the search area, the template of the current block including samples adjacent to the current block, the template of each of the one or more reference blocks including samples adjacent to a corresponding reference block of the one or more reference blocks, and The collocated reference sub-block corresponding to each sub-block is determined as the sub-block collocated with the sub-block corresponding to one of the one or more reference blocks.
3. The method of claim 2, wherein the template matching between the template of the current block and the template of each of the one or more reference blocks is determined based on one of a sum of absolute differences SAD, a sum of absolute transformed differences SATD, a sum of squared errors SSE, a sub-sampled SAD, and a mean-cut SAD.
4. The method of claim 3, wherein the determining the one or more reference blocks further comprises: Determining a plurality of candidate reference blocks in the search area; Determining a plurality of cost values based on the template matching between the template of the current block and templates of the plurality of candidate reference blocks, and The one or more reference blocks are determined to be the one or more candidate reference blocks of the plurality of candidate reference blocks that correspond to one or more lowest cost values of the plurality of cost values.
5. The method of claim 2, wherein the search area comprises one of (i) an area centered at a location collocated with the current block in the reference picture, and (ii) an area centered at the current block in the current picture.
6. The method of claim 2, wherein the determining the search area further comprises: the search area is determined based on a displacement vector DV, which is derived from one of (i) a motion vector of a spatial neighboring block of the current block and (ii) a motion vector of a merge candidate list of the current block.
7. The method of claim 6, wherein the determining the search area further comprises: The search area is determined as an area centered on the sample indicated by the DV, the area being one of rectangular and diamond-shaped.
8. The method of claim 6, wherein the determining the search area further comprises: The search area is determined as a set of samples centered on the sample indicated by the DV, the set of samples being located at least one of 0 degrees, 45 degrees, 90 degrees, or 135 degrees relative to the sample indicated by the DV.
9. The method of claim 2, wherein the determining the one or more reference blocks further comprises: Determining a first reference block of the one or more reference blocks, indicated by a first displacement vector DV from a template of the current block to a template of the first reference block, the first DV being one of (i) derived based on the template matching such that the first DV corresponds to a cost value associated with a difference between the template of the first reference block and the template of the current block, and (ii) signaled.
10. The method according to claim 2, characterized in that: the determining the search area further includes: Determining the search area based on a first displacement vector DV derived prior to the template matching, and The determining the one or more reference blocks further comprises: determining a first reference block of the one or more reference blocks is indicated by a second DV from a template of the current block to a template of the first reference block, the second DV being derived based on the template match such that the second DV corresponds to a cost value associated with a difference between the template of the first reference block and the template of the current block.
11. The method of claim 2, wherein reconstructing the plurality of sub-blocks of the current block further comprises: determining one or more MVs for a first sub-block of the plurality of sub-blocks in the current block based on one or more motion vectors MVs for sub-blocks in the one or more reference blocks that are collocated with the first sub-block; Determining one or more prediction sub-blocks of the first sub-block of the plurality of sub-blocks based on the one or more MVs of the first sub-block, and A prediction sample of the first sub-block is determined based on a combination or weighted combination of the one or more prediction sub-blocks.
12. The method as recited in claim 2, further comprising: determining a plurality of candidate reference blocks of the current block based on a plurality of displacement vectors DV, each candidate reference block of the plurality of candidate reference blocks being indicated by a respective DV of the plurality of DV, and The one or more reference blocks of the current block are determined from the plurality of candidate reference blocks based on one or more cost values of the template matching.
13. An apparatus for performing decoding, comprising: processing circuitry configured to perform the method of any one of claims 1 to 12.
14. A method of encoding for execution, the method comprising: determining a search area, wherein the search area is positioned in one of a current picture and a reference picture of the current picture, and the current picture comprises a current block; Determining one or more reference blocks of the current block based on template matching between a template of the current block and a template of each of the one or more reference blocks in the search region, wherein the template of the current block includes samples adjacent to the current block, and wherein the template of each of the one or more reference blocks includes samples adjacent to a corresponding reference block of the one or more reference blocks; Sub-block prediction samples of the current block are generated based on the determined sub-blocks of the one or more reference blocks.
15. An encoding apparatus for execution, the apparatus comprising: the processing circuitry is configured to process the data, the processing circuit is configured to: determining a search area, wherein the search area is positioned in one of a current picture and a reference picture of the current picture, and the current picture comprises a current block; Determining one or more reference blocks of the current block based on template matching between a template of the current block and a template of each of the one or more reference blocks in the search region, wherein the template of the current block includes samples adjacent to the current block, and wherein the template of each of the one or more reference blocks includes samples adjacent to a corresponding reference block of the one or more reference blocks; Sub-block prediction samples of the current block are generated based on the determined sub-blocks of the one or more reference blocks.
16. A computer readable storage medium, characterized in that the computer readable storage medium comprises a stored program, wherein the program when run by a processor performs the method of any one of claims 1 to 12 or 14.
17. A computer program product comprising computer programs/instructions which, when executed by a processor, implement the steps of the method of any of claims 1 to 12 or 14.

Description

Method and apparatus for video encoding and decoding, and storage medium Priority The present application claims the benefit of priority of U.S. patent application Ser. No. 17/983,866, "sub-block based motion vector predictor (SUBBLOCK-BASED MOTION VECTOR PREDICTOR WITH MV OFFSET DERIVED BY TEMPLATE MATCHING) with MV offsets derived by template matching", filed on day 2022, 11, 9, which claims the benefit of priority of U.S. provisional application Ser. No. 63/344,840, "filed on day 2022, 5, 23, with sub-block based motion vector predictor (Subblock Based Motion Vector Predictor With MV Offset Derived By TEMPLATE MATCHING) with MV offsets derived by template matching". Technical Field This disclosure describes embodiments generally related to video coding. Background The background description provided herein is for the purpose of generally presenting the context of the disclosure. Work of the presently named inventors, to the extent it is described in this background section, as well as aspects of the description that may not otherwise qualify as prior art at the time of filing, are neither expressly nor impliedly admitted as prior art against the present disclosure. The uncompressed digital image and/or video may include a series of pictures, each having a spatial dimension of, for example, 1920 x 1080 luma samples and associated chroma samples. The series of pictures may have a fixed or variable image rate (informally also referred to as frame rate), for example 60 pictures per second or 60Hz. Uncompressed images and/or video have specific bit rate requirements. For example, 1080p604:2:0 video (1920 x 1080 luma sample resolution at 60Hz frame rate) with 8 bits per sample requires a bandwidth close to 1.5 Gbit/s. One hour such video requires more than 600GB of storage space. One purpose of image and/or video encoding and decoding is to reduce redundancy in an input video signal by compression. Compression helps reduce the bandwidth and/or storage space requirements described above, which in some cases may be reduced by two orders of magnitude or more. Although the description herein uses video encoding/decoding as an illustrative example, the same techniques may be applied to image encoding/decoding in a similar manner without departing from the spirit of the present disclosure. Lossless compression and lossy compression, as well as combinations thereof, may be employed. Lossless compression refers to a technique by which an exact copy of the original signal can be reconstructed from the compressed original signal. When lossy compression is used, the reconstructed signal may not be identical to the original signal, but the distortion between the original signal and the reconstructed signal is small enough that the reconstructed signal is useful for the intended application. In the case of video, lossy compression is widely used. The amount of distortion allowed depends on the application, for example, users of certain consumer streaming applications may tolerate higher distortion than users of television applications. The achievable compression ratio may reflect that higher allowable/tolerable distortion may result in higher compression ratios. Video encoders and decoders can utilize several broad classes of techniques, including, for example, motion compensation, transform processing, quantization, and entropy encoding. Video codec technology includes a technique known as intra-frame coding. In intra coding, sample values are represented without reference to samples or other data from a previously reconstructed reference picture. In some video codecs, a picture is spatially subdivided into blocks of samples. When all sample blocks are encoded in intra mode, the picture may be an intra picture. Intra pictures and their derivatives (e.g., independent decoder refresh pictures) may be used to reset decoder states and thus may be used as the first picture in an encoded video bitstream and video session, or as a still image. Samples of intra blocks may be transformed and transform coefficients may be quantized prior to entropy encoding. Intra prediction may be a technique that minimizes sample values in the pre-transform domain. In some cases, the smaller the transformed DC value, the smaller the AC coefficient, and the fewer bits needed to represent the entropy encoded block at a given quantization step. For example, conventional intra-coding known in the MPEG-2 generation coding technique does not use intra-prediction. However, some newer video compression techniques include techniques that attempt to perform prediction based on surrounding sample data and/or metadata obtained during, for example, encoding and/or decoding of a block of data. This technique is hereinafter referred to as "intra prediction" technique. Note that, at least in some cases, intra prediction uses only reference data from the current picture in the reconstruction, and not reference data from the reference picture. There may be many di