US-12621462-B2 - Methods and devices for candidate derivation for affine merge mode in video coding

US12621462B2US 12621462 B2US12621462 B2US 12621462B2US-12621462-B2

Abstract

Methods for video decoding and encoding, apparatuses and non-transitory storage media are provided. In one decoding method, the decoder obtains a first candidate position and a second candidate position. The decoder obtains a third candidate position based on the first and second candidate positions and obtains a virtual block based on the first, the second, and the third candidate positions. The decoder may obtain a plurality of CPMVs for the virtual block based on translational MVs at the first, second, and third candidate positions; and project, the plurality of CPMVs for the virtual block to a current block to obtain a translational MV based on a specific position within the current block or a second plurality of CPMVs for the current block.

Inventors

Wei Chen
Xiaoyu Xiu
Yi-Wen Chen
HONG-JHENG JHU
Che-Wei KUO
Ning Yan
Xianglin Wang
Bing Yu

Assignees

Beijing Dajia Internet Information Technology Co., Ltd.

Dates

Publication Date: 20260505
Application Date: 20240701

Claims (18)

1 . A method for video decoding, comprising: obtaining, by a decoder and based on a first candidate position and a second candidate position, a third candidate position, wherein the first candidate position and the second candidate position are from a plurality of non-adjacent neighbor positions that are a number of blocks away from one side of a current block; obtaining, by the decoder, a virtual block based on the first candidate position, the second candidate position, and the third candidate position; obtaining, by the decoder, a first plurality of control point motion vectors (CPMVs) for the virtual block based on translational motion vectors (MVs) at the first candidate position, the second candidate position, and the third candidate position; and projecting, by the decoder, the first plurality of CPMVs for the virtual block to the current block, to obtain a translational MV based on a specific position within the current block in response to determining that the current block is coded as a regular inter mode, or to obtain a second plurality of CPMVs for the current block in response to determining that the current block is coded as an affine mode.
2 . The method of claim 1 , further comprising: inserting, by the decoder, the translational MV into a regular merge list in response to determining that the current block is coded as the regular inter mode; and inserting, by the decoder, the second plurality of CPMVs into an affine merge candidate list or an affine advanced motion vector prediction (AMVP) candidate list in response to determining that the current block is coded as the affine mode.
3 . The method of claim 2 , further comprising: obtaining and inserting, by the decoder, an additional translational MV into the regular merge list until the regular merge list is full; and obtaining and inserting, by the decoder, additional second plurality of CPMVs into the affine merge candidate list until the affine merge candidate list is full, or into the affine AMVP candidate list until the affine AMVP candidate list is full.
4 . The method of claim 1 , wherein the specific position comprises a center position within the current block.
5 . The method of claim 1 , wherein the virtual block is a rectangular coding block and the third candidate position is determined based on a vertical position of the first candidate position and a horizontal position of the second candidate position.
6 . The method of claim 1 , further comprising: obtaining, by the decoder, the first candidate position according to a first scanning distance and a first scanning area, wherein the first scanning distance indicates a first distance between the first candidate position and a left side of the current block, and the first scanning area is located on the left of the current block; and obtaining, by the decoder, the second candidate position according to a second scanning distance and a second scanning area, wherein the second scanning distance indicates a second distance between the second candidate position and a top side of the current block, and the second scanning area is located above the current block.
7 . An apparatus for video decoding, comprising: one or more processors; and a memory coupled to the one or more processors and configured to store instructions executable by the one or more processors, wherein the one or more processors, upon execution of the instructions, are configured to perform operations comprising: obtaining, based on a first candidate position and a second candidate position, a third candidate position, wherein the first candidate position and the second candidate position are from a plurality of non-adjacent neighbor positions that are a number of blocks away from one side of a current block; obtaining a virtual block based on the first candidate position, the second candidate position, and the third candidate position; obtaining a first plurality of control point motion vectors (CPMVs) for the virtual block based on translational motion vectors (MVs) at the first candidate position, the second candidate position, and the third candidate position; and projecting the first plurality of CPMVs for the virtual block to the current block, to obtain a translational MV based on a specific position within the current block in response to determining that the current block is coded as a regular inter mode, or to obtain a second plurality of CPMVs for the current block in response to determining that the current block is coded as an affine mode.
8 . The apparatus of claim 7 , wherein the operations further comprise: inserting the translational MV into a regular merge list in response to determining that the current block is coded as the regular inter mode; and inserting the second plurality of CPMVs into an affine merge candidate list or an affine advanced motion vector prediction (AMVP) candidate list in response to determining that the current block is coded as the affine mode.
9 . The apparatus of claim 8 , wherein the operations further comprise: obtaining and inserting an additional translational MV into the regular merge list until the regular merge list is full; and obtaining and inserting additional second plurality of CPMVs into the affine merge candidate list until the affine merge candidate list is full, or into the affine AMVP candidate list until the affine AMVP candidate list is full.
10 . The apparatus of claim 7 , wherein the specific position comprises a center position within the current block.
11 . The apparatus of claim 7 , wherein the virtual block is a rectangular coding block and the third candidate position is determined based on a vertical position of the first candidate position and a horizontal position of the second candidate position.
12 . The apparatus of claim 7 , wherein the operations further comprising: obtaining the first candidate position according to a first scanning distance and a first scanning area, wherein the first scanning distance indicates a first distance between the first candidate position and a left side of the current block, and the first scanning area is located on the left of the current block; and obtaining the second candidate position according to a second scanning distance and a second scanning area, wherein the second scanning distance indicates a second distance between the second candidate position and a top side of the current block, and the second scanning area is located above the current block.
13 . A non-transitory computer-readable storage medium for video decoding storing a bitstream to be decoded by operations comprising: obtaining, by a decoder and based on a first candidate position and a second candidate position, a third candidate position, wherein the first candidate position and the second candidate position are from a plurality of non-adjacent neighbor positions that are a number of blocks away from one side of a current block; obtaining, by the decoder, a virtual block based on the first candidate position, the second candidate position, and the third candidate position; obtaining, by the decoder, a first plurality of control point motion vectors (CPMVs) for the virtual block based on translational motion vectors (MVs) at the first candidate position, the second candidate position, and the third candidate position; and projecting, by the decoder, the first plurality of CPMVs for the virtual block to the current block, to obtain a translational MV based on a specific position within the current block in response to determining that the current block is coded as a regular inter mode, or to obtain a second plurality of CPMVs for the current block in response to determining that the current block is coded as an affine mode.
14 . The medium of claim 13 , wherein the operations further comprise: inserting, by the decoder, the translational MV into a regular merge list in response to determining that the current block is coded as the regular inter mode; and inserting, by the decoder, the second plurality of CPMVs into an affine merge candidate list or an affine advanced motion vector prediction (AMVP) candidate list in response to determining that the current block is coded as the affine mode.
15 . The medium of claim 14 , wherein the operations further comprise: obtaining and inserting, by the decoder, an additional translational MV into the regular merge list until the regular merge list is full; and obtaining and inserting, by the decoder, additional second plurality of CPMVs into the affine merge candidate list until the affine merge candidate list is full, or into the affine AMVP candidate list until he affine AMVP candidate list is full.
16 . The medium of claim 13 , wherein the specific position comprises a center position within the current block.
17 . The medium of claim 13 , wherein the virtual block is a rectangular coding block and the third candidate position is determined based on a vertical position of the first candidate position and a horizontal position of the second candidate position.
18 . The medium of claim 13 , wherein the operations further comprising: obtaining, by the decoder, the first candidate position according to a first scanning distance and a first scanning area, wherein the first scanning distance indicates a first distance between the first candidate position and a left side of the current block, and the first scanning area is located on the left of the current block; and obtaining, by the decoder, the second candidate position according to a second scanning distance and a second scanning area, wherein the second scanning distance indicates a second distance between the second candidate position and a top side of the current block, and the second scanning area is located above the current block.

Description

CROSS-REFERENCE TO RELATED APPLICATION The present application is a continuation application of International Application No. PCT/US2023/010143, filed on Jan. 4, 2023, which is filed upon and claims priority to U.S. Provisional Application No. 63/296,467, entitled “Candidate Derivation for Affine Merge Mode in Video Coding,” filed on Jan. 4, 2022, all of which are incorporated by reference for all purposes. FIELD The present disclosure relates to video coding and compression, and in particular but not limited to, methods and apparatus on improving the affine merge candidate derivation for affine motion prediction mode in a video encoding or decoding process. BACKGROUND Various video coding techniques may be used to compress video data. Video coding is performed according to one or more video coding standards. For example, nowadays, some well-known video coding standards include Versatile Video Coding (VVC), High Efficiency Video Coding (HEVC, also known as H.265 or MPEG-H Part2) and Advanced Video Coding (AVC, also known as H.264 or MPEG-4 Part 10), which are jointly developed by ISO/IEC MPEG and ITU-T VECG. AOMedia Video 1 (AV1) was developed by Alliance for Open Media (AOM) as a successor to its preceding standard VP9. Audio Video Coding (AVS), which refers to digital audio and digital video compression standard, is another video compression standard series developed by the Audio and Video Coding Standard Workgroup of China. Most of the existing video coding standards are built upon the famous hybrid video coding framework i.e., using block-based prediction methods (e.g., inter-prediction, intra-prediction) to reduce redundancy present in video images or sequences and using transform coding to compact the energy of the prediction errors. An important goal of video coding techniques is to compress video data into a form that uses a lower bit rate while avoiding or minimizing degradations to video quality. The first generation AVS standard includes Chinese national standard “Information Technology, Advanced Audio Video Coding, Part 2: Video” (known as AVS1) and “Information Technology, Advanced Audio Video Coding Part 16: Radio Television Video” (known as AVS+). It can offer around 50% bit-rate saving at the same perceptual quality compared to MPEG-2 standard. The AVS1 standard video part was promulgated as the Chinese national standard in February 2006. The second generation AVS standard includes the series of Chinese national standard “Information Technology, Efficient Multimedia Coding” (knows as AVS2), which is mainly targeted at the transmission of extra HD TV programs. The coding efficiency of the AVS2 is double of that of the AVS+. In May 2016, the AVS2 was issued as the Chinese national standard. Meanwhile, the AVS2 standard video part was submitted by Institute of Electrical and Electronics Engineers (IEEE) as one international standard for applications. The AVS3 standard is one new generation video coding standard for UHD video application aiming at surpassing the coding efficiency of the latest international standard HEVC. In March 2019, at the 68-th AVS meeting, the AVS3-P2 baseline was finished, which provides approximately 30% bit-rate savings over the HEVC standard. Currently, there is one reference software, called high performance model (HPM), is maintained by the AVS group to demonstrate a reference implementation of the AVS3 standard. SUMMARY The present disclosure provides examples of techniques relating to improving the motion vector candidate derivation for motion prediction mode in a video encoding or decoding process. According to a first aspect of the present disclosure, there is provided a method of video decoding. The method includes that a decoder may obtain a third candidate position for a third affine candidate based on a first candidate position for a first affine candidate and a second candidate position for a second affine candidate, where the first affine candidate and the second affine candidate are from a plurality of non-adjacent neighbor blocks that are not adjacent to a current block; obtain a virtual block based on the first candidate position, the second candidate position, and the third candidate position; obtain plurality of control point motion vectors (CPMVs) for the virtual block based on translational motion vectors (MVs) at the first candidate position, the second candidate position, and the third candidate position; and project the plurality of CPMVs for the virtual block to the current block, to obtain a translational MV based on a specific position within a current block in response to determining that the current block is coded as a regular inter mode, or to obtain a second plurality of CPMVs for the current block in response to determining that the current block is coded as an affine mode. Furthermore, the decoder may insert the translational MV into a regular merge list in response to determining that the current block is coded as regular inter mode or i