KR-102963505-B1 - Signaling for merge mode based on motion vector differences in video coding

KR102963505B1KR 102963505 B1KR102963505 B1KR 102963505B1KR-102963505-B1

Abstract

A video decoder configured to generate a first merge candidate list for a first block; determine that the first block is coded in a merge mode based on motion vector differences; in response to determining that the maximum number of entries for the first merge candidate list is equal to 1, infer that the value of a first instance of a flag is equal to a first value, wherein the first value for the flag indicates that the first block will be decoded using a first entry in the first merge candidate list; receive first motion vector difference information; determine first motion information for predicting the first block based on candidate motion information included in a first entry of the first merge candidate list and the first motion vector difference information; and decode the first block using the first motion information.

Inventors

창 야오-젠
치엔 웨이-정
카르체비츠 마르타

Assignees

퀄컴 인코포레이티드

Dates

Publication Date: 20260511
Application Date: 20200219
Priority Date: 20200218

Claims (20)

As a method for decoding video data, A step of generating a first merge candidate list for the first block; A step of determining that the first block is coded in a merge mode based on motion vector differences; A step of inferring that the value of a first instance of a flag is equal to a first value in response to determining that the maximum number of entries for the first merge candidate list is equal to 1, wherein the first value for the flag indicates that the first block will be decoded using the first entry in the first merge candidate list; Step of receiving first motion vector difference information; A step of determining first motion information for predicting the first block based on candidate motion information included in the first entry of the first merge candidate list and the first motion vector difference information; A step of decoding the first block using the first motion information; A step of generating a second merge candidate list for the second block; A step of determining that the second block is coded in a merge mode based on the motion vector differences; A step of receiving a second instance of the flag in the video data in response to determining that the maximum number of entries for the second merge candidate list is greater than 1, wherein the second instance of the flag is set to be equal to the first value, and the first value for the second instance of the flag indicates that the second block will be decoded using the first entry in the second merge candidate list; Step of receiving second motion vector difference information; A step of determining second motion information for predicting the second block based on candidate motion information included in the first entry of the second merge candidate list and the second motion vector difference information; and A method for decoding video data, comprising the step of decoding the second block using the second motion information.
delete
In Article 1, A step of generating a third merge candidate list for the third block; A step of determining that the above third block is coded in a merge mode based on the motion vector differences; In response to determining that the maximum number of entries for the third merge candidate list is greater than 1, the step of receiving a third instance of the flag in the video data, wherein the third instance of the flag is identical to a second value, and the second value for the flag indicates that the third block will be decoded using the second entry in the third merge candidate list; Step of receiving third motion vector difference information; A step of determining third motion information for predicting the third block based on candidate motion information included in the second entry of the third merge candidate list and the third motion vector difference information; and A method for decoding video data, further comprising the step of decoding the third block using the third motion information.
In Article 1, The step of determining that the first block is coded in a merge mode based on the motion vector differences is: A step of receiving a first syntax element indicating that the first block is coded in merge mode; A method for decoding video data, comprising the step of receiving a second syntax element indicating that the first block is coded in a merge mode based on the motion vector differences, in response to determining that the first syntax element indicates that the first block is coded in the merge mode.
In Article 1, A method for decoding video data, wherein the step of determining that the number of entries for the first merge candidate list is equal to 1 includes the step of receiving an indication of the number of entries for the first merge candidate list in a sequence parameter set.
In Article 1, The step of receiving the first motion vector difference information includes the step of receiving a distance index that identifies an offset, and A method for decoding video data, wherein the step of determining the first motion information for predicting the first block based on the candidate motion information included in the first entry of the first merged candidate list and the first motion vector difference information includes the step of modifying the motion vector of the candidate motion information included in the first entry of the first merged candidate list based on the offset.
In Article 6, The step of receiving the first motion vector difference information further includes the step of receiving a direction index that identifies the direction for the offset, and A method for decoding video data, wherein the step of determining the first motion information for predicting the first block based on the candidate motion information included in the first entry of the first merged candidate list and the first motion vector difference information further includes the step of modifying the motion vector of the candidate motion information included in the first entry of the first merged candidate list based on the direction for the offset.
As a device for decoding video data, Memory configured to store video data; and It includes one or more processors implemented in the circuit, and The above one or more processors are, Generate a first merge candidate list for the first block; It is determined that the above-mentioned first block is coded in a merge mode based on motion vector differences; In response to determining that the maximum number of entries for the first merge candidate list is equal to 1, inferring that the value of a first instance of a flag is equal to a first value, wherein the first value for the flag indicates that the first block will be decoded using the first entry in the first merge candidate list, and inferring that the value of a first instance of the flag is equal to a first value; Receive first motion vector difference information; Determining first motion information for predicting the first block based on candidate motion information included in the first entry of the first merge candidate list and the first motion vector difference information; Decoding the first block using the first motion information; Generate a second merge candidate list for the second block; It is determined that the above second block is coded in a merge mode based on the motion vector differences; In response to determining that the maximum number of entries for the second merge candidate list is greater than 1, receiving a second instance of the flag in the video data, wherein the second instance of the flag is identical to the first value, and the first value for the second instance of the flag indicates that the second block will be decoded using the first entry in the second merge candidate list; Receive second motion vector difference information; Determining second motion information for predicting the second block based on candidate motion information included in the first entry of the second merge candidate list and the second motion vector difference information; and A device for decoding video data configured to decode the second block using the second motion information.
delete
In Article 8, The above one or more processors also, Generate a third merge candidate list for the third block; It is determined that the above third block is coded in a merge mode based on the motion vector differences; In response to determining that the maximum number of entries for the third merge candidate list is greater than 1, receiving a third instance of the flag in the video data, wherein the third instance of the flag is identical to a second value, and the second value for the third instance of the flag indicates that the third block will be decoded using the second entry in the third merge candidate list; Receive third motion vector difference information; Determining third motion information for predicting the third block based on candidate motion information included in the second entry of the third merge candidate list and the third motion vector difference information; and A device for decoding video data configured to decode the third block using the third motion information.
In Article 8, To determine that the above-mentioned first block is coded in a merge mode based on the motion vector differences, the one or more processors, Receiving a first syntax element indicating that the above first block is coded in merge mode; A device for decoding video data configured to receive a second syntax element indicating that the first block is coded in a merge mode based on the motion vector differences, in response to determining that the first syntax element indicates that the first block is coded in the merge mode.
In Article 8, A device for decoding video data, wherein one or more processors are configured to receive an indication of the number of entries for the first merge candidate list in a sequence parameter set in order to determine that the number of entries for the first merge candidate list is equal to 1.
In Article 8, A device for decoding video data, wherein, in order to receive the first motion vector difference information, the one or more processors are configured to receive a distance index identifying an offset, and in order to determine the first motion information for predicting the first block based on the candidate motion information included in the first entry of the first merge candidate list and the first motion vector difference information, the one or more processors are configured to modify the motion vector of the candidate motion information included in the first entry of the first merge candidate list based on the offset.
In Article 13, A device for decoding video data, wherein, in order to receive the first motion vector difference information, the one or more processors are configured to receive a direction index identifying a direction for the offset, and in order to determine the first motion information for predicting the first block based on the candidate motion information included in the first entry of the first merge candidate list and the first motion vector difference information, the one or more processors are configured to modify the motion vector of the candidate motion information included in the first entry of the first merge candidate list based on the direction for the offset.
In Article 14, The above device is a device for decoding video data, comprising a wireless communication device and further comprising a receiver configured to receive encoded video data.
In Article 15, The above wireless communication device includes a telephone handset, and the receiver is a device for decoding video data configured to demodulate a signal including the encoded video data according to a wireless communication standard.
In Article 14, A device for decoding video data, further comprising a display configured to display decoded video data.
In Article 14, The above device is a device for decoding video data, comprising one or more of a camera, a computer, a mobile device, a broadcast receiver device, or a set-top box.
As a device for decoding video data, Means for generating a first merge candidate list for a first block; Means for determining that the above-mentioned first block is coded in a merge mode based on motion vector differences; Means for inferring that the value of a first instance of a flag is equal to a first value in response to determining that the maximum number of entries for the first merge candidate list is equal to 1, wherein the first value for the flag indicates that the first block will be decoded using the first entry in the first merge candidate list; Means for receiving first motion vector difference information; Means for determining first motion information for predicting the first block based on candidate motion information included in the first entry of the first merge candidate list and the first motion vector difference information; Means for decoding the first block using the first motion information; Means for generating a second merge candidate list for the second block; Means for determining that the above second block is coded in a merge mode based on the motion vector differences; Means for receiving a second instance of the flag in the video data in response to determining that the maximum number of entries for the second merge candidate list is greater than 1, wherein the second instance of the flag is set to be equal to the first value, and the first value for the second instance of the flag indicates that the second block will be decoded using the first entry in the second merge candidate list; Means for receiving second motion vector difference information; Means for determining second motion information for predicting the second block based on candidate motion information included in the first entry of the second merge candidate list and the second motion vector difference information; and A device for decoding video data, comprising means for decoding the second block using the second motion information.
delete

Description

Signaling for merge mode based on motion vector differences in video coding This application claims the benefit of U.S. Provisional Application No. 62/808,215 filed on February 20, 2019, and the priority of U.S. Application No. 16/793,807 filed on February 18, 2020, the full contents of which are incorporated herein by reference. Technology field The present disclosure relates to video encoding and video decoding. Digital video capabilities can be integrated into a wide range of devices, including digital televisions, digital direct broadcast systems, wireless broadcast systems, personal digital assistants (PDAs), laptop or desktop computers, tablet computers, e-book readers, digital cameras, digital recording devices, digital media players, video gaming devices, video game consoles, cellular or satellite wireless phones, so-called "smartphones," video teleconferencing devices, video streaming devices, etc. Digital video devices implement video coding techniques such as those described in standards defined by MPEG-2, MPEG-4, ITU-T H.263, ITU-T H.264/MPEG-4, Part 10, Advanced Video Coding (AVC), High Efficiency Video Coding (HEVC) standards, ITU-T H.265/High Efficiency Video Coding (HEVC), and extensions of these standards. Video devices may also transmit, receive, encode, decode, and/or store digital video information more efficiently by implementing such video coding techniques. Video coding techniques include spatial (intra-picture) prediction and/or temporal (inter-picture) prediction to reduce or eliminate redundancy inherent in video sequences. For block-based video coding, a video slice (i.e., a video picture or a part of a video picture) may be partitioned into video blocks, which may also be referred to as coding tree units (CTUs), coding units (CUs), and/or coding nodes. Video blocks in an intra-coded (I) slice of a picture are encoded using spatial prediction for reference samples in neighboring blocks within the same picture. Video blocks in an inter-coded (P or B) slice of a picture may use spatial prediction for reference samples in neighboring blocks within the same picture or temporal prediction for reference samples in other reference pictures. Pictures may be referred to as frames, and reference pictures may be referred to as reference frames. FIG. 1 is a block diagram illustrating an exemplary video encoding and decoding system capable of performing the techniques of the present disclosure. Figures 2a and 2b are conceptual diagrams illustrating the quadtree binary tree (QTBT) structure and the corresponding coding tree unit (CTU) of an example. Figure 3 shows an example of a merge (MMVD) search point by MVD. Figure 4 shows an example of the positions of inherited affine motion predictors. Figure 5 shows an example of control point motion vector inheritance. Figure 6 shows an example of the locations of candidate positions for the constructed affine merge mode. Figure 7 shows an example of triangle partition-based inter-prediction. Figure 8 shows examples of spatial and temporal neighbor blocks used to construct a single-prediction candidate list. FIG. 9 is a block diagram illustrating an exemplary video encoder capable of performing the techniques of the present disclosure. FIG. 10 is a block diagram illustrating an exemplary video decoder capable of performing the techniques of the present disclosure. FIG. 11 is a flowchart illustrating a process for encoding video data according to the techniques of the present disclosure. FIG. 12 is a flowchart illustrating a process for decoding video data according to the techniques of the present disclosure. FIG. 13 is a flowchart illustrating a process for encoding video data according to the techniques of the present disclosure. FIG. 14 is a flowchart illustrating a process for decoding video data according to the techniques of the present disclosure. Video coding (e.g., video encoding and/or video decoding) typically involves predicting a block of video data from a block of video data already coded in the same picture (e.g., intra-prediction) or from a block of video data already coded in a different picture (e.g., inter-prediction). In some cases, the video encoder also calculates residual data by comparing the predicted block with the original block. Thus, the residual data represents the difference between the predicted block and the original block. To reduce the number of bits required to signal the residual data, the video encoder may sometimes transform and quantize the residual data and signal the transformed and quantized residual data from the encoded bitstream. The compression achieved by the transformation and quantization processes may be lossy, which means that the transformation and quantization processes may introduce distortion into the decoded video data. A video decoder decodes residual data and adds it to a prediction block to generate a reconstructed video block that matches the original video block more closely than the predic