JP-2026076203-A - Methods, apparatus, and computer programs for reducing context models for entropy coding of transformation coefficient significance flags

JP2026076203AJP 2026076203 AJP2026076203 AJP 2026076203AJP-2026076203-A

Abstract

[Problem] To provide a method and apparatus for reducing the context model for entropy coding of conversion coefficient significance flags. [Solution] A video decoding method performed in a video decoder includes the step of receiving an encoded video bitstream which includes a current picture and at least one syntax element corresponding to a conversion coefficient of a conversion block in the current picture. The method further includes the step of determining an offset value based on the output of a monotonically non-decreasing f(x) function performed on the sum (x) of a group of partially reconstructed transformation coefficients. The method further includes the step of determining a context model index based on the sum of the determined offset value and a base value. The method further includes the step of selecting a context model from among several context models based on the determined context model index for at least one syntax of the current transformation coefficients. [Selection Diagram] Figure 16

Inventors

チュン・オーヤン
シン・ジャオ
シアン・リ
シャン・リュウ

Assignees

テンセント・アメリカ・エルエルシー

Dates

Publication Date: 20260511
Application Date: 20260114
Priority Date: 20200617

Claims (20)

A video decoding method performed in a video decoder, wherein the method is: The steps include receiving an encoded video bitstream which includes a current picture and at least one syntax element corresponding to a conversion coefficient of a conversion block in the current picture, The steps include determining the offset value based on the output of a monotonically non-decreasing f(x) function performed on the sum (x) of a group of partially reconstructed transformation coefficients, The steps include determining the context model index based on the sum of the determined offset value and the base value, A method comprising the step of selecting a context model from a plurality of context models based on the determined context model index for the at least one syntax of the current conversion coefficient.
The method according to claim 1, wherein one of the base value and the offset value is determined based on the number of context models included in the plurality of context models.
The aforementioned method, The process further includes the step of determining whether dependent quantization is effective for the current coefficients, The method according to claim 2, wherein, in response to the determination that dependent quantization is effective for the current coefficient, the base value is based on the state of the quantizer.
The method according to claim 3, wherein the current coefficient is located in the Luma region, and the base value is based on a comparison of the distance of the current coefficient from the upper left corner of the transformation block and a first diagonal position threshold.
The method according to claim 4, wherein the base value is further based on a comparison of the distance and a second diagonal position threshold.
The method according to claim 3, wherein the current coefficient is located in the chroma region, and the base value is based on a comparison of the distance of the current coefficient from the upper left corner of the transformation block and a first diagonal position threshold.
The method according to claim 1, wherein the monotonically non-decreasing function is defined as x - (x >> 2).
The method according to claim 1, wherein the monotonically non-decreasing function is defined as (x+1) >> 1.
The method according to claim 1, wherein the group of current coefficients and partially reconstructed transformation coefficients forms a template constituting a contiguous set of transformation coefficients.
The method according to claim 1, wherein at least one syntax element is a conversion coefficient significance flag (sig_coeff_flag).
The method according to claim 1, wherein the bitstream comprises a plurality of syntax elements, each containing at least one syntax element, and the sum (x) of the group of partially reconstructed conversion coefficients is based on one or more syntax elements from the plurality of syntax elements.
A video decoding method performed in a video decoder, wherein the method is: The steps include receiving an encoded video bitstream which includes a current picture and at least one syntax element corresponding to a conversion coefficient of a conversion block in the current picture, For each context model region from multiple context model regions, the step of determining the output of a monotonically non-decreasing function performed on the sum (x) of a partially reconstructed group of transformation coefficients and the number of context models associated with each context model region, A step of determining a context model index based on the output of the monotonically non-decreasing function for each context model domain, A method comprising the step of selecting a context model from a plurality of context models based on the determined context model index for the at least one syntax of the current conversion coefficient.
The method according to claim 12, wherein the step of determining the context model index further depends on comparing the distance of the current coefficient from the upper-left corner of the transformation block with a first diagonal position threshold and a second diagonal position threshold.
The method according to claim 12, wherein the step of determining the context model index further depends on comparing the distance of the current coefficient from the upper-left corner of the transformation block with a first diagonal position.
A currently encoded video bitstream is received, which includes a current picture and at least one syntax element corresponding to a conversion coefficient of a conversion block within the current picture. The offset value is determined based on the output of the monotonically non-decreasing f(x) function performed on the sum (x) of the partially reconstructed group of transformation coefficients. Based on the sum of the determined offset value and base value, the context model index is determined. A video decoder for video decoding, comprising a processing circuit configured to select a context model from a plurality of context models based on the determined context model index for at least one syntax of the current conversion coefficient.
The video decoder according to claim 15, wherein one of the base value and offset value is determined based on the number of context models included in the plurality of context models.
The aforementioned processing circuit is It is further configured to determine whether dependent quantization is effective for the current coefficients, The video decoder according to claim 16, wherein, in response to the determination that dependent quantization is effective for the current coefficient, the base value is based on the state of the quantizer.
The video decoder according to claim 17, wherein the current coefficient is located in the Luma region, and the base value is based on a comparison of the distance of the current coefficient from the upper left corner of the transformation block and a first diagonal position threshold.
The video decoder according to claim 18, wherein the base value is further based on a comparison of the distance and a second diagonal position threshold.
A currently encoded video bitstream is received, which includes a current picture and at least one syntax element corresponding to a conversion coefficient of a conversion block within the current picture. For each context model region from multiple context model regions, the output of a monotonically non-decreasing function is determined for the sum (x) of a partially reconstructed group of transformation coefficients and the number of context models associated with each context model region. The context model index is determined based on the output of the monotonically non-decreasing function for each context model domain. A video decoder device for video decoding, comprising a processing circuit configured to select a context model from a plurality of context models based on the determined context model index for at least one syntax of the current conversion coefficient.

Description

Cross-reference of Related Applications This disclosure claims priority under U.S. Provisional Application No. 62/863,742, filed June 19, 2019, titled "METHOD OF REDUCING CONTEXT MODELS FOR ENTROPY CODEING OF TRANSFORM COEFFICIENT SIGNIFIANT FLAG," which is incorporated herein by whole reference in its entirety by "METHOD AND APPARATUS FOR REDUCING CONTEXT MODELS FOR ENTROPY CODEING OF TRANSFORM COEFFICIENT," filed June 17, 2020. We claim the benefit of priority under U.S. Patent Application No. 16/904,000, entitled “SIGNIFICANT FLAG”. This disclosure generally describes embodiments related to video coding. The background art description provided herein is intended to provide a general context for this disclosure. The inventors' research, to the extent described in this background art section, and any aspects of the description that may not be considered prior art at the time of filing, are not expressly or implicitly considered prior art to this disclosure. Video encoding and decoding can be performed using picture-to-picture prediction with motion compensation. Uncompressed digital video can contain a series of pictures, each picture having spatial dimensions of, for example, 1920 x 1080 lumens and associated chromens. The series of pictures may have a fixed or variable picture rate (informally also known as frame rate), for example, 60 pictures per second or 60 Hz. Uncompressed video has considerable bitrate requirements. For example, 1080p60 4:2:0 video with 8 bits per sample (1920 x 1080 lumens resolution at a frame rate of 60 Hz) requires a bandwidth of nearly 1.5 Gbit/s. One hour of such video requires more than 600 GBytes of storage space. One of the purposes of video encoding and decoding may be to reduce the redundancy of the input video signal through compression. Compression can help reduce the aforementioned bandwidth or storage space requirements by more than two orders of magnitude, in some cases. Both lossless and lossy compression, as well as combinations thereof, can be used. Lossless compression refers to a technique that allows an exact copy of the original signal to be reconstructed from the compressed original signal. When using lossy compression, the reconstructed signal may not be identical to the original signal, but the distortion between the original and reconstructed signals is small enough to make the reconstructed signal useful for its intended purpose. In the case of video, lossy compression is widely adopted. The amount of acceptable distortion depends on the application; for example, users of certain consumer streaming applications may tolerate higher distortion than users of television distribution applications. The achievable compression ratio can reflect that higher acceptable/acceptable distortion can result in a higher compression ratio. Video encoders and decoders can utilize techniques from several broad categories, including, for example, motion compensation, transformation, quantization, and entropy coding. Video codec techniques may include a technique known as intra-coding. In intra-coding, sample values are represented without referencing other data from the sample or a previously reconstructed reference picture. In some video codecs, the picture is spatially subdivided into blocks of samples. If all blocks of samples are coded in intra-mode, the picture may be an intra-picture. Intra-pictures and derivatives of intra-pictures, such as independent decoder refresh pictures, may be used to reset the decoder state and thus can be used as the first picture in the coded video bitstream and video session, or as a still image. Samples in intra-blocks may be subjected to transformations, and transformation coefficients may be quantized before entropy coding. Intra-prediction may be a technique to minimize the sample values in the pre-transformation region. In some cases, smaller post-transformation DC values and smaller AC coefficients result in fewer bits being required at a given quantization step size to represent the post-entropy-coded block. For example, traditional intra-prediction techniques, such as those known from MPEG-2 generation coding techniques, do not use intra-prediction. However, some newer video compression techniques include, for example, techniques that attempt to predict from surrounding sample data and/or metadata obtained during the encoding/decoding of spatially nearby and preceding blocks of data in the decoding order. Such techniques will hereafter be referred to as “intra-prediction” techniques. It should be noted that, in at least some cases, intra-prediction uses only reference data from the current picture being reconstructed, not from the reference picture. Intra-prediction can take many different forms. When two or more such techniques can be used in a given video coding technique, the techniques used can be encoded in intra-prediction mode. In certain cases, a mode may have submodes and/or parameters, which can be encoded individually or