US-12621489-B2 - Systems and methods for applying non-separable transforms on inter prediction residuals

US12621489B2US 12621489 B2US12621489 B2US 12621489B2US-12621489-B2

Abstract

The various implementations described herein include methods and systems for coding video. In one aspect, a method includes receiving a video bitstream that includes a set of inter mode encoded blocks and a corresponding set of transform coefficients. The method includes deriving a set of inter mode residual blocks from the set of transform coefficients. The method includes determining, according to a value of a first indicator in the video bitstream, whether one or more non-separable transform kernels are to be applied to the set of inter mode residual blocks. The method includes applying a first non-separable transform kernel when the indicator has a first value, and forgoing applying the first non-separable transform kernels when the indicator has a second value. The method also includes reconstructing a set of video blocks using the set of inter mode residual blocks and a corresponding set of prediction blocks.

Inventors

Madhu PERINGASSERY KRISHNAN
Yushin Cho
Xin Zhao
Liang Zhao
Jing Ye
Han Gao

Assignees

Tencent America LLC

Dates

Publication Date: 20260505
Application Date: 20240529

Claims (20)

1 . A method of video decoding performed at a computing system having memory and one or more processors, the method comprising: receiving a video bitstream that includes a set of inter mode encoded blocks and a corresponding first set of transform coefficients and includes a set of intra mode encoded blocks and a corresponding second set of transform coefficients; deriving a set of inter mode residual blocks from the first set of transform coefficients; deriving a set of intra mode residual blocks from the second set of transform coefficients; determining, according to a value of a first indicator in the video bitstream, a first set of intra secondary transform (IST) kernels that are allowed to be applied to the set of intra mode residual blocks; determining, according to a value of a second indicator in the video bitstream, a second set of IST kernels that are allowed to be applied to the set of inter mode residual blocks, wherein the second set of IST kernels includes one or more IST kernels not in the first set of IST kernels; identifying, according to a third indicator, an IST kernel from the second set of IST kernels: applying the IST kernel to the set of inter mode residual blocks; and reconstructing a set of video blocks using the set of inter mode residual blocks and a corresponding set of prediction blocks.
2 . The method of claim 1 , wherein the first indicator is encoded in the video bitstream using a binary encoding; and the method further comprises identifying a value of the first indicator via a binarized decoding of the first indicator.
3 . The method of claim 2 , wherein the first indicator comprises a binarized codeword, and a binary symbol of the binarized codeword indicates whether any IST kernels are to be applied to the set of inter mode residual blocks.
4 . The method of claim 1 , wherein the third indicator has a value selected from a set of N values, N being a positive integer; and the method further comprises identifying the IST kernel from the second set of IST kernels using the value of the third indicator.
5 . The method of claim 4 , further comprising identifying the second set of IST kernels from a plurality of sets of non-separable transform kernels based on one or more predicted samples for the set of inter mode encoded blocks.
6 . The method of claim 5 , wherein the one or more predicted samples comprise corner samples from the corresponding set of prediction blocks.
7 . The method of claim 5 , wherein the one or more predicted samples comprise boundary samples from the corresponding set of prediction blocks.
8 . The method of claim 5 , wherein identifying the second set of IST kernels based on the one or more predicted samples comprises identifying the second set of IST kernels using a statistical analysis of spatial directional patterns in the one or more predicted samples.
9 . The method of claim 8 , wherein the statistical analysis includes applying an edge detection technique.
10 . The method of claim 5 , wherein the second set of IST kernels is identified based on prediction mode information from neighboring blocks for the set of video blocks.
11 . The method of claim 4 , further comprising identifying, according to a value of the first indicator in the video bitstream, the first set of IST kernels from a plurality of sets of IST kernels, wherein the first indicator has a value selected from a set of M values, M being a positive integer.
12 . The method of claim 11 , wherein M is based on a number of intra prediction modes for the video bitstream.
13 . The method of claim 11 , wherein Mis based on a number of IST sets applicable to intra mode coded blocks.
14 . The method of claim 13 , wherein Mis less than the number of IST sets applicable to intra mode coded blocks.
15 . The method of claim 4 , further comprising identifying the set of IST kernels from a plurality of sets of IST kernels in accordance with a default set index.
16 . The method of claim 4 , wherein the set of IST kernels includes one or more IST kernels used for both inter mode residual blocks and intra mode residual blocks.
17 . The method of claim 1 , wherein the second first indicator is a binary indicator indicating whether any kernels are allowed to be applied to the set of inter mode residual blocks.
18 . The method of claim 1 , wherein the second indicator is signaled in a high-level syntax of the video bitstream.
19 . A computing system, comprising: control circuitry; memory; and one or more sets of instructions stored in the memory and configured for execution by the control circuitry, the one or more sets of instructions comprising instructions for: receiving video data that includes a set of video blocks; deriving a set of intra mode residual blocks from the set of video blocks; deriving a set of inter mode residual blocks from the set of video blocks; identifying a first set of intra secondary transform (IST) kernels applicable to the set of intra mode residual blocks; identifying a second set of IST kernels applicable to the set of inter mode residual blocks, wherein the second set of IST kernels includes one or more IST kernels not in the first set of IST kernels; generating a first set of transform coefficients from the set of inter mode residual blocks using a first IST kernel from the second set of IST kernels; generating a second set of transform coefficients from the set of intra mode residual blocks using a second IST kernel from the first set of IST kernels; determining a value for a first indicator indicating that the second set of IST kernels are applicable to the set of inter mode residual blocks; signaling the first indicator and a second indicator indicating the first IST kernel in a video bitstream; and signaling the first set of transform coefficients and the second set of transform coefficients in the video bitstream.
20 . A non-transitory computer-readable storage medium storing a video bitstream that is generated by a video encoding method, the video bitstream comprising: coded information for a plurality of blocks of video data, including a first set of transform coefficients corresponding to a set of intra mode blocks and a second set of transform coefficients corresponding to a set of inter mode blocks; a first indicator indicating that a first set of intra secondary transform (IST) kernels are applicable to the set of inter mode residual blocks; and a second indicator indicating that a second set of IST kernels are applicable to the set of intra mode residual blocks, wherein the second set of IST kernels includes one or more IST kernels not in the first set of IST kernels; wherein the video encoding method comprises: deriving the set of intra mode residual blocks; deriving the set of inter mode residual blocks; identifying the first set of IST kernels applicable to the set of inter mode residual blocks; identifying the second set of IST kernels applicable to the set of intra mode residual blocks; generating the first set of transform coefficients from the set of inter mode residual blocks using a first IST kernel from the first set of IST kernels; generating the second set of transform coefficients from the set of intra mode residual blocks using a second IST kernel from the second set of IST kernels; signaling the first indicator the second indicator in the video bitstream; and signaling the first set of transform coefficients and the second set of transform coefficients in the video bitstream.

Description

RELATED APPLICATIONS This application claims priority to U.S. Provisional Patent Application No. 63/603,544, entitled “Methods and Apparatus for Applying Non-Separable Transforms on Inter Prediction Residuals,” filed Nov. 28, 2023, which is hereby incorporated by reference in its entirety. TECHNICAL FIELD The disclosed embodiments relate generally to video coding, including but not limited to transform coding applied to prediction residuals. BACKGROUND Digital video is supported by a variety of electronic devices, such as digital televisions, laptop or desktop computers, tablet computers, digital cameras, digital recording devices, digital media players, video gaming consoles, smart phones, video teleconferencing devices, video streaming devices, etc. The electronic devices transmit and receive or otherwise communicate digital video data across a communication network, and/or store the digital video data on a storage device. Due to a limited bandwidth capacity of the communication network and limited memory resources of the storage device, video coding may be used to compress the video data according to one or more video coding standards before it is communicated or stored. The video coding can be performed by hardware and/or software on an electronic/client device or a server providing a cloud service. Video coding generally utilizes prediction methods (e.g., inter-prediction, intra-prediction, or the like) that take advantage of redundancy inherent in the video data. Video coding aims to compress video data into a form that uses a lower bit rate, while avoiding or minimizing degradations to video quality. Multiple video codec standards have been developed. For example, High-Efficiency Video Coding (HEVC/H.265) is a video compression standard designed as part of the MPEG-H project. ITU-T and ISO/IEC published the HEVC/H.265 standard in 2013 (version 1), 2014 (version 2), 2015 (version 3), and 2016 (version 4). Versatile Video Coding (VVC/H.266) is a video compression standard intended as a successor to HEVC. ITU-T and ISO/IEC published the VVC/H.266 standard in 2020 (version 1) and 2022 (version 2). AOMedia Video 1 (AV1) is an open video coding format designed as an alternative to HEVC. On Jan. 8, 2019, a validated version 1.0.0 with Errata 1 of the specification was released. SUMMARY The present disclosure describes a set of methods for video (image) compression, including methods for applying non-separable transforms on inter mode block residuals. In accordance with some embodiments, a video bitstream includes a set of inter mode encoded blocks and a corresponding set of transform coefficients. A set of inter mode residual blocks may be derived from the set of transform coefficients. A value of a flag signaled in the video bitstream may indicate whether one or more non-separable transform kernels are to be applied to the set of inter mode residual blocks. Depending on the value of the flag, a first non-separable transform kernel may, or may not, be applied to the set of inter mode residual blocks. A set of video blocks can be reconstructed from the set of inter mode residual blocks and a corresponding set of prediction blocks. An advantage of using non-separable transform coding techniques in this manner is that the quality (e.g., the accuracy and/or precision) of the corresponding reconstructed (decoded) video may be improved. Further, signalling whether or not to use the non-separable transform kernels can reduce computational costs (e.g., more efficiently encoding and/or decoding of the residuals for a video). In accordance with some embodiments, a method of video decoding includes (i) receiving a video bitstream that includes a set of inter mode encoded blocks and a corresponding set of transform coefficients; (ii) deriving a set (one or more) of inter mode residual blocks from the set of transform coefficients; (iii) determining, according to a value of a first indicator in the video bitstream, whether one or more non-separable transform kernels are to be applied to the set of inter mode residual blocks; (iv) when the indicator has a first value, applying a first non-separable transform kernel to the set of inter mode residual blocks; (v) when the indicator has a second value, forgoing applying the first non-separable transform kernels to the set of inter mode residual blocks; and (vi) reconstructing a set of video blocks using the set of inter mode residual blocks and a corresponding set of prediction blocks. In accordance with some embodiments, a method of video encoding includes (i) receiving video data that includes a set of video blocks; (ii) deriving a set of inter mode residual blocks from the set of video blocks; (iii) determining whether one or more non-separable transform kernels are to be applied to the set of inter mode residual blocks; (iv) generating a set of transform coefficients from the set of inter mode residual blocks in accordance with the determination as to whether the one or