US-12627798-B2 - Method, apparatus, and medium for video processing

US12627798B2US 12627798 B2US12627798 B2US 12627798B2US-12627798-B2

Abstract

Embodiments of the present disclosure provide a solution for video processing. A method for video processing comprises: determining, from a plurality of filter shapes during a conversion between a current video block of a video and a bitstream of the video, a first filter shape for coding a first sample of the current video block; and performing the conversion based on the first filter shape. Compared with the conventional solution, the proposed method can advantageously improve the performance of the filtering tool.

Inventors

Wenbin YIN
Kai Zhang
Li Zhang

Assignees

BEIJING BYTEDANCE NETWORK TECHNOLOGY CO., LTD.
BYTEDANCE INC.

Dates

Publication Date: 20260512
Application Date: 20240328
Priority Date: 20210928

Claims (19)

1 . A method for video processing, comprising: determining, from a plurality of filter shapes during a conversion between a current video block of a video and a bitstream of the video, a first filter shape for coding a first sample of the current video block; and performing the conversion based on the first filter shape, wherein performing the conversion comprises: determining, from a plurality of sets of coefficients, a set of coefficients for a first offline-trained filter with the first filter shape, the first offline-trained filter being used for coding the first sample, each set of the plurality of sets of coefficients corresponding to one or more filter shapes; and performing the conversion based on the set of coefficients, and wherein the first sample is a luma sample, and a further offline-trained filter for coding a further luma sample of the video has a filter shape different from the first filter shape, or wherein the first sample is a chroma sample, and a further offline-trained filter for coding a further chroma sample of the video has a filter shape different from the first filter shape.
2 . The method of claim 1 , wherein the first set of samples correspond to a first ALF processing unit.
3 . The method of any of claim 2 , wherein an index of the first filter shape is indicated in the bitstream for the first ALF processing unit, or the index of the first filter shape is derived for the first ALF processing unit, or the index of the first filter shape is determined on-the-fly for the first ALF processing unit, of a flag indicating whether the first sample is to be filtered is indicated in the bitstream for the filter ALF processing unit, or the flag is derived for the first ALF processing unit, or the flag is determined on-the-fly for the first ALF processing unit.
4 . The method of claim 1 , wherein a shape of the first filter shape is different from a shape of a fifth filter shape of the plurality of filter shapes, or a size of the first filter shape is different from a size of the fifth filter.
5 . The method of claim 4 , wherein the shape of the first filter shape is one of the following: a diamond, a square, a cross, or a symmetrical shape, or wherein the shape of the first filter shape is an asymmetrical shape.
6 . The method of claim 4 , wherein the first filter shape is one of: a diamond shape with a size of 5×5, a diamond shape with a size of 7×7, a diamond shape with a size of 9×9, a diamond shape with a size of 11×11, a diamond shape with a size of 13×13, a square shape with a size of 5×5, a square shape with a size of 7×7, a square shape with a size of 9×9, a square shape with a size of 11×11, or a square shape with a size of 13×13, a cross shape with a size of 5×5, a cross shape with a size of 7×7, a cross shape with a size of 9×9, a cross shape with a size of 11×11, a cross shape with a size of 13×13, a symmetrical shape with a size of 11×11, a symmetrical shape with a size of 13×13, a symmetrical shape with a size of 15×15, a symmetrical shape with a size of 17×17, or a symmetrical shape with a size of 19×19.
7 . The method of claim 1 , wherein the first filter shape is used for an online-trained filter, or the first filter shape is used for an offline-trained filter.
8 . The method of claim 1 , wherein performing the conversion comprises: determining, from a plurality of sets of coefficients, a set of coefficients for a first online-trained filter with the first filter shape, the first online-trained filter being used for coding the first sample, each set of the plurality of sets of coefficients corresponding to one or more filter shapes; and performing the conversion based on the set of coefficients.
9 . The method of claim 8 , wherein the first sample is a luma sample, and a further online-trained filter for coding a further luma sample of the video has a filter shape different from the first filter shape, or wherein the first sample is a chroma sample, and a further online-trained filter for coding a further chroma sample of the video has a filter shape different from the first filter shape, or wherein a class merging is performed individually on each of the plurality of filter shapes, or wherein a training alternative number is determined for each of the plurality of filter shapes individually, or a training alternative number is determined for each of the plurality of filter shapes jointly, or wherein the first online-trained filter comprises an additional tap, an input of the additional tap comprises an intermediate filtering result of an offline-trained filter, and wherein a filter shape of the offline-trained filter is the same as an offline-trained filter used for generating an input of an addition tap for a further filter, or a filter shape of the offline-trained filter is different from an offline-trained filter used for generating an input of an addition tap for a further filter.
10 . The method of claim 1 , wherein the plurality of filter shapes comprise a set of filter shapes for different color components of the video, a syntax element structure is indicated in the bitstream, the syntax element structure comprises parameters of a plurality of filters with the set of filter shapes.
11 . The method of claim 10 , wherein the plurality of filters comprise a set of filters with an identical filter shape for a luma component of the video, and wherein an index indicating the identical filter shape is indicated in the syntax element structure, or the number of sets of enabled sets of parameters for the set of filters is indicated in the syntax element structure, or class merging results of the set of filters are indicated in the syntax element structure, or wherein coefficients of the set of filters are indicated in the syntax element structure.
12 . The method of claim 10 , wherein the plurality of filters comprise a set of filters with an identical filter shape for a chroma component of the video, and wherein an index indicating the identical filter shape is indicated in the syntax element structure, or the number of sets of enabled sets of parameters for the set of filters is indicated in the syntax element structure, or class merging results of the set of filters are indicated in the syntax element structure, or wherein coefficients of the set of filters are indicated in the syntax element structure.
13 . The method of claim 10 , wherein the plurality of filters comprise a set of filters with different filter shapes for a luma component of the video, and wherein an index indicating one of the different filter shapes is indicated in the syntax element structure, or the number of sets of enabled sets of parameters for the set of filters is indicated in the syntax element structure, or class merging results of the set of filters are indicated in the syntax element structure, or wherein coefficients of the set of filters are indicated in the syntax element structure.
14 . The method of claim 10 , wherein the plurality of filters comprise a set of filters with different filter shapes for a chroma component of the video, and wherein an index indicating one of the different filter shapes is indicated in the syntax element structure, or the number of sets of enabled sets of parameters for the set of filters is indicated in the syntax element structure, or class merging results of the set of filters are indicated in the syntax element structure, or coefficients of the set of filters are indicated in the syntax element structure.
15 . The method of claim 1 , wherein the method is applied to one of: an adaptive loop filter (ALF), a cross-component adaptive loop filter (CCALF), or an in-loop filtering tool different from the ALF and CCALF.
16 . The method of claim 1 , wherein the conversion includes encoding the current video block into the bitstream, or wherein the conversion includes decoding the current video block from the bitstream.
17 . An apparatus for processing video data comprising a processor and a non-transitory memory with instructions thereon, wherein the instructions upon execution by the processor, cause the processor to perform acts comprising: determining, from a plurality of filter shapes during a conversion between a current video block of a video and a bitstream of the video, a first filter shape for coding a first sample of the current video block; and performing the conversion based on the first filter shape, wherein performing the conversion comprises: determining, from a plurality of sets of coefficients, a set of coefficients for a first offline-trained filter with the first filter shape, the first offline-trained filter being used for coding the first sample, each set of the plurality of sets of coefficients corresponding to one or more filter shapes; and performing the conversion based on the set of coefficients, and wherein the first sample is a luma sample, and a further offline-trained filter for coding a further luma sample of the video has a filter shape different from the first filter shape, or wherein the first sample is a chroma sample, and a further offline-trained filter for coding a further chroma sample of the video has a filter shape different from the first filter shape.
18 . A non-transitory computer-readable storage medium storing instructions that cause a processor to perform acts comprising: determining, from a plurality of filter shapes during a conversion between a current video block of a video and a bitstream of the video, a first filter shape for coding a first sample of the current video block; and performing the conversion based on the first filter shape, wherein performing the conversion comprises: determining, from a plurality of sets of coefficients, a set of coefficients for a first offline-trained filter with the first filter shape, the first offline-trained filter being used for coding the first sample, each set of the plurality of sets of coefficients corresponding to one or more filter shapes; and performing the conversion based on the set of coefficients, and wherein the first sample is a luma sample, and a further offline-trained filter for coding a further luma sample of the video has a filter shape different from the first filter shape, or wherein the first sample is a chroma sample, and a further offline-trained filter for coding a further chroma sample of the video has a filter shape different from the first filter shape.
19 . A non-transitory computer-readable recording medium storing a bitstream of a video which is generated by a method performed by a video processing apparatus, wherein the method comprises: determining, from a plurality of filter shapes, a first filter shape for coding a first sample of a current video block of the video; and generating the bitstream based on the first filter shape, wherein performing the conversion comprises: determining, from a plurality of sets of coefficients, a set of coefficients for a first offline-trained filter with the first filter shape, the first offline-trained filter being used for coding the first sample, each set of the plurality of sets of coefficients corresponding to one or more filter shapes; and performing the conversion based on the set of coefficients, and wherein the first sample is a luma sample, and a further offline-trained filter for coding a further luma sample of the video has a filter shape different from the first filter shape, or wherein the first sample is a chroma sample, and a further offline-trained filter for coding a further chroma sample of the video has a filter shape different from the first filter shape.

Description

CROSS REFERENCE This application is a continuation of International Application No. PCT/CN2022/121925, filed on Sep. 27, 2022, which claims the benefit of International Application No. PCT/CN2021/121494 filed on Sep. 28, 2021. The entire contents of these applications are hereby incorporated by reference in their entireties. FIELD Embodiments of the present disclosure relates generally to video coding techniques, and more particularly, to filter shape selection for adaptive loop filter in video coding. BACKGROUND In nowadays, digital video capabilities are being applied in various aspects of peoples' lives. Multiple types of video compression technologies, such as MPEG-2, MPEG-4, ITU-TH.263, ITU-TH.264/MPEG-4 Part 10 Advanced Video Coding (AVC), ITU-TH.265 high efficiency video coding (HEVC) standard, versatile video coding (VVC) standard, have been proposed for video encoding/decoding. However, performance of conventional video coding techniques is generally expected to be further improved. SUMMARY Embodiments of the present disclosure provide a solution for video processing. In a first aspect, a method for video processing is proposed. The method comprises: determining, from a plurality of filter shapes during a conversion between a current video block of a video and a bitstream of the video, a first filter shape for coding a first sample of the current video block; and performing the conversion based on the first filter shape. According to the proposed method, a sample of a video block is coded with a filter shape selected from a plurality of filter shapes. Compared with the conventional solution where a fixed filter shape is used, the proposed method can advantageously improve the performance of the filtering tool, and thus the coding performance can be improved. In a second aspect, an apparatus for processing video data is proposed. The apparatus for processing video data comprises a processor and a non-transitory memory with instructions thereon. The instructions upon execution by the processor cause the processor to perform a method in accordance with the first aspect of the present disclosure. In a third aspect, a non-transitory computer-readable storage medium is proposed. The non-transitory computer-readable storage medium storing instructions that cause a processor to perform a method in accordance with the first aspect of the present disclosure. In a fourth aspect, another non-transitory computer-readable recording medium is proposed. The non-transitory computer-readable recording medium stores a bitstream of a video which is generated by a method performed by a video processing apparatus. The method comprises: determining, from a plurality of filter shapes, a first filter shape for coding a first sample of a current video block of the video; and generating the bitstream based on the first filter shape. In a fifth aspect, a method for storing a bitstream of a video is proposed. The method comprises: determining, from a plurality of filter shapes, a first filter shape for coding a first sample of a current video block of the video; generating the bitstream based on the first filter shape; and storing the bitstream in a non-transitory computer-readable recording medium. This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter. BRIEF DESCRIPTION OF THE DRAWINGS Through the following detailed description with reference to the accompanying drawings, the above and other objectives, features, and advantages of example embodiments of the present disclosure will become more apparent. In the example embodiments of the present disclosure, the same reference numerals usually refer to the same components. FIG. 1 illustrates a block diagram of an example video coding system in accordance with some embodiments of the present disclosure; FIG. 2 illustrates a block diagram of an example video encoder in accordance with some embodiments of the present disclosure; FIG. 3 illustrates a block diagram of an example video decoder in accordance with some embodiments of the present disclosure; FIG. 4 illustrates nominal vertical and horizontal locations of 4:2:2 luma and chroma samples in a picture; FIG. 5 illustrates a schematic diagram of example of encoder block diagram; FIG. 6 illustrates a schematic diagram of 67 intra prediction modes; FIG. 7 illustrates a diamond shape with a size of 5×5; FIG. 8 illustrates a diamond shape with a size of 7×7; FIG. 9 illustrates a diamond shape with a size of 9×9; FIG. 10 illustrates a diamond shape with a size of 11×11; FIG. 11 illustrates a diamond shape with a size of 13×13; FIG. 12 illustrates a square shape with a size of 5×5; FIG. 13 illustrates a square shape with a size of 7×7; FIG. 14 illustrates a squa