US-12621499-B2 - Unified neural network in-loop filter signaling

US12621499B2US 12621499 B2US12621499 B2US 12621499B2US-12621499-B2

Abstract

A method implemented by a video coding apparatus includes applying a neural network (NN) filter to an unfiltered sample of a video unit to generate a filtered sample. The NN filter is applied based on a syntax element of the video unit. The method also includes converting between a video media file and a bitstream based on the filtered sample that was generated.

Inventors

Yue Li
Li Zhang
Kai Zhang
Junru LI
Meng Wang
Siwei Ma
Shiqi Wang

Assignees

LEMON INC.
BEIJING BYTEDANCE NETWORK TECHNOLOGY CO., LTD.
BYTEDANCE INC.
Bytedance (hk) Limited

Dates

Publication Date: 20260505
Application Date: 20240402

Claims (19)

1 . A method implemented by a video coding apparatus, comprising: applying a neural network (NN) filter to an unfiltered sample of a video unit to generate a filtered sample, wherein the NN filter is applied based on a syntax element of the video unit; and converting between a video media file and a bitstream based on the filtered sample that was generated, wherein the syntax element indicates at least one selected from the group consisting of: whether to enable the NN filter, a number of NN filters to be applied, and a type of NN filter to be applied, wherein a first level comprises a sequence level, and a second level comprises a picture level, and wherein the syntax element is a first syntax element at the first level that indicates whether a NN filter can be adaptively selected at the second level to be applied to a picture or a slice of the video unit.
2 . The method of claim 1 , wherein: a syntax element indicated in the first level is indicated in a sequence parameter set (SPS) and/or a sequence header of the video unit; a syntax element indicated in the second level is indicated in a picture header, a picture parameter set (PPS), and/or a slice header of the video unit; and a third level comprises a subpicture level a syntax element indicated in the third level is indicated for a patch of the video unit, a coding tree unit (CTU) of the video unit, a coding tree block (CTB) of the video unit, a block of the video unit, a subpicture of the video unit, a tile of the video unit, a slice of the video unit, or a region of the video unit.
3 . The method of claim 2 , wherein the syntax element is the first syntax element further at the second level that is conditionally applied based on a second syntax element at the first level, wherein the NN filter is applied at the second level based on the first syntax element based on the second syntax element being a flag that is true, and wherein the NN filter is not applied based on the second syntax element being false.
4 . The method of claim 2 , wherein the syntax element is the first syntax element further at the second level that indicates whether a NN filter can be adaptively selected at the third level to be applied to a subpicture of the video unit, or that indicates whether usage of the NN filter can be controlled at the third level.
5 . The method of claim 2 , wherein the syntax element is the first syntax element further at the second level that indicates whether a NN filter can be adaptively selected at the second level, used at the second level, or applied on the second level; and wherein the first syntax element is signaled based on an indication that the NN filter can be adaptively selected at the second level or an indication that a number of NN filters is greater than one.
6 . The method of claim 2 , wherein the syntax element is the first syntax element further at the third level that is conditionally applied based on a second syntax element at the first level and/or a third syntax at the second level, wherein the first syntax element is coded using context, wherein the NN filter is applied at the third level based on the first syntax element based on one of the second syntax element and the third syntax element being a flag that is true, and wherein the NN filter is not applied based on one of the second syntax element and the third syntax element being false.
7 . The method of claim 2 , wherein the syntax element is the first syntax element further at the third level that is signaled based on an indication that the NN filter can be adaptively selected at the third level or an indication that a number of NN filters is greater than one.
8 . The method of claim 2 , wherein the syntax element is signaled responsive to the NN filter being enabled for a picture or a slice of the video unit, and wherein the NN filter is one of a plurality (T) of NN filters, and wherein the syntax element includes an index (k).
9 . The method of claim 8 , further comprising applying the k th NN filter at the second level of the video unit based on the index k>=0 and k<T.
10 . The method of claim 8 , further comprising adaptively selecting a NN filter at the third level based on the index k>=T.
11 . The method of claim 8 , wherein the index k is restricted to be in a range from 0 to (T−1).
12 . The method of claim 2 , wherein the syntax element is coded based on a context model that is selected based on a number of allowed NN filters, wherein a filter model index for a color component of the video unit is configured to specify one of K context models, and wherein the one of the K context models is specified as a minimum of K−1 and binIdx, wherein binIdx is an index of a bin to be coded.
13 . The method of claim 12 , wherein a filter model index for first and second color components of the video unit is coded with a same set of contexts.
14 . The method of claim 12 , wherein the filter model index for a first color component of the video unit is coded with a different set of contexts than the filter model index for a second color component of the video unit.
15 . The method of claim 1 , wherein the syntax element is signaled using context coding or bypass coding, or is binarized using fixed-length coding, unary coding, truncated unary coding, signaled unary coding, signed truncated unary coding, truncated binary coding, or exponential Golomb coding.
16 . The method of claim 1 , wherein the conversion comprises generating the bitstream according to the video media file.
17 . The method of claim 1 , wherein the conversion comprises parsing the bitstream to obtain the video media file.
18 . An apparatus for coding video data comprising a processor and a non-transitory memory with instructions thereon, wherein the instructions upon execution by the processor cause the processor to: apply a neural network (NN) filter to an unfiltered sample of a video unit to generate a filtered sample, wherein the NN filter is applied based on a syntax element of the video unit; and convert between a video media file and a bitstream based on the filtered sample that was generated, wherein the syntax element indicates at least one selected from the group consisting of: whether to enable the NN filter, a number of NN filters to be applied, and a type of NN filter to be applied, wherein a first level comprises a sequence level, and a second level comprises a picture level, and wherein the syntax element is a first syntax element at the first level that indicates whether a NN filter can be adaptively selected at the second level to be applied to a picture or a slice of the video unit.
19 . A non-transitory computer readable medium storing a bitstream of a video that is generated by a method performed by a video processing apparatus, wherein the method comprises: applying a neural network (NN) filter to an unfiltered sample of a video unit to generate a filtered sample, wherein the NN filter is applied based on a syntax element of the video unit; and generating the bitstream based on the filtered sample that was generated, wherein the syntax element indicates at least one selected from the group consisting of: whether to enable the NN filter, a number of NN filters to be applied, and a type of NN filter to be applied, wherein a first level comprises a sequence level, and a second level comprises a picture level, and wherein the syntax element is a first syntax element at the first level that indicates whether a NN filter can be adaptively selected at the second level to be applied to a picture or a slice of the video unit.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS This patent application is a continuation application of U.S. application Ser. No. 17/720,125, filed on Apr. 13, 2022, which claims the benefit of International Application No. PCT/CN2021/087615 filed Apr. 15, 2021 by Beijing Bytedance Network Technology Co., Ltd., International Application No. PCT/CN2021/087915 filed Apr. 16, 2021 by Beijing Bytedance Network Technology Co., Ltd., U.S. Provisional Patent Application No. 63/176,871 filed Apr. 19, 2021, by Lemon, Inc., and International Application No. PCT/CN2021/088480 filed Apr. 20, 2021 by Beijing Bytedance Network Technology Co., Ltd., all of which are hereby incorporated by reference in their entireties. TECHNICAL FIELD The present disclosure is generally related to image and video coding and decoding. BACKGROUND Digital video accounts for the largest bandwidth use on the internet and other digital communication networks. As the number of connected user devices capable of receiving and displaying video increases, it is expected that the bandwidth demand for digital video usage will continue to grow. SUMMARY The disclosed aspects/embodiments provide one or more neural network (NN) filter models trained as part of an in-loop filtering technology or filtering technology used in a post-processing stage for reducing the distortion incurred during compression. In addition, samples with different characteristics are processed by different NN filter models. Further, the presence of (e.g., application of) NN filter models may be controlled through syntax elements at various levels. For example, the syntax element(s) that indicate whether to apply a NN filter may be at a first level (e.g., in a sequence parameter set (SPS) and/or a sequence header of a video unit). Syntax element(s) that indicate whether to apply a NN filter may also be at a second level (e.g., a picture header, a picture parameter set (PPS), and/or a slice header of the video unit). Still further, syntax element(s) that indicate whether to apply a NN filter may be at a third level (e.g., the syntax element is indicated for a patch of the video unit, a CTU of the video unit, a CTB of the video unit, a block of the video unit, a subpicture of the video unit, a tile of the video unit, or a region of the video unit. A first aspect relates to a method implemented by a coding apparatus. The method includes applying a neural network (NN) filter to an unfiltered sample of a video unit to generate a filtered sample, wherein the NN filter is applied based on a syntax element of the video unit. The method also includes converting between a video media file and a bitstream based on the filtered sample that was generated. Optionally, in any of the preceding aspects, another implementation of the aspect provides that the syntax element indicates at least one selected from the group consisting of: whether to enable the NN filter, a number of NN filters to be applied, and a type of NN filter to be applied. Optionally, in any of the preceding aspects, another implementation of the aspect provides that a first level comprises a sequence level and a syntax element indicated in the first level is indicated in a sequence parameter set (SPS) and/or a sequence header of the video unit; a second level comprises a picture level and a syntax element indicated in the second level is indicated in a picture header, a picture parameter set (PPS), and/or a slice header of the video unit; and a third level comprises a subpicture level a syntax element indicated in the third level is indicated for a patch of the video unit, a coding tree unit (CTU) of the video unit, a coding tree block (CTB) of the video unit, a block of the video unit, a subpicture of the video unit, a tile of the video unit, a slice of the video unit, or a region of the video unit. Optionally, in any of the preceding aspects, another implementation of the aspect provides that the syntax element is a first syntax element at the first level that indicates whether a NN filter can be adaptively selected at the second level to be applied to a picture or a slice of the video unit. Optionally, in any of the preceding aspects, another implementation of the aspect provides that the syntax element is a first syntax element at the second level that is conditionally applied based on a second syntax element at the first level, wherein the NN filter is applied at the second level based on the first syntax element based on the second syntax element being a flag that is true, and wherein the NN filter is not applied based on the second syntax element being false. Optionally, in any of the preceding aspects, another implementation of the aspect provides that the syntax element is a first syntax element at the second level that indicates whether a NN filter can be adaptively selected at the third level to be applied to a subpicture of the video unit, or that indicates whether usage of the NN filter can be controlled at the third level.