US-20260129195-A1 - METHOD, APPARATUS, AND MEDIUM FOR VIDEO PROCESSING

US20260129195A1US 20260129195 A1US20260129195 A1US 20260129195A1US-20260129195-A1

Abstract

Embodiments of the present disclosure provide a solution for video processing. A method for video processing is proposed. The method comprises: performing a conversion between a video and a bitstream of the video, wherein a neural-network post-processing filter (NNPF) is applied on a current picture associated with the video based on at least one input picture for the NNPF, and all of pictures outputted by the NNPF do not precede the first picture in the bitstream in an output order.

Inventors

Jizheng Xu
Ye-Kui Wang

Assignees

BYTEDANCE INC.

Dates

Publication Date: 20260507
Application Date: 20251231

Claims (20)

1 . A method for video processing, comprising: performing a conversion between a video and a bitstream of the video, wherein a neural-network post-processing filter (NNPF) is applied on a current picture associated with the video based on at least one input picture for the NNPF, and all of pictures outputted by the NNPF do not precede the first picture in the bitstream in an output order.
2 . The method of claim 1 , wherein a picture generated by the NNPF that corresponds to an input picture obtained through a padding process is not outputted by the NNPF.
3 . The method of claim 2 , wherein the padding process comprises setting the input picture to be the current picture.
4 . The method of claim 1 , wherein if all of the following conditions are met, an NNPF inference is repeated until a picture corresponding to the last picture is generated by the NNPF: a purpose of the NNPF does not comprise picture rate upsampling, the number of pictures used as input for the NNPF is greater than 1, the NNPF is allowed to be used for post-processing filtering for the current picture and all subsequent pictures of a current layer in the output order until a termination condition is met, the current layer being a layer comprising the current picture, a picture generated by the NNPF that corresponds to a single input picture in the at least one input picture is output by the NNPF, the current picture is the last picture of the bitstream in output order that has an identifier of a layer equal to an identifier of the current layer.
5 . The method of claim 1 , wherein if all of the following conditions are met, the number of NNPF inferences is set equal to or larger than one: a purpose of the NNPF does not comprise picture rate upsampling, the number of pictures used as input for the NNPF is greater than 1, the NNPF is allowed to be used for post-processing filtering for the current picture and all subsequent pictures of a current layer in the output order until a termination condition is met, the current layer is a layer comprising the current picture, a picture generated by the NNPF that corresponds to a single input picture in the at least one input picture is output by the NNPF, the current picture is the last picture of the bitstream in output order that has an identifier of a layer equal to an identifier of the current layer.
6 . The method of claim 4 , wherein an indication pictureRateUpsamplingFlag indicates whether the purpose of the NNPF comprises picture rate upsampling or not, or an indication numInputPics indicates the number of pictures used as input for the NNPF, or an indication nnpfa_persistence_flag indicates whether the NNPF is allowed to be used for post-processing filtering for the current picture and all subsequent pictures of the current layer in the output order until the termination condition is met, or an indication nuh_layer_id indicates the identifier of the layer, or an indication currLayerId indicates the identifier of the current layer.
7 . The method of claim 1 , wherein a purpose of the NNPF does not comprise picture rate upsampling, the number of pictures used as input for the NNPF is greater than 1, the NNPF is allowed to be used for post-processing filtering for the current picture and all subsequent pictures of a current layer in the output order until a termination condition is met, the current layer is a layer comprising the current picture, a picture generated by the NNPF that corresponds to a single input picture in the at least one input picture is output by the NNPF, the current picture is the last picture in the output order in a coded layer video sequence (CLVS) in the bitstream, and an NNPF inference is repeated until a picture corresponding to the last picture is generated by the NNPF.
8 . The method of claim 1 , wherein a purpose of the NNPF does not comprise picture rate upsampling, the number of pictures used as input for the NNPF is greater than 1, the NNPF is allowed to be used for post-processing filtering for the current picture and all subsequent pictures of a current layer in the output order until a termination condition is met, the current layer is a layer comprising the current picture, a picture generated by the NNPF that corresponds to a single input picture in the at least one input picture is output by the NNPF, the current picture is the last picture in the output order in a coded layer video sequence (CLVS) in the bitstream, and the number of NNPF inferences is equal to or larger than one.
9 . The method of claim 1 , wherein the current picture comprises a decoded picture or a cropped decoded picture of the video.
10 . The method of claim 1 , wherein the conversion includes encoding the video into the bitstream.
11 . The method of claim 1 , wherein the conversion includes decoding the video from the bitstream.
12 . An apparatus for video processing comprising a processor and a non-transitory memory with instructions thereon, wherein the instructions upon execution by the processor, cause the processor to perform operations comprising: performing a conversion between a video and a bitstream of the video, wherein a neural-network post-processing filter (NNPF) is applied on a current picture associated with the video based on at least one input picture for the NNPF, and all of pictures outputted by the NNPF do not precede the first picture in the bitstream in an output order.
13 . The apparatus of claim 12 , wherein if all of the following conditions are met, an NNPF inference is repeated until a picture corresponding to the last picture is generated by the NNPF: a purpose of the NNPF does not comprise picture rate upsampling, the number of pictures used as input for the NNPF is greater than 1, the NNPF is allowed to be used for post-processing filtering for the current picture and all subsequent pictures of a current layer in the output order until a termination condition is met, the current layer being a layer comprising the current picture, a picture generated by the NNPF that corresponds to a single input picture in the at least one input picture is output by the NNPF, the current picture is the last picture of the bitstream in output order that has an identifier of a layer equal to an identifier of the current layer.
14 . The apparatus of claim 12 , wherein if all of the following conditions are met, the number of NNPF inferences is set equal to or larger than one: a purpose of the NNPF does not comprise picture rate upsampling, the number of pictures used as input for the NNPF is greater than 1, the NNPF is allowed to be used for post-processing filtering for the current picture and all subsequent pictures of a current layer in the output order until a termination condition is met, the current layer is a layer comprising the current picture, a picture generated by the NNPF that corresponds to a single input picture in the at least one input picture is output by the NNPF, the current picture is the last picture of the bitstream in output order that has an identifier of a layer equal to an identifier of the current layer.
15 . A non-transitory computer-readable storage medium storing instructions that cause a processor to perform operations comprising: performing a conversion between a video and a bitstream of the video, wherein a neural-network post-processing filter (NNPF) is applied on a current picture associated with the video based on at least one input picture for the NNPF, and all of pictures outputted by the NNPF do not precede the first picture in the bitstream in an output order.
16 . The non-transitory computer-readable storage medium of claim 15 , wherein if all of the following conditions are met, an NNPF inference is repeated until a picture corresponding to the last picture is generated by the NNPF: a purpose of the NNPF does not comprise picture rate upsampling, the number of pictures used as input for the NNPF is greater than 1, the NNPF is allowed to be used for post-processing filtering for the current picture and all subsequent pictures of a current layer in the output order until a termination condition is met, the current layer being a layer comprising the current picture, a picture generated by the NNPF that corresponds to a single input picture in the at least one input picture is output by the NNPF, the current picture is the last picture of the bitstream in output order that has an identifier of a layer equal to an identifier of the current layer.
17 . The non-transitory computer-readable storage medium of claim 15 , wherein if all of the following conditions are met, the number of NNPF inferences is set equal to or larger than one: a purpose of the NNPF does not comprise picture rate upsampling, the number of pictures used as input for the NNPF is greater than 1, the NNPF is allowed to be used for post-processing filtering for the current picture and all subsequent pictures of a current layer in the output order until a termination condition is met, the current layer is a layer comprising the current picture, a picture generated by the NNPF that corresponds to a single input picture in the at least one input picture is output by the NNPF, the current picture is the last picture of the bitstream in output order that has an identifier of a layer equal to an identifier of the current layer.
18 . A non-transitory computer-readable recording medium storing a bitstream of a video which is generated by a method performed by an apparatus for video processing, wherein the method comprises: performing a conversion between the video and the bitstream, wherein a neural-network post-processing filter (NNPF) is applied on a current picture associated with the video based on at least one input picture for the NNPF, and all of pictures outputted by the NNPF do not precede the first picture in the bitstream in an output order.
19 . The non-transitory computer-readable recording medium of claim 18 , wherein if all of the following conditions are met, an NNPF inference is repeated until a picture corresponding to the last picture is generated by the NNPF: a purpose of the NNPF does not comprise picture rate upsampling, the number of pictures used as input for the NNPF is greater than 1, the NNPF is allowed to be used for post-processing filtering for the current picture and all subsequent pictures of a current layer in the output order until a termination condition is met, the current layer being a layer comprising the current picture, a picture generated by the NNPF that corresponds to a single input picture in the at least one input picture is output by the NNPF, the current picture is the last picture of the bitstream in output order that has an identifier of a layer equal to an identifier of the current layer.
20 . The non-transitory computer-readable recording medium of claim 18 , wherein if all of the following conditions are met, the number of NNPF inferences is set equal to or larger than one: a purpose of the NNPF does not comprise picture rate upsampling, the number of pictures used as input for the NNPF is greater than 1, the NNPF is allowed to be used for post-processing filtering for the current picture and all subsequent pictures of a current layer in the output order until a termination condition is met, the current layer is a layer comprising the current picture, a picture generated by the NNPF that corresponds to a single input picture in the at least one input picture is output by the NNPF, the current picture is the last picture of the bitstream in output order that has an identifier of a layer equal to an identifier of the current layer.

Description

CROSS REFERENCE This application is a continuation of International Application No. PCT/US2024/036575, filed on Jul. 2, 2024, which claims the benefit of U.S. Provisional Application No. 63/511,818, filed on Jul. 3, 2023. The entire contents of these applications are hereby incorporated by reference in their entireties. FIELDS Embodiments of the present disclosure relates generally to video processing techniques, and more particularly, to a neural-network post-processing filter (NNPF). BACKGROUND In nowadays, digital video capabilities are being applied in various aspects of peoples' lives. Multiple types of video compression technologies, such as MPEG-2, MPEG-4, ITU-TH.263, ITU-TH.264/MPEG-4 Part 10 Advanced Video Coding (AVC), ITU-TH.265 high efficiency video coding (HEVC) standard, versatile video coding (VVC) standard, have been proposed for video encoding/decoding. However, the functionality of video coding techniques is generally expected to be further improved. SUMMARY Embodiments of the present disclosure provide a solution for video processing. In a first aspect, a method for video processing is proposed. The method comprises: performing a conversion between a video and a bitstream of the video, wherein a neural-network post-processing filter (NNPF) is applied on a current picture associated with the video based on at least one input picture for the NNPF, and all of pictures outputted by the NNPF do not precede the first picture in the bitstream in an output order. Based on the method in accordance with the first aspect of the present disclosure, it is specified that all of pictures outputted by the NNPF do not precede the first picture in the bitstream in an output order. Compared with the conventional solution lacking such a constraint, the proposed method can advantageously avoid outputting a picture before outputting the first picture in a bitstream regardless of a purpose of the NNPF. Thereby, an extrapolation of pictures can be avoided and thus a proper functionality of NNPF can be ensured. In a second aspect, an apparatus for video processing is proposed. The apparatus comprises a processor and a non-transitory memory with instructions thereon. The instructions upon execution by the processor, cause the processor to perform a method in accordance with the first aspect of the present disclosure. In a third aspect, a non-transitory computer-readable storage medium is proposed. The non-transitory computer-readable storage medium stores instructions that cause a processor to perform a method in accordance with the first aspect of the present disclosure. In a fourth aspect, another non-transitory computer-readable recording medium is proposed. The non-transitory computer-readable recording medium stores a bitstream of a video which is generated by a method performed by an apparatus for video processing. The method comprises: performing a conversion between the video and the bitstream, wherein a neural-network post-processing filter (NNPF) is applied on a current picture associated with the video based on at least one input picture for the NNPF, and all of pictures outputted by the NNPF do not precede the first picture in the bitstream in an output order. In a fifth aspect, a method for storing a bitstream of a video is proposed. The method comprises: generating the bitstream from the video; and storing the bitstream in a non-transitory computer-readable recording medium, wherein a neural-network post-processing filter (NNPF) is applied on a current picture associated with the video based on at least one input picture for the NNPF, and all of pictures outputted by the NNPF do not precede the first picture in the bitstream in an output order. This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter. BRIEF DESCRIPTION OF THE DRAWINGS Through the following detailed description with reference to the accompanying drawings, the above and other objectives, features, and advantages of example embodiments of the present disclosure will become more apparent. In the example embodiments of the present disclosure, the same reference numerals usually refer to the same components. FIG. 1 illustrates a block diagram that illustrates an example video coding system, in accordance with some embodiments of the present disclosure; FIG. 2 illustrates a block diagram that illustrates a first example video encoder, in accordance with some embodiments of the present disclosure; FIG. 3 illustrates a block diagram that illustrates an example video decoder, in accordance with some embodiments of the present disclosure; FIG. 4 illustrates an illustration of luma data channels; FIG. 5 illustrates a flowchart of a method for video processing in accordance wit