CN-122029807-A - Video processing for different color formats for loop filtering in video codec

CN122029807ACN 122029807 ACN122029807 ACN 122029807ACN-122029807-A

Abstract

A mechanism for processing video data is disclosed. The mechanism determines at least one unavailable spot of the filled video prior to feeding the video to the process. Conversion between the visual media data and the bitstream is performed based on the padded samples.

Inventors

LI YUE
ZHANG KAI
ZHANG LI

Assignees

字节跳动有限公司

Dates

Publication Date: 20260512
Application Date: 20241016
Priority Date: 20231016

Claims (20)

1. A method of processing media data using a neural network, comprising: determining at least one unavailable sample point of the video to be filled prior to feeding the video to the process, and Conversion between visual media data and a bitstream is performed based on the padded samples.
2. The method of claim 1, wherein the process is a Neural Network (NN) filter.
3. The method of any of claims 1-2, wherein the process is super-resolution, inter-prediction, or virtual reference frame generation, and wherein the process is based on an NN model.
4. The method of any of claims 1-2, wherein the process is super-resolution, inter-prediction, or virtual reference frame generation, and wherein the process is not based on an NN model.
5. A method according to any of claims 1-3, wherein an NN model is used for super resolution processing of blocks in inter-slices, and wherein the NN model receives as input filled chroma samples.
6. The method of any of claims 1-5, wherein the process comprises a loop filter without NN filtering, wherein the loop filter comprises an Adaptive Loop Filter (ALF) or a cross-component ALF (CC-ALF).
7. The method of any of claims 1-6, wherein the unavailable samples correspond to unavailable components.
8. The method of any of claims 1-7, wherein Cb and/or Cr components are deemed unavailable if the color format of the sample is YCbCr 4:0:0 or YUV 4:0:0.
9. The method of any of claims 1-8, wherein the NN model is trained based on a YUV 4:2:0 dataset, a YUV 4:4:4 dataset, or a YUV 4:2:2 dataset, wherein the video is in YUV 4:0:0 format, and wherein chroma-samples of the video are filled before the video is fed to the process.
10. The method of any of claims 1-9, wherein the NN filter receives Cb, cr, or Cb and Cr samples as inputs, and wherein when the video is in YCbCr 4:0:0 or YUV 4:0:0 format, the Cb, cr, or Cb and Cr samples, respectively, are padded before being input to the NN filter.
11. The method according to any of claims 1-10, wherein the padding value depends on the bit depth.
12. The method according to any of claims 1-11, wherein the filling value is set to Where b represents the bit depth.
13. The method according to any of claims 1-12, wherein padding values are signaled in the bitstream.
14. The method according to any of claims 1-13, wherein the filling value is determined on-the-fly.
15. The method of any of claims 1-14, wherein the padding values are determined based on the decoded information.
16. The method of any of claims 1-15, wherein the padding is applied in a High Operating Point (HOP) filter, a Low Operating Point (LOP) filter, or a combination thereof.
17. The method according to any of claims 1-16, wherein whether and/or how to determine whether to fill the at least one unavailable sample is signaled in a Sequence Parameter Set (SPS), a Picture Parameter Set (PPS), an Adaptive Parameter Set (APS), a slice header, a Codec Tree Unit (CTU), a Codec Unit (CU), or a combination thereof.
18. The method of any of claims 1-17, wherein whether and/or how to determine to populate the at least one unavailable sample is based on codec information of the sample, wherein the codec information includes color components, quantization parameters, temporal layers, or a combination thereof.
19. The method of any of claims 1-18, wherein the converting comprises encoding the visual media data into the bitstream.
20. The method of any of claims 1-18, wherein the converting comprises decoding the visual media data from the bitstream.

Description

Video processing for different color formats for loop filtering in video codec Cross Reference to Related Applications The present application claims priority and benefit from U.S. provisional patent application No. 63/590,699 filed on 10/16 of 2023, the entire contents of which are incorporated herein by reference. Technical Field The present disclosure relates to the processing of digital images and video. Background Digital video occupies the maximum bandwidth used on the internet and other digital communication networks. As the number of connected user devices capable of receiving and displaying video increases, the bandwidth requirements for digital video usage may continue to increase. Disclosure of Invention A first aspect relates to a method of processing video data using a neural network, comprising determining to fill at least one unavailable spot before feeding video to a process, and performing a conversion between visual media data and a bitstream based on the filled spot. A second aspect relates to an apparatus for processing video data, comprising a processor, and a non-transitory memory having instructions thereon, wherein the instructions when executed by the processor cause the processor to perform the method of any of the above aspects. A third aspect relates to a non-transitory computer readable medium comprising a computer program product for use by a video codec device, the computer program product comprising computer executable instructions stored on the non-transitory computer readable medium such that when executed by a processor cause the video codec device to perform the method of any of the above aspects. A fourth aspect relates to a non-transitory computer readable medium storing a bitstream of video generated by a method performed by a video processing apparatus, wherein the method comprises determining to fill at least one unavailable spot before feeding the video to a process, and generating the bitstream based on the determination. A fifth aspect relates to a method for storing a bitstream of video, comprising determining to fill at least one unavailable spot before feeding the video to a process, generating a bitstream based on the determination, and storing the bitstream in a non-transitory computer readable recording medium. A sixth aspect relates to a method, apparatus or system described in the present disclosure. For clarity, any of the above-described embodiments may be combined with any one or more of the other previously-described embodiments to create new embodiments within the scope of the present disclosure. These and other features will be more clearly understood from the following detailed description taken in conjunction with the accompanying drawings and claims. Drawings For a more complete understanding of this disclosure, reference is now made to the following brief description, taken in connection with the accompanying drawings and detailed description, wherein like reference numerals represent like parts. Fig. 1 shows an example of raster scan stripe segmentation of a picture. Fig. 2 shows an example of rectangular stripe segmentation of a picture. Fig. 3 shows an example of a picture divided into sheets, bricks and rectangular strips. Fig. 4A shows an example of a Coded Tree Block (CTB) crossing a bottom picture boundary. Fig. 4B shows an example of CTBs crossing right picture boundaries. Fig. 4C shows an example of CTBs crossing the lower right picture boundary. Fig. 5 shows an example of an encoder block diagram. Fig. 6 shows an example of a pre-processing and post-processing unit. Fig. 7 shows an example architecture of a Convolutional Neural Network (CNN) in filter set 0. Fig. 8 shows an example implementation of CNNs in filter set 0. FIG. 9 illustrates an example encoder optimization. Fig. 10A shows an example header of a luminance network. Fig. 10B illustrates an example subnetwork. Fig. 10C illustrates another example subnetwork. Fig. 11 shows an example time domain loop filter. Fig. 12A shows an example parameter selection at the encoder side. Fig. 12B shows an example parameter selection at the decoder side. Fig. 13 shows a schematic diagram of predicting a current block from the context of reference samples around the current block via a neural network based intra prediction mode. Fig. 14 shows a schematic diagram of decomposing the context of reference samples around the current block into usable reference samples and unusable reference samples. Fig. 15 illustrates intra prediction mode signaling of a current luma Codec Block (CB) within a dotted line box. Fig. 16 illustrates an example architecture of a High Operating Point (HOP) model. Fig. 17 shows an example architecture of a fused low complexity CNN filter set including canonical multiple (CP) decomposition and 1x1 convolutional layers. Fig. 18 shows an example parallel fusion of the outputs of a neural network based loop filter (NNLF) and a deblocking filter. Fig. 19 is a block diagram i