KR-20260066717-A - Adaptive quantization of neural network weights for convolutional neural network filters in video coding

KR20260066717AKR 20260066717 AKR20260066717 AKR 20260066717AKR-20260066717-A

Abstract

A device for decoding video data is configured to determine first neural network (NN) weights for layers of a convolutional neural network (CNN) filter; derive quantization values for layers of the CNN filter based on the values of the first NN weights; convert the first NN weights into second NN weights based on the quantization values; and filter blocks of video data using the second NN weights.

Inventors

리, 윤
루사노프스키, 드미트로
카르체비치, 마르타

Assignees

퀄컴 인코포레이티드

Dates

Publication Date: 20260512
Application Date: 20240913
Priority Date: 20240912

Claims (20)

As a method for decoding video data, A step of determining first neural network (NN) weights for layers of convolutional neural network (CNN) filters; A step of deriving a quantization value for a layer of the CNN filter based on the values of the first NN weights; A step of converting the first NN weights into second NN weights based on the quantization values; A step of filtering a block of the video data using the second NN weights to generate a filtered block of the video data; and A method comprising the step of outputting a picture of decoded video data including the filtered block above.
A method according to claim 1, wherein the step of deriving a quantization value for a layer of the CNN filter based on the values of the first NN weights comprises the step of testing candidate quantization values to determine the maximum difference between the original parameter value and the quantized-post-inversely quantized version of the original parameter value generated by the candidate quantization values.
A method according to claim 1, wherein the step of deriving a quantization value for a layer of the CNN filter based on the values of the first NN weights includes the step of testing candidate quantization values to determine the amount of clipping generated by the candidate quantization values.
A method according to claim 1, wherein the step of deriving a quantization value for a layer of the CNN filter based on the values of the first NN weights comprises the step of testing candidate quantization values to determine the number of values clipped due to overflow and the magnitude of clipping errors generated by the candidate quantization values.
A method according to claim 1, wherein the step of deriving a quantization value for a layer of the CNN filter based on the values of the first NN weights comprises the step of testing candidate quantization values to determine the percentage of overflow parameters generated by the candidate quantization values.
A method according to claim 1, wherein the step of deriving a quantization value for a layer of the CNN filter based on the values of the first NN weights includes the step of testing candidate quantization values and determining the percentage of parameters that overflow as a result of the candidate quantization values.
In paragraph 1, The step of deriving a quantization value for a layer of the CNN filter based on the values of the first NN weights includes the step of testing candidate quantization values to determine a quantization error generated by the candidate quantization values and a clipping error generated by the candidate quantization values. The above quantization error corresponds to the maximum difference between the original parameter value and the quantization-post-inverse quantized version of the original parameter value generated by the above candidate quantization value, and A method in which the clipping error corresponds to one or more of the amount of clipping generated by the candidate quantization value or the percentage of parameters overflowing as a result of the candidate quantization value.
A method according to claim 1, wherein the step of deriving a quantization value for a layer of the CNN filter based on the values of the first NN weights includes the step of determining the quantization value by testing a plurality of candidate quantization values.
A method according to claim 1, wherein the first NN weights include floating-point precision values and the second NN weights include integer precision values.
In claim 1, the decoding method is performed as part of a video encoding process.
As a device for decoding video data, Memory configured to store video data; It includes one or more processors implemented in a circuit, and said one or more processors, Determine the first neural network (NN) weights for the layers of the convolutional neural network (CNN) filter; Based on the values of the first NN weights, derive a quantization value for the layer of the CNN filter; Converting the first NN weights into second NN weights based on the above quantization values; Filtering a block of the video data using the above second NN weights to generate a filtered block of the video data; A device configured to output a picture of decoded video data including the filtered block above.
In claim 11, the device is further configured such that, in order to derive a quantization value for a layer of the CNN filter based on the values of the first NN weights, the one or more processors test candidate quantization values to determine the maximum difference between the original parameter value and the quantized-post-inverse quantized version of the original parameter value generated by the candidate quantization values.
In claim 11, the device is further configured such that, in order to derive a quantization value for a layer of the CNN filter based on the values of the first NN weights, the one or more processors test a candidate quantization value to determine the amount of clipping generated by the candidate quantization value.
A device according to claim 11, wherein, in order to derive a quantization value for a layer of the CNN filter based on the values of the first NN weights, the one or more processors are configured to test candidate quantization values to determine the number of values clipped due to overflow and the magnitude of clipping errors generated by the candidate quantization values.
In claim 11, the device is further configured such that, in order to derive a quantization value for a layer of the CNN filter based on the values of the first NN weights, the one or more processors test a candidate quantization value to determine the percentage of overflow parameters generated by the candidate quantization value.
In claim 11, the device is further configured such that, in order to derive a quantization value for a layer of the CNN filter based on the values of the first NN weights, the one or more processors test a candidate quantization value to determine the percentage of parameters that overflow as a result of the candidate quantization value.
In Paragraph 11, In order to derive a quantization value for a layer of the CNN filter based on the values of the first NN weights, the one or more processors are further configured to test candidate quantization values to determine a quantization error generated by the candidate quantization values and a clipping error generated by the candidate quantization values, and The above quantization error corresponds to the maximum difference between the original parameter value and the quantization-post-inverse quantized version of the original parameter value generated by the above candidate quantization value, and The above clipping error corresponds to one or more of the amount of clipping generated by the candidate quantization value or the percentage of parameters overflowing as a result of the candidate quantization value.
A device according to claim 11, wherein, in order to derive a quantization value for a layer of the CNN filter based on the values of the first NN weights, the one or more processors are further configured to test a plurality of candidate quantization values to determine the quantization value.
A device according to claim 11, wherein the first NN weights include floating-point precision values and the second NN weights include integer precision values.
As a method of encoding video data, A step of determining first neural network (NN) weights for layers of a convolutional neural network (CNN) filter; A step of deriving a quantization value for a layer of the CNN filter based on the values of the first NN weights; A step of converting the first NN weights into second NN weights based on the quantization values; A step of filtering a block of the first video data using the second NN weights to generate a filtered block of the video data; A step of storing a picture of decoded video data including the filtered block above; and A method comprising the step of predicting a block of second video data based on the above-mentioned stored picture.

Description

Adaptive quantization of neural network weights for convolutional neural network filters in video coding This application claims priority to U.S. Patent Application No. 18/883,696 filed September 12, 2024 and U.S. Provisional Patent Application No. 63/582,724 filed September 14, 2023, the entire contents of which are incorporated herein by reference. U.S. Patent Application No. 18/883,696 filed September 12, 2024 claims the benefit of U.S. Provisional Patent Application No. 63/582,724 filed September 14, 2023. Technology field The present disclosure relates to video encoding and video decoding. Digital video capabilities can be integrated into a wide range of devices, including digital televisions, digital direct broadcast systems, wireless broadcast systems, personal digital assistants (PDAs), laptops or desktop computers, tablet computers, e-book readers, digital cameras, digital recording devices, digital media players, video gaming devices, video game consoles, cellular or satellite radio phones, so-called "smartphones," video teleconferencing devices, video streaming devices, etc. Digital video devices implement video coding techniques such as standards defined by MPEG-2, MPEG-4, ITU-T H.263, ITU-T H.264/MPEG-4, Part 10, Advanced Video Coding (AVC), High Efficiency Video Coding (HEVC), and Versatile Video Coding (VVC), as well as extensions of such standards, as well as proprietary video codecs/formats such as AV1 (AOMedia Video 1) developed by the Alliance for Open Media. By implementing such video coding techniques, video devices can transmit, receive, encode, decode, and/or store digital video information more efficiently. Video coding techniques include spatial (intra-picture) prediction and/or temporal (inter-picture) prediction to reduce or eliminate redundancy inherent in video sequences. For block-based video coding, a video slice (e.g., a video picture, or a part of a video picture) may be partitioned into video blocks, which may also be referred to as coding tree units (CTUs), coding units (CUs), and/or coding nodes. Video blocks in an intra-coded (I) slice of a picture are encoded using spatial prediction for reference samples in neighboring blocks of the same picture. Video blocks in an inter-coded (P or B) slice of a picture may use spatial prediction for reference samples in neighboring blocks of the same picture, or temporal prediction for reference samples in other reference pictures. Pictures may be referred to as frames, and reference pictures may be referred to as reference frames. Studies have shown that neural network (NN)-based filtering techniques can potentially significantly improve video data being decoded and/or played back. NN-based filtering techniques are very complex and may require significant processing power to be executed effectively. Additionally, results from filtering using floating-point operations are not playable across different hardware platforms and can therefore generate drift during decoding. This disclosure describes simplifications and adaptive quantization techniques that can be applied to NN-based filtering techniques while also maintaining coding quality. As described in more detail below, parameters of an NN model, such as NN weights, are typically defined in floating-point precision. To apply an NN filter, a video coder converts floating-point precision values into integer precision values using a function that involves quantization with rounding. Quantization processes using rounding reduce the computational complexity of implementing CCN filters, reduce or eliminate decoding drift, and reduce the number of operations that result in values exceeding a specified dynamic range, such as 16 bits. In some coding scenarios, reducing quantization precision can increase the amount of deviation between the integer precision representation of the NN model and the floating-point representation of the NN model, which ultimately reduces the quality of filtering obtained from the NN model. However, at the same time, increasing quantization precision can increase the number of operations that exceed the dynamic range and require clipping, which also reduces the quality of filtering obtained from the NN model. To perform conversion from floating-point to integer precision, video coders are configured to use fixed quantization values. However, for some coding scenarios, this uniform fixed quantization precision introduces more deviation than is necessary between the floating-point representation of the NN model and the integer representation of the NN model. According to the techniques of the present disclosure, rather than using fixed quantization precision, the video decoder may be configured to derive quantization values for layers of convolutional neural network (CNN) filters based on the values of NN weights and/or input attributes. By deriving quantization values based on the values of NN weights and/or inputs, the video coder can achieve better filtering. Acc