EP-4449309-B1 - OPERATION OF A NEURAL NETWORK WITH CONDITIONED WEIGHTS

EP4449309B1EP 4449309 B1EP4449309 B1EP 4449309B1EP-4449309-B1

Inventors

IKONIN, Sergey Yurievich
ALSHINA, Elena Alexandrovna
KOYUNCU, Esin
SYCHEV, Maxim Borisovitch
KARABUTOV, Alexander Alexandrovich
SOSULNIKOV, Mikhail Vyacheslavovich
SOLOVYEV, Timofey Mikhailovich

Dates

Publication Date: 20260506
Application Date: 20220314

Claims (17)

A computer-implemented method (1200) of operating a neural network comprising a neural network layer comprising or connected with an accumulator register for buffering summation results and having a pre-defined accumulator register size of n bits, wherein n is a positive integer value, the method (1200) comprising the steps of defining an integer lower threshold value, A, and an integer upper threshold value, B, for values of integer numbers comprised in data entities of input data for the neural network layer; if a value of an integer number comprised in a data entity of the input data is smaller than the defined integer lower threshold value, clipping (S1210) the value of the integer number comprised in the data entity of the input data to the defined integer lower threshold value, and if a value of an integer number comprised in a data entity of the input data is larger than the defined integer upper threshold value, clipping (S1210) the value of the integer number comprised in the data entity of the input data to the defined integer upper threshold value; and determining (S1220) integer valued weights of the neural network layer based on the defined integer lower threshold value, the defined integer upper threshold value and the pre-defined accumulator register size, such that integer overflow of the accumulator register can be avoided; and wherein the neural network layer is configured to perform a summation D + ∑ x i ∈ X w i ∈ W w i x i wherein D denotes an integer value, W denotes a subset of trainable layer weights, and X denotes one of a set and a subset of the input data of the neural network layer; and wherein a) the integer valued weights {w i } are determined to fulfill the conditions max B 0 ∑ w i ∈ W w i > 0 w i + max − A , 0 ∑ w i ∈ W w i < 0 w i + max D 0 ≤ 2 n − 1 − 1 max − A , 0 ∑ w i ∈ W w i > 0 w i + max B 0 ∑ w i ∈ W w i < 0 w i + max − D , 0 ≤ 2 n − 1 or b) the integer valued weights {w i } are determined to fulfill the conditions ∑ w i ∈ W w i ≤ 2 n − k − max D 0 2 k − 1 − 1 ∑ w i ∈ W w i ≤ 2 n − k − max − D , 0 2 k − 1 or the condition ∑ w i ∈ W w i ≤ 2 n − k − D 2 k − 1 − 1 wherein in both cases the integer lower threshold value A is less than or equal to 0 and given by -2 k-1 and the integer upper threshold value B is greater than or equal to 0 and given by 2 k-1 -1, wherein k denotes a pre-defined bitdepth of the layer input data.
The method (1200) according to claim 1, wherein the neural network layer is a fully connected layer or a convolutional neural network layer.
The method (1200) according to claim 1 or 2, wherein the accumulator register size is equal to one of 32 bits and 16 bits.
The method according to any of the preceding claims, wherein D is equal to 0.
The method (1200) according to claim 2, wherein the neural network layer is a two-dimensional convolutional neural network layer and the summation ∑ w i ∈ W w i is obtained by the following equation: ∑ i = 0 C in − 1 ∑ k 1 = 0 K 1 − 1 ∑ k 2 = 0 K 2 − 1 w ijk 1 k 2 wherein C in denotes the number of input channels of the neural network layer, K 1 and K 2 denote convolution kernel sizes and j denotes an index of an output channel of the neural network layer, or wherein the neural network layer is an N dimensional convolutional neural network layer and the summation ∑ w i ∈ W w i is obtained by the following equation: ∑ i = 0 C in − 1 ∑ k 1 = 0 K 1 − 1 ∑ k 2 = 0 K 2 − 1 ⋯ ∑ k N = 0 K N − 1 w ij 1 k 2 … k N wherein C in denotes the number of input channels of the neural network layer, K 1 , K 2 , ..K N denote convolution kernel sizes and j denotes an index of an output channel of the neural network layer.
The method (1200) according to any of preceding claims, wherein the neural network layer comprises an attention mechanism.
The method (1200) according to any of the preceding claims, further comprising providing weights and scaling the weights by first scaling factors to obtain scaled weights and rounding the scaled weights to the respective closest integer values to obtain the integer valued weights, wherein the rounding is performed by the floor function or ceil function, and wherein the weights are real valued weights and first scaling factors are given by 2 sj wherein s j denotes the number of bits representing the fractional parts of the real valued weights.
The method (1200) according to any of the preceding claims, further comprising scaling data entities of the input data by second scaling factors to obtain scaled values of the data entities and further comprising rounding the scaled values of the data entity to the respective closest integer values to obtain the integer values of the data entities.
A method of encoding data, comprising the steps of the method (1200) of operating a neural network according to any of the preceding claims.
A method of decoding encoded data, comprising the steps of the method (1200) of operating a neural network according to one of the claims 1 to 8.
A method of encoding at least a portion of an image, comprising transforming a tensor representing a component of the image into a latent tensor; providing an entropy model; and processing the latent tensor by means of a neural network based on the provided entropy model to generate a bitstream; wherein the providing of the entropy model comprises performing the steps of the method (1200) according to any of the claims 1 to 8.
A method of reconstructing at least a portion of an image, comprising providing an entropy model; processing a bitstream by means of a neural network based on the provided entropy model to obtain a latent tensor representing a component of the image; and processing the latent tensor to obtain a tensor representing the component of the image; wherein the providing of the entropy model comprises performing the steps of the method (1200) according to any of the claims 1 to 8.
A computer program product comprising a code stored on a non-transitory medium which, when the code is executed by one or more processors, causes the one or more processors to perform the method (1200) according to any one of the claims 1 to 12.
An apparatus (1500) for encoding data, wherein the apparatus comprises processing circuitry (1510) configured for performing the steps of the method according to any of the claims 1 to 9, and 11.
An apparatus (1500) for encoding at least a portion of an image, comprising processing circuitry (1510) configured for transforming a tensor representing a component of the image into a latent tensor, providing an entropy model comprising performing the steps of the method (1200) according to any of the claims 1 to 8 and processing the latent tensor by means of a neural network based on the provided entropy model to generate a bitstream.
An apparatus (1500) for decoding data, comprising processing circuitry (1510) configured for performing the steps of the method according to any of the claims 1 to 8, and 10.
An apparatus (1500) for decoding at least a portion of an encoded image, comprising processing circuitry (1510) configured for providing an entropy model comprising performing the steps of the method (1200) according to any of the claims 1 to 8, processing a bitstream by means of a neural network based on the provided entropy model to obtain a latent tensor representing a component of the image, and processing the latent tensor to obtain a tensor representing the component of the image.

Description

TECHNICAL FIELD Embodiments of the present disclosure generally relate to the field of encoding and decoding data based on a neural network architecture. In particular, some embodiments relate to methods and apparatuses for such encoding and decoding images and/or videos from a bitstream using a plurality of processing layers. BACKGROUND Hybrid image and video codecs have been used for decades to compress image and video data. In such codecs, a signal is typically encoded block-wisely by predicting a block and by further coding only the difference between the original bock and its prediction. In particular, such coding may include transformation, quantization and generating the bitstream, usually including some entropy coding. Typically, the three components of hybrid coding methods - transformation, quantization, and entropy coding - are separately optimized. Modern video compression standards like High-Efficiency Video Coding (HEVC), Versatile Video Coding (VVC) and Essential Video Coding (EVC) also use transformed representations to code a residual signal after prediction. Recently, neural network architectures have been applied to image and/or video coding. In general, these neural network (NN) based approaches can be applied in various different ways to the image and video coding. For example, some end-to-end optimized image or video coding frameworks have been discussed. Moreover, deep learning has been used to determine or optimize some parts of the end-to-end coding framework such as selection or compression of prediction parameters or the like. Besides, some neural network based approaches have also been discussed for usage in hybrid image and video coding frameworks, e.g. for implementation as a trained deep learning model for intra or inter prediction in image or video coding. The end-to-end optimized image or video coding applications discussed above have in common that they produce some feature map data, which is to be conveyed between encoder and decoder. Neural networks are machine learning models that employ one or more layers of nonlinear units based on which they can predict an output for a received input. Some neural networks include one or more hidden layers in addition to an output layer. A corresponding feature map may be provided as an output of each hidden layer. Such corresponding feature map of each hidden layer may be used as an input to a subsequent layer in the network, i.e., a subsequent hidden layer or the output layer. Each layer of the network generates an output from a received input in accordance with current values of a respective set of parameters. In a neural network that is split between different devices, e.g. between encoder and decoder, or a device and a cloud , a feature map at the output of the site of splitting (e.g. a first device) is compressed and transmitted to the remaining layers of the neural network (e.g. to a second device). Further improvement of encoding and decoding using trained network architectures may be desirable. MARKUS NAGEL ET AL: "A White Paper on Neural Network Quantization", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 15 June 2021, disclose neural network quantization comprising quantization of input vectors, weights and activations. BRUIN BARRY DE ET AL: "Quantization of deep neural networks for accumulator-constrained processors", MICROPROCESSORS AND MICROSYSTEMS, IPC BUSINESS PRESS LTD. LONDON, GB, vol. 72, 14 August 2019, disclose quantization of deep neural networks. US 2021/358180 A1 discloses a method performed by one or more data processing apparatus for entropy encoding data which defines a sequence comprising a plurality of components, wherein each component specifies a respective code symbol from a predetermined discrete set of possible code symbols, the method comprising: for each component of the plurality of components: processing an input comprising: (i) a respective integer representation of each of one or more components of the data which precede the component in the sequence, (ii) an integer representation of one or more respective latent variables characterizing the data, or (iii) both, using an integer neural network to generate data defining a probability distribution over the predetermined set of possible code symbols for the component of the data, wherein: the integer neural network has a plurality of integer neural network parameter values, and each of the plurality of integer neural network parameter values are integers; the integer neural network comprises a plurality of integer neural network layers, each integer neural network layer is configured to process a respective integer neural network layer input to generate a respective integer neural network layer output, and processing an integer neural network layer input to generate an integer neural network layer output comprises: generating an intermediate result by processing the integer neural network layer input i