US-12626416-B2 - Data encoding method and related device for lossless compression using trained volume preserving flow model

US12626416B2US 12626416 B2US12626416 B2US 12626416B2US-12626416-B2

Abstract

A data encoding method includes obtaining to-be-encoded data; processing the to-be-encoded data by using a volume preserving flow model to obtain a hidden variable output, where the volume preserving flow model includes a target volume preserving flow layer, an operation corresponding to the target volume preserving flow layer is an invertible operation that meets a volume preserving flow constraint, the target volume preserving flow layer is used to perform a multiplication operation on a preset coefficient and first data input to the target volume preserving flow layer, and the preset coefficient is not 1; and encoding the hidden variable output to obtain encoded data.

Inventors

Shifeng Zhang
Chen Zhang
Ning Kang
Zhenguo Li

Assignees

HUAWEI TECHNOLOGIES CO., LTD.

Dates

Publication Date: 20260512
Application Date: 20230825
Priority Date: 20210227

Claims (17)

1 . A method comprising: obtaining to-be-encoded data; processing the to-be-encoded data using a trained volume preserving flow model to obtain a latent variable output representing a bijective transformation of the to-be-encoded data, wherein the trained volume preserving flow model comprises a plurality of invertible layers comprising a target volume preserving flow layer, wherein the target volume preserving flow layer performs an invertible operation that meets a volume preserving flow constraint, wherein the volume preserving flow constraint comprises that input space and output space of an operation corresponding to a volume preserving operation layer have a same volume size, wherein the invertible operation comprises an element-wise multiplication operation on a preset coefficient vector and an input vector to the target volume preserving flow layer, and wherein the preset coefficient vector comprises N coefficients whose product is equal to 1; and encoding the latent variable output using entropy encoding based on a learned probability distribution associated with the latent variable output to obtain encoded data.
2 . The method of claim 1 , wherein the input vector comprises N elements that one-to-one correspond to the N coefficients, and wherein performing the an element-wise multiplication operation comprises performing the an element-wise multiplication operation on each of the N elements and a corresponding coefficient of the N coefficients to obtain a product result.
3 . The method of claim 1 , further comprising: processing, using a first neural network, input data to the target volume preserving flow layer to obtain a first network output; and performing a preset operation on the first network output to obtain the preset coefficient vector.
4 . The method of claim 3 , wherein the first network output is a vector and comprising N elements, and wherein performing the preset operation comprises: obtaining an average of the N elements and subtracting the average from each of the N elements to obtain processed N elements; and performing an exponentiation operation on each of the processed N elements using a natural constant e as a base to obtain the preset coefficient vector.
5 . The method of claim 3 , wherein an output at the target volume preserving flow layer comprises the input data.
6 . The method of claim 1 , further comprising performing, using the target volume preserving flow layer, an addition operation on a constant term and a product result of the input vector and the preset coefficient vector, wherein the constant term is not 0.
7 . The method of claim 6 , further comprising processing, using a second neural network, input data to the target volume preserving flow layer to obtain the constant term.
8 . The method of claim 1 , wherein the input vector comprises N elements that one-to-one correspond to the N coefficients, wherein the N elements comprise a first target element corresponding to a first target coefficient and a second target element corresponding to a second target coefficient, and wherein performing the element-wise multiplication operation comprises: obtaining a first fixed-point number corresponding to the first target element and a second fixed-point number corresponding to the second target element; obtaining a first fraction corresponding to the first target coefficient and a second fraction corresponding to the second target coefficient, wherein the first fraction comprises a first numerator and a first denominator, wherein the second fraction comprises a second numerator and a second denominator, wherein the first numerator, the first denominator, the second numerator, and the second denominator are integers, and wherein the first denominator is the same as the second numerator; performing a second multiplication operation on the first fixed-point number and the first numerator to obtain a first result; performing a first division operation on the first result and the first denominator to obtain a second result, wherein the second result comprises a first quotient result and a first remainder result, and wherein the first quotient result is a first multiplication result of the first target element and the first target coefficient; performing a third multiplication operation on the second fixed-point number and the second numerator to obtain a third result; performing an addition operation on the third result and the first remainder result to obtain a fourth result; and performing a second division operation on the fourth result and the second denominator to obtain a fifth result, wherein the fifth result comprises a second quotient result and a second remainder result, and wherein the second quotient result is a second multiplication result of the second target element and the second target coefficient.
9 . The method of claim 8 , wherein the second target element is a last element in the N elements on which a fourth multiplication operation is performed with a corresponding coefficient in a process of performing the element-wise multiplication operation, and wherein the target volume preserving flow layer further outputs the second remainder result.
10 . The method of claim 8 , wherein the volume preserving flow model further comprises a first volume preserving flow layer which is an adjacent volume preserving flow layer before the target volume preserving flow layer, and wherein performing the second multiplication operation comprises: obtaining a remainder result output by the first volume preserving flow layer; and performing the second multiplication operation and performing, using the first volume preserving flow layer, an addition operation on a multiplication operation result and the remainder result output to obtain the first result.
11 . The method of claim 1 , wherein the volume preserving flow model comprises M volume preserving flow layers that are connected in series, wherein the M volume preserving flow layers comprise the target volume preserving flow layer, wherein a first output at an (i−1) th volume preserving flow layer is a first input at an i th volume preserving flow layer, wherein i is a positive integer not greater than M, wherein a second input at a 1 st volume preserving flow layer is the to-be-encoded data, and wherein a second output at an M th volume preserving flow layer is the latent variable output.
12 . The method of claim 1 , wherein the volume preserving flow model further comprises a target convolutional layer connected to the target volume preserving flow layer, wherein an output at the target volume preserving flow layer is an input at the target convolutional layer, and wherein the method further comprising performing, using the target convolutional layer, a second multiplication operation on the output and a weight matrix.
13 . The method of claim 12 , wherein performing the second multiplication operation comprises: obtaining the weight matrix; performing lower-upper (LU) decomposition on the weight matrix to obtain a first matrix, a second matrix, a third matrix, and a fourth matrix, wherein the first matrix is a permutation matrix, wherein the second matrix is a lower triangular matrix, wherein the third matrix is a unit matrix whose product of diagonal elements is 1, and wherein the fourth matrix is an upper triangular matrix; performing a third multiplication operation on the output and the fourth matrix to obtain a sixth result; performing a fourth multiplication operation on the sixth result and the third matrix to obtain a seventh result; performing a fifth multiplication operation on the seventh result and the second matrix to obtain an eighth result; and performing a sixth multiplication operation on the eighth result and the first matrix to obtain a ninth result, wherein the ninth result is a result of the second multiplication operation.
14 . The method of claim 12 , wherein the volume preserving flow model comprises M volume preserving flow layers and M convolutional layers that are connected in series, wherein the M volume preserving flow layers comprise the target volume preserving flow layer, wherein the M convolutional layers comprise the target convolutional layer, wherein a first output at an i th volume preserving flow layer is a first input at an i th convolutional layer, wherein a second output at the i th convolutional layer is a second input at an (i+1) th volume preserving flow layer, wherein i is a positive integer not greater than M, wherein a third input at a 1 st volume preserving flow layer is the to-be-encoded data, and wherein a third output at an M th convolutional layer is the latent variable output.
15 . A method comprising: obtaining encoded data; decoding the encoded data to obtain a latent variable output representing a bijective transformation of the encoded data; and processing the latent variable output using a trained volume preserving flow model to obtain a decoding output, wherein the trained volume preserving flow model comprises a plurality of invertible layers comprising a target volume preserving flow layer, wherein the target volume preserving flow layer performs an invertible operation that meets a volume preserving flow constraint, wherein the volume preserving flow constraint comprises that input space and output space of an operation corresponding to a volume preserving operation layer have a same volume size, wherein the invertible operation comprises an element-wise division operation on a preset coefficient vector and an input vector to the target volume preserving flow layer, and wherein the preset coefficient vector comprises N coefficients whose product is equal to 1.
16 . The method of claim 15 , wherein the input vector comprises N elements that one-to-one correspond to the N coefficients, and wherein performing the element-wise division operation comprises performing the element-wise division operation on each of the N elements and a corresponding coefficient of the N coefficients to obtain a division result.
17 . An apparatus comprising: a memory configured to store instructions; and one or more processors coupled to the memory and configured to: obtain to-be-encoded data; process the to-be-encoded data using a trained volume preserving flow model to obtain a latent variable output representing a bijective transformation of the to-be-encoded data, wherein the trained volume preserving flow model comprises a plurality of invertible layers comprising a target volume preserving flow layer, wherein the target volume preserving flow layer performs an invertible operation that meets a volume preserving flow constraint, wherein the volume preserving flow constraint comprises that input space and output space of an operation corresponding to a volume preserving operation layer have a same volume size, wherein the invertible operation comprises an element-wise multiplication operation on a preset coefficient vector and an input vector to the target volume preserving flow layer, and wherein the preset coefficient vector comprises N coefficients whose product is equal to 1; and encode the latent variable output using entropy encoding based on a learned probability distribution associated with the latent variable output to obtain encoded data.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS This is a continuation of International Patent Application No. PCT/CN2022/077808 filed on Feb. 25, 2022, which claims priority to Chinese Patent Application No. 202110221722.X filed on Feb. 27, 2021. The disclosures of the aforementioned applications are hereby incorporated by reference in their entireties. TECHNICAL FIELD This application relates to the field of artificial intelligence, and in particular, to a data encoding method and a related device. BACKGROUND A core of lossless compression is searching for a distribution rule inside data. For example, in an English document, the letter e appears more frequently than the letter z. If e is stored by using shorter bits, a storage length of the document can be shorter, to implement document compression. Artificial intelligence (AI) lossless compression is a new technology field that uses artificial intelligence to perform lossless compression. A core of the artificial intelligence lossless compression is using AI to search for a better distribution rule inside data, and compress data based on data distribution to obtain a better lossless compression rate. During lossless compression, input data and a hidden variable output used for encoding need to be discrete and completely invertible. This directly limits selection and use of a data encoding/decoding method. Because if most data encoding/decoding methods perform a floating-point operation, a numerical error is caused, and an invertible operation cannot be implemented. In an existing implementation, lossless compression is performed by using an integer discrete flow (IDF) model. The integer discrete flow model uses integer addition and subtraction operations, to avoid a floating-point error, and ensure that a value of the flow model is invertible. During calculation, the model uses integer addition and subtraction operations for input data, to avoid a numerical error. In addition, both the input data x and a hidden variable output z=f(x) are integers, and f−1(f (x))=x can be ensured. However, because only the integer addition and subtraction operations are used in an encoding/decoding process, an IDF representation capability is poor, and data distribution cannot be accurately estimated, which causes a low compression rate. SUMMARY According to a first aspect, this application provides a data encoding method. The method includes obtaining the to-be-encoded data. The to-be-encoded data may be data of an image, a video, or a text. For example, the to-be-encoded data is data of an image. The image may be an image photographed by a terminal device by using a camera, or an image obtained from a terminal device (for example, an image stored in an album of the terminal device, or a picture obtained by the terminal device from a cloud). It should be understood that the image may be an image with an image compression requirement. A source of the to-be-processed image is not limited in this application. The to-be-encoded data is processed by using a volume preserving flow model to obtain a hidden variable output. The volume preserving flow model includes a target volume preserving flow layer, an operation corresponding to the target volume preserving flow layer is an invertible operation that meets a volume preserving flow constraint, the target volume preserving flow layer is used to perform a multiplication operation on a preset coefficient and first data input to the target volume preserving flow layer, and the preset coefficient is not 1. The target volume preserving flow layer may also be referred to as a target volume preserving coupling layer. The volume preserving flow constraint may be that input space and output space of an operation corresponding to a volume preserving operation layer have a same volume size. The same volume size means that data of the input space one-to-one corresponds to the data of the output space, and different output data corresponds to different input data. To ensure that the operation corresponding to the target volume preserving flow layer meets the volume preserving flow constraint, a product of linear coefficients in the operation corresponding to the target volume preserving flow layer needs to be 1. Further, the first data and the preset coefficient are vectors, the first data includes N elements, the preset coefficient includes N coefficients, the N elements included in the first data one-to-one correspond to the N coefficients, the N coefficients included in the preset coefficient is the linear coefficients in the operation corresponding to the target volume preserving flow layer, and a product of the N coefficients is 1. The invertible operation means that the operation may obtain output data based on input data, and may also inversely derive the input data based on the output data. For example, it is assumed that the input data is x, and the output data is z=f(x). x may also be restored from the output data z by using an inverse op