EP-4097581-B1 - METHOD AND APPARATUS FOR ADAPTIVE IMAGE COMPRESSION WITH FLEXIBLE HYPERPRIOR MODEL BY META LEARNING

EP4097581B1EP 4097581 B1EP4097581 B1EP 4097581B1EP-4097581-B1

Inventors

JIANG, WEI
WANG, WEI
LIU, SHAN
XU, XIAOZHONG

Dates

Publication Date: 20260513
Application Date: 20210806

Claims (13)

A method of adaptive neural image compression with a hyperprior model by meta-learning, the method being performed by at least one processor, and the method comprising: generating (1000) a statistic feature, based on an input image and a hyperparameter; generating (1001) a first shared feature and an estimated adaptive encoding parameter; encoding (1002) the input image to obtain a signal encoded image, based on the generated first shared feature and the generated estimated adaptive encoding parameter; generating (1003) a second shared feature and an estimated adaptive hyper encoding parameter; generating (1004) a hyper feature, based on the signal encoded image, the generated second shared feature, and the generated estimated adaptive hyper encoding parameter; compressing (1005) the obtained signal encoded image, the generating statistic feature, and the generated hyper feature; decoding (1006) the compressed signal encoded image to obtain a recovered image, the compressed statistic feature to obtain a recovered statistic feature, and the compressed hyper feature to obtain a recovered hyper feature; generating (1007) a third shared feature and an estimated adaptive hyper decoding parameter; generating (1008) a hyper prior feature, based on the recovered statistic feature, the generated third shared feature, and the generated estimated adaptive hyper decoding parameter; and generating (1009) a reconstructed image, based on the generated hyper prior feature and the obtained recovered image.
The method of claim 1, wherein the generating the first shared feature and the estimated adaptive encoding parameter comprises: generating the first shared feature of a layer in a plurality of layers in a first neural network, based on the generated statistic feature and a shared signal encoding parameter; and performing convolution to compute the estimated adaptive encoding parameter, based on the generated first shared feature, static feature, and an adaptive signal encoding parameter.
The method of claim 2, wherein the estimated adaptive encoding parameter is updated in the plurality of layers of the first neural network.
The method of claim 2, further comprising computing an encoded output of the layer in the plurality of layers of the first neural network, based on the generated first shared feature and the estimated adaptive encoding parameter, wherein the encoded output of a last layer of the plurality of layers of the first neural network is the signal encoded image.
The method of claim 1, wherein the generating the second shared feature and the estimated adaptive hyper encoding parameter comprises: generating the second shared feature of a layer in a plurality of layers in a second neural network, based on the generated statistic feature, the obtained signal encoded image, and a hyper encoding parameter; and performing convolution to compute the estimated adaptive hyper encoding parameter, based on the generated second shared feature, static feature, and an adaptive signal encoding parameter.
The method of claim 5, wherein the estimated adaptive hyper encoding parameter is updated in the plurality of layers of the second neural network.
The method of claim 5, further comprising generating a hyper output of the layer in the plurality of layers of the second neural network, based on the generated second shared feature and the estimated adaptive hyper encoding parameter, wherein the hyper output of a last layer of the plurality of layers of the second neural network is the hyper feature.
The method of claim 1, wherein the generating the third shared feature and the estimated adaptive hyper decoding parameter comprise: computing the third shared feature of a layer in a plurality of layers in a third neural network, based on the compressed hyper feature and a shared hyper decoding parameter; and performing convolution to compute the estimated adaptive hyper decoding parameter, based on the generated third shared feature, the recovered statistic feature, and an adaptive hyper decoding parameter.
The method of claim 8, wherein the estimated adaptive hyper decoding parameter is updated in the plurality of layers of the third neural network.
The method of claim 8, further comprising computing a hyper prior output of the layer in the plurality of layers of the third neural network, based on the generated third shared feature and the estimated adaptive hyper decoding parameter, wherein the hyper prior output of a last layer of the plurality of layers of the third neural network is the hyper prior feature.
The method of claim 1, further comprising: generating the fourth shared feature of a layer in a plurality of layers in a fourth neural network, based on the hyper prior feature, the recovered signal encoded image, and a shared signal decoding parameter; performing convolution to compute estimated adaptive signal decoding parameter, based on the generated fourth shared feature, the recovered statistic feature, and an adaptive hyper decoding parameter; and generating a decoded output of the layer in the plurality of layers of the fourth neural network, the estimated adaptive signal decoding parameter, wherein the decoded output of a last layer of the plurality of layers of the fourth neural network is the reconstructed image.
An apparatus for adaptive neural image compression with a hyperprior model by meta-learning, the apparatus comprising: at least one memory configured to store program code; and at least one processor configured to read the program code and operate as instructed by the program code, to perform the method according to any one of claims 1 to 11.
A non-transitory computer-readable medium storing instructions that, when executed by at least one processor for adaptive neural image compression with a hyperprior model by meta-learning, cause the at least one processor to perform the method according to any one of claims 1 to 11.

Description

BACKGROUND Standard groups and companies have been actively searching for potential needs for standardization of future video coding technology. These standard groups and companies have focused on artificial intelligence (AI)-based end-to-end neural image compression (NIC) using deep neural networks (DNNs). The success of this approach has brought more and more industrial interest in advanced neural image and video compression methodologies. Typically, a pre-trained NIC model instance is computed by using a set of training data, assuming that the training data covers the entire data distribution of all natural images and an universal model instance with pre-trained fixed model parameters can be obtained to work on all natural images. This assumption is not true in practice. Real natural images have various data distributions, and a pre-trained NIC model can only work well on a subset of images normally. It is highly desired that an NIC model can adaptively select its model parameters to accommodate different input images. SUMMARY The present invention is defined by the appended claims. According to embodiments, a method of adaptive neural image compression with a hyperprior model by meta-learning is performed by at least one processor and includes generating a statistic feature, based on an input image and a hyperparameter, and generating a first shared feature and an estimated adaptive encoding parameter, encoding the input image to obtain a signal encoded image, based on the generated first shared feature and the generated estimated adaptive encoding parameter, generating a second shared feature and an estimated adaptive hyper encoding parameter, generating a hyper feature, based on the obtained signal encoded image, the generated second shared feature, and the generated estimated adaptive hyper encoding parameter, and compressing the obtained signal encoded image, the generated statistic feature, and the generated hyper feature. The method further includes decoding the compressed signal encoded image to obtain a recovered image, the compressed statistic feature to obtain a recovered statistic feature, and the compressed hyper feature to obtain a recovered hyper feature, generating a third shared feature and an estimated adaptive hyper decoding parameter, generating a hyper prior feature, based on the recovered statistic feature, the generated third shared feature, and the estimated adaptive hyper decoding parameter; and generating a reconstructed image, based on the generated hyper prior feature and the obtained recovered image. According to embodiments, an apparatus for adaptive neural image compression with a hyperprior model by meta-learning includes at least one memory configured to store program code, and at least one processor configured to read the program code and operate as instructed by the program code, the program code including statistic feature generating code configured to cause the at least one processor to generate a statistic feature, based on an input image and a hyperparameter, a first shared feature generating code configured to cause the at least one processor to generate a first shared feature, an adaptive encoding code configured to cause the at least one processor to generate an estimated adaptive encoding parameter, encoding code configured to cause the at least one processor to encode the input image to obtain a signal encoded image, based on the first shared feature and the estimated adaptive encoding parameter, a second shared feature generating code configured to cause the at least one processor to generate a second shared feature, adaptive hyper encoding code configured to cause the at least one processor to generate an estimated adaptive hyper encoding parameter, a hyper feature generating code configured to cause the at least one processor to generate a hyper feature, based on the obtained signal encoded image, the second shared feature, and the estimated adaptive hyper encoding parameter, and compression code configured to cause the at least one processor to compress the obtained signal encoded image, the generated statistic feature, and the generated hyper feature. The program code further includes decoding code configured to cause the at least one processor to decode the compressed image to obtain a recovered image, the compressed statistic feature to obtain a recovered statistic feature, and the compressed hyper feature to obtain a recovered hyper feature, a third shared feature generating code configured to cause the at least one processor to generate a third shared feature, adaptive hyper decoding code configured to cause the at least one processor to generate an estimated adaptive hyper decoding parameter, a hyper prior feature generating code configured to cause the at least one processor to generate a hyper prior feature, based on the recovered statistic feature, the third shared feature, and the estimated adaptive hyper decoding parameter, and reconstruction cod