CN-121985138-A - Learning type image compression method based on stream matching residual error reconstruction

CN121985138ACN 121985138 ACN121985138 ACN 121985138ACN-121985138-A

Abstract

The invention provides a learning type image compression method based on stream matching residual error reconstruction, and relates to the technical field of image processing and computer vision. The method comprises the steps of constructing an image compression network model based on a flow matching latent variable refining module, carrying out residual reconstruction on a quantized latent variable by the flow matching latent variable refining module through a time continuous micro-flow field to compensate fine details lost in a quantization process, constructing a joint optimization objective function based on a rate distortion loss function and a flow matching loss function, carrying out end-to-end training on the whole image compression network model through an input image, and carrying out image compression on an image to be compressed through the trained image compression network model. By adopting the method and the device, the lost fine details in the quantization process can be effectively compensated without obviously increasing the reasoning delay, and the structural fidelity and the perceived quality of image reconstruction are obviously improved.

Inventors

ZENG HUI
Yi Fangzhou
LIU BO
ZHAO YUNSHENG
LIU TAO

Assignees

北京科技大学

Dates

Publication Date: 20260505
Application Date: 20260123

Claims (10)

1. A method of learning image compression based on stream matching residual reconstruction, the method comprising: constructing an image compression network model based on a flow matching latent variable refining module, wherein the flow matching latent variable refining module utilizes a time continuous micro-flow field to carry out residual reconstruction on a quantized latent variable and compensates fine details lost in the quantization process; constructing a joint optimization objective function based on a rate distortion loss function and a stream matching loss function, and performing end-to-end training on the whole image compression network model by using an input image; and carrying out image compression on the image to be compressed by using the trained image compression network model.
2. The method for learning image compression based on stream matching residual reconstruction of claim 1, wherein the image compression network model comprises an analysis transformation network Super analysis conversion network Super synthetic transformation network Flow matching latent variable refining module and synthetic transformation network ; The processing flow of the image compression network model comprises the following steps: transforming a network by analysis Will input the image Encoding into a first potential representation ; Transforming a network through super analysis Will first potentially represent Encoding into a second potential representation And is opposite to And (3) with Quantization is carried out to obtain quantized latent variables And ; From quantising latent variables Through super-synthesis transformation network Generating entropy model parameters, wherein the entropy model parameters comprise a mean value And standard deviation Mean value of For realizing Is performed in the quantization operation; The time continuous micro-flow field pair quantization latent variable is utilized by a flow matching latent variable refining module Modeling to predict quantized residuals Quantized residual to be predicted And quantifying latent variables Fusion to obtain a corrected latent variable ; Through a synthetic transformation network For correcting latent variable Decoding to generate a reconstructed image 。
3. The method for compressing a learning image based on stream matching residual reconstruction as claimed in claim 2, wherein for Quantization is carried out to obtain quantized latent variables : ; In the formula, For quantization operations, μ is the transform network by super synthesis Mean parameter of output 。
4. The method for compressing a learning image based on stream matching residual reconstruction as recited in claim 1, wherein the stream matching latent variable refining module uses time variable as the time variable As a continuous path through a velocity field function Constructing a normal differential equation: ; In the formula, Is a mapping function that varies with time; is the initial condition of the mapping function, i.e. in When the mapping function is an identity function, the value is quantized latent variable ; The ordinary differential equation is used for gradually transforming the basic distribution into a target residual distribution so as to predict the quantized residual 。
5. The method of image compression for learning based on stream-matched residual reconstruction of claim 4, wherein the latent variable is modified Expressed as: ; 。
6. The method for learning image compression based on stream matching residual reconstruction as claimed in claim 1, wherein the stream matching latent variable refining module includes a time embedding module and N stacked residual blocks, wherein, The time embedding module is used for receiving and processing time step information; stacked residual blocks for quantizing latent variables And the time parameter table output by the time embedding module is processed to obtain a corrected latent variable 。
7. The method of learning image compression based on stream-matched residual reconstruction of claim 6, wherein each residual block comprises: The main path comprises a first Conv-LN module and a second Conv-LN module, wherein each Conv-LN module comprises 3 parts in sequence The 3 convolution layers and the layer normalization layers are connected with the first Conv-LN module through scaling displacement operation and residual connection in sequence; A time adjustment path for receiving the time parameter table output from the time embedding module and generating a scaling parameter accordingly And displacement parameter ; A scaling displacement operation for affine transforming the output of the first Conv-LN module in the main path using the α and β; and a residual connection for adding an input of the residual block to an output of the scaling displacement operation.
8. The method for learning image compression based on stream matching residual reconstruction as set forth in claim 7, wherein the time adjustment path is used for generating the α and β, and the structure thereof sequentially includes a linear layer, a layer normalization layer and 1 1. A convolution layer.
9. The method for learning image compression based on stream matching residual reconstruction of claim 1, wherein the constructed joint optimization objective function Expressed as: ; In the formula, In order for the bit rate to be lost, In order to reconstruct the distortion, In order for the stream to match the loss, In order to be a lagrange multiplier, Is a super-parameter for balancing the two losses, In order to input an image of the subject, To reconstruct an image.
10. The method for learning image compression based on stream matching residual reconstruction of claim 9, wherein the stream matching loss For minimizing predicted velocity fields Reference velocity field The difference between them, in the form of: ; In the formula, Refers to time The mathematical expectation is found that, Represented as a two-norm number, Is the velocity field learned by the flow matching latent variable refining module, Is a known reference vector field which, Is distributed from the base To data distribution Intermediate probability density over the path.

Description

Learning type image compression method based on stream matching residual error reconstruction Technical Field The invention relates to the technical field of image processing and computer vision, in particular to a learning type image compression method based on stream matching residual error reconstruction. Background Image compression is one of the important basic technologies in the field of information processing, and the aim is to keep high reconstruction quality while reducing storage and transmission overhead as much as possible. Traditional compression algorithms (e.g., JPEG, h.265, h.266) are based primarily on artificially designed discrete transform and quantization schemes. Although the method is simple to realize, the fixed block structure of the method easily introduces obvious blocking effect and texture distortion at a low bit rate, and is difficult to meet the compression requirement of modern high-resolution multi-scene images. In recent years, development of deep learning technology has prompted a Learning Image Compression (LIC) method. Such methods have exceeded classical codecs in terms of multiple performance metrics by replacing traditional manual design modules with end-to-end trained neural networks. Compression models based on variational self-encoders (VAEs) can effectively optimize rate-distortion balance, and compression methods based on generative models (e.g., GAN, diffusion, flow Matching) have advantages in terms of perceived quality. However, the current learning-type compression framework still has the following technical bottlenecks: 1. Quantization residual errors are difficult to accurately recover, namely information loss and irreversible noise are introduced in quantization operation, so that the expression capacity of latent variables is limited; 2. The perceived quality is balanced with the structural distortion, the VAE model is biased towards the structural fidelity, and the generated model is biased towards the visual reality; therefore, there is a need for a new efficient generation compression framework that can effectively reconstruct quantized residual information while keeping the computational cost low, and achieve a better balance between rate-distortion and perceptual indicators. Disclosure of Invention In order to solve the technical problems that quantization residual errors are difficult to accurately recover and the tradeoff exists between perceived quality and structural distortion in the prior art, the embodiment of the invention provides a learning type image compression method based on stream matching residual error reconstruction. The technical scheme is as follows: In one aspect, there is provided a learning-type image compression method based on stream-matching residual reconstruction, the method being implemented by a learning-type image compression apparatus, the method comprising: constructing an image compression network model based on a flow matching latent variable refining module, wherein the flow matching latent variable refining module utilizes a time continuous micro-flow field to carry out residual reconstruction on a quantized latent variable and compensates fine details lost in the quantization process; constructing a joint optimization objective function based on a rate distortion loss function and a stream matching loss function, and performing end-to-end training on the whole image compression network model by using an input image; and carrying out image compression on the image to be compressed by using the trained image compression network model. Further, the image compression network model includes an analysis transformation networkSuper analysis conversion networkSuper synthetic transformation networkFlow matching latent variable refining module and synthetic transformation network; The processing flow of the image compression network model comprises the following steps: transforming a network by analysis Will input the imageEncoding into a first potential representation; Transforming a network through super analysisWill first potentially representEncoding into a second potential representationAnd is opposite toAnd (3) withQuantization is carried out to obtain quantized latent variablesAnd; From quantising latent variablesThrough super-synthesis transformation networkGenerating entropy model parameters, wherein the entropy model parameters comprise a mean valueAnd standard deviationMean value ofFor realizingIs performed in the quantization operation; The time continuous micro-flow field pair quantization latent variable is utilized by a flow matching latent variable refining module Modeling to predict quantized residualsQuantized residual to be predictedAnd quantifying latent variablesFusion to obtain a corrected latent variable; Through a synthetic transformation networkFor correcting latent variableDecoding to generate a reconstructed image。 Further, toQuantization is carried out to obtain quantized latent variables: ; In