CN-121985139-A - Very low bit rate image compression method based on mixed attention and multi-scale entropy modeling

CN121985139ACN 121985139 ACN121985139 ACN 121985139ACN-121985139-A

Abstract

The invention discloses an extremely low bit rate image compression method based on mixed attention and multi-scale entropy modeling, which comprises the steps of obtaining a standard image data set comprising a training set and a test set, constructing an extremely low bit rate image compression network model based on double-branch mixed attention and multi-scale entropy modeling, constructing a loss function of an extremely low bit rate image compression network, inputting an original image in the test set into the extremely low bit rate image compression network loaded with optimal weight in a deployment stage, and obtaining a binary code stream after coding compression and decoding reconstructed images. According to the method, a multi-scale attention fusion mechanism is introduced into an autoregressive entropy model, the problem of dimension fragmentation of feature representation is solved through three parallel branches, and context features are enhanced jointly by modeling cross-dimension interaction, local spatial correlation and grouping channel correlation, so that potential spatial redundancy in a compression process is reduced.

Inventors

YUE SHUANG
CHEN ZHE
YIN FULIANG

Assignees

大连理工大学

Dates

Publication Date: 20260505
Application Date: 20260209

Claims (7)

1. A method of very low bit rate image compression based on mixed attention and multi-scale entropy modeling, comprising: Acquiring a standard image data set comprising a training set and a testing set; Constructing an extremely low bit rate image compression network model based on mixed attention and multi-scale entropy modeling, wherein the model comprises an extremely low bit rate image encoder, a decoder, a quantizer, a super-encoder, an super-quantizer and an entropy encoding module, the encoder and the decoder comprise a dual-branch mixed attention module, the entropy encoding module comprises a multi-scale attention fusion mechanism, the dual-branch mixed attention module comprises a self-attention mechanism based on a Swin window and a global group coordinate attention mechanism, and the multi-scale attention fusion mechanism comprises cross-latitude attention, improved window attention and channel group attention; Constructing a loss function of an extremely low bit rate image compression network, training the extremely low bit rate image compression network on a training set, updating network parameters, transmitting gradient values of the network by using a back propagation algorithm in the training process, repeatedly training and updating the network parameters, enabling the bit rate to reach a set value, controlling the reconstructed image to approach an original high-quality image, obtaining the optimal weight, and completing the training process of the extremely low bit rate image compression network; In the deployment stage, the original images in the test set are input into an extremely low bit rate image compression network loaded with the optimal weight, and the encoded compressed binary code stream and the decoded reconstructed image are obtained.
2. The method for compressing an extremely low bit rate image based on mixed attention and multi-scale entropy modeling of claim 1, wherein said method comprises training an extremely low bit rate image compression network model by using a Lagrangian multiplier-based rate distortion optimization framework, minimizing bit rate by a loss function and reconstructing a weighted sum of distortions by: Let the original input image be Original image Obtaining a primary potential representation via an encoder Obtaining a discrete primary potential representation via a quantizer Principal latent representation Obtaining a super potential representation via a super encoder And quantized by an overdimensioner to obtain a discrete superpositive representation The discrete super potential representation For predicting discrete primary potential representations To construct a conditional probability model And build a probability model of the super potential representation The decoding end performs potential representation according to discrete main Reconstruction by a decoder to obtain a decoded image ; Loss function of the very low bit rate image compression network The method is represented by the following steps: Wherein, the Representative of the super potential representation at a given discrete time Latent representation of discrete principal under conditions The bit rate at which the encoding is performed, Representative pair discrete super potential representation A bit rate at which the encoding is performed, the bit rate being estimated by a negative log likelihood of a probability model, wherein , Representing the computation of an original image using a mean square error MSE And decoding the image Differences between (a) and (b), wherein Lagrange multiplier Is used to control the balance between bit rate and distortion, directly controlling the target output bit rate and reconstruction quality of the very low bit rate image compression model.
3. The method of claim 1, wherein the encoder of the very low bit rate image compression network comprises three residual block downsampled blocks and the decoder comprises three residual block upsampled blocks.
4. The method for compressing an extremely low bit rate image based on mixed attention and multi-scale entropy modeling as recited in claim 1, wherein said dual-branch mixed attention module comprises a Swin window-based self-attention mechanism and a global group coordinate attention mechanism with two weight sharing, wherein two attention branches are respectively input into an original image with the same size, so as to extract characteristics of two different scales, namely local and global.
5. A method of very low bit rate image compression based on mixed attention and multiscale entropy modeling as defined in claim 4 wherein the self-attention mechanism based on the Swin window is by dividing the image into size To reduce complexity, and to perform multi-headed self-attention within each window to extract local detail features.
6. The method for very low bit rate image compression based on mixed attention and multi-scale entropy modeling as defined in claim 4, wherein the global group coordinate attention mechanism divides the input feature X into K sub-features in the channel dimension by grouping operation and inputs the reshaped feature map Y onto four parallel paths for different operations, wherein two paths perform global average pooling and global maximum pooling along the height dimension and the other two paths perform global average pooling and global maximum pooling along the width dimension, and the process of performing operations along the height and width dimensions is as follows: Wherein, the Representing the sigmoid function, The convolution is represented by a representation of the convolution, The height dimension is indicated as such, Representing the width dimension of the web, Represents an average pooling of the data in the pool, The maximum pooling is indicated and the maximum pool is indicated, Representing the kth sub-feature; adding the spatial attention features obtained in two height dimensions, adding the spatial attention features obtained in two width dimensions, and multiplying the summed spatial attention features by Obtaining output characteristics of weight distribution The process is expressed as follows: Using adaptive fusion gating Each spatial location is independently assigned a weight to dynamically balance local and global information: Wherein, the The dimension map is represented as a map of dimensions, Representing a feature expansion matrix; The feature compression matrix is represented as such, Weight based on Swin window attention mechanism And global group coordinate attention mechanism By passing through Obtaining: The self-adaptive fusion is characterized in that: 。
7. The method for compressing an extremely low bit rate image based on dual-branch hybrid attention and multi-scale entropy modeling as recited in claim 3, wherein said multi-scale attention fusion mechanism comprises a cross-weft attention, a spatial window and a channel attention mechanism, wherein said multi-scale attention fusion mechanism is used for generating potential characteristics of an encoder Inputting a multi-scale attention fusion mechanism to perform three-scale feature fusion to obtain features : Wherein, the Representing the operation of a convolution, Represents the concentration of the weft crossing, the space and the channel, Representing the weight of the dynamic learning, Representing the attention branches in three dimensions, Representing element-by-element multiplication.

Description

Very low bit rate image compression method based on mixed attention and multi-scale entropy modeling Technical Field The invention relates to the technical field of image compression, in particular to an extremely low bit rate image compression method based on mixed attention and multi-scale entropy modeling. Background Very low bit rate image compression is an important research direction in the field of image processing and communication, and its core goal is to compress image data to very low bit rate levels while ensuring that the visual quality of the image is within an acceptable range. The technology has important application value in application scenes with limited bandwidth or limited storage resources, such as large-scale image storage and retrieval, satellite communication, underwater communication, deep space exploration and the like. At present, the traditional image compression technology forms a mature standard system, such as JPEG, JPEG2000, BPG and HEVC and VVC-based image compression standards, and is widely applied to the fields of Internet image transmission, digital photography, medical imaging and the like. However, such conventional codecs typically rely on artificially designed transform coding, quantization and entropy coding strategies, the core of which is based on linear transforms or piecewise linear models, with limited representation capabilities. When the bit rate is further reduced to an extremely low level, the conventional method is difficult to simultaneously consider the compression efficiency and the visual quality of the reconstructed image, and obvious distortion and detail loss are often caused. In recent years, with the development of deep learning technology, an image compression method based on learning is becoming a research hotspot. The method generally constructs an end-to-end neural network compression frame, enables the model to automatically learn high-efficiency implicit representation of image content by jointly optimizing an encoder, a quantization process and a decoder, and takes a minimized rate-distortion loss function as a training target, thereby realizing higher-quality image reconstruction under the condition of low bit rate. Image compression based on deep learning has significant advantages over conventional methods in terms of representation capability and modeling flexibility. However, at very low bit rates, existing methods of learning image compression still face a number of challenges. Although methods represented by a variational automatic encoder (Variational Autoencoder, VAE) and a generation countermeasure Network (GAN) have achieved very low bit rate compression to some extent, most methods have been inadequate in terms of high frequency detail, edge structure and retention of complex textures, problems such as blurring, artifacts or structural inconsistencies in reconstructed images are likely to occur, severely affecting subjective visual perception quality. Therefore, how to achieve high quality, realistic image reconstruction under the constraint of extremely low bit rate is still a key technical problem to be solved in the field. In the prior art, for example in paper Variational Image Compression WITH A SCALE Hyperprior, ball et al propose an image compression method based on nonlinear transform coding and introduce an end-to-end rate distortion optimization framework. Based on this rate-distortion optimization theory and its derivation on the lower bound of variation, ball et al then further propose an image compression framework based on a variation automatic encoder and characterize the spatial dependencies in the underlying representation by introducing a super-prior (Hyperprior) model, thereby improving compression performance. However, this type of method is mainly oriented to medium-low bit rate scenes, and it is still difficult to achieve satisfactory reconstruction quality under extremely low bit rate conditions. In paper GENERATIVE ADVERSARIAL Networks for Extreme LEARNED IMAGE Compression Agustsson et al propose an extremely low bit rate image Compression framework based on generation of a countermeasure network, which comprises two modes of generation Compression and selective generation Compression, which can preserve the overall structure and semantic content of an image at an extremely low bit rate, and realize the reconstruction of a full resolution image. However, the method relies on semantic mapping information, and the semantic mapping usually requires additional storage bits, so that the acquisition cost is high, meanwhile, instability exists in the training process of the generation countermeasure network, artifacts are easy to introduce, and the authenticity and consistency of the reconstructed image are affected. In addition, in the paper Frequency-Aware Transformer for LEARNED IMAGE Compression, li et al propose an image Compression method based on Frequency sensing, and the modeling capability of a model