CN-122023198-A - Image restoration method based on self-adaptive downsampling and annular scanning state space model
Abstract
The invention discloses an image restoration method based on a self-adaptive downsampling and annular scanning state space model. The SSM is adopted to build a backbone network, firstly, wavelet downsampling is introduced, and the self-adaptive downsampling module is provided in combination with the traditional stride convolution, so that the sampling characteristics are dynamically adjusted, and the leakage of useful information is prevented as much as possible. Secondly, by improving the SSM scanning strategy, more attention is paid to the area with high weight in reasoning and more pixel information can be borrowed for reasoning of missing pixels. A global-local perception module is set, global information is captured using extended convolution, context attention is used to help pixel level reconstruction, and filling of semantics and textures is balanced. The expression of the invention on CelebA-HQ and PARIS STREETVIEW datasets shows that the invention obtains the most advanced performance, ensures the integrity of semantic structures, and simultaneously better avoids blurring and smoothing on textures.
Inventors
- ZENG ZHILING
- ZHU YE
Assignees
- 河北工业大学
Dates
- Publication Date
- 20260512
- Application Date
- 20260203
Claims (10)
- 1. An image restoration method based on an adaptive downsampling and annular scanning state space model, which is characterized by comprising the following steps: acquiring an input damaged image and a mask corresponding to the input damaged image; Constructing an image restoration network, wherein the image restoration network comprises an encoder and a decoder, a global-local perception module connected with the encoder and the decoder, and a reconstruction module used for reconstructing an image of the output of the decoder, the encoder comprises a plurality of cascade encoding layers, and the decoder comprises a plurality of cascade decoding layers; The coding layer comprises an adaptive downsampling module and a state space model SSM or an annular state space model C-SSM, and the decoding layer comprises an SSM or C-SSM and an upsampling module; The self-adaptive downsampling module comprises two branches of stride convolution downsampling and Haar wavelet downsampling, wherein in the Haar wavelet downsampling branch, the two-dimensional discrete wavelet transformation performs one-dimensional discrete wavelet transformation on an input characteristic diagram in a row and a column to obtain a component with low frequency in the horizontal direction and a component with high frequency in the vertical direction, a component with high frequency in the horizontal direction and a component with low frequency in the vertical direction, and a component with high frequency in the horizontal direction and a component with high frequency in the vertical direction, wherein the resolution of the four components is halved compared with the input characteristic diagram, but the number of channels is unchanged, the input characteristic diagram is subjected to channel splicing to obtain a characteristic diagram of wavelet downsampling, and then the input characteristic diagram is subjected to channel splicing The convolution process obtains the final wavelet downsampling result, which is recorded as ; Is respectively input into a spatial attention module and a convolution layer to respectively obtain scaling parameters Offset parameter The downsampling result of the stride convolution downsampling branch is normalized and supplemented by the scaling parameter and the offset parameter to obtain the result of the self-adaptive downsampling module, which is used as the input of SSM or C-SSM; the global-local perception module comprises two branches, namely a texture perception branch taking context attention as a core and a semantic perception branch taking extended convolution as a core, wherein the texture perception branch comprises 5 branches Convolution, contextual attention, 2 Convolution composition, semantic aware branching includes 3 Rolling and 4 different expansion rates Fusing the feature graphs obtained by the texture perception branches and the semantic perception branches to obtain the input of a decoder; The annular state space model C-SSM replaces two of four horizontal left, horizontal right, vertical downward and vertical upward tiling scans with annular scans in horizontal and vertical directions; and training an image restoration network by using the damaged image and the mask corresponding to the damaged image, and performing image restoration by using the trained image restoration network.
- 2. The repair method according to claim 1, wherein the circular scanning is characterized in that the circular scanning is performed by starting from the upper left corner pixel point and performing serpentine spiral scanning from outside to inside (from the edge pixel point to the center pixel point target) in a clockwise direction, or starting from the upper left corner pixel point and performing serpentine spiral scanning from outside to inside (from the edge pixel point to the center pixel point target) in a counterclockwise direction.
- 3. The restoration method according to claim 1, wherein the context attention uses a pixel level, i.e. a patch size of 1, a cosine similarity is used to calculate a similarity between a foreground patch and a background patch, and then an attention score is updated by Softmax operation and shifting from left to right and up to down directions to obtain a final attention score, and the background patch is selected based on the attention score to reconstruct the foreground patch.
- 4. The repair method of claim 1, wherein the specification and replenishment process is: (2) (3) Wherein, the The normalization of the examples is shown and, The convolution is represented by a representation of the convolution, A step-wise convolution is represented and, The attention of the space is indicated and, For the input feature map of the adaptive downsampling module, Representing scaled parameters Offset parameter Normalization and post-supplementation downsampling results.
- 5. The repair method of claim 1, wherein the C-SSM employs a ring scan strategy based SSM, and wherein prior to entering the signature, the inputs are pre-processed using a channel attention mechanism to enhance the representational capacity of the signature, as shown in equation (4): (4) Wherein the method comprises the steps of Representing SSM based on a circular scanning strategy, The technique of regularization is represented by, Representing channel attention; The output of the annular state space model C-SSM, and f is the input of the annular state space model C-SSM.
- 6. The restoration method according to claim 1, wherein the image restoration network is provided with a low-level feature extraction module before the encoder input, the low-level feature extraction module including a convolution layer and a state space model SSM layer, the convolution layer including Convolution, instance normalization and GELU activation functions, the convolution layer changes the channel number of an input feature map from 4 to 32, and sends the channel number to the SSM layer for further processing, and the resolution of the feature map obtained by the low-level feature extraction module is unchanged and is kept at In the coding layer, in And Using a circular scanning strategy on pixel-sized feature processing, in And A tile scanning strategy is still used on the feature processing of pixel size.
- 7. The restoration method according to claim 1, wherein in the decoding layer, the input feature map is first sent to a C-SSM or SSM for processing, long-distance dependence is established, and then up-sampling is performed, and the up-sampling module is composed of Convolution, instance normalization and GELU activation functions, the resolution of the feature map is changed to 2 times of the input feature map by adopting nearest neighbor interpolation, and then the feature map is obtained by Convolution achieves halving of the number of channels.
- 8. The repairing method according to claim 1, wherein the reconstruction module comprises an SSM layer, The convolution and group Tanh functions, through which the input feature map passes, the SSM layer does not change resolution and channel number, The convolution changes the channel number into 3 channels of RGB image, and then the final repair result is obtained by the Tanh activation function, and the reconstruction is carried out Is performed on the pixel size of the module, and the size of the feature map input to the module is Wherein B represents BatchSize, and the number of channels is 48.
- 9. The repair method of claim 1, wherein the training process uses reconstruction loss Countering losses Loss of style And perceived loss Construction of the Total loss function : (11) Wherein, the 、 、 、 For the corresponding loss weight value(s), And Semantic and style information is learned by minimizing the distance of high-level feature variables.
- 10. A computer readable storage medium having stored thereon a computer program, characterized in that the program, when executed by a processor, is adapted to carry out the steps of the method according to any of claims 1-9.
Description
Image restoration method based on self-adaptive downsampling and annular scanning state space model Technical Field The invention belongs to the technical field of image restoration, and particularly relates to an image restoration method based on a self-adaptive downsampling and annular scanning state space model. Background Image restoration (IMAGE INPAINTING) is to infer the content of the missing region from the known regions existing in the image. With the development of information technology, digital images are an important way to record life and acquire information. However, the image quality tends to be poor and even broken due to factors such as shooting conditions, device quality, and noise during transmission. Originally, image restoration was mainly used for manual restoration of ancient works of art, but was inefficient and fragile. Currently, image restoration methods mainly include conventional methods and deep learning-based methods. Conventional image restoration techniques can be divided into two categories, patch-based image restoration methods and diffusion-based image restoration methods. The main ideas of the two methods are to infer the unknown information of the damaged area according to the similarity between pixels, and then to propagate the generated pixels by using a broadcasting mechanism, thereby completing the picture repair work. Although the traditional image restoration method breaks through in the neighborhood search, the traditional image restoration method relies on the existing information to generate patches, so that the pixels are easily discontinuous, the restoration results are insufficient in diversity, and complex restoration tasks are difficult to deal with. In recent years, with the continuous progress of computing hardware and the rapid development of deep learning technology, image restoration technology has made significant progress. The image semantic information can be captured by using the deep learning technology, semantic content is predicted on the basis of texture restoration, the defects of the traditional image restoration algorithm are well overcome, and the image restoration effect is obviously improved in the aspects of pixel level and integrity. LGnet proposed by Quan et al is a global-local-global progressively refined progressive network, which fully considers the influence of the receiving field on different types of missing regions and shows good robustness in the repair of semantics and textures. E2I adopts an edge detector based on a depth network to acquire an edge map of an incomplete image, fills a missing region in the edge map, and finally generates missing pixels with the aid of the complete edge map. The context of Yu et al, focusing on reconstructing a missing pixel from a remote spatial location, is an important breakthrough in depth image restoration, and has been widely used until now. Lana is based on fast Fourier convolution, so that the model has an acceptance field of the whole image range from early stage, and MFMAM is used for obtaining depth semantic information while retaining details as much as possible by fusing deep features with multi-scale shallow features obtained by expansion convolution. The GAN, transformer, VAE-based method generally represents a trivial aspect in diversity, as a new GAN inversion model INVERTFILL proposed by Yu et al, can eliminate obvious color differences and semantic inconsistencies through a StyleGAN generator with F & w+ potential space, and also guarantees diversity of repair results. However, the currently mainstream method still needs to be improved in balancing the restoration of semantics and textures, and the following disadvantages exist: First, many mainstream methods often lose important features during encoding, limited by the downsampling layer's own sampling characteristics and the number of network parameters. Second, due to the lack of features of the damaged image, the pixel values generated during the decoding stage are inaccurate, and the repair result is often too smooth, especially when facing high-frequency regions with large gradient changes. Thirdly, under the condition of large-scale missing of key objects in the image, the existing method is slightly insufficient in filling complete, reasonable and correct semantic structures. Disclosure of Invention Aiming at the defects of the prior method, the invention aims to solve the technical problem of providing an image restoration method based on a self-adaptive downsampling and annular scanning state space model, which is called ADCSNet for short, wherein a backbone network is built by adopting a state space model SSM, and the structure of the backbone network is shown in figure 1. In the encoding stage, focusing on the information leakage problem caused by the traditional stride convolution downsampling layer, introducing wavelet downsampling, and combining with the traditional stride convolution, the leakage of useful