CN-121998849-A - General image restoration method based on deformable attention mechanism

CN121998849ACN 121998849 ACN121998849 ACN 121998849ACN-121998849-A

Abstract

The invention relates to a general image restoration method based on a deformable attention mechanism, which comprises the following steps of obtaining a degraded image, inputting the degraded image into a pre-trained image restoration model for processing, outputting the restored image, wherein the step of outputting the restored image comprises the steps of inputting the degraded image into a multi-scale image pyramid for multi-scale feature extraction to obtain initial features of different scales, adopting a basic scale stage module for carrying out mixed processing of a spatial domain and a frequency domain and long-distance dependent modeling based on the deformable attention mechanism for the initial features of each scale to obtain deep features of each scale, adopting a deformable multi-scale feature aggregation mechanism module for carrying out feature alignment and fusion to obtain fusion features, and adopting an output reconstruction module for carrying out reconstruction processing based on the fusion features to obtain the restored image. Compared with the prior art, the invention has the advantages of improving the visual quality and the recovery precision of the image, and the like.

Inventors

LIN XIAO
CHEN YILIN
HUANG WEI
LI YAN
WU KUN
AN KANG

Assignees

上海师范大学

Dates

Publication Date: 20260508
Application Date: 20260130

Claims (10)

1. A method for generic image restoration based on a deformable attention mechanism, comprising the steps of: Obtaining a degraded image, inputting the degraded image into a pre-trained image restoration model for processing, and outputting a restoration image, wherein the image restoration model comprises a multi-scale image pyramid, a basic scale stage module, a deformable multi-scale feature aggregation mechanism module and an output reconstruction module, and the step of outputting the restoration image comprises the following steps: inputting the degraded image into a multi-scale image pyramid to perform multi-scale feature extraction to obtain initial features with different scales; For the initial characteristics of each scale, adopting a basic scale stage module to perform mixed processing of a space domain and a frequency domain and long-distance dependent modeling based on a deformable attention mechanism to obtain deep characteristics of each scale; Carrying out feature alignment and fusion on deep features of all scales by adopting the deformable multi-scale feature aggregation mechanism module to obtain fusion features; and based on the fusion characteristics, adopting an output reconstruction module to perform reconstruction processing to obtain a restored image.
2. The method of claim 1, wherein the base scale stage module comprises a plurality of cascaded coupling refinement blocks, each coupling refinement block comprising a cascaded spatial-frequency domain fusion module and a deformable recovery fransformer module, the step of obtaining fusion features comprising: for the initial characteristics of each scale, extracting and fusing the space domain and frequency domain characteristics by adopting the space-frequency domain fusion module to obtain the space-frequency domain fusion characteristics of each scale; And for the space-frequency domain fusion characteristics of each scale, the deformable recovery transducer module introduces a dynamic deformation mechanism in the characteristic extraction process to focus important information, so as to obtain deep characteristics of each scale.
3. The method for restoring a general image based on a deformable attention mechanism according to claim 2, wherein the spatial-frequency domain fusion module includes two branches, a spatial branch and a frequency domain branch, respectively, and the step of extracting the spatial-frequency domain fusion feature of each scale includes: For the initial feature of each scale, the spatial branches extract spatial local features by using depth separable convolution, which is expressed as follows: in the formula, For each coupled input of the refinement block, Is that The depth may be a function of a split convolution operation, In order to activate the function, Is a spatial local feature; For the initial feature of each scale, mapping the frequency domain branch to the frequency domain by adopting a Fourier transform method, extracting a real part and an imaginary part respectively, splicing the real part and the imaginary part in the channel dimension, and obtaining a frequency domain global feature complementary with the space local feature, wherein the frequency domain global feature is expressed as: in the formula, As a global feature of the frequency domain, The channel splice is indicated as such, And Representing the real and imaginary parts of the feature, Is a two-dimensional real number fast fourier transform; the space local features and the frequency domain global features are interacted and fused, so that the space local features are supplemented by the frequency domain global features, and the space frequency domain fusion features of each scale are obtained and expressed as follows: in the formula, For the final output of the space-frequency domain fusion module, i.e. the space-frequency domain fusion characteristics of each scale, Is a two-dimensional real inverse fast fourier transform, For the transformation of the frequency domain global features and the non-linear activation of the processed features, Is that And (5) convolution operation.
4. The method of claim 2, wherein the deformable recovery transform module comprises a cascade of deformable self-attention units and a gated deep convolutional feed forward network, and the step of obtaining deep features for each scale comprises: Setting the first Input features based on each dimension of the deformable recovery transducer module are ; The input features for each scale are The deformable self-attention unit is processed by adopting dynamic deformable self-attention operation to obtain initial depth characteristics of each scale, which are expressed as: in the formula, For the initial depth features of each scale, For the deformable self-attention unit operation, Representation layer normalization operations; And processing the initial depth characteristic of each scale by adopting a gating depth convolution feedforward network to obtain a final depth characteristic of each scale, wherein the final depth characteristic of each scale is expressed as: in the formula, For the initial depth features of each scale, Operating for a gated deep convolutional feed forward network.
5. The method for generic image restoration based on a deformable attention mechanism according to claim 4, wherein the step of executing the deformable self-attention unit comprises: For each scale of input features, convolutions of different expansion rates are used to generate queries Key and key Sum value ; For each scale of input features, the offset is predicted by convolution And is superimposed on the base grid Obtaining a dynamic sampling grid Wherein the offset is Expressed as: Sampling grid Expressed as: in the formula, Is that The convolution operation is performed in such a way that, Input features for each scale; based on the key Sum value Executing based on the dynamic sampling grid Bilinear interpolation sampling process of (1) to obtain deformed keys Sum value Expressed as: in the formula, Resampling operations, i.e., bilinear interpolation sampling operations; For the query Deformed key The shape is adjusted and an attention map is recalculated, expressed as: in the formula, In order to take care of the force of the drawing, Each position of (1) represents With all positions The degree of correlation between the two, To adjust the shape , To adjust the shape , As a result of the scale factor being a learnable, Is that A function; based on the attention map and the deformed values And carrying out weighted summation to obtain a final feature map as the initial depth feature of each scale, wherein the final feature map is expressed as: in the formula, As a result of the characteristic diagram which is to be obtained, Each position in (a) is all positions Adaptive aggregation of information, with aggregate weights being mapped by attention It is decided that the method comprises the steps of, To adjust the shape 。
6. The method for generic image restoration based on a deformable attention mechanism of claim 4, the method is characterized in that the step of executing the gating deep convolution feedforward network comprises the following steps: Initial depth features for each scale Performing layer normalization to obtain initial depth characteristics after layer normalization Expressed as: in the formula, Representation layer normalization operations; In the first path, the layer normalized initial depth features Channel expansion is performed through standard convolution, then local depth features are extracted through depth separable convolution, nonlinear activation is performed through GELU activation functions, and first path features are obtained and expressed as: in the formula, As a feature of the first path, The function is activated for the purpose of GELU, Is that The depth may be a function of the split convolution operation, For expanding channels in a first path Convolving; in the second path, the layer normalized initial depth features Channel expansion is performed through standard convolution, and then local depth features are extracted through depth separable convolution, so that second path features are obtained, and the second path features are expressed as follows: in the formula, As a feature of the second path it is, For expanding channels in the second path Convolving; characterizing the first path And a second path feature Multiplying by element, projecting the channel number back to original dimension by convolution, and combining with the initial depth feature of each dimension Residual connection is carried out, and the final deep features of each scale are obtained and expressed as: in the formula, For each scale of deep features, For dimension reduction of channels The operation of the convolution is performed, For element-by-element multiplication.
7. A method of generic image restoration based on a deformable attention mechanism as claimed in claim 1, wherein said step of deriving a fusion feature comprises: deep features with different scales are respectively 、 And For a pair of And Performing primary channel attention fusion to obtain initial fusion characteristics; for the initial fusion feature Performing a second channel attention fusion to obtain a final fusion feature, expressed as: in the formula, For the final fusion feature to be a feature, Is that The convolution operation is performed in such a way that, Is a channel attention fusion function.
8. The method for generic image restoration based on a deformable attention mechanism as recited in claim 7, wherein said channel attention fusion function The expression of (2) is: in the formula, Respectively represent And Or initial fusion features , As a function of the hyperbolic tangent, 、 Respectively the core sizes are 、 Deformable convolution operation at the time of And Set up at fusion At the initial fusion of features and Set up at fusion 。
9. A method of generic image restoration based on a deformable attention mechanism as recited in claim 1, wherein said step of obtaining a restored image comprises: And reconstructing a restored image for the fusion characteristic through hierarchical up-sampling and jump connection.
10. The general image restoration method based on a deformable attention mechanism according to claim 1, wherein a composite multi-scale loss function is adopted for optimization in the training process of the image restoration model, and the expression of the composite multi-scale loss function is as follows: in the formula, In order to compound the multi-scale loss, For the loss of the multi-scale Charbonnier, For the multi-scale edge loss, For the multi-scale frequency domain reconstruction loss, Is a weight coefficient; In the subscript Corresponding to different image resolution scales when In the time-course of which the first and second contact surfaces, And (3) with Respectively representing the original full-scale real image and the predicted image when When the value of the other value is taken, And (3) with Respectively represent the first A true image of the scale and a predicted image, In order to stabilize the constant of the device, In order for the laplace operator to be useful, Is a fourier transform operator.

Description

General image restoration method based on deformable attention mechanism Technical Field The invention relates to the field of computer vision and image processing, in particular to a general image restoration method based on a deformable attention mechanism. Background In real world complex scenes, the image acquisition process is often inevitably disturbed by bad weather or environmental factors, such as rain shielding, snowflake noise, haze scattering, and motion blur caused by camera shake. The degradation factors not only seriously reduce the visual quality and information integrity of the image, but also greatly influence the subsequent high-level visual tasks such as target detection, semantic segmentation, automatic driving perception, security monitoring and the like, so that the algorithm performance is obviously reduced. Therefore, how to recover a high-quality clear image from a degraded image has been a research hotspot and difficulty in the field of computer vision. Early image restoration methods relied primarily on a priori knowledge of the manual design, often constraining solution space based on a simplified physical model or statistical model. However, these methods often build on idealized assumptions, which make it difficult to accurately characterize complex and diverse image degradation patterns in the real world. When facing complex mixed degradation or not conforming to the scene of a priori assumption, the generalization capability of the traditional method is weak, and the restored image often has the problems of artifact or detail loss. With the rapid development of deep learning technology, a convolutional neural network-based method is gradually rising. The convolutional neural network obtains the performance exceeding the traditional method on the image recovery task due to the strong hierarchical feature extraction capability of convolutional operation. However, standard convolution operations have a fixed geometry and localized receptive fields, which makes the network less effective in dealing with degradations with spatial heterogeneity. Furthermore, due to the locality of the convolution kernel, convolutional neural networks have difficulty effectively capturing long-range dependencies in images, resulting in insufficient performance in recovering large damaged areas or requiring global context information. To overcome the limitations of convolutional neural networks in long-range modeling, researchers began to introduce a Transformer architecture into the image restoration task. The transducer can calculate the correlation between any two positions in the image by means of a Self-Attention mechanism (Self-Attention), so that the transducer has excellent global modeling capability. This advantage is also accompanied by the obvious problem that, on the one hand, standard transformers usually involve intensive attention calculations over the full map, whose computational complexity increases quadratically with the image resolution, resulting in high computational costs and difficulty in processing high resolution images, and, on the other hand, image degradation usually manifests itself as an irregular distribution of local areas around the pixels, transformers being too focused on global correlations, rather possibly neglecting the fine capture and processing of these local irregular degradation patterns. In summary, the main problem faced by the prior art is that the conventional convolutional neural network method is limited by the fixed geometry convolution and cannot flexibly adapt to the irregular degradation shape, while the transform method often loses the focusing capability on the local degradation region and has huge calculation cost although the transform method utilizes the full graph correlation. Therefore, there is a need for a feature encoding method that captures efficient context, adaptively deforms according to different degradation modes while focusing on local critical areas, and a mechanism that can effectively aggregate multi-scale features to preserve fine textures, thereby enabling efficient and versatile image restoration. Disclosure of Invention The invention aims to provide a general image restoration method based on a deformable attention mechanism, which realizes efficient image reconstruction with better restoration effect. The aim of the invention can be achieved by the following technical scheme: a general image restoration method based on a deformable attention mechanism, comprising the steps of: Obtaining a degraded image, inputting the degraded image into a pre-trained image restoration model for processing, and outputting a restoration image, wherein the image restoration model comprises a multi-scale image pyramid, a basic scale stage module, a deformable multi-scale feature aggregation mechanism module and an output reconstruction module, and the step of outputting the restoration image comprises the following steps: inputting the degr