CN-121998874-A - Image recovery method and system

CN121998874ACN 121998874 ACN121998874 ACN 121998874ACN-121998874-A

Abstract

The invention relates to the field of image processing, in particular to an image restoration method and system, wherein the method comprises the steps of inputting a degraded image into an image restoration network; extracting channel information, spatial information and frequency domain information of the degraded image from three dimensions of an image channel, space and frequency respectively through an image restoration network, restoring the degraded image by using the extracted channel information, spatial information and frequency domain information, the method and the device for restoring the images output the restored images based on the idea of three-domain division control can restore texture details in a multi-complex degradation scene while removing degradation, and can reduce the computation complexity of the model.

Inventors

CHENG PENG
Gui Youqiang

Assignees

四川大学

Dates

Publication Date: 20260508
Application Date: 20260410

Claims (9)

1. An image restoration method, comprising: Inputting the degraded image into an image restoration network; Repairing the degraded image from three dimensions of an image channel, space and frequency through the image recovery network respectively, and outputting the recovered image; the image restoration network learns the hierarchical representation through an encoder-decoder architecture; In the encoding stage, supplementing the original features from the input by a skip connection, the skip connection including average pooling, pixel-by-pixel convolution, and depth-by-depth convolution; In the decoding stage, the encoder characteristics are connected with a decoder through jump connection to carry out auxiliary reconstruction; The first two layers of the encoder-decoder framework are three-domain blocks, the second two layers are two-domain blocks, the three-domain blocks are used for extracting channel information, space information and frequency domain information and capturing textures and details, and the two-domain blocks are used for extracting semantic information and recovering low-frequency information of images.
2. The image restoration method according to claim 1, wherein the process of extracting channel information includes: generating a query, key and value projection of the input image by a1 x 1 point-wise convolution and a3 x 3 depth-wise convolution; And remodelling the query and the key, and generating an attention map as channel information through dot products of the query and the key.
3. The image restoration method according to claim 1, wherein the process of extracting the spatial information includes: inputting an input image into an upper branch, sequentially carrying out 1×1 convolution, and extracting an upper branch characteristic by 3×3 depth convolution; Inputting an input image into a lower branch, and extracting lower branch characteristics by 1×1 convolution, 3×3 depth convolution and GELU activation functions in sequence; The upper branch and the lower branch are multiplied element by element and the channel number is recovered through 1X 1 convolution to obtain the space information.
4. The image restoration method according to claim 1, wherein the process of extracting the frequency domain information includes: Frequency separation, namely aligning an input image with intermediate features, calculating two-dimensional wavelet coefficients by using forward discrete wavelet transformation, performing zero filling on the obtained two-dimensional wavelet coefficients, and performing discrete inverse wavelet transformation to separate low-frequency information and high-frequency information, wherein the intermediate features are features comprising channel information and space information; frequency mining, namely acquiring low-frequency mining features and high-frequency mining features from the intermediate features by adopting a transposed channel attention mechanism based on low-frequency information and high-frequency information; And performing interactive integration, namely performing interactive refinement on the low-frequency mining feature and the high-frequency mining feature to obtain a low-frequency refinement feature and a high-frequency refinement feature, using 1 multiplied by 1 convolution to aggregate the low-frequency refinement feature and the high-frequency refinement feature to generate a global frequency feature, and fusing the global frequency feature into the intermediate feature by using a transposed channel attention mechanism to obtain frequency domain information.
5. The image restoration method according to claim 4, wherein the process of refining the low frequency mining feature includes: respectively carrying out global maximum pooling and global average pooling on the high-frequency mining features to generate two single-channel space feature graphs, and connecting the two single-channel space feature graphs along the channel dimension; adjusting the number of channels and refining the characteristics through 7×7 convolution, and generating a spatial attention diagram through a sigmoid function; And performing matrix point multiplication on the spatial attention map and the low-frequency mining features to generate low-frequency refinement features.
6. The image restoration method according to claim 4, wherein the process of refining the high-frequency mining feature includes: Respectively carrying out self-adaptive average pooling and self-adaptive maximum pooling on the low-frequency mining characteristics along the space dimension; sequentially carrying out two 1×1 convolutions on the self-adaptive average pooled features to obtain upper branch features; Sequentially carrying out two 1×1 convolutions on the self-adaptive maximally pooled features to obtain lower branch features; After adding the upper branch feature and the lower branch feature, generating a channel attention map through a sigmoid function; and carrying out matrix point multiplication on the channel attention map and the high-frequency mining feature to generate a high-frequency refinement feature.
7. The image restoration method according to claim 1, wherein the image restoration network is optimized using L1 loss and SSIM loss as pixel fidelity loss.
8. An image restoration system for performing the method of any of claims 1-7, comprising: The image recovery unit is stored with an image recovery network and comprises a channel attention module, a space perception module and a frequency learning module; The channel attention module is used for extracting channel information of the input image from the channel dimension; The space perception module is used for extracting space information of an input image from a space dimension; The frequency learning module is used for extracting the high-frequency information and the low-frequency information of the input image and integrating the high-frequency information and the low-frequency information into frequency domain information.
9. The image restoration system according to claim 8, wherein the encoder-decoder architecture of the image restoration network is characterized in that the first two layers of the encoder and the decoder are three-domain blocks and the second two layers are two-domain blocks; The three-domain block comprises a transposed channel attention module, a space perception module and a frequency learning module, and is used for extracting characteristics and capturing textures and details; The dual domain block includes a transposed channel attention and spatial perception module for extracting semantic information.

Description

Image recovery method and system Technical Field The present invention relates to the field of image processing, and in particular, to an image restoration method and system. Background In recent years, deep neural networks based on data driving have been rapidly developed in the field of image restoration, and many smart functional modules based on Convolutional Neural Networks (CNN) and transformers have been designed to obtain high-quality output predictions, for example: 1. the channel attention block and the pixel attention block are designed by convolution, and the different layers of characteristic information are adaptively learned. To capture remote dependencies, many students customize a model based on a Transformer framework for image restoration tasks; 2. In order to improve the computational efficiency of the transducer model, it is proposed to apply self-attention in the channel dimension. The above methods operate mainly in the spatial and channel domains, although they improve the visual quality of image restoration to some extent, the degradation in real scenes has complex distribution characteristics, while edge information plays a key role in restoring the texture details of sharp images. However, the foregoing transform operation is inherently insensitive to local texture details, whereas the layer-by-layer convolution operation tends to ignore edge points, making it difficult to guarantee repair quality. Image restoration requires not only removal of degradation, but also restoration of fine detail and texture. The low frequency components of the image retain more structural information such as color and brightness, while the high frequency components represent finer details in the image such as edges and texture. Therefore, the frequency information is critical to restoring the structure and texture of the image. Recently, some studies explore image restoration in the frequency domain. However, the lack of frequency separation prevents targeted processing of different frequency band features, thereby increasing the complexity of feature learning while reducing the interpretability of the method. Although the high and low frequency information is processed separately, no interactive complementary enhancement is achieved between these frequency components. In view of the fact that the degree of dependence of different types of degradation on high-frequency and low-frequency information is different, in a defogging task, more attention is paid to recovering the low-frequency information, and in a denoising task, more attention is paid to enhancing high-frequency details, so that the relative importance of high-frequency and low-frequency characteristics is required to be adjusted through interactive regulation and control, and various complex degradation scenes are adapted. It has also been observed that many methods operate in the deep layer of the U-net network also in the frequency domain, where frequency information is almost lost due to the already quite small feature map size, and excessive batch processing can affect model performance while increasing computational complexity. Disclosure of Invention The invention aims to overcome the problems of insufficient image restoration quality, insufficient sensitivity to local texture details and serious frequency information loss in the prior art, and provides an image restoration method and system. In a first aspect, the present invention provides an image restoration method, including: Inputting the degraded image into an image restoration network; Repairing the degraded image from three dimensions of an image channel, space and frequency through the image recovery network respectively, and outputting the recovered image; the image restoration network learns the hierarchical representation through an encoder-decoder architecture; In the encoding stage, supplementing the original features from the input by a skip connection, the skip connection including average pooling, pixel-by-pixel convolution, and depth-by-depth convolution; In the decoding stage, the encoder characteristics are connected with a decoder through jump connection to carry out auxiliary reconstruction; The first two layers of the encoder-decoder framework are three-domain blocks, the second two layers are two-domain blocks, the three-domain blocks are used for extracting channel information, space information and frequency domain information and capturing textures and details, and the two-domain blocks are used for extracting semantic information and recovering low-frequency information of images. Based on the thought of dividing and controlling, the space, channel and frequency domain information of the input image is comprehensively extracted from three dimensions of the channel, the space and the frequency, so that the texture details in the multi-complex degradation scene can be recovered while degradation is removed, and the calculation complexity of the model can