US-12620056-B2 - Methods and systems for wavelet domain-based normalizing flow super-resolution image reconstruction

US12620056B2US 12620056 B2US12620056 B2US 12620056B2US-12620056-B2

Abstract

The present disclosure discloses a method and a system for wavelet domain-based normalizing flow super-resolution image reconstruction. The method includes constructing a training set and a normalizing flow model, wherein the normalizing flow model includes a plurality of levels, each of the plurality of levels including a squeeze layer, two types of conditional mapping layers, a split layer, an activation standard layer, and a quick response (QR) layer; determining a stable normalizing flow model through a wavelet transform, a reconstructed QR layer, and a T-distribution based on the normalizing flow model; determining a wavelet domain-based normalizing flow super-resolution model by adding a refinement layer based on the stable normalizing flow model; training the wavelet domain-based normalizing flow super-resolution model based on the training set; and reconstructing a super-resolution image based on a trained normalizing flow super-resolution model.

Inventors

Bailin YANG
Shaobang LI
Chao Song

Assignees

ZHEJIANG GONGSHANG UNIVERSITY

Dates

Publication Date: 20260505
Application Date: 20230615
Priority Date: 20220812

Claims (19)

1 . A method for wavelet domain-based normalizing flow super-resolution image reconstruction implemented by a processor, the method comprising: constructing a training set and a normalizing flow model, wherein the normalizing flow model includes a plurality of levels, each of the plurality of levels including a squeeze layer, two types of conditional mapping layers, a split layer, an activation standard layer, and a quick response (QR) layer; determining a stable normalizing flow model through a wavelet transform, the QR layer constructed through a principle of orthogonal triangular QR decomposition, and a T-distribution based on the normalizing flow model; determining a wavelet domain-based normalizing flow super-resolution model by adding a refinement layer based on the stable normalizing flow model, wherein during a training process of the wavelet domain-based normalizing flow super-resolution model, a standard deviation of the T-distribution is dynamically determined based on at least one of a current count of iteration rounds or at least one training sample; training the wavelet domain-based normalizing flow super-resolution model based on the training set; and reconstructing a super-resolution image based on a trained normalizing flow super-resolution model.
2 . The method according to claim 1 , further comprising: S1: collecting a first image data set and a second image data set based on a deep learning super-resolution task, the first image data set being separated into a first component of the training set, a validation set, and a test set; and obtaining a third image data set by merging the first component of the training set and the second image data set, randomly cutting each image pair of the third data set into a same size, and constructing the training set; S2: constructing the normalizing flow model; S3: adding the wavelet transform to the normalizing flow model, transforming a distribution of information to be learned into the wavelet domain, and obtaining low-frequency information, diagonal detail information, horizontal detail information, and vertical detail information of the information to be learned; S4: replacing a normal distribution with the T-distribution, constructing the QR layer through a principle of orthogonal triangular QR decomposition, and obtaining the stable normalizing flow model by adding the T-distribution and the QR layer to the normalizing flow model; S5: adding the refinement layer before the stable normalizing flow model, further refining conditional features provided by an encoder, and obtaining the wavelet domain-based normalizing flow super-resolution model; and S6: using the training set to train the wavelet domain-based normalizing flow super-resolution model, and inputting low-resolution images in the test set into the trained normalizing flow super-resolution model, including: inputting the low-resolution images in the test set into the encoder to obtain the conditional features; obtaining refined features by refining the conditional features through the refinement layer, sampling latent feature variables from a simple distribution, and inputting the latent feature variables to the trained normalizing flow super-resolution model; and injecting the refined features into a corresponding conditional mapping layer to obtain a high-quality super-resolution image under a conditional feature distribution.
3 . The method according to claim 2 , wherein in S2, the squeeze layer enlarges a channel dimension of a feature to four times of an original size and compresses a length dimension and a width dimension to one-half of the original size, the squeeze layer being reversible; the two types of conditional mapping layers include a self-conditional mapping layer and an other feature-conditional mapping layer, which are used to enhance mapping learning abilities of the normalizing flow model based on a conditional feature distribution; the split layer is reversible for processing the channel dimension of the feature, allowing half of the channel dimension of the feature to continue to let the normalizing flow model learn a mapping relationship and another half of the channel dimension of the feature to obey the T-distribution; the QR layer is a network layer for improving mapping abilities of the normalizing flow model and configured to exchange information on the channel dimension of the feature; and the activation standard layer is configured to implement an activated mapping transformation using a scale and a deviation parameter of each channel and initializing the scale and the deviation parameter.
4 . The method according to claim 2 , wherein the normalizing flow model in S2 is a reversible model, and the each level is designed based on a Jacobian matrix.
5 . The method according to claim 2 , wherein a first level of the normalizing flow model described in S2 does not have the squeeze layer.
6 . The method according to claim 2 , wherein the refinement layer in S5 includes a plurality of attention modules, and every two conditional mapping layers correspond to an independent attention module in the refinement layer.
7 . The method according to claim 6 , wherein the attention modules include a channel attention module and a spatial attention module.
8 . The method according to claim 1 , wherein any one of split layers in the normalizing flow model configured to: through a current split layer, split data into a first part of data with a first dimension value and a second part of data with a second dimension value, wherein the first part of data is learned through a data mapping relationship, the second part of data is regularized as the T-distribution, and the first dimension value and the second dimension value are determined based on a depth of the current split layer.
9 . The method according to claim 8 , wherein the first dimension value and the second dimension value are related to an image complexity of the training set, and the image complexity includes at least one of a color complexity, a texture complexity, and a shape complexity.
10 . The method according to claim 1 , wherein in each iteration round of the training process, determining the standard deviation of the T-distribution includes: determining a standard deviation of a T-distribution of a current iteration round based on a depth texture feature of at least one training sample of the current iteration round.
11 . The method according to claim 10 , wherein determining the depth texture feature includes: determining sample information of the at least one training sample by performing the wavelet transform on the at least one training sample, wherein the sample information includes at least one of sample low-frequency information, sample diagonal detail information, sample horizontal detail information, and sample vertical detail information; and determining the depth texture feature through a texture evaluation model based on the sample information, respectively, wherein the texture evaluation model is a machine learning model.
12 . A system for wavelet domain-based normalizing flow super-resolution image reconstruction, wherein the system comprises: a building module configured for constructing a training set and a normalizing flow model, wherein the normalizing flow model includes a plurality of levels, each of the plurality of levels including a squeeze layer, two types of conditional mapping layers, a split layer, an activation standard layer, and a quick response (QR) layer; a first determination module configured to determine a stable normalizing flow model through a wavelet transform, the QR layer constructed through a principle of orthogonal triangular QR decomposition, and a T-distribution based on the normalizing flow model; a second determination module configured to determine a wavelet domain-based normalizing flow super-resolution model by adding a refinement layer based on the stable normalizing flow model, wherein during a training process of the wavelet domain-based normalizing flow super-resolution model, a standard deviation of the T-distribution is dynamically determined based on at least one of a current count of iteration rounds or at least one training sample; a training module configured to train the wavelet domain-based normalizing flow super-resolution model based on the training set; and a reconstruction module configured to reconstruct a super-resolution image based on a trained normalizing flow super-resolution model.
13 . The system according to claim 12 , wherein the building module is further configured to: collect a first image data set and a second image data set based on a deep learning super-resolution task, the first image data set being separated into a first component of the training set, a validation set, and a test set; and obtain a third image data set by merging the first component of the training set and the second image data set, randomly cutting each image pair of the third data set into a same size, and construct the training set; and construct the normalizing flow model; the first determination module is further configured to: add the wavelet transform to the normalizing flow model, transform a distribution of information to be learned into the wavelet domain, and obtain low-frequency information, diagonal detail information, horizontal detail information, and vertical detail information of the information to be learned; and replace a normal distribution with the T-distribution, construct the QR layer through a principle of orthogonal triangular QR decomposition, and obtain the stable normalizing flow model by adding the T-distribution and the QR layer to the normalizing flow model; the second determination module is further configured to: add the refinement layer before the stable normalizing flow model, further refine conditional features provided by an encoder, and obtain the wavelet domain-based normalizing flow super-resolution model; and the training module is further configured to: use the training set to train the wavelet domain-based normalizing flow super-resolution model, and input low-resolution images in the test set into the trained normalizing flow super-resolution model, including: inputting the low-resolution images in the test set into the encoder to obtain the conditional features; obtaining refined features by refining through the refinement layer, sampling latent feature variables from a simple distribution, and inputting the latent feature variables to the trained normalizing flow super-resolution model; and injecting the refined features into a corresponding conditional mapping layer to obtain a high-quality super-resolution image under a conditional feature distribution.
14 . The system according to claim 13 , wherein the squeeze layer enlarges a channel dimension of a feature to four times of an original size and compresses a length dimension and a width dimension to one-half of the original size, the squeeze layer being reversible; the two types of conditional mapping layers include a self-conditional mapping layer and an other feature-conditional mapping layer, which are used to enhance mapping learning abilities of the normalizing flow model based on a conditional feature distribution; the split layer is reversible for processing the channel dimension of the feature, allowing half of the channel dimension of the feature to continue to let the normalizing flow model learn a mapping relationship and another half of the channel dimension of the feature to obey the T-distribution; the QR layer is a network layer for improving mapping abilities of the normalizing flow model and configured to exchange information on the channel dimension of the feature; and the activation standard layer is configured to implement an activated mapping transformation using a scale and a deviation parameter of each channel and initializing the scale and the deviation parameter.
15 . The system according to claim 13 , wherein the normalizing flow model is a reversible model, and the each level is designed based on a Jacobian matrix.
16 . The system according to claim 13 , wherein a first level of the normalizing flow model does not have the squeeze layer.
17 . The system according to claim 12 , wherein the refinement layer includes a plurality of attention modules, and every two conditional mapping layers correspond to an independent attention module in the refinement layer.
18 . The system according to claim 17 , wherein the attention modules include a channel attention module and a spatial attention module.
19 . A non-transitory computer-readable storage medium, wherein the storage medium stores computer instructions, and when reading the computer instructions in the storage medium, a computer implements the method according to claim 1 .

Description

CROSS-REFERENCE TO RELATED APPLICATIONS This application claims the priority of the Chinese Patent Application No. 202210969698.2, filed on Aug. 12, 2022, the entire contents of which are incorporated herein by reference. TECHNICAL FIELD The present disclosure belongs to the technical field of computer image processing, and in particular, to a method and a system for wavelet domain-based normalizing flow super-resolution image reconstruction. BACKGROUND Due to an influence of external environment or collecting equipment, images often present problems such as low resolutions and a loss of details. With an increase in users' visual experience and application requirements, it is critical to process low-resolution images. Image super-resolution reconstruction algorithms may be broadly classified into three approaches based on different principles. The three approaches include an interpolation-based approach, a degradation model-based approach, and a learning-based approach. Representative algorithms of the interpolation-based approach mainly include nearest neighbor interpolation, bilinear interpolation, and bicubic interpolation. The degradation model-based approach starts from a degradation model of an image, assuming that a low-resolution image is obtained after a super-resolution image has undergone an appropriate motion transformation, a blur, and noise. This approach constrains the degradation of the super-resolution image by extracting key information from the low-resolution image and combining it with prior knowledge of an unknown super-resolution image. Common techniques of the degradation model-based approach include an iterative inverse projection technique, a convex set projection technique, a maximum posterior probability technique, etc. With the continuous development of deep learning and its wide application in the field of computer vision, researchers have designed many deep learning-based models to solve the problem. However, image super-resolution reconstruction is a morbid problem because a real low-resolution image may correspond to a plurality of high-resolution images, it cannot be determined that a generated super-resolution image must match the real super-resolution. However, most of the deep learning models at present are deterministically mapped, and the fixed parameters in the network model make a low-resolution image correspond to only one high-resolution image, and a part of the super-resolution images generated by the deep learning models at present are not satisfactory. There are two types of models currently including a model based on a peak signal-to-noise ratio (PSNR) and a model based on visual perception. In recent years, the normalizing flow model has attracted extensive attention due to its strong generative ability, and thus, it is also used in the field of super-resolution. The normalizing flow model can learn an accurate mapping from complex distributions to simple distributions. Due to the peculiarity of the normalizing flow model, when generating an image, it requires to sample from a simple distribution (e.g., Gaussian distribution), which makes it possible to generate a plurality of super-resolution images with a similar subject but different details in some parts from a same low-resolution image, alleviating the morbid problem of super-resolution reconstruction to a certain extent. However, the super-resolution images generated by this normalizing flow model may also be unsatisfactory, and the normalizing flow model is also not particularly stable during training. Therefore, there is an urgent need to provide a method and a system for wavelet domain-based normalizing flow super-resolution image reconstruction, which can reconstruct super-resolution images stably and efficiently. SUMMARY The present disclosure provides a method for wavelet domain-based normalizing flow super-resolution image reconstruction. The method uses information obtained in the wavelet domain combined with a powerful generative model, i.e., a normalizing flow model, to achieve the reconstruction of high-quality super-resolution images. At the same time, the method proposes a solution that may solve the instability of training the normalizing flow model to a certain extent. One or more embodiments of the present disclosure provide a method for wavelet domain-based normalizing flow super-resolution image reconstruction implemented by a processor. The method includes: constructing a training set and a normalizing flow model, wherein the normalizing flow model includes a plurality of levels, each of the plurality of levels including a squeeze layer, two types of conditional mapping layers, a split layer, an activation standard layer, and a quick response (QR) layer; determining a stable normalizing flow model through a wavelet transform, a reconstructed QR layer, and a T-distribution based on the normalizing flow model; determining a wavelet domain-based normalizing flow super-resolution model by a