US-20260127704-A1 - IMAGE ENHANCEMENT METHOD, ELECTRONIC DEVICE AND STORAGE MEDIUM

US20260127704A1US 20260127704 A1US20260127704 A1US 20260127704A1US-20260127704-A1

Abstract

An image enhancement method, an electronic device, and a storage medium are provided in the present disclosure. The method includes performing down-sampling at a plurality of levels on a first image to obtain a plurality of encoding features with different sampling parameters; performing up-sampling at a plurality of levels on an encoding feature, obtained from down-sampling at a last level, to obtain a plurality of decoding features with different sampling parameters; and obtaining a second image based on a decoding feature obtained from up-sampling at a last level, where a resolution of the second image is higher than a resolution of the first image. Up-sampling at each non-first level includes fusing a decoding feature, obtained from up-sampling at a previous level, with an encoding feature of a same sampling parameter to obtain a fused feature, and performing up-sampling on the fused feature.

Inventors

Yilan WANG

Assignees

SMARTER SILICON (SHANGHAI) TECHNOLOGIES CO., LTD.

Dates

Publication Date: 20260507
Application Date: 20251027
Priority Date: 20241104

Claims (20)

1 . An image enhancement method, comprising: performing down-sampling at a plurality of levels on a first image to obtain a plurality of encoding features with different sampling parameters; performing up-sampling at a plurality of levels on an encoding feature, obtained from down-sampling at a last level, to obtain a plurality of decoding features with different sampling parameters, wherein up-sampling at each non-first level includes fusing a decoding feature, obtained from up-sampling at a previous level, with an encoding feature of a same sampling parameter to obtain a fused feature, and performing up-sampling on the fused feature; and a process of obtaining a fused feature of at least one non-first level includes fusing a decoding feature, obtained from up-sampling at a previous level, with an encoding feature of a same sampling parameter according to a pre-learned fusion parameter; and obtaining a second image based on a decoding feature obtained from up-sampling at a last level, wherein a resolution of the second image is higher than a resolution of the first image.
2 . The method according to claim 1 , wherein fusing the decoding feature, obtained from up-sampling at the previous level, with the encoding feature of the same sampling parameter according to the pre-learned fusion parameter includes: based on the fusion parameter, the decoding feature obtained from up-sampling at the previous level, and the encoding feature of the same sampling parameter, calculating a first weight of the encoding feature having the same sampling parameter as the decoding feature obtained from up-sampling at the previous level; and performing weighted fusion on the decoding feature obtained from up-sampling at the previous level and the encoding feature of the same sampling parameter based on the first weight.
3 . The method according to claim 2 , wherein performing the weighted fusion on the decoding feature obtained from up-sampling at the previous level and the encoding feature of the same sampling parameter based on the first weight includes: obtaining a second weight of the decoding feature obtained from up-sampling at the previous level based on the first weight; and based on the first weight and the second weight, performing weighted calculation on the decoding feature obtained from up-sampling at the previous level and the encoding feature of the same sampling parameter.
4 . The method according to claim 2 , wherein performing the weighted fusion on the decoding feature obtained from up-sampling at the previous level and the encoding feature of the same sampling parameter based on the first weight includes: summing the decoding feature obtained from up-sampling at the previous level and the encoding feature of the same sampling parameter to obtain an initial fused feature; obtaining a second weight of the decoding feature obtained from up-sampling at the previous level based on the first weight; and based on the first weight and the second weight, performing weighted calculation on the initial fused feature and the decoding feature obtained from up-sampling at the previous level.
5 . The method according to claim 1 , wherein the process of performing down-sampling at the plurality of levels on the first image, performing up-sampling at the plurality of levels on the encoding feature obtained from down-sampling at the last level, and obtaining the second image based on the decoding feature obtained from up-sampling at the last level includes: performing down-sampling at the plurality of levels on the first image using a first encoding module of an enhancement network; performing up-sampling at the plurality of levels on the encoding feature obtained from down-sampling at the last level using a decoding module of the enhancement network, wherein up-sampling at each non-first level includes obtaining the fused feature by fusing the decoding feature, obtained from up-sampling at the previous level, with the encoding feature of the same sampling parameter using a fusion module of the enhancement network, and performing up-sampling on the fused feature; and the process of obtaining the fused feature of at least one non-first level includes, by the fusion module, fusing the decoding feature, obtained from up-sampling at the previous level, with the encoding feature of the same sampling parameter according to the pre-learned fusion parameter; and processing the decoding feature obtained from up-sampling at the last level to obtain the second image using an output module of the enhancement network.
6 . The method according to claim 5 , wherein the enhancement network is obtained by a training manner including: based on high-quality images in a first image set, performing unsupervised training on an initial network obtained from a second encoding module, the decoding module, and the output module to obtain a pre-trained network, wherein the first image set includes a plurality of low-quality images and high-quality images corresponding to all low-quality images; and the second encoding module is configured to perform down-sampling at a plurality of levels on a high-quality image inputted to obtain a plurality of encoding features with different sampling parameters; and based on the pre-trained network, performing supervised training on the first encoding module and the fusion module using the first image set to obtain a trained first encoding module and a trained fusion module, wherein the trained first encoding module, the trained fusion module, and the decoding module and the output module in the pre-trained network form the enhanced network.
7 . The method according to claim 6 , wherein based on the pre-trained network, performing supervised training on the first encoding module using the first image set includes: for any low-quality image in the first image set, inputting any low-quality image into the first encoding module to obtain a plurality of encoding features with different sampling parameters of any low-quality image; inputting a high-quality image corresponding to any low-quality image into the second encoding module in the pre-trained network to obtain a plurality of encoding features with different sampling parameters of the high-quality image corresponding to any low-quality image; and updating a parameter of the first encoding module with a goal of minimizing a first difference between an encoding feature of any low-quality image and an encoding feature which is of the high-quality image corresponding to any low-quality image and has a same sampling parameter as the encoding feature of any low-quality image.
8 . An electronic device, comprising: a memory, configured to store a computer program; and one or more processors, configured to, when the computer program is executed, perform: performing down-sampling at a plurality of levels on a first image to obtain a plurality of encoding features with different sampling parameters; performing up-sampling at a plurality of levels on an encoding feature, obtained from down-sampling at a last level, to obtain a plurality of decoding features with different sampling parameters, wherein up-sampling at each non-first level includes fusing a decoding feature, obtained from up-sampling at a previous level, with an encoding feature of a same sampling parameter to obtain a fused feature, and performing up-sampling on the fused feature; and a process of obtaining a fused feature of at least one non-first level includes fusing a decoding feature, obtained from up-sampling at a previous level, with an encoding feature of a same sampling parameter according to a pre-learned fusion parameter; and obtaining a second image based on a decoding feature obtained from up-sampling at a last level, wherein a resolution of the second image is higher than a resolution of the first image.
9 . The electronic device according to claim 8 , wherein for fusing the decoding feature, obtained from up-sampling at the previous level, with the encoding feature of the same sampling parameter according to the pre-learned fusion parameter, the one or more processors are further configured to perform: based on the fusion parameter, the decoding feature obtained from up-sampling at the previous level, and the encoding feature of the same sampling parameter, calculating a first weight of the encoding feature having the same sampling parameter as the decoding feature obtained from up-sampling at the previous level; and performing weighted fusion on the decoding feature obtained from up-sampling at the previous level and the encoding feature of the same sampling parameter based on the first weight.
10 . The electronic device according to claim 9 , wherein for performing the weighted fusion on the decoding feature obtained from up-sampling at the previous level and the encoding feature of the same sampling parameter based on the first weight, the one or more processors are further configured to perform: obtaining a second weight of the decoding feature obtained from up-sampling at the previous level based on the first weight; and based on the first weight and the second weight, performing weighted calculation on the decoding feature obtained from up-sampling at the previous level and the encoding feature of the same sampling parameter.
11 . The electronic device according to claim 9 , wherein for performing the weighted fusion on the decoding feature obtained from up-sampling at the previous level and the encoding feature of the same sampling parameter based on the first weight, the one or more processors are further configured to perform: summing the decoding feature obtained from up-sampling at the previous level and the encoding feature of the same sampling parameter to obtain an initial fused feature; obtaining a second weight of the decoding feature obtained from up-sampling at the previous level based on the first weight; and based on the first weight and the second weight, performing weighted calculation on the initial fused feature and the decoding feature obtained from up-sampling at the previous level.
12 . The electronic device according to claim 8 , wherein for the process of performing down-sampling at the plurality of levels on the first image, performing up-sampling at the plurality of levels on the encoding feature obtained from down-sampling at the last level, and obtaining the second image based on the decoding feature obtained from up-sampling at the last level, the one or more processors are further configured to perform: performing down-sampling at the plurality of levels on the first image using a first encoding module of an enhancement network; performing up-sampling at the plurality of levels on the encoding feature obtained from down-sampling at the last level using a decoding module of the enhancement network, wherein up-sampling at each non-first level includes obtaining the fused feature by fusing the decoding feature, obtained from up-sampling at the previous level, with the encoding feature of the same sampling parameter using a fusion module of the enhancement network, and performing up-sampling on the fused feature; and the process of obtaining the fused feature of at least one non-first level includes, by the fusion module, fusing the decoding feature, obtained from up-sampling at the previous level, with the encoding feature of the same sampling parameter according to the pre-learned fusion parameter; and processing the decoding feature obtained from up-sampling at the last level to obtain the second image using an output module of the enhancement network.
13 . The electronic device according to claim 12 , wherein the enhancement network is obtained by a training manner including: based on high-quality images in a first image set, performing unsupervised training on an initial network obtained from a second encoding module, the decoding module, and the output module to obtain a pre-trained network, wherein the first image set includes a plurality of low-quality images and high-quality images corresponding to all low-quality images; and the second encoding module is configured to perform down-sampling at a plurality of levels on a high-quality image inputted to obtain a plurality of encoding features with different sampling parameters; and based on the pre-trained network, performing supervised training on the first encoding module and the fusion module using the first image set to obtain a trained first encoding module and a trained fusion module, wherein the trained first encoding module, the trained fusion module, and the decoding module and the output module in the pre-trained network form the enhanced network.
14 . The electronic device according to claim 13 , wherein for based on the pre-trained network, performing supervised training on the first encoding module using the first image set, the one or more processors are further configured to perform: for any low-quality image in the first image set, inputting any low-quality image into the first encoding module to obtain a plurality of encoding features with different sampling parameters of any low-quality image; inputting a high-quality image corresponding to any low-quality image into the second encoding module in the pre-trained network to obtain a plurality of encoding features with different sampling parameters of the high-quality image corresponding to any low-quality image; and updating a parameter of the first encoding module with a goal of minimizing a first difference between an encoding feature of any low-quality image and an encoding feature which is of the high-quality image corresponding to any low-quality image and has a same sampling parameter as the encoding feature of any low-quality image.
15 . A non-transitory computer-readable storage medium containing a computer program that, when being executed, causes one or more processors to perform: performing down-sampling at a plurality of levels on a first image to obtain a plurality of encoding features with different sampling parameters; performing up-sampling at a plurality of levels on an encoding feature, obtained from down-sampling at a last level, to obtain a plurality of decoding features with different sampling parameters, wherein up-sampling at each non-first level includes fusing a decoding feature, obtained from up-sampling at a previous level, with an encoding feature of a same sampling parameter to obtain a fused feature, and performing up-sampling on the fused feature; and a process of obtaining a fused feature of at least one non-first level includes fusing a decoding feature, obtained from up-sampling at a previous level, with an encoding feature of a same sampling parameter according to a pre-learned fusion parameter; and obtaining a second image based on a decoding feature obtained from up-sampling at a last level, wherein a resolution of the second image is higher than a resolution of the first image.
16 . The storage medium according to claim 15 , wherein for fusing the decoding feature, obtained from up-sampling at the previous level, with the encoding feature of the same sampling parameter according to the pre-learned fusion parameter, the one or more processors are further configured to perform: based on the fusion parameter, the decoding feature obtained from up-sampling at the previous level, and the encoding feature of the same sampling parameter, calculating a first weight of the encoding feature having the same sampling parameter as the decoding feature obtained from up-sampling at the previous level; and performing weighted fusion on the decoding feature obtained from up-sampling at the previous level and the encoding feature of the same sampling parameter based on the first weight.
17 . The storage medium according to claim 16 , wherein for performing the weighted fusion on the decoding feature obtained from up-sampling at the previous level and the encoding feature of the same sampling parameter based on the first weight, the one or more processors are further configured to perform: obtaining a second weight of the decoding feature obtained from up-sampling at the previous level based on the first weight; and based on the first weight and the second weight, performing weighted calculation on the decoding feature obtained from up-sampling at the previous level and the encoding feature of the same sampling parameter.
18 . The storage medium according to claim 16 , wherein for performing the weighted fusion on the decoding feature obtained from up-sampling at the previous level and the encoding feature of the same sampling parameter based on the first weight, the one or more processors are further configured to perform: summing the decoding feature obtained from up-sampling at the previous level and the encoding feature of the same sampling parameter to obtain an initial fused feature; obtaining a second weight of the decoding feature obtained from up-sampling at the previous level based on the first weight; and based on the first weight and the second weight, performing weighted calculation on the initial fused feature and the decoding feature obtained from up-sampling at the previous level.
19 . The storage medium according to claim 15 , wherein for the process of performing down-sampling at the plurality of levels on the first image, performing up-sampling at the plurality of levels on the encoding feature obtained from down-sampling at the last level, and obtaining the second image based on the decoding feature obtained from up-sampling at the last level, the one or more processors are further configured to perform: performing down-sampling at the plurality of levels on the first image using a first encoding module of an enhancement network; performing up-sampling at the plurality of levels on the encoding feature obtained from down-sampling at the last level using a decoding module of the enhancement network, wherein up-sampling at each non-first level includes obtaining the fused feature by fusing the decoding feature, obtained from up-sampling at the previous level, with the encoding feature of the same sampling parameter using a fusion module of the enhancement network, and performing up-sampling on the fused feature; and the process of obtaining the fused feature of at least one non-first level includes, by the fusion module, fusing the decoding feature, obtained from up-sampling at the previous level, with the encoding feature of the same sampling parameter according to the pre-learned fusion parameter; and processing the decoding feature obtained from up-sampling at the last level to obtain the second image using an output module of the enhancement network.
20 . The storage medium according to claim 19 , wherein the enhancement network is obtained by a training manner including: based on high-quality images in a first image set, performing unsupervised training on an initial network obtained from a second encoding module, the decoding module, and the output module to obtain a pre-trained network, wherein the first image set includes a plurality of low-quality images and high-quality images corresponding to all low-quality images; and the second encoding module is configured to perform down-sampling at a plurality of levels on a high-quality image inputted to obtain a plurality of encoding features with different sampling parameters; and based on the pre-trained network, performing supervised training on the first encoding module and the fusion module using the first image set to obtain a trained first encoding module and a trained fusion module, wherein the trained first encoding module, the trained fusion module, and the decoding module and the output module in the pre-trained network form the enhanced network.

Description

CROSS-REFERENCE TO RELATED APPLICATION The present disclosure claims the priority of Chinese Patent Application No. 202411563375.9, filed on Nov. 4, 2024, the content of which is incorporated herein by reference in its entirety. TECHNICAL FIELD The present disclosure generally relates to the field of image processing technology, and, more particularly, relates to an image enhancement method, an image enhancement device, an image enhancement model and a training method thereof, and an electronic device. BACKGROUND Image enhancement is a method for improving image visual quality. The primary purpose of image enhancement is to improve the visual quality and image resolution (i.e., clarity). Current image enhancement solutions may use a classic U-shaped network to enhance low-quality original images to high-quality images. However, the image enhancement performance of the classic U-shaped network may be poor. To improve the image enhancement performance of the U-shaped network, a low-resolution feature search module may be added between an encoding module and a decoding module of the U-shaped network to extract highly discriminative features needed for enhancement. Furthermore, the decoding module may decode such highly discriminative features to produce high-quality images. The addition of the low-resolution feature search module may improve image enhancement performance. However, the scale of the low-resolution feature search module may be relatively large, which may significantly increase computational complexity of entire image enhancement network, and result in slow network speed and difficult network deployment. SUMMARY One aspect of the present disclosure provides an image enhancement method. The image enhancement method includes performing down-sampling at a plurality of levels on a first image to obtain a plurality of encoding features with different sampling parameters; performing up-sampling at a plurality of levels on an encoding feature, obtained from down-sampling at a last level, to obtain a plurality of decoding features with different sampling parameters; and obtaining a second image based on a decoding feature obtained from up-sampling at a last level, where a resolution of the second image is higher than a resolution of the first image. Up-sampling at each non-first level includes fusing a decoding feature, obtained from up-sampling at a previous level, with an encoding feature of a same sampling parameter to obtain a fused feature, and performing up-sampling on the fused feature; and a process of obtaining a fused feature of at least one non-first level includes fusing a decoding feature, obtained from up-sampling at a previous level, with an encoding feature of a same sampling parameter according to a pre-learned fusion parameter. Another aspect of the present disclosure provides an electronic device. The electronic device includes a memory, configured to store a computer program; and one or more processors, configured to, when the computer program is executed, perform an image enhancement method. The image enhancement method includes performing down-sampling at a plurality of levels on a first image to obtain a plurality of encoding features with different sampling parameters; performing up-sampling at a plurality of levels on an encoding feature, obtained from down-sampling at a last level, to obtain a plurality of decoding features with different sampling parameters; and obtaining a second image based on a decoding feature obtained from up-sampling at a last level, where a resolution of the second image is higher than a resolution of the first image. Up-sampling at each non-first level includes fusing a decoding feature, obtained from up-sampling at a previous level, with an encoding feature of a same sampling parameter to obtain a fused feature, and performing up-sampling on the fused feature; and a process of obtaining a fused feature of at least one non-first level includes fusing a decoding feature, obtained from up-sampling at a previous level, with an encoding feature of a same sampling parameter according to a pre-learned fusion parameter. Another aspect of the present disclosure provides a non-transitory computer-readable storage medium containing a computer program that, when being executed, causes one or more processors to perform an image enhancement method. The image enhancement method includes performing down-sampling at a plurality of levels on a first image to obtain a plurality of encoding features with different sampling parameters; performing up-sampling at a plurality of levels on an encoding feature, obtained from down-sampling at a last level, to obtain a plurality of decoding features with different sampling parameters; and obtaining a second image based on a decoding feature obtained from up-sampling at a last level, where a resolution of the second image is higher than a resolution of the first image. Up-sampling at each non-first level includes fusing a decoding