US-12620068-B2 - Device and method for texture-aware self-supervised blind denoising using self-residual learning

US12620068B2US 12620068 B2US12620068 B2US 12620068B2US-12620068-B2

Abstract

The present invention relates to a blind denoising device including: a receiving unit for receiving an original noisy image; a Pixel-shuffle Downsampling (PD) unit for performing PD for the original noisy image to produce at least one or more downsampled images; a predicted image producing unit for eliminating the noise from the original noisy image and the downsampled images to produce at least one or more predicted images restored to the shape of the original noisy image; and a learning unit for performing optimized learning for the predicted image producing unit, based on at least one or more self-supervised losses of the predicted images and the original noisy image.

Inventors

WonKi Jeong
Kanggeun LEE

Assignees

KOREA UNIVERSITY RESEARCH AND BUSINESS FOUNDATION

Dates

Publication Date: 20260505
Application Date: 20230925
Priority Date: 20221214

Claims (14)

1 . A blind denoising device comprising: a receiver for receiving an original noisy image; a Pixel-shuffle Downsampler (PD) for performing PD for the original noisy image to produce at least one or more downsampled images; a predicted image producer for eliminating the noise from the original noisy image and the downsampled images to produce at least one or more predicted images restored to the shape of the original noisy image; and a learner for performing optimized learning for the predicted image producer, based on at least one or more self-supervised losses of the predicted images and the original noisy image, wherein the predicted image producer comprises: a first network for eliminating the noise from the original noisy image and the downsampled images; a second network for eliminating the noise from the downsampled images according to the at least one or more stride factors; and a restorer for restoring the shape of the original noisy image from the images from which the noise is eliminated through the first network and the second network to thus produce the predicted images.
2 . The blind denoising device according to claim 1 , wherein the PD produces the at least one or more downsampled images from the original noisy image according to at least one or more stride factors to thus augment data of the original noisy image.
3 . The blind denoising device according to claim 2 , wherein the PD performs order-variant PD using transformation matrices selected randomly, and the transformation matrices are shuffled in the sampling order.
4 . The blind denoising device according to claim 1 , wherein the learner performs learning for the first network and the second network, based on a first self-supervised loss defining a loss of the second network, a second self-supervised loss for supporting residual noise learning of the first network, a third self-supervised loss for enhancing the similarity between the predicted images produced through the first network and the second network, and a fourth self-supervised loss for limiting the distribution of the predicted images produced through the first network.
5 . The blind denoising device according to claim 4 , wherein the first self-supervised loss defines the loss of the second network from a difference between the predicted image produced through the second network and the original noisy image.
6 . The blind denoising device according to claim 4 , wherein the second self-supervised loss is defined as a difference between a pseudo-noise map and the predicted image produced through the first network, the pseudo-noise map is produced from a difference between the original noisy image and the predicted image produced through the second network, and the predicted image produced through the first network is the predicted image from the original noisy image.
7 . The blind denoising device according to claim 4 , wherein the third self-supervised loss is defined, based on a difference image between the original noisy image and the predicted image produced through the first network and a difference value between the predicted images produced through the second network, to enhance the similarity of low-frequency characteristics between the predicted image produced through the first network and the predicted image produced through the second network, and the predicted images produced through the first network and the second network are the predicted images from the downsampled images according to the same stride factor.
8 . The blind denoising device according to claim 4 , wherein the fourth self-supervised loss is noise prior loss that penalizes noise having higher size than a threshold value to allow the noise distribution of the predicted image produced through the first network to be close to the noise distribution of the original noisy image, and the predicted image produced through the first network is the predicted image from the original noisy image.
9 . A blind denoising method comprising the steps of: receiving an original noisy image; performing PD for the original noisy image to produce at least one or more downsampled images; eliminating the noise from the original noisy image and the downsampled images to produce at least one or more predicted images restored to the shape of the original noisy image; and performing optimized learning for a predicted image producer, based on at least one or more self-supervised losses of the predicted images and the original noisy image, wherein the step of producing at least one or more predicted images comprises the steps of: eliminating the noise from the original noisy image and the downsampled images through a first network; eliminating the noise from the downsampled images according to different stride factors through a second network; and restoring the shape of the original noisy image from the images from which the noise is eliminated through the first network and the second network to thus produce the predicted images through a restorer.
10 . The blind denoising method according to claim 9 , wherein the step of performing optimized learning for a predicted image producer comprises the step of performing learning for the first network and the second network, based on a first self-supervised loss defining a loss of the second network, a second self-supervised loss for supporting residual noise learning of the first network, a third self-supervised loss for enhancing the similarity between the predicted images produced through the first network and the second network, and a fourth self-supervised loss for limiting the distribution of the predicted images produced through the first network.
11 . The blind denoising method according to claim 10 , wherein the first self-supervised loss defines the loss of the second network from a difference between the predicted image produced through the second network and the original noisy image.
12 . The blind denoising method according to claim 10 , wherein the second self-supervised loss is defined as a difference between a pseudo-noise map and the predicted image produced through the first network, the pseudo-noise map is produced from a difference between the original noisy image and the predicted image produced through the second network, and the predicted image produced through the first network is the predicted image from the original noisy image.
13 . The blind denoising method according to claim 10 , wherein the third self-supervised loss is defined, based on a difference image between the original noisy image and the predicted image produced through the first network and a difference value between the predicted images produced through the second network, to enhance the similarity of low-frequency characteristics between the predicted image produced through the first network and the predicted images produced through the second network, and the predicted images produced through the first network and the second network are the predicted images from the downsampled images according to the same stride factor.
14 . The blind denoising method according to claim 10 , wherein the fourth self-supervised loss is noise prior loss that penalizes noise having higher size than a threshold value to allow the noise distribution of the predicted image produced through the first network to be close to the noise distribution of the original noisy image, and the predicted image produced through the first network is the predicted image from the original noisy image.

Description

CROSS-REFERENCE TO RELATED APPLICATION This application claims priority to Korean Patent Application Nos. 10-2022-0174325, filed on Dec. 14, 2022 and 10-2023-0033553, filed on Mar. 14, 2023, and all the benefits accruing therefrom under 35 U.S.C. § 119, the contents of which in their entireties are herein incorporated by reference. TECHNICAL FIELD The present invention relates to a device and method for texture-aware self-supervised blind denoising using self-residual learning, more specifically to a blind denoising device and method that makes use of a loss function-based denoising model learning and inference for eliminating noise generated in a process of acquiring an image. BACKGROUND ART Conventional self-supervised blind denoising shows poor quality in real-world images due to spatially correlated noise corruption. Recently, Pixel-shuffle Downsampling (PD) has been proposed to eliminate the spatial correlation of the noise. A study combining asymmetric PD (AP) and a blind-spot network (BSN) successfully demonstrates that self-supervised blind denoising is applicable to real-world noisy images. However, PD-based inference of the BSN may degrade texture details in the testing phase (denoising step) because high-frequency details (e.g., edges) are destroyed in the downsampled images. To solve such a problem, a model capable of eliminating noise, without the PD process, is needed to allow texture details to be kept. Further, a new inference system is required to boost overall performance, while avoiding the use of an order-variant PD constraint, noise prior knowledge-based loss function, and the PD. PRIOR ART LITERATURE (Patent Literature 0001) Korean Patent Application Laid-open No. 10-2022-0167824 (Dec. 22, 2022) DISCLOSURE Technical Problem Accordingly, the present invention has been made in view of the above-mentioned problems occurring in the related art, and it is an object of the present invention to provide a blind denoising device and method that is capable of learning a denoising model only using a noisy image to eliminate noise, without a PD process, in a testing step, so that texture details of an original image are kept, without degrading. Technical Solution To accomplish the above-mentioned objects, according to one aspect of the present invention, a blind denoising device may include: a receiving unit for receiving an original noisy image; a Pixel-shuffle Downsampling (PD) unit for performing PD for the original noisy image to produce at least one or more downsampled images; a predicted image producing unit for eliminating the noise from the original noisy image and the downsampled images to produce at least one or more predicted images restored to the shape of the original noisy image; and a learning unit for performing optimized learning for the predicted image producing unit, based on at least one or more self-supervised losses of the predicted images and the original noisy image. Further, the PD unit may produce the at least one or more downsampled images from the original noisy image according to at least one or more stride factors to thus augment data of the original noisy image. Furthermore, the PD unit may perform order-variant PD using transformation matrices selected randomly, and the transformation matrices may be shuffled in the sampling order. Moreover, the predicted image producing unit may include a first network for eliminating the noise from the original noisy image and the downsampled images, a second network for eliminating the noise from the downsampled images according to the stride factors, and a restoring part for restoring the shape of the original noisy image from the images from which the noise is eliminated through the first network and the second network to thus produce the predicted images. Further, the learning unit may perform learning for the first network and the second network, based on a first self-supervised loss defining a loss of the second network, a second self-supervised loss for supporting residual noise learning of the first network, a third self-supervised loss for enhancing the similarity between the predicted images produced through the first network and the second network, and a fourth self-supervised loss for limiting the distribution of the predicted images produced through the first network. Further, the first self-supervised loss may define the loss of the second network from a difference between the predicted image produced through the second network and the original noisy image. Furthermore, the second self-supervised loss may be defined as a difference between a pseudo-noise map and the predicted image produced through the first network, the pseudo-noise map may be produced from a difference between the original noisy image and the predicted image produced through the second network, and the predicted image produced through the first network may be the predicted image from the original noisy image. Moreover, the third self-supervised los