CN-122023124-A - Image super-resolution reconstruction method, device, equipment and storage medium
Abstract
The invention discloses an image super-resolution reconstruction method, device, equipment and storage medium, wherein the method comprises the steps of obtaining a low-resolution image, inputting the low-resolution image into an encoder of a deep learning network, executing multi-scale image feature extraction and downsampling operation through the encoder, and outputting multi-scale downsampling features; the method comprises the steps of inputting multi-scale downsampling characteristics into a decoder of a deep learning network as input characteristics, mapping the input characteristics of current scale nodes in each scale node of the decoder to obtain an original image of the current scale nodes, extracting characteristics by a convolution layer and an activation function aiming at the original image to obtain residual error correction characteristics, fusing the residual error correction characteristics with the input characteristics of the current scale nodes to obtain the input characteristics of the next scale nodes, and outputting the original image as a high-resolution image after the resolution of the original image reaches a preset resolution. The method realizes the gradual explicit restoration and characteristic calibration of the image content.
Inventors
- LIN ZIXIN
- Zheng Sengui
Assignees
- 深圳软牛科技集团股份有限公司
Dates
- Publication Date
- 20260512
- Application Date
- 20260127
Claims (10)
- 1. An image super-resolution reconstruction method, which is characterized by comprising the following steps: acquiring a low-resolution image, inputting the low-resolution image into an encoder of a deep learning network, performing multi-scale image feature extraction and downsampling operation through the encoder, and outputting multi-scale downsampling features; inputting the multi-scale downsampled features as input features to a decoder of the deep learning network; Mapping the input characteristics of the current scale node in each scale node of the decoder to obtain an original image of the current scale node; performing feature extraction on the original image through a convolution layer and an activation function to obtain residual error correction features; Fusing the residual error correction characteristic with the input characteristic of the current scale node to obtain the input characteristic of the next scale node; and outputting the original image as a high-resolution image after the resolution of the original image reaches a preset resolution.
- 2. The method of claim 1, wherein the performing, by the encoder, multi-scale image feature extraction and downsampling operations, the outputting multi-scale downsampled features comprises: Processing the low-resolution image through a first-stage feature extraction layer of the encoder to obtain a first scale feature; Performing a first downsampling operation on the first scale feature to obtain a second scale feature; Inputting the second scale features into a second-stage feature extraction layer of the encoder to extract deep semantic information, so as to obtain second scale features; performing a second downsampling operation on the second scale feature to obtain a third scale feature; and collecting the first scale feature, the second scale feature and the third scale feature to obtain the multi-scale downsampling feature.
- 3. The method for reconstructing an image super-resolution according to claim 1, wherein the extracting features of the original image through a convolution layer and an activation function to obtain residual correction features includes: Performing a first convolution operation on the original image to generate a first convolution feature; performing a first nonlinear activation process on the first convolution feature to generate a preliminary feature; performing a second convolution operation on the preliminary features to generate second convolution features; Executing a second nonlinear activation process on the second convolution feature to generate a deep semantic feature; performing up-sampling operation on the deep semantic features to generate high-resolution feature mapping; obtaining a reference image and extracting the difference between the original image and the reference image to obtain residual information; fusing the high-resolution feature map and the residual information to generate residual guiding features; and performing convolution and feature transformation operation on the residual error guiding feature to obtain a residual error correction feature.
- 4. The method for reconstructing an image according to claim 3, wherein the obtaining a reference image and extracting a difference between the original image and the reference image to obtain residual information comprises: acquiring a high-resolution real image corresponding to the low-resolution image as a reference image; Performing a spatial size transformation operation on the reference image to generate a size-aligned image; Performing difference calculation on the original image and the size alignment image through numerical subtraction operation to generate an initial difference image; and performing multi-layer convolution processing on the initial difference image to obtain residual information.
- 5. The method of claim 1, wherein fusing the residual correction feature with the input feature of the current scale node to obtain the input feature of the next scale node comprises: performing up-sampling operation on the residual error correction characteristic and the input characteristic of the current scale node to respectively obtain an up-sampling correction characteristic and an up-sampling input characteristic; And carrying out weighted summation on the up-sampling correction characteristic and the up-sampling input characteristic to obtain the input characteristic of the next scale node.
- 6. The image super-resolution reconstruction method according to claim 1, wherein after the resolution of the original image reaches a predetermined resolution, outputting the original image as a high-resolution image comprises: Acquiring a high-resolution original training image; When each scale node of the decoder generates a corresponding original image, downsampling the high-resolution original training image to the same spatial resolution as the high-resolution original training image by a nearest neighbor interpolation method to obtain a supervision image of a corresponding scale; calculating a first loss value between the original image and the supervision image of the corresponding scale; calculating a second loss value between the high resolution image and the high resolution original training image; Carrying out weighted summation on all the first loss values and the second loss values to obtain total loss values; and updating and optimizing parameters of the deep learning network by using the total loss value.
- 7. The method for reconstructing the super-resolution image according to claim 6, wherein the downsampling the high-resolution original training image to the same spatial resolution as the high-resolution original training image by a nearest neighbor interpolation method to obtain a supervisory image of a corresponding scale comprises: Determining the target spatial resolution of the original image output by the current scale node in the decoder; according to the original spatial resolution of the high-resolution original training image and the target spatial resolution, calculating to obtain a width-direction scaling factor and a height-direction scaling factor; Based on the scaling factor in the width direction and the scaling factor in the height direction, performing spatial transformation on the high-resolution original training image by adopting a nearest neighbor interpolation algorithm to generate a downsampled image; And taking the downsampled image as a supervision image corresponding to the current scale node.
- 8. An image super-resolution reconstruction apparatus, comprising: The acquisition unit is used for acquiring a low-resolution image, inputting the low-resolution image into an encoder of the deep learning network, executing multi-scale image feature extraction and downsampling operation through the encoder, and outputting multi-scale downsampling features; An input unit for inputting the multi-scale downsampling feature as an input feature to a decoder of the deep learning network; The mapping unit is used for mapping the input characteristics of the current scale node in each scale node of the decoder to obtain an original image of the current scale node; The extracting unit is used for extracting characteristics of the original image through the convolution layer and the activation function to obtain residual error correction characteristics; The fusion unit is used for fusing the residual error correction characteristic with the input characteristic of the current scale node to obtain the input characteristic of the next scale node; And the output unit is used for outputting the original image as a high-resolution image after the resolution of the original image reaches the preset resolution.
- 9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the image super-resolution reconstruction method according to any one of claims 1 to 7 when executing the computer program.
- 10. A computer readable storage medium, characterized in that the computer readable storage medium stores a computer program which, when executed by a processor, causes the processor to perform the image super resolution reconstruction method according to any one of claims 1 to 7.
Description
Image super-resolution reconstruction method, device, equipment and storage medium Technical Field The present invention relates to the field of image processing technologies, and in particular, to a method, an apparatus, a device, and a storage medium for reconstructing an image with super resolution. Background In the technical field of image processing and computer vision, an image Super-Resolution (SR) reconstruction technology aims at recovering a high-Resolution version from a low-Resolution image, and has important application value in the aspects of security monitoring, medical imaging, satellite remote sensing, multimedia entertainment and the like. Currently, deep learning-based image super-resolution technology has become mainstream, and in particular, an end-to-end mapping model of Convolutional Neural Network (CNN) or a transform architecture is adopted, for example, a classical structure such as UNet, resNet or RCAN. Such methods typically input the low resolution image directly into the network, through a series of hierarchical feature extraction and nonlinear transformations, ultimately generating the high resolution image at once at the output of the network, and calculating a single reconstruction loss at the final output to guide model training. However, such strategies relying on single end supervision increasingly exhibit several inherent drawbacks in practice and application: First, features extracted by deep networks tend to be highly abstract, belonging to a semantic-level representation, lacking direct association with physical properties of the image (e.g., structure, texture). In the feature transfer process, the critical spatial information and detail texture are easy to be gradually attenuated or distorted along with the increase of the network depth, so that artifacts or detail loss occur in the final reconstruction result. In addition, because the loss signal is only generated at the end of the network, when the error is in back propagation, the gradient must be returned layer by layer through a plurality of layers, so that the problem of gradient decrease or dissipation is very easy to occur, the parameters in the network and the shallow layer are difficult to obtain sufficient and effective update, the convergence speed of the model is further slowed down, and the training stability is influenced. Furthermore, the structure represented by UNet introduces a jump connection to fuse features of different scales, but its decoder is essentially an open-loop forward pass in the upsampling reconstruction process. If false estimates of texture or structure are generated at some intermediate level, such errors will be continually amplified and accumulated during subsequent upsampling and directly affect the final output quality. The existing architecture lacks a feedback mechanism capable of instantly checking, evaluating and implementing targeted correction on image content at the intermediate generation level, and further improvement of reconstruction accuracy and robustness is limited. Disclosure of Invention The invention aims to provide an image super-resolution reconstruction method, device, equipment and storage medium, and aims to solve the problems of inaccurate detail recovery, unobvious definition improvement and the like in the existing image super-resolution technology. In a first aspect, an embodiment of the present invention provides an image super-resolution reconstruction method, including: acquiring a low-resolution image, inputting the low-resolution image into an encoder of a deep learning network, performing multi-scale image feature extraction and downsampling operation through the encoder, and outputting multi-scale downsampling features; inputting the multi-scale downsampled features as input features to a decoder of the deep learning network; Mapping the input characteristics of the current scale node in each scale node of the decoder to obtain an original image of the current scale node; performing feature extraction on the original image through a convolution layer and an activation function to obtain residual error correction features; Fusing the residual error correction characteristic with the input characteristic of the current scale node to obtain the input characteristic of the next scale node; and outputting the original image as a high-resolution image after the resolution of the original image reaches a preset resolution. In a second aspect, an embodiment of the present invention provides an image super-resolution reconstruction apparatus, including: The acquisition unit is used for acquiring a low-resolution image, inputting the low-resolution image into an encoder of the deep learning network, executing multi-scale image feature extraction and downsampling operation through the encoder, and outputting multi-scale downsampling features; An input unit for inputting the multi-scale downsampling feature as an input feature to a decoder of the deep le