CN-121837032-B - High resolution image restoration method, apparatus and storage medium
Abstract
The application discloses a high-resolution image restoration method, equipment and a storage medium, which relate to the technical field of computer vision and comprise the steps of extracting structural features of a damaged image to obtain a corresponding edge image and a line diagram, determining a multi-channel input tensor based on the damaged image, the edge image, the line diagram and a binary mask image, inputting the multi-channel input tensor into a high-resolution structure restoration network to generate the high-resolution edge image and the high-resolution line diagram, carrying out fusion processing on the high-resolution edge image, the high-resolution line diagram, the mask image and the binary mask image to obtain a structural vision fusion feature image, inputting the structural vision fusion feature image into a structural enhancement texture restoration network to generate a high-resolution enhancement feature image, generating a high-resolution restoration image according to the high-resolution enhancement feature image, improving the structural reconstruction capability of high-resolution restoration, enhancing texture restoration quality, and generating restoration results with reasonable structure and consistent texture semantics.
Inventors
- LI FULIN
- LIU YANING
Assignees
- 深圳市石犀科技有限公司
Dates
- Publication Date
- 20260508
- Application Date
- 20260316
Claims (10)
- 1. A high resolution image restoration method, characterized in that the high resolution image restoration method comprises: obtaining a low-resolution damaged image and a binary mask image corresponding to the damaged image, and extracting structural features of the damaged image to obtain a corresponding edge image and a line block diagram; masking the damaged image, the edge map and the line block diagram through the binary mask map respectively to obtain a corresponding mask image, a mask edge map and a mask line block diagram; splicing the mask image, the binary mask image, the mask edge image and the mask line block diagram in a channel dimension to obtain a multi-channel input tensor; Inputting the multi-channel input tensor into a high-resolution structure restoration network, and performing structure restoration on the edge map and the line block diagram through the high-resolution structure restoration network to generate a high-resolution edge map and a high-resolution line block diagram; Performing fusion processing on the high-resolution edge map, the high-resolution line block diagram, the mask image and the binary mask map to obtain a corresponding structure vision fusion feature map; Inputting the structural visual fusion feature map into a structural enhancement texture repair network, and performing feature enhancement processing on the structural visual fusion feature map through the structural enhancement texture repair network to generate a high-resolution enhancement feature map; and generating a high-resolution repair image corresponding to the damaged image according to the high-resolution enhancement feature map.
- 2. The high resolution image restoration method according to claim 1, wherein the step of inputting the multi-channel input tensor into a high resolution structure restoration network, performing structure restoration on the edge map and the line block diagram through the high resolution structure restoration network, and generating the high resolution edge map and the high resolution line block diagram includes: performing convolution downsampling processing on the multi-channel input tensor to obtain a corresponding first downsampling feature map; Inputting the first downsampling feature map into a high-efficiency Transformer block in the high-resolution structural repair network, and carrying out high-order spatial interaction and feature enhancement on the first downsampling feature map through the high-efficiency Transformer block to obtain a corresponding structural enhancement feature map, wherein the high-efficiency Transformer block is constructed based on recursive gating convolution and a multi-head interaction attention mechanism; And performing transposed convolution up-sampling processing on the structure enhancement feature map to generate a corresponding high-resolution structure sketch space, wherein the high-resolution structure sketch space comprises the high-resolution edge map and the high-resolution line block diagram.
- 3. The method of high resolution image restoration according to claim 2, wherein the step of inputting the first downsampled feature map into an efficient transducer block in the high resolution structural restoration network, and obtaining the corresponding structural enhancement feature map by the efficient transducer block through high-order spatial interaction and feature enhancement of the first downsampled feature map comprises: normalizing the first downsampled feature map to obtain a corresponding first normalized feature map; Performing recursive gating convolution operation on the first normalized feature map to obtain a corresponding spatial hybrid feature map; Performing feature transformation on the spatial hybrid feature map through a first feedforward neural network to obtain a corresponding initial structure enhancement feature map, and performing addition processing on the initial structure enhancement feature map and the first downsampling feature map to obtain a corresponding first structure fusion feature map; Inputting the first structure fusion feature map into a mixed attention module in the efficient transducer block, and carrying out normalization processing on the first structure fusion feature map through a normalization layer in the mixed attention module to obtain a corresponding normalized structure fusion feature map; performing structural feature enhancement processing on the normalized structural fusion feature map through multi-head interaction attention branches in the mixed attention module to obtain a corresponding multi-head interaction attention processing result, and performing global semantic extraction on the normalized structural fusion feature map through complete attention branches in the mixed attention module to obtain a corresponding complete attention processing result; Adding the complete attention processing result, the multi-head interactive attention processing result and the first structure fusion feature map to obtain a corresponding second structure fusion feature map; and carrying out feature transformation on the second structure fusion feature map through a second feedforward neural network to obtain a corresponding intermediate structure enhancement feature map, and carrying out addition processing on the intermediate structure enhancement feature map and the second structure fusion feature map to obtain the structure enhancement feature map.
- 4. The method of high resolution image restoration according to claim 3, wherein said step of performing structural feature enhancement processing on said normalized structural fusion feature map through multiple interactive attention branches in said mixed attention module to obtain corresponding multiple interactive attention processing results includes: dividing the normalized structure fusion feature graph along the channel dimension to obtain sub-features corresponding to each attention head, and respectively carrying out linear transformation on each sub-feature to generate corresponding query tensor, key tensor and value tensor; dividing the query tensor, the key tensor and the value tensor into known region features and mask region features according to the binary mask map; Respectively calculating a first global feature vector corresponding to the known region feature and a second global feature vector corresponding to the mask region feature, and multiplying the first global feature vector and the second global feature vector element by element to obtain a known region enhancement feature and a mask region enhancement feature; Based on the known region enhancement features and the mask region enhancement features, carrying out weight dynamic prediction through SENet networks to obtain corresponding weight coefficients; weighting and fusing the known region enhancement features and the mask region enhancement features based on the weight coefficients to obtain corresponding weighting and fusing results; Combining the known region characteristics and the weighted fusion results to obtain output characteristics of each attention head; And splicing the output characteristics of the attention heads along the channel dimension to obtain the multi-head interaction attention processing result.
- 5. The method for high resolution image restoration according to claim 1, wherein the step of inputting the structural visual fusion feature map into a structural enhancement texture restoration network, performing feature enhancement processing on the structural visual fusion feature map through the structural enhancement texture restoration network, and generating the high resolution enhancement feature map includes: performing convolution downsampling processing on the structural vision fusion feature map to obtain a corresponding second downsampled feature map; Inputting the second downsampled feature map into a global-local transducer block of the structure enhanced texture repair network, and performing global-local feature enhancement on the second downsampled feature map through the global-local transducer block to obtain a corresponding texture enhanced feature map; And carrying out convolution up-sampling processing on the texture enhancement feature map to generate the high-resolution enhancement feature map.
- 6. The method of high resolution image restoration according to claim 5, wherein the step of inputting the second downsampled feature map into a global-local transform block of the structure enhanced texture restoration network, and performing global-local feature enhancement on the second downsampled feature map by the global-local transform block, obtaining a corresponding texture enhanced feature map includes: Normalizing the second downsampled feature map to obtain a corresponding second normalized feature map; performing recursive gating convolution operation on the second normalized feature map to obtain a corresponding initial texture enhancement feature map, and performing addition processing on the initial texture enhancement feature map and the second downsampled feature map to obtain a corresponding first texture fusion feature map; Normalizing the first texture fusion feature map to obtain a corresponding normalized texture fusion feature map; Inputting the normalized texture fusion feature map into a global-local fusion module of the global-local transducer block, extracting global features through global branches of the global-local fusion module, screening by a gating unit, extracting local detail features through local branches of the global-local fusion module, and combining the screened global features with the local features to obtain an intermediate texture enhancement feature map; Adding the intermediate texture enhancement feature map and the first texture fusion feature map to obtain a corresponding second texture fusion feature map; and carrying out feature transformation on the second texture fusion feature map through a third feedforward neural network, and carrying out addition processing on a corresponding feature transformation result and the second texture fusion feature map to obtain the texture enhancement feature map.
- 7. The method of high resolution image restoration according to claim 1, wherein before said step of extracting structural features of said damaged image to obtain a corresponding edge map and line block diagram, further comprising: the method comprises the steps of obtaining a training set, wherein the training set comprises a plurality of training samples and labels of the training samples, the labels of the training samples comprise high-resolution target images corresponding to the training samples, high-resolution target edge maps corresponding to the training samples and high-resolution target block diagrams corresponding to the training samples, and the training samples are training images with low resolution and training binary mask maps corresponding to the training images; Inputting the training image into a structure extraction network in a high-resolution image restoration model to be trained, and extracting structural features of the training image through the structure extraction network to obtain a corresponding training edge diagram and a training line diagram; Masking the training image, the training edge map and the training line block diagram through the training binary mask map to obtain a corresponding training mask image, training mask edge map and training mask line block diagram; carrying out channel cascading on the training mask image, the training binary mask image, the training mask edge image and the training mask line block diagram to form a training multi-channel input tensor; Inputting the training multi-channel input tensor into a high-resolution structure restoration network in the high-resolution image restoration model, and carrying out structure restoration on the edge map and the line block diagram through the high-resolution structure restoration network to generate a high-resolution edge prediction map and a high-resolution line block prediction map; Calculating a first loss value corresponding to a cross entropy loss function based on the high resolution edge prediction graph and the high resolution target edge graph, and calculating a second loss value corresponding to a cross entropy loss function based on the high resolution wire frame prediction graph and the high resolution target line block diagram; calculating a first stage loss value of the high-resolution image restoration model according to the first loss value and the second loss value, and updating parameters of the high-resolution structural restoration network based on the first stage loss value; Inputting the training image into the high-resolution image restoration model, and performing image restoration on the training image with low resolution through the high-resolution image restoration model to be trained, so as to obtain a high-resolution predicted image corresponding to the training image; Determining a second-stage loss value of the high-resolution image restoration model based on the high-resolution predicted image and the high-resolution target image, and updating parameters of the structure-enhanced texture restoration network based on the second-stage loss value; and when the loss value of the second stage is smaller than or equal to a preset value, obtaining the high-resolution image restoration model after training.
- 8. The high resolution image restoration method according to claim 7, wherein the step of determining the second-stage loss value of the high resolution image restoration model based on the high resolution predicted image and the high resolution target image includes: Calculating L1 norm loss between the high-resolution predicted image and the high-resolution target image to obtain a pixel reconstruction loss value; Respectively inputting the high-resolution predicted image and the high-resolution target image into a discriminator, outputting corresponding true and false judging results through the discriminator, calculating discriminator loss and generator loss based on the true and false judging results, and adding a gradient penalty term to obtain an anti-loss value; Obtaining output feature graphs of the discriminator on the high-resolution predicted image and the high-resolution target image in a plurality of intermediate activation layers, and calculating L1 norm losses between the output feature graphs of the corresponding layers to be used as feature matching loss values; Respectively inputting the high-resolution predicted image and the high-resolution target image into a pre-trained residual error network with an expansion convolution structure to obtain a corresponding first high-level semantic feature map and a corresponding second high-level semantic feature map; Performing feature similarity calculation on the first high-level semantic feature map and the second high-level semantic feature map to serve as a high receptive field perception loss value; And carrying out weighted summation on the pixel reconstruction loss value, the antagonism loss value, the characteristic matching loss value and the high receptive field perception loss value to obtain the second stage loss value.
- 9. A high resolution image restoration device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the computer program being configured to implement the steps of the high resolution image restoration method according to any one of claims 1 to 8.
- 10. A storage medium, characterized in that the storage medium is a computer-readable storage medium, on which a computer program is stored, which computer program, when being executed by a processor, realizes the steps of the high resolution image restoration method according to any one of claims 1 to 8.
Description
High resolution image restoration method, apparatus and storage medium Technical Field The present application relates to the field of computer vision, and in particular, to a method and apparatus for repairing a high resolution image, and a storage medium. Background Image restoration is the reconstruction of image structures and content with confidence levels to some extent using existing information. In recent years, image restoration technology has become one of important research contents in the field of image processing, and has a wide range of applications such as broken image restoration, target removal, image retouching, text removal, and the like. To produce realistic results, the restored image should maintain missing region neighborhood, image texture consistency. In spite of great progress in image restoration technology in recent years, with progress in photographing apparatuses and display technologies, demands of users for high-quality, high-definition images are increasing, and high-resolution image restoration is challenging. High resolution images typically contain rich detail and fine geometry (e.g., continuous edges, continuous lines, and complex textures), whereas conventional convolutional neural networks have limited local receptive fields, which can easily result in structural breaks, misalignments, or discontinuities in the resulting content and surrounding known regions when dealing with extensive deletions or complex structural repairs. Although a transducer architecture is introduced in subsequent researches to capture long-range dependency, a self-attention mechanism of the transducer architecture is still biased to modeling of low-order space relation in deep feature interaction, and high-order nonlinear association among pixels cannot be fully simulated, so that macroscopic structures such as edge contours, object shapes and the like after repair are often distorted or distorted, and visual sense reality and space continuity of images are seriously damaged. The foregoing is provided merely for the purpose of facilitating understanding of the technical solutions of the present application and is not intended to represent an admission that the foregoing is prior art. Disclosure of Invention The application mainly aims to provide a high-resolution image restoration method, high-resolution image restoration equipment and a storage medium, and aims to solve the technical problem that the integral structural integrity of an image is difficult to maintain in a high-resolution image restoration process in the prior art. In order to achieve the above object, the present application provides a high resolution image restoration method, including: obtaining a low-resolution damaged image and a binary mask image corresponding to the damaged image, and extracting structural features of the damaged image to obtain a corresponding edge image and a line block diagram; Masking the damaged image, the edge map and the line block diagram respectively through the binary mask map, and carrying out channel cascading on a corresponding masking result and the binary mask map to form a multi-channel input tensor; Inputting the multi-channel input tensor into a high-resolution structure restoration network, and performing structure restoration on the edge map and the line block diagram through the high-resolution structure restoration network to generate a high-resolution edge map and a high-resolution line block diagram; Performing fusion processing on the high-resolution edge map, the high-resolution line block diagram, the mask image and the binary mask map to obtain a corresponding structure vision fusion feature map; Inputting the structural visual fusion feature map into a structural enhancement texture repair network, and performing feature enhancement processing on the structural visual fusion feature map through the structural enhancement texture repair network to generate a high-resolution enhancement feature map; and generating a high-resolution repair image corresponding to the damaged image according to the high-resolution enhancement feature map. In an embodiment, the step of inputting the multi-channel input tensor into a high-resolution structure repair network, and performing structure repair on the edge map and the line block diagram through the high-resolution structure repair network, to generate a high-resolution edge map and a high-resolution line block diagram includes: performing convolution downsampling processing on the multi-channel input tensor to obtain a corresponding first downsampling feature map; Inputting the first downsampling feature map into a high-efficiency Transformer block in the high-resolution structural repair network, and carrying out high-order spatial interaction and feature enhancement on the first downsampling feature map through the high-efficiency Transformer block to obtain a corresponding structural enhancement feature map, wherein the high-efficiency