CN-122023196-A - Large-area defect image restoration method based on multi-scale feature fusion
Abstract
The invention relates to the technical field of computer vision and deep learning, in particular to a large-area defect image restoration method based on multi-scale feature fusion, which comprises the following steps of obtaining a damaged image and a binary mask corresponding to the damaged image; the method comprises the steps of constructing an encoder, extracting multi-level spatial features of a damaged image to obtain potential semantic vectors, decoupling random noise vectors into semantic embedded vectors through a mapping network, introducing a multi-scale feature fusion module based on weight distribution into a generator, aggregating context information of different receptive fields, reconstructing a feature map and generating a repair image, introducing a contrast learning mechanism into a discriminator, enhancing feature discrimination capability and stabilizing training by constructing positive and negative sample pairs, and carrying out joint optimization on the network by combining multiple loss functions to generate a repair result. By introducing a multi-scale feature fusion module and a contrast learning mechanism, the global semantic consistency and local texture detail of large-area defect image restoration are realized, and the method has good stability and generalization capability.
Inventors
- WANG XILONG
- WU WEILI
- HUO TIANLONG
Assignees
- 西安邮电大学
Dates
- Publication Date
- 20260512
- Application Date
- 20260202
Claims (8)
- 1. The large-area defect image restoration method based on multi-scale feature fusion is characterized by comprising the following steps of: obtaining a damaged image and a corresponding binary mask thereof; Constructing an encoder, extracting multi-level spatial features of the damaged image, and obtaining potential semantic vectors; Decoupling the random noise vector into a semantic embedded vector through a mapping network; Introducing a multi-scale feature fusion module based on weight distribution into a generator, aggregating the context information of different receptive fields, reconstructing a feature map and generating a repair image; introducing a contrast learning mechanism into the discriminator, and pulling a repair image and a real image and pushing away the repair image and a damaged image in a feature space by constructing positive and negative sample pairs so as to enhance feature discrimination capability and stabilize training; And carrying out joint optimization on the network by combining the countermeasures, the reconstruction measures, the style measures and the contrast measures to generate a repair result.
- 2. The large-area defect image restoration method based on multi-scale feature fusion of claim 1, wherein the constructing encoder extracts multi-level spatial features of the damaged image to obtain potential semantic vectors, and the method comprises the following specific steps: The encoder is formed by sequentially connecting a plurality of convolution blocks, each convolution block comprises two 3 multiplied by 3 convolution layers, and each convolution block is connected with LeakyReLU activating functions; Splicing the damaged image and the mask into a four-way input tensor input encoder network; outputting through two convolution layers and then continuing to enter the next stage of downsampling; gradually reducing the spatial resolution of the feature map and increasing the number of channels through a plurality of downsampling stages; In the forward propagation process of the encoder, the feature diagram of each layer output is saved to form a multi-scale jump connection feature set ; Mapping high-dimensional feature maps of encoder output into latent semantic vectors through full-connection layer 。
- 3. The method for repairing large-area defect image based on multi-scale feature fusion according to claim 1, wherein the mapping network is a multi-layer perceptron comprising N full-connection layers, which vectors random noise Mapping into semantic embedded vectors The mapping process is expressed as: wherein To map network parameters, said The layers are normalized by the adaptive instance to inject into each up-sampling stage of the generator to modulate the style and semantic properties of the feature map.
- 4. The method for repairing the large-area defect image based on multi-scale feature fusion according to claim 1, wherein the multi-scale feature fusion module based on weight distribution comprises the following specific steps: Will be Divided into a sequence of feature blocks of size 3 x3 And calculating cosine similarity between two feature blocks at different positions, so as to measure the similarity between the feature blocks, wherein a similarity formula is as follows: Wherein, the method comprises the steps of, Representing feature blocks And feature block Is used for the cosine similarity of the (c), The i-th feature block is represented as, Representing a j-th feature block; obtaining the attention score of each feature block by carrying out Softmax normalization on the similarity ; Using the attention score Weighting and reconstructing the original features to obtain enhanced features ; Will be Inputting four parallel cavity convolution layers with expansion rates of r=1, 2,4 and 8, wherein the sizes of receptive fields of the cavity convolution layers are respectively expanded to 3,7,15,31 so as to capture remote context semantic information with different scales, and thus, consistent structural textures are synthesized by utilizing the remote context semantic information with different scales in the process of up-sampling of a generator; Generating weight maps corresponding to multi-scale features by a lightweight weight allocator The distributor consists of two convolution layers, and applies Softmax to ensure that each spatial position has a normalized weight distribution in the channel dimension, and finally, the distributor carries out weight graph slicing and is divided into 4 weight graphs, and then, the feature graph obtained by cavity convolution is subjected to weighted fusion with the weight graph; Weighting and fusing the multi-scale features The input mixed attention module CBAM sequentially modulates the channel attention sub-module and the space attention sub-module, and finally outputs the fusion characteristic 。
- 5. The method for large area defect image restoration based on multi-scale feature fusion as set forth in claim 4, wherein said multi-scale feature fusion Module (MSFF) is packaged in a residual unit to form MSFF-Res block, said MSFF-Res block is combined with same resolution skip features from an encoder And fusing through feature addition to form MSFF-Syn modules.
- 6. The method for repairing the large-area defect image based on multi-scale feature fusion according to claim 1, wherein a contrast learning mechanism is introduced into the discriminator, and the repair image and the real image and the push-away repair image and the damaged image are zoomed in a feature space by constructing positive and negative sample pairs so as to enhance feature discrimination capability and stabilize training, and the method comprises the following specific steps: the discriminator comprises a plurality of convolution layers, and extracts a multi-layer characteristic representation of an input image; constructing a sample pair with a real image as an anchor point, a repair image as a positive sample and a damaged image as a negative sample; Calculating texture contrast loss based on the shallow layer characteristics of the discriminator, and restricting low-level texture consistency; And calculating semantic comparison loss based on the deep features of the discriminators, and restraining high-level semantic consistency.
- 7. The method for repairing a large-area defect image based on multi-scale feature fusion according to claim 1, wherein the loss function comprises an antagonism loss based on unsaturated cross entropy, an L1 reconstruction loss, a style loss based on Gram matrix and a contrast loss formed by weighting texture contrast loss and semantic contrast loss, and the total loss function is a weighted sum of the individual term losses.
- 8. The large-area defect image restoration method based on multi-scale feature fusion according to claim 1, wherein the training process adopts an alternate optimization strategy, and comprises the following specific steps: firstly, fixing generator parameters, and updating a discriminator to minimize the countering loss; fixing the parameters of the discriminator, and updating the generator to minimize the total loss; And in this way, iterating until the network converges.
Description
Large-area defect image restoration method based on multi-scale feature fusion Technical Field The invention relates to the technical field of computer vision and deep learning, in particular to a large-area defect image restoration method based on multi-scale feature fusion. Background Digital images play a vital role in today's society as one of the most important carriers for information recording and dissemination. However, due to technical limitations or interference from external factors, the images inevitably face different degrees of damage or loss of a certain area during storage and transmission. The image restoration technique aims at reconstructing the missing region by using the existing image information and generating the image content which is consistent with the existing visual structure and has the realistic texture. Image restoration has been widely used in the real world, such as image editing, object removal, restoration of old photos, and the like. The traditional image restoration methods mostly adopt machine learning algorithms based on statistical probability, the methods have poor effect when processing complex semantics and large damaged areas, and the generated restoration images often lack semantic consistency and texture structure consistency. With the rapid development of deep learning in the field of computer vision, the algorithm based on the deep learning can better capture high-level semantics in image restoration and obtain a significantly improved restoration result. However, when the large-area image is missing, the repairing effect is still not ideal, and the problems of texture loss, structure distortion, boundary blurring, inconsistent semantics and the like of the repairing result are caused because long-distance semantic association is difficult to capture. Most current methods rely on local context information, lacking efficient modeling of image global semantic structures. Although some researches attempt to introduce a attentive mechanism or cavity convolution to expand the receptive field, importance differences of different scale features are often ignored in the feature fusion process, so that the feature utilization rate is low and the fusion effect is rough. In addition, in countermeasure training, the discriminator is generally only used to distinguish the authenticity of the image, and discriminative information in the feature space thereof is not fully utilized to guide the generator to learn the repair result with more consistency. Therefore, how to effectively fuse multi-scale features, enhance global semantic understanding, and obtain more consistent image restoration results is still a key problem to be solved urgently. Disclosure of Invention The invention aims to provide a large-area defect image restoration method based on multi-scale feature fusion, which aims to solve the problems of inconsistent semantics, texture distortion and the like in large-area defect image restoration. The method adopts the technology of generating an countermeasure network model, combining an encoder and a mapping network to extract rich semantic features of the image, and realizing high-quality and global semantic consistent image restoration through multi-scale feature fusion, dynamic attention weight, contrast learning and other technologies. The invention provides a large-area defect image restoration method based on multi-scale feature fusion, which comprises the following steps: Obtaining a damaged image and a corresponding binary mask; constructing an encoder, extracting multi-level spatial features of the damaged image, and obtaining potential semantic vectors; Decoupling the random noise vector into a semantic embedded vector through a mapping network; Introducing a multi-scale feature fusion module based on weight distribution into a generator, aggregating the context information of different receptive fields, reconstructing a feature map and generating a repair image; introducing a contrast learning mechanism into the discriminator, and pulling a repair image and a real image and pushing away the repair image and a damaged image in a feature space by constructing positive and negative sample pairs so as to enhance feature discrimination capability and stabilize training; And carrying out joint optimization on the network by combining the countermeasures, the reconstruction measures, the style measures and the contrast measures to generate a repair result. Preferably, the constructing encoder extracts multi-level spatial features of the damaged image to obtain potential semantic vectors, and the specific steps are as follows: The encoder is formed by sequentially connecting a plurality of convolution blocks, each convolution block comprises two 3 multiplied by 3 convolution layers, and each convolution block is connected with LeakyReLU activating functions; Splicing the damaged image and the mask into a four-way input tensor input encoder network; outputting