CN-121998848-A - Image self-adaptive degradation recovery method based on dual-network cooperation and region perception

CN121998848ACN 121998848 ACN121998848 ACN 121998848ACN-121998848-A

Abstract

The invention relates to the technical field of computer vision and digital image processing, and discloses an image self-adaptive degradation recovery method based on double-network cooperation and region perception, which comprises the steps of constructing a blind neighborhood network, acquiring a flat region smooth priori by using a self-supervision strategy, constructing a local perception network, and reserving texture high-frequency details by combining gradient constraint; and finally, only the main body network is reserved, and the end-to-end reasoning is realized by combining an operator fusion and quantization technology. The problem that smooth denoising and detail reservation are difficult to be compatible is effectively solved through a double-network collaborative supervision and feature decoupling mechanism, the image restoration quality is remarkably improved, meanwhile, the model calculation complexity is greatly reduced, and the requirement of efficient real-time deployment of edge terminal equipment is met.

Inventors

LIN WEILONG
HE YUHAO
YAN SHUAI
SONG YUNHAI
CHEN XIANBIAO
WANG LIWEI
YU DIANRUI
XIAO YAOHUI
DING WEIFENG
YU JUNSONG
SHAO CHENGLIN

Assignees

中国南方电网有限责任公司超高压输电公司电力科研院

Dates

Publication Date: 20260508
Application Date: 20260123

Claims (10)

1. The image self-adaptive degradation recovery method based on double-network cooperation and region perception is characterized by comprising the following steps of: S1, constructing a blind neighborhood network, forming a physical blind neighborhood by introducing hole convolution or feature map shift operation into a feature extraction branch, and training the network by utilizing a self-supervision strategy to acquire smooth priori information of an image flat region; S2, constructing a local perception network, and performing joint training by using the output of the blind neighborhood network as a supervision signal of a flat area and combining gradient constraint of an image texture area so as to acquire local features retaining high-frequency details; S3, constructing a main body denoising network integrating region perception memory and contrast learning, generating a region self-adaptive supervisory signal by utilizing the output of the blind neighborhood network and the local perception network, and carrying out joint optimization training on the main body denoising network; S4, performing single-model end-to-end reasoning and engineering deployment, only instantiating a main body denoising network after training, inputting an image to be processed into the network for forward reasoning, and outputting a restored image.
2. The method for image adaptive degradation recovery based on dual network collaboration and regional awareness according to claim 1, wherein training the network in step S1 using a self-supervision strategy specifically comprises: defining a self-supervision L1 loss function, and restraining a network output value to approach a statistical expected value of an input noise image in a flat area by utilizing zero mean value characteristics of noise in spatial distribution; defining a flat area smooth constraint loss function, and based on a total variation principle, constraining a spatial variation rate by calculating differential absolute values of adjacent pixels of an output image in a horizontal direction and a vertical direction; And carrying out weighted summation on the self-supervision L1 loss function and the flat area smooth constraint loss function, and enabling the wide receptive field blind neighborhood network to block the propagation of spatial correlation noise in the local neighborhood through joint optimization.
3. The dual network collaboration and region awareness based image adaptive degradation restoration method of claim 1, wherein the restricted receptive field local awareness network in step S2 is configured to perform feature inference only through pixel level local neighborhood information: The main body of the limited receptive field local perception network is composed of a plurality of layers of stacked convolution layers, the expansion rate of all the convolution layers is set to be 1, no pooling or downsampling operation is arranged between the layers, and the effective receptive field of the network for any output pixel is limited in a preset pixel range, so that the transmission of long-range smooth information is blocked on a physical path.
4. The method of claim 3, further comprising computing a region adaptive fusion coefficient and a texture mask when training the restricted receptive field local perception network: Quantifying content complexity by utilizing pixel statistical characteristics of the local area of the image, and calculating pixel intensity standard deviations of all pixel positions in a neighborhood window; mapping the standard deviation to a normalization interval to obtain an adaptive coefficient, wherein the adaptive coefficient with smaller value represents a flat area, and the adaptive coefficient with larger value represents a texture area; And generating a binarization texture mask based on the adaptive coefficient, wherein the binarization texture mask is used for distinguishing a high-frequency area from a low-frequency area in an image.
5. The method for image adaptive degradation recovery based on dual-network collaboration and region awareness according to claim 4, wherein in step S2, the output of the blind neighborhood network is used as a supervision signal of a flat region, and the combined training by combining gradient constraints of an image texture region specifically comprises: the adaptive coefficient is used as a space weight, the limited receptive field local perception network is forced to approach the output of the wide receptive field blind neighborhood network in a flat area, and gradient stopping operation is applied to the output of the wide receptive field blind neighborhood network; Constraining the gradient field of the limited receptive field local perception network output image and the gradient field of the wide receptive field blind neighborhood network output image to be consistent in a texture area by utilizing the binarization texture mask; The loss functions calculated based on the constraints are weighted and summed to update the network parameters.
6. The image adaptive degradation recovery method based on dual-network collaboration and region awareness according to claim 1, wherein the main denoising network in step S3 adopts a two-stage nested transformer backbone architecture: the architecture adopts a nested U-shaped design, an external structure is responsible for overall feature coordination and image reconstruction, and an internal structure is used as a core processing unit to model a degradation mode through a multi-scale encoding and decoding structure; The basic building block in the main body denoising network adopts a window-based self-attention mechanism, an input feature map is divided into a plurality of rectangular windows which are arranged in a grid shape in the space dimension, self-attention calculation is independently carried out in all windows, and relative space structure information among pixels is encoded through relative position offset items.
7. The method for image adaptive degradation recovery based on dual network collaboration and area awareness of claim 6, wherein embedding an area awareness non-local memory module at each layer of the internal structure comprises: A multi-mode memory bank storing memory vectors encoding different region-specific degenerate modes; The regional perception gating mechanism uses the self-adaptive coefficient generated in the step S2 as a guide to dynamically calculate the correlation weight between the current image position characteristic and each memory vector, and generates a fusion memory vector according to the correlation weight; And the long-period memory network cyclic updating unit is used for circularly updating the characteristic sequence by using the fusion memory vector modulation input gate and the forgetting gate so as to keep consistency of degradation modeling among different stages.
8. The method for image adaptive degradation recovery based on dual-network collaboration and region awareness according to claim 1, wherein the training of the subject denoising network in step S3 further comprises constructing a feature-level region awareness contrast learning mechanism: For anchor point samples extracted from the degraded image flat area, limiting positive samples of the anchor point samples to other image blocks from the same flat area, and setting negative samples of the anchor point samples to all image blocks in the image blocks containing the degraded image texture area and the truth image; for anchor point samples extracted from the degraded image texture area, defining positive samples thereof as image blocks from the same texture area, and setting negative samples thereof as including flat area samples and true value image samples; the model is forced to decouple image content, degradation type and regional characteristics in the hidden feature space by calculating regional perceptual contrast learning loss.
9. The method for image adaptive degradation recovery based on dual-network collaboration and regional awareness according to claim 4, wherein generating the regional adaptive supervisory signal in step S3 specifically comprises: According to the calculated region self-adaptive fusion coefficient, carrying out pixel-level weighted fusion on the outputs of the wide receptive field blind neighborhood network and the limited receptive field local perception network, and taking the pixel-level weighted fusion as a supervision target of basic denoising loss; And constructing a total loss function comprising the basic denoising loss, the multi-scale pixel loss, the multi-scale perception loss, the memory consistency loss and the gating alignment loss, and performing end-to-end joint optimization on the main denoising network.
10. The image adaptive degradation recovery method based on dual-network collaboration and region awareness according to claim 1, wherein the step S4 of performing single-model end-to-end reasoning and engineering deployment specifically comprises: all training parameters of the trained main body denoising network are extracted and stored, wherein the training parameters comprise a cured memory matrix and gating parameters, and a wide receptive field blind neighborhood network and a limited receptive field local perception network are not loaded; Implementing model quantization and acceleration optimization, executing an operator fusion strategy, combining continuous convolution calculation, offset addition and activation function operation in a network into a single calculation core, and quantizing model parameters from floating points to low-precision numerical values; And normalizing the image to be processed, inputting the normalized image to an inference model, and directly realizing self-adaptive weighted reading of the memory prototype based on the matching degree of the characteristic content by using a normalized exponential function through a network, and generating a restored image through single feedforward.

Description

Image self-adaptive degradation recovery method based on dual-network cooperation and region perception Technical Field The invention relates to the technical field of computer vision and digital image processing, in particular to an image self-adaptive degradation recovery method based on dual-network cooperation and region perception. Background The image restoration technology aims at reconstructing a high-quality image from degradation observation, is a basic research direction in the field of computer vision, and is widely applied to scenes such as security monitoring, industrial inspection, medical imaging and the like. With the development of deep learning, the data driving method based on a convolutional neural network and a visual transducer gradually replaces the traditional algorithm based on manual characteristics. The existing mainstream method generally utilizes a large-scale paired data set to train an end-to-end mapping network, and extracts image features through stacking deep convolution modules or a self-attention mechanism so as to fit a complex degradation process and remove interference factors such as noise, blurring and the like, thereby improving the visual quality of images. Most of the existing deep learning image restoration methods adopt global unified processing strategies, namely, the same loss constraint and feature extraction mode are applied to the whole image, so that the content characteristic differences of different areas of the image are often ignored. Flat areas in the image typically require a strong smoothing process to remove noise, while high frequency areas with rich texture require fine detail preservation. Due to the lack of an adaptive sensing and differential processing mechanism for different region characteristics, the existing model is easy to fall into optimization dilemma in the training process, namely if the noise removal is focused, texture regions are often excessively smooth, high-frequency details are lost, if the detail reconstruction is focused, low-frequency noise is easily remained in flat regions or artifacts are easily introduced, and an ideal balance between the denoising thoroughness and the structural fidelity is difficult to obtain. In order to improve the recovery performance, the prior art tends to construct a deep network model with huge parameter quantity and high calculation complexity, and the performance gain is traded for by increasing the network depth and width. Although the recovery precision is improved by the design, the reasoning delay and the video memory occupation of the model are obviously increased, so that the consumption of computing resources is overlarge. In practical industrial application, especially in edge equipment with limited calculation power or real-time inspection scenes, a huge model is difficult to meet the requirements of real-time performance and low-power consumption deployment. Disclosure of Invention Aiming at the defects of the prior art, the invention provides an image self-adaptive degradation recovery method based on dual-network cooperation and region perception, which solves the problems that noise removal and texture detail preservation are difficult to be considered in the existing image recovery technology, so that flat region denoising is incomplete or texture region detail is lost. The image self-adaptive degradation recovery method based on double-network cooperation and region perception comprises the following steps: Constructing a blind neighborhood network, forming a physical blind neighborhood by introducing hole convolution or feature map shift operation into a feature extraction branch, and training the network by utilizing a self-supervision strategy to acquire smooth priori information of an image flat region; constructing a local perception network, utilizing the output of the blind neighborhood network as a supervision signal of a flat area, and carrying out joint training by combining gradient constraint of an image texture area so as to acquire local characteristics retaining high-frequency details; Constructing a main body denoising network integrating regional perception memory and contrast learning, generating a regional self-adaptive supervisory signal by utilizing the output of the blind neighborhood network and the local perception network, and carrying out joint optimization training on the main body denoising network; And executing single-model end-to-end reasoning and engineering deployment, only instantiating a main body denoising network after training, inputting the image to be processed into the network for forward reasoning, and outputting a restored image. Further, when the self-supervision strategy is utilized to train the blind neighborhood network, the blind neighborhood network constructs an effective receptive field with a central physical blind spot by introducing a central mask in convolution operation or adopting a characteristic diagram shifting strat