CN-122023125-A - Image super-resolution method based on self-supervised learning
Abstract
The invention discloses an image super-resolution method based on self-supervision learning, which uses MoCo method to obtain discriminant degradation representation, utilizes InfoNCE loss to cluster degradation types in hidden space, provides reliable prior information for dynamic routing, utilizes learning ability of a gating network to combine with cross-scale similarity prediction, generates soft weight through joint modeling of degradation strength, noise level and texture complexity, and adaptively selects a cooperative mode of CS-NL and STL module. The mechanism adopts a mode of combining self-supervision degradation modeling and dynamic routing, utilizes contrast loss to learn potential manifold of various degradation, adopts differential gating to realize dynamic activation and gradient propagation of a module, and adopts bidirectional check to block artifact propagation of error matching. The invention realizes regularized routing by taking degradation representation as a priori condition, does not need to fuse fixed weights and manually adjust parameters for specific data sets, and effectively avoids redundant calculation and negative interference of modules.
Inventors
- CAO JIE
- YU JING
- TONG LEI
- XIAO CHUANGBAI
Assignees
- 北京工业大学
Dates
- Publication Date
- 20260512
- Application Date
- 20260128
Claims (6)
- 1. The image super-resolution method based on self-supervision learning is characterized by comprising the following steps of: step 1, degradation representation learning and shallow feature extraction; degradation representation learning adopts MoCo v framework, and non-supervision extraction of degradation priori by momentum encoder and dynamic queue mechanism, specifically, from input LR image Two patch blocks are cut randomly And Forming positive sample pairs, 65536 negative samples are stored in the queue Encoder for encoding a digital signal After mapping to 256-dimensional embedded vectors, infoNCE penalty optimization is used: ; Shallow feature extraction using a single 3 x 3 convolutional layer The input image is mapped to an initial feature map of channel inputs 108, The spatial dimension of the image is consistent with the LR image, namely H×W; step 2, generating dynamic routing decision and gating coefficients; Designing a lightweight gating network MLP, taking a degradation expression vector d and an intermediate feature diagram F as inputs, generating a shared gating coefficient for each DRSTB module, wherein the MLP comprises two shared hidden layers (256-128) and is connected with three independent pre-measurement heads, namely a fuzzy estimation head Noise estimation head And a complex texture header The normalized scalar is output respectively, and the texture head additionally introduces local variance calculation: ; The final CS-NL gating coefficients are: ; step 3, depth feature extraction and cross-scale local enhancement; the depth feature extraction module is formed by stacking 6 degradation perception residual Swin transducer blocks DRSTB, each DRSTB internally comprises 6 CMT submodules, each CMT follows a cascade structure of DCL-STL, and the DCL carries out depth convolution kernel through degradation modulation Sum channel weight Extracting local detail, STL performs self-attention computation within an 8×8 window, where the Value component is modulated by the degenerate representation of V=XP MLP, inlet DRSTB, according to gating coefficient The CS-NL module is selectively activated, the module firstly performs bilinear downsampling on the feature map by 2 times, calculates the cross-scale similarity through 3X 3 patch matching, and then performs weighted aggregation on the matched HR patch to obtain upsampled features Its output weight is composed of Dynamic control; Step 4, gating weighted fusion and image reconstruction; After finishing 6 DRSTB processes and cross-scale enhancement, adopting a soft routing strategy to fuse multi-branch characteristics, wherein the specific process is expressed by the following formula: ; Wherein e= Reconstruction module Mapping fusion features to SR images by 3X 3 convolution ; Step 5, multi-task loss joint optimization; the training process adopts a three-stage course strategy, wherein the first 300 rounds only optimize MoCo encoders, the 301-500 rounds jointly optimize an SR network and a gating network, the 501-1000 rounds remove gating supervision loss, only use a total loss function for fine adjustment, and the total loss is defined as the following formula: 。
- 2. the method according to claim 1, wherein 6 degradation perception residual blocks are used, each block contains 6 CNN-transform sub-modules, the window size is set to 8×8, the number of feature channels is 180, and the attention header is 6.
- 3. The method of claim 1, wherein the batch size during training is set to 16.
- 4. The method for image super-resolution based on self-supervised learning as recited in claim 1, wherein the training is performed a total of 32000 iterations.
- 5. The method for image super resolution based on self-supervised learning as set forth in claim 1, characterized in that an Adam optimizer is used, = 0.9, The initial learning rate during training was set to 0.0002 and the poly strategy was used to linearly decay from round 300 to 0.
- 6. The method of claim 1, wherein during the testing phase, the gating network dynamically calculates the weights of the CSNL and the STL according to the degradation representation, without a fixed threshold, and for extremely degraded areas, the CSNL module is forced to be turned off to ensure robustness.
Description
Image super-resolution method based on self-supervised learning Technical Field The invention relates to the field of image super-resolution, in particular to an image super-resolution method based on self-supervision learning. Background The Image Super-resolution (SR) aims to recover a high-resolution Image (HR) from a low-resolution Image (LR). The process of acquiring an image by an imaging device can be regarded as a degradation process from a high resolution image to a low resolution image, and the super resolution method solves the problem of the degradation process as an inverse process. In recent years, the breakthrough progress of artificial intelligence technology has driven the rapid development of computer vision. The intelligent analysis system based on visual information processing has been applied in large scale in the fields of automatic driving, medical imaging, remote sensing monitoring, security monitoring, digital entertainment and the like, and becomes a core technical foundation for supporting intelligent scene landing. However, obtaining high quality visual information under complex imaging conditions remains a common challenge and fundamental research topic faced in the above-mentioned fields. As a fundamental task of computer vision, the core goal of image super-resolution is to reconstruct high-resolution details from low-resolution observations, creating a precise mapping relationship from degraded space to clear space. The fine structure of the restored image is the premise of numerous high-level visual tasks, including target detection and identification, medical focus analysis, remote sensing image interpretation, face recognition, video enhancement and the like. Therefore, the super-resolution of the image is not only an effective means for breaking through the limitation of hardware imaging, but also a key pivot for improving the perception capability and decision accuracy of a visual system, and plays an important role in the top-bottom bearing between the bottom-layer vision and the high-layer understanding. Many intelligent vision applications place extremely high demands on image resolution, mainly because the lack of detail in the initial stages may be gradually amplified in subsequent processing flows, ultimately significantly reducing the reliability of the overall vision system. For example, in automated driving or medical diagnostic tasks, these tasks rely on sharp image features to derive key decision parameters, high resolution images being the core basis. When the image degradation causes detail blurring or artifact, the subsequent target positioning or focus recognition result is greatly deviated from the real situation. The sensitivity is embodied as small target missing detection in remote sensing monitoring and as face false recognition in security monitoring. The method is essentially complex factors such as physical limitation, environmental disturbance, transmission loss and the like of imaging equipment, so that high-frequency information of an original scene suffers irreversible loss in the imaging process, and the super-resolution reconstruction is adversely affected. However, to achieve high quality reconstruction in the absence of external training data, the multi-resolution and instability challenges posed by the nature of the inverse problem must be overcome. Therefore, how to improve the robustness and the adaptation of the super-resolution method is still an important issue to be solved. The research aiming at self-supervision super-resolution is developed, so that the method has important theoretical value and remarkable practical application significance. The invention provides a self-supervision image super-resolution method based on contrast learning and cross-scale non-local attention, which uses MoCo method to obtain discriminant degradation representation, and utilizes learning ability of a gating network in combination with cross-scale similarity prediction to adaptively select a cooperative mode of CS-NL and an STL module. The mechanism adopts a mode of combining unsupervised degradation modeling and dynamic routing, utilizes contrast loss to learn potential manifold of various degradation, adopts differential gating to realize dynamic activation and gradient propagation of a module, and adopts bidirectional check to block false-matching artifact propagation. The invention realizes regularized routing by taking degradation representation as a priori condition, does not need to fuse fixed weights and manually adjust parameters for specific data sets, and effectively avoids redundant calculation and negative interference of modules. Disclosure of Invention In view of this, the present invention provides an image super-resolution method based on self-supervised learning to achieve image super-resolution of low-resolution images. In order to achieve the above object, the present invention provides the following embodiments: the image super-re