CN-121998829-A - Image super-resolution reconstruction method based on geometric transformation enhanced state space model

CN121998829ACN 121998829 ACN121998829 ACN 121998829ACN-121998829-A

Abstract

An image super-resolution reconstruction method based on a state space model enhanced by geometric transformation belongs to the technical field of image processing and computer vision. The method solves the problem of poor reconstruction quality caused by insufficient modeling capability due to long-range dependence, weak robustness to geometric transformation and insufficient local texture detail recovery in the existing method. The method comprises the steps of constructing a high-low resolution image pair data set, constructing a reconstruction network, training the network by utilizing a training set, and inputting a low resolution image to be processed into the trained network to obtain a high resolution image. The network comprises a shallow characteristic extraction module, a plurality of residual state space groups and a reconstruction module which are connected in series, wherein each residual state space group comprises a plurality of visual state space layers, and each visual state space layer sequentially carries out geometric transformation enhancement, double-branch parallel processing, characteristic fusion, double residual learning and inverse geometric transformation on input. The invention is used for converting the low-resolution image into the high-resolution image.

Inventors

YANG HONG
YANG XIANQIANG

Assignees

哈尔滨工业大学

Dates

Publication Date: 20260508
Application Date: 20260324

Claims (10)

1. The image super-resolution reconstruction method based on the state space model enhanced by geometric transformation is characterized by comprising the following steps of: Step one, constructing a data set containing high-resolution images and corresponding low-resolution images in different scenes as a training set; constructing an image super-resolution reconstruction network by utilizing a network mapping relation; training the image super-resolution reconstruction network by using the training set to obtain a trained image super-resolution reconstruction network; inputting the low-resolution image to be processed into a trained image super-resolution reconstruction network to obtain a corresponding high-resolution image; The image super-resolution reconstruction network comprises a shallow layer feature extraction module, a plurality of serially connected residual state space groups and a residual connection and reconstruction module; the shallow feature extraction module is used for extracting shallow features of an input low-resolution image to obtain shallow features; The shallow features sequentially pass through a series residual state space group to extract deep features; In each residual state space group, the input features are sequentially processed by a plurality of visual state space layers and a convolution operation layer; in each visual state space layer, carrying out geometric transformation enhancement, double-branch parallel processing, feature fusion, nonlinear mapping and inverse geometric transformation of a multi-layer perception mechanism on input features in sequence to obtain output features of the layer; The convolution operation layer carries out convolution feature aggregation on the features processed by the visual state space layers; and the residual connection and reconstruction module performs residual connection on the features aggregated by the convolution features and the shallow features to obtain deep reconstruction features, and then performs pixel rearrangement and upsampling on the deep reconstruction features to output a high-resolution image.
2. The method for reconstructing the image super-resolution based on the state space model enhanced by geometric transformation according to claim 1, wherein the visual state space layer comprises a geometric transformation enhancing module, a double-branch parallel processing module, a feature fusion module, a multi-layer perceptron module and an inverse geometric transformation module, The geometric transformation enhancing module is used for applying geometric transformation on the input characteristics in the space dimension and obtaining the characteristics after transformation enhancement; The geometric transformation mode adopted by the geometric transformation enhancement module is selected from one of preset geometric transformation sets, wherein the geometric transformation set comprises an identity transformation mode, a horizontal overturning mode, a vertical overturning mode, a transposition mode, a horizontal and vertical overturning mode and a 180-degree rotation geometric transformation mode; The geometric transformation enhancement modules in all visual state space layers sequentially and circularly select geometric transformation modes in the geometric transformation set according to the layer index sequence; The double-branch parallel processing module is connected with the geometric transformation enhancement module and is used for equally dividing the characteristics subjected to geometric transformation into a left branch and a right branch along the channel dimension for parallel processing; The left branch adopts a lightweight convolutional neural network to extract local characteristics, and the right branch adopts a direction selection state space model to scan and model a characteristic sequence along a preset direction; The feature fusion module is connected with the double-branch parallel processing module and is used for splicing the features of the left branch output and the features of the right branch output in the channel dimension, executing channel shuffling operation on the spliced features, and carrying out inter-branch interaction and fusion on the features of different branch outputs to obtain fusion features; the fusion feature and the input feature are subjected to residual connection to obtain a first residual connection feature, and the first residual connection feature is output to the multi-layer perceptron module; The multi-layer perceptron module is used for carrying out nonlinear mapping on fusion characteristics after residual connection in a channel domain, enhancing characteristic expression and obtaining enhanced characteristics; Carrying out residual connection on the enhanced features and the first residual connection features again to obtain second residual connection features, and outputting the second residual connection features to an inverse geometric transformation module; And the inverse geometric transformation module is used for applying inverse geometric transformation corresponding to the geometric transformation enhancement module to the second residual connection feature, and restoring the feature to an original space coordinate system to obtain the final output feature of the visual state space layer.
3. The image super-resolution reconstruction method based on the geometric transformation enhanced state space model of claim 1, wherein the shallow feature extraction module is constructed by adopting a strategy of channel expansion and shuffling, specifically, the input low-resolution image is subjected to the following steps of Doubling the channel dimension to obtain an expanded image The extended image is processed Performing in channel dimension according to preset grouping number Channel shuffling of scale factors, obtaining shuffled features Re-matching the characteristics after shuffling Applying a3 x 3 convolution operation Obtaining initial shallow features ; Wherein, the Representing the input low-resolution image(s), Representing a 3x3 convolution operation.
4. The image super-resolution reconstruction method based on a state space model enhanced by geometric transformation according to claim 1, wherein the formula for deep feature extraction of the shallow features sequentially through the series residual state space group is as follows: Wherein, the The number of sets representing the series of sets of residual state spaces, Representing the features of the deep layer, Representing a cascade of operations such that, Represent the first A set of the residual state space, 。
5. The method for reconstructing an image based on a state space model enhanced by geometric transformation according to claim 1, wherein in the third step, the formula for obtaining the corresponding high-resolution image is: Wherein, the And reconstructing a high-resolution image output by the network for the trained image super-resolution.
6. The image super-resolution reconstruction method based on the geometric transformation enhanced state space model according to claim 2, wherein the local feature extraction process of the left branch by adopting a lightweight convolutional neural network is as follows: Input features are sequentially subjected to a first normalization layer A first 3 x3 convolution layer, a second normalization layer, a first ReLU activation function The method comprises the steps of processing a second 3 multiplied by 3 convolution layer, a third normalization layer, a second ReLU activation function, a1 multiplied by 1 convolution layer and a third ReLU activation function, extracting local space texture and detail information, wherein the number of channels of all convolution layers is the same as or matched with the number of channels of a left branch, and the output result of the left branch is as follows: Wherein, the Indicating the output result of the left hand branch, Features for branch input in the dual-branch parallel processing module, Representing an activation function, Representing normalization operation, Representing a1 x 1 convolution operation.
7. The image super-resolution reconstruction method based on a state space model enhanced by geometric transformation according to claim 2, wherein the method for scan modeling the feature sequence along the predetermined direction by using the direction-selective state space model by the right branch is as follows: The direction selection state space model scans and models the feature sequence in the horizontal direction of the image in the odd-numbered visual state space layers; Scanning modeling the feature sequence in the even numbered visual state spatial layers along the vertical direction of the image; Modeling long-range dependency relationship among sequences through state space equation to obtain output of model as : Wherein, the Indicating the output result of the left hand branch, Features for branch input in the dual-branch parallel processing module, A layer normalization operation is represented and is performed, A linear layer is represented and is shown as such, Representing a3 x 3 depth separable convolution, Is a kind of activation function that is used to activate, Representing a priori with local spatial correlation Is modeled by a unidirectional state space scan of (a).
8. The method for reconstructing an image super-resolution based on a state space model enhanced by geometric transformation according to claim 7, wherein before the direction-selective state space model performs scan modeling on a feature sequence along a predetermined direction, the method further comprises a process of local spatial correlation prior calculation, and the process specifically comprises: Applying 5 x 5 depth separable convolution to the features input to the direction selective state space model to form local correlation enhancement branches, extracting local correlation information, fusing the local correlation information with output projection features in the direction selective state space model in a channel-by-channel mode, and outputting the fused local correlation information to the direction selective state space model, wherein the local correlation information is as follows: Wherein, the Representing the Sigmoid activation function, Representing features input to the directionally selective state space model, Representing the application of a5 x 5 depth separable convolution operation, Representing the local correlation information extracted by convolution.
9. The image super-resolution reconstruction method based on a state space model enhanced by geometric transformation according to claim 2, wherein the process of obtaining the fusion features by the feature fusion module is as follows: Outputting the left branch to feature And right branch output feature Splicing in the channel dimension to obtain splicing characteristics Performing a channel shuffling operation on the stitched features The number of the channel shuffled packets corresponds to the number of the double branches, so that the characteristics of different branch outputs are rearranged among the packets, and the interaction and fusion among the branches of the characteristics of different branch outputs are realized; Wherein, the Representing the fused features of the residual state space group output.
10. The image super-resolution reconstruction method based on a state space model enhanced by geometric transformation according to claim 9, wherein the enhanced features are: Wherein, the For the first residual connection characteristic, The enhanced features are represented by the features, Representing a plurality of sensor layers of a sensor, Representing a second scaling factor.

Description

Image super-resolution reconstruction method based on geometric transformation enhanced state space model Technical Field The invention belongs to the technical field of image processing and computer vision. Background The image super-resolution reconstruction technology generally refers to recovering detailed information of a high-resolution image from a low-resolution image, and has wide application in scenes such as security monitoring, remote sensing imaging, medical imaging and mobile terminal imaging. In recent years, the image super-resolution method based on deep learning has made remarkable progress in the aspects of objective indexes and visual effects, and a convolutional neural network can mine abundant local texture features by overlapping multi-layer convolution and nonlinear activation, but is limited by the receptive field of a convolution kernel, has limited modeling capability on long-distance dependency, and is difficult to fully utilize structural association between long-distance pixels. To enhance the long-range modeling capability, some studies have introduced self-attention or transducer structures, improving reconstruction quality through global correlation modeling. However, self-attention mechanisms typically require computation of a correlation matrix across the entire feature map, with computational complexity and memory overhead increasing rapidly with resolution, and being difficult to deploy in edge devices or real-time application scenarios. In order to achieve both performance and efficiency, some lightweight network structures are presented, and the model scale is compressed by means of reducing the number of channels, pruning, low-rank decomposition or the like, but the method usually sacrifices the recovery capability of complex textures and edge details, and the problems of over-smooth textures and detail loss are easy to generate. The state space model proposed in recent years provides a new thought for sequence modeling, part of work attempts to introduce a one-dimensional selective scanning state space structure into an image restoration task, and modeling of long-distance dependence is realized with lower computational complexity. The existing image super-resolution method based on the state space model has the common limitations that firstly, most methods scan a characteristic sequence only along a fixed direction, are difficult to simultaneously consider structural consistency in a horizontal direction, a vertical direction and different geometric transformations, and are insufficient in robustness to geometric changes such as rotation and overturn, secondly, the collaborative modeling capability between a state space trunk and a convolution branch is limited, effective fusion with a local space correlation priori is lacking, texture detail reconstruction is easy to be insufficient, thirdly, the design of partial lightweight is insufficient in cross-branch information interaction while the parameter quantity and the operation quantity are reduced, and the representation capability of an integral network is limited. Therefore, on the premise of keeping controllable model parameter and computation complexity, how to simultaneously utilize geometric transformation enhancement, direction selection state space modeling and local texture extraction capability of convolution branches to realize high-quality and high-efficiency image super-resolution reconstruction is still a technical problem to be solved in the field. Disclosure of Invention The invention aims to solve the problem that the image reconstruction quality is poor due to the insufficient modeling capability, weak robustness to geometric transformation and insufficient local texture detail recovery of the traditional super-resolution method. The image super-resolution reconstruction method based on the geometric transformation enhanced state space model is provided. The invention relates to an image super-resolution reconstruction method based on a geometric transformation enhanced state space model, which comprises the following steps: Step one, constructing a data set containing high-resolution images and corresponding low-resolution images in different scenes as a training set; constructing an image super-resolution reconstruction network by utilizing a network mapping relation; training the image super-resolution reconstruction network by using the training set to obtain a trained image super-resolution reconstruction network; inputting the low-resolution image to be processed into a trained image super-resolution reconstruction network to obtain a corresponding high-resolution image; the image super-resolution reconstruction network comprises a shallow layer feature extraction module, a plurality of serially connected residual state space groups, a residual connection and reconstruction module; the shallow feature extraction module is used for extracting shallow features of an input low-resolution image