CN-122023129-A - Light field image super-resolution reconstruction method combining self-attention and state space

CN122023129ACN 122023129 ACN122023129 ACN 122023129ACN-122023129-A

Abstract

The invention discloses a light field image super-resolution reconstruction method of mixed self-attention and state space, which relates to the technical field of image super-resolution and comprises the steps of extracting shallow layer characteristics by adopting convolution operation; the method comprises the steps of constructing a first basic feature extraction unit by combining a multi-branch structure and a state space model, constructing a second basic feature extraction unit by combining a multi-head sub-attention mechanism and a channel multi-layer perceptron, extracting spatial features and angle features based on the first basic features, extracting polar plane features based on the second basic features, calculating deep features by combining the spatial features, the angle features and the polar plane features, and calculating a high-resolution light field image based on shallow features, the deep features and an input low-resolution light field image. According to the invention, self-attention and state space are introduced to model the coupling space, angle and parallax relation of complex light field data, the characterization capability of global space-angle-parallax information in the super-resolution process of the light field image is obviously enhanced, and the reconstruction quality of high-frequency details is effectively improved.

Inventors

ZENG HUANQIANG
ZHENG HUIJIE
Dong Houhui
ZHU JIANQING
SHI YIFAN
GONG XINRONG
CHEN JING
CAI LEI
XIANG WENJIE
LIN QI

Assignees

华侨大学
厦门理工学院

Dates

Publication Date: 20260512
Application Date: 20260414

Claims (10)

1. The light field image super-resolution reconstruction method of the mixed self-attention and state space is characterized by comprising the following steps of: A shallow feature extraction module is constructed by adopting convolution operation, and shallow features are extracted from an input low-resolution light field image; constructing a first basic feature extraction unit by combining a multi-branch structure and a state space model, and constructing a second basic feature extraction unit by combining a multi-head sub-attention mechanism and a channel multi-layer perceptron; constructing a spatial feature Mamba extraction sub-module and an angular feature Mamba extraction sub-module based on the first basic feature extraction unit, which are respectively used for extracting spatial features and angular features from the shallow features; constructing a deep feature extraction module based on a spatial feature Mamba extraction sub-module, an angle feature Mamba extraction sub-module and a polar plane feature Transformer extraction sub-module, wherein the deep feature extraction module extracts deep features based on the shallow features; An image reconstruction module is constructed, and a high-resolution light field image is obtained based on the shallow layer features, the deep layer features and the input low-resolution light field image; And combining the shallow layer feature extraction module, the deep layer feature extraction module and the image reconstruction module to form a super-resolution network for realizing super-resolution reconstruction of the light field image.
2. The method for super-resolution reconstruction of a light field image mixed with self-attention and state space according to claim 1, wherein the operation of said first basic feature extraction unit is represented as Comprising the following operations: ; ; Wherein, the 、 And Respectively representing an input feature, an intermediate feature and an output feature of the first basic feature extraction unit; A function representing the layer normalization is performed, Representation of A convolution layer; representing depth-convolved feedforward layers, including sequentially connected Convolution and depth separable convolution product Is a convolution of (1); 、 Representing a residual connection that can be learned, Representing the space angle feature perception state space mechanism.
3. The method for super-resolution reconstruction of a light field image with mixed self-attention and state space according to claim 2, wherein said spatial-angular feature aware state space mechanism The expression of (2) is as follows: ; Wherein, the A linear mapping layer is represented and is shown, Representing a depth-separable convolution, The activation function is represented as a function of the activation, Representing a multi-branch state space mechanism, Representing a hadamard dot product; Multi-branch state space mechanism The calculation process of (1) comprises the following steps: Channel division, expressed as: ; Local feature extraction branches, expressed as: ; global feature extraction branches, expressed as: ; ; ; ; feature fusion, expressed as: ; Wherein, the Representing the input of the multi-branch state space mechanism, A channel-dividing operation is represented and, Representing the intermediate features after channel division; local features extracted by a multi-branch state space mechanism; representing a global average pooling operation, Is an output characteristic representing the global average pooling operation; An up-sampling operation is indicated and, A state space model is represented and is used to represent a state space model, Is to represent the output characteristics after up-sampling operation and state space model processing; Representing the differentially extracted features; global features extracted by a multi-branch state space mechanism; representing the feature by channel fusion operation, Representing the output features after fusing the global features and the local features.
4. A method of super-resolution reconstruction of a light field image in a hybrid self-attention and state space as claimed in claim 3, wherein the operation of the spatial feature Mamba extraction sub-module is represented as The calculation process is as follows: ; ; Wherein, the 、 And Representing the spatial features Mamba to extract the input features, intermediate features and output features of the sub-module respectively, Represents a matrix dimension transformation operation that converts an input feature into a sub-aperture image feature, Representation of A convolution layer.
5. The method of claim 4, wherein the operations of the angular feature Mamba extraction submodule are expressed as The calculation process is as follows: ; ; Wherein, the 、 And Respectively representing the input features, the intermediate features and the output features of the angle feature Mamba extraction submodule; represents a matrix dimension conversion operation that converts sub-aperture image features into macro-pixel image features, A matrix dimension conversion operation for converting the macro-pixel image features into sub-aperture image features; Representation of A convolution layer.
6. The method for super-resolution reconstruction of a light field image mixed with self-attention and state space as recited in claim 5, wherein an operation of said second basic feature extraction unit is represented as Comprising the following operations: ; ; Wherein, the 、 And Respectively representing an input feature, an intermediate feature and an output feature of the second basic feature extraction unit; Representation of The layer of the convolution is formed from a layer of, The activation function is represented as a function of the activation, Representing a parallax aware self-attention mechanism.
7. The method for super-resolution reconstruction of a light field image in a mixed self-attention and state space of claim 6, wherein said parallax aware self-attention mechanism The calculation process is as follows: ; ; Wherein, the 、 And Respectively representing input features, intermediate features and output features of the parallax aware self-attention mechanism; representing a multi-headed self-attention mechanism, A function representing the layer normalization is performed, Representing a channel multi-layer perceptron.
8. The method of mixed self-attention and state space light field image super-resolution reconstruction of claim 7, wherein the polar plane feature transform extraction submodule is represented as The calculation process is as follows: ; ; Wherein, the 、 And Respectively representing input features, intermediate features and output features of the polar plane feature transducer extraction submodule; A matrix dimension transformation operation representing transforming sub-aperture image features into horizontal epipolar plane image features, A matrix dimension conversion operation for converting the horizontal polar line planar image characteristics into sub-aperture image characteristics is represented; representing a matrix dimension conversion operation that converts sub-aperture image features to vertical epipolar plane image features, A matrix dimension transformation operation that converts the vertical epipolar planar image features into sub-aperture image features is represented.
9. The method for super-resolution reconstruction of a light field image in a mixed self-attention and state space according to claim 8, wherein the deep feature extraction module is constructed based on a spatial feature Mamba extraction sub-module, an angular feature Mamba extraction sub-module and a polar plane feature transducer extraction sub-module, comprising the steps of: Cascading the spatial feature Mamba extraction submodule, the angular feature Mamba extraction submodule and the polar plane feature transducer extraction submodule, and connecting residual errors to obtain a light field feature Tra-Mba extraction submodule, wherein the light field feature Tra-Mba extraction submodule is expressed as The calculation process is as follows: ; Wherein, the And Respectively representing input features and output features of the sub-module extracted by the light field features Tra-Mba; The light field deep layer characteristic extraction module is constructed by cascading and residual connecting a plurality of light field characteristic Tra-Mba extraction submodules and is expressed as The calculation process is as follows: ; ; Wherein, the The input is represented by a representation of the input, Representing the function of the light field deep feature extraction module, Representing deep features; The features of the shallow layer are represented by, Representing the operation of the extraction sub-module through six light field features Tra-Mba.
10. The method of claim 1, wherein the image reconstruction module obtains a high-resolution light field image based on the shallow features, the deep features, and the input low-resolution light field image, comprising the steps of: converting the shallow features and the deep features into light field fusion features The calculation process is as follows: ; ; Wherein, the And Respectively representing a shallow feature and a deep feature, Representing channel dimension splicing operation; Representing a fusion operation; Fusion of features to light fields Upsampling to obtain upsampled features The expression is as follows: ; Wherein, the Representing scale factors as Or (b) Is a sub-pixel convolution layer of (a); Representation of A convolution layer; combining upsampling And high resolution images Reconstructing a high resolution image, expressed as: ; Wherein the high resolution image Is generated by direct bicubic upsampling of the low resolution light field image to be reconstructed.

Description

Light field image super-resolution reconstruction method combining self-attention and state space Technical Field The invention relates to the technical field of image super-resolution, in particular to a light field image super-resolution reconstruction method mixed with self-attention and a state space. Background Due to the inherent space-angle tradeoff characteristic of light field imaging, when a camera acquires multi-view information, the camera has to reduce the spatial resolution of each Sub-Aperture Image (SAI), which greatly restricts the further expansion of the light field Image in high-precision vision application. Therefore, the Super-Resolution (LFSR) technology of the light field image becomes a key solution, and the key purpose of the LFSR technology is to improve the spatial Resolution of the light field image with low Resolution, fully mine the complementary information of the space angle in the light field data, and provide high-quality data support for subsequent visual analysis and advanced processing tasks. Deep mining of information in complex four-dimensional light fields (LIGHT FIELD, LF) is critical to the development of the LIGHT FIELD Super-Resolution (LFSR) technology for driving light field images. Compared with the common image super-resolution technology, the light field image super-resolution technology has the difficulty of fully utilizing the inherent space and angle information in the light field image and improving the reconstruction precision. In recent years, in the LFSR field, both convolutional neural network (Convolutional Neural Network, CNN) based techniques and transform based techniques have achieved a certain result, and these techniques generally decompose a four-dimensional light field into two-dimensional subspaces for processing, which limits feature learning to specific aspects such as parallax cues or spatial angle correlation, and significantly improves the spatial resolution of the light field image compared with the conventional method. However, the LFSR method based on CNN can only process local null angle features of the light field image, and the LFSR method based on the transducer can explore non-local null angle features of the light field image, but the intrinsic structural characteristics of the light field image cannot be explored efficiently and comprehensively by the transducer alone. The state space model is used as an emerging sequence modeling architecture, can realize a real global receptive field under the linear computation complexity, and simultaneously has the effects of effectively reserving local information and achieving characteristic expression efficiency, so that the dilemma of explosion of the global dependence weakness of CNN and the computation complexity of a transducer and local information splitting is perfectly overcome. Therefore, a hybrid architecture of a transducer and a state space model is tried to efficiently capture the long-range dependency relationships inherent in the light field image 4D structure, and simultaneously realize the deep interaction of the spatial information and the angular information. The Chinese patent application with the publication number of CN119809940A discloses a super-resolution reconstruction method of a light field image based on a state model, by utilizing the dynamic characteristics of a state space model, the global empty angle information characterization capability and the detail reconstruction capability of the super-resolution method of the light field image are obviously improved, but the method has the following defects that firstly, the method only relies on a single state space model to uniformly model the space, the angle and the polar plane characteristics of the light field image, the special modeling capability aiming at the polar plane parallax characteristics is lacked, the state space model is used as a sequence modeling framework, the global parallax correlation in the polar plane image is difficult to fully capture, secondly, a deep layer characteristic extraction unit of the method directly sends input characteristics into a selective state space model to process, the characteristics are not subjected to multi-branch refined decomposition, the extraction granularity of the global characteristics and the local characteristics is not fine enough, and the effective difference information among the characteristics cannot be fully excavated. Disclosure of Invention The object of the present invention is to solve the problems of the prior art. The technical scheme adopted by the invention for solving the technical problems is that a light field image super-resolution reconstruction method mixed with self-attention and a state space is provided, and comprises the following steps: A shallow feature extraction module is constructed by adopting convolution operation, and shallow features are extracted from an input low-resolution light field image; constructing a first bas