CN-121981888-A - Infrared image super-resolution reconstruction method, system and medium based on cascade perception and mixed attention

CN121981888ACN 121981888 ACN121981888 ACN 121981888ACN-121981888-A

Abstract

The embodiment of the invention discloses an infrared image super-resolution reconstruction method, system and medium based on cascade perception and mixed attention. The method comprises the steps of firstly obtaining a low-resolution infrared image, carrying out feature extraction on the low-resolution infrared image to obtain shallow features and taking the shallow features as input features, carrying out feature extraction on the input features based on a multi-head self-attention mechanism and a channel attention mechanism to obtain output features and weighted features, splicing the output features and the weighted features along a channel dimension to obtain spliced features, extracting local neighborhood features of the spliced features, combining the local neighborhood features and the spliced features to obtain deep features, splicing the deep features and the input features according to the channel dimension to obtain fusion features, finally expanding the channel number of the fusion features through convolution operation, carrying out sub-pixel convolution up-sampling, and carrying out smooth convolution processing to obtain the reconstructed high-resolution infrared image. According to the method, the global information and the local details are considered through the synergistic effect of the mixed attention and the cascade perception, so that the infrared image reconstruction accuracy is effectively improved.

Inventors

ZHOU HUIXIN
YAN XUN
LI ZHIBIN
LI QIANYOU
WANG BINGJIAN
LAI RUI
QIN HANLIN
SUN RUIYANG
XIANG PEI
YANG TIANFANG
SONG JIANGLUQI
QI SHUXIA
LI HUAN
SHI JIN
TENG XIANG

Assignees

西安电子科技大学
中国石油大学(北京)克拉玛依校区

Dates

Publication Date: 20260505
Application Date: 20251230

Claims (10)

1. An infrared image super-resolution reconstruction method based on cascade perception and mixed attention, which is characterized by comprising the following steps: acquiring a low-resolution infrared image; extracting features of the low-resolution infrared image to obtain shallow features, wherein the shallow features are used as input features; Respectively carrying out feature extraction on the input features based on the multi-head self-attention mechanism and the channel attention mechanism to obtain output features and weighted features, splicing the output features and the weighted features along the channel dimension to obtain splicing features, extracting local neighborhood features of the splicing features, acquiring deep features based on the local neighborhood features and the splicing features, and splicing the deep features and the input features according to the channel dimension to obtain fusion features; And expanding the channel number of the fusion feature through convolution operation, up-sampling the fusion feature after the channel number is expanded by using sub-pixel convolution, and performing smooth convolution processing on the up-sampled fusion feature to obtain a reconstructed high-resolution infrared image.
2. The method for super-resolution reconstruction of an infrared image based on cascaded perceptions and mixed attentions of claim 1, further comprising: and constructing a composite loss function containing pixel loss, structural loss and total variation loss based on the reconstructed high-resolution infrared image and a preset high-resolution reference image, and carrying out parameter updating on the shallow characteristic extraction module, the multiple multi-branch cascade perception modules and the upsampling reconstruction module through the composite loss function.
3. The method for super-resolution reconstruction of an infrared image based on cascade perception and mixed attention according to claim 2, wherein the constructing a composite loss function comprising the pixel loss function, the structural loss function and the total variation loss function based on the reconstructed high-resolution infrared image and a preset high-resolution reference image specifically comprises: Determining a pixel loss function according to the L 1 distance between the reconstructed high-resolution infrared image and a preset high-resolution reference image; respectively extracting horizontal edge information and vertical edge information of the reconstructed high-resolution infrared image and a preset high-resolution reference image; Determining a reconstructed edge amplitude map and a true edge amplitude map according to the horizontal edge information and the vertical edge information; Determining a structure loss function according to the L 1 distance between the reconstructed edge amplitude diagram and the real edge amplitude diagram; Determining a total variation loss function according to pixel values of the reconstructed high-resolution infrared image; and constructing a composite loss function according to the pixel loss function, the structural loss function and the total variation loss function.
4. The method for super-resolution reconstruction of an infrared image based on cascade perception and mixed attention as set forth in claim 3, wherein the determining the structure loss function based on the L 1 distance between the reconstructed edge magnitude map and the true edge magnitude map specifically includes: According to Determining a structure loss function, wherein, E' is a real edge amplitude map of a preset high-resolution reference image, E is a reconstructed edge amplitude map of a reconstructed high-resolution infrared image, W is the width of the reconstructed high-resolution infrared image, and H is the height of the reconstructed high-resolution infrared image.
5. A method for super-resolution reconstruction of an infrared image based on cascaded perceptions and mixed attentions according to claim 3, characterized in that said determining a total variation loss function from pixel values of said reconstructed high resolution infrared image comprises in particular: According to Determining a total variation loss function, wherein, For total variation loss, W is the width of the reconstructed high resolution infrared image, H is the height of the reconstructed high resolution infrared image, For the pixel values of the pixel points at the ith row and the jth column in the reconstructed high-resolution infrared image, For reconstructing high-resolution infrared images Pixel values of pixel points adjacent in the vertical direction, For reconstructing high-resolution infrared images Pixel values of pixel points adjacent in the horizontal direction.
6. The method for super-resolution reconstruction of an infrared image based on cascade perception and mixed attention as claimed in claim 1, wherein the steps of respectively extracting the input features based on the multi-head self-attention mechanism and the channel attention mechanism to obtain output features and weighted features, splicing the output features and the weighted features along a channel dimension to obtain spliced features, extracting local neighborhood features of the spliced features, acquiring deep features based on the local neighborhood features and the spliced features, splicing the deep features and the input features according to the channel dimension to obtain fusion features, specifically comprise: Multiplying input features with a preset leachable weight matrix respectively to obtain a query matrix, a key matrix and a value matrix, dividing the query matrix, the key matrix and the value matrix into a plurality of subspaces along a channel dimension to obtain subspace features, calculating scaling dot product attentiveness according to the subspace features in each subspace, splicing the scaling dot product attentiveness output by all subspaces into a matrix along the channel dimension, and obtaining output features of multiple attention branches through linear transformation; Carrying out global average pooling processing on the input features to obtain channel descriptor vectors, carrying out dimension reduction on the channel number of the channel descriptor vectors through 1X 1 convolution, carrying out dimension ascending on the channel number through 1X 1 convolution after the dimension reduction is processed through a ReLU activation function, generating channel weight vectors through Sigmoid activation function processing on the channel descriptor vectors after the dimension ascending, and multiplying the channel weight vectors with the input features in the channel dimension element by element to obtain weighted features; Splicing the output features and the weighted features along the channel dimension to form spliced features, extracting local neighborhood features of the spliced features, acquiring multi-scale integrated features based on the local neighborhood features and the spliced features, and splicing the multi-scale integrated features and the input features according to the channel dimension to obtain deep features; and splicing the deep features and the input features according to the channel dimension to obtain fusion features.
7. The method for super-resolution reconstruction of an infrared image based on cascade perception and mixed attention as claimed in claim 6, wherein the step of splicing the output feature and the weighted feature along the channel dimension into a spliced feature, extracting a local neighborhood feature of the spliced feature, acquiring a multi-scale integrated feature based on the local neighborhood feature and the spliced feature, and splicing the multi-scale integrated feature and the input feature according to the channel dimension to obtain a deep feature comprises the following steps: extracting local neighborhood characteristics of the spliced characteristics through 3X 3 convolution; the local neighborhood feature is subjected to dimension reduction through 1 multiplied by 1 convolution, and then is subjected to GELU activation function processing to obtain an intermediate feature; The intermediate feature is subjected to dimension lifting through 1X 1 convolution, residual errors of the intermediate feature after dimension lifting and the spliced feature are added, and then the intermediate feature is subjected to 3X 3 convolution and GELU activation function processing again, so that a multi-scale integrated feature is output; and splicing the multi-scale integrated features and the input features according to the channel dimension to obtain deep features.
8. An infrared image super-resolution reconstruction system based on cascade perception and mixed attention, the system comprising: The shallow feature extraction module is used for extracting features of the input low-resolution infrared image to obtain shallow features, and the shallow features are used as input features; The multi-branch cascade sensing module is used for respectively extracting the input features based on a multi-head self-attention mechanism and a channel attention mechanism to obtain output features and weighted features, splicing the output features and the weighted features along the channel dimension to obtain splicing features, extracting local neighborhood features of the splicing features, acquiring deep features based on the local neighborhood features and the splicing features, and splicing the deep features and the input features according to the channel dimension to obtain fusion features; And the up-sampling reconstruction module is used for expanding the channel number of the fusion feature through convolution operation, up-sampling the fusion feature after the channel number is expanded by using sub-pixel convolution, and performing smooth convolution processing on the up-sampled fusion feature to obtain a reconstructed high-resolution infrared image.
9. The infrared image super-resolution reconstruction system based on cascaded perceptions and mixed attentions of claim 8, said system further comprising: The training module is used for constructing a composite loss function comprising a pixel loss function, a structural loss function and a total variation loss function based on the reconstructed high-resolution infrared image and a preset high-resolution reference image, and carrying out parameter updating on the shallow feature extraction module, the multiple multi-branch cascade perception modules and the up-sampling reconstruction module through the composite loss function.
10. A computer readable storage medium storing a computer program which, when executed by a processor, causes the processor to perform the steps of the method of any one of claims 1 to 7.

Description

Infrared image super-resolution reconstruction method, system and medium based on cascade perception and mixed attention Technical Field The invention relates to the technical field of computer vision and image processing, in particular to an infrared image super-resolution reconstruction method, system and medium based on cascade perception and mixed attention. Background The infrared imaging technology has important application value in the fields of night monitoring, security inspection, medical imaging, aerospace and the like, and can realize target detection and identification in low-light or no-light environments. However, due to the limited hardware resolution of the infrared sensor, together with insufficient imaging capability of the optical system and environmental interference, the acquired infrared image is generally low in resolution, the details appear blurred, and noise interference is particularly pronounced in the weak radiation region. These factors can result in an edge profile of the image that is not sharp enough to affect subsequent target recognition, detection, and measurement accuracy. The existing infrared image super-resolution method mainly comprises an interpolation method, a sparse representation method and a method based on deep learning. Interpolation methods such as bicubic and Lanczos are simple to implement, but blur and artifact are easy to introduce after amplification, and high-frequency details are difficult to recover. Sparse representation relies on dictionary learning to reconstruct local texture to some extent under small sample conditions, but is not sufficiently manifested in terms of global consistency. In recent years, the deep learning method has made remarkable progress in the field of image super resolution, and the reconstruction effect can be remarkably improved through end-to-end feature learning. However, deep learning super-resolution methods for infrared images often lack targeted optimization, especially have shortcomings in noise suppression and structural detail enhancement, resulting in insufficient edge sharpness of the reconstruction result and easy loss of texture information. Disclosure of Invention Based on the above, it is necessary to provide a method, a system and a medium for super-resolution reconstruction of an infrared image based on cascade perception and mixed attention. An infrared image super-resolution reconstruction method based on cascade perception and mixed attention, the method comprising: A low resolution infrared image is acquired. And extracting the characteristics of the low-resolution infrared image to obtain shallow characteristics, and taking the shallow characteristics as input characteristics. And respectively carrying out feature extraction on the input features based on the multi-head self-attention mechanism and the channel attention mechanism to obtain output features and weighted features, splicing the output features and the weighted features along the channel dimension to obtain splicing features, extracting local neighborhood features of the splicing features, acquiring deep features based on the local neighborhood features and the splicing features, and splicing the deep features and the input features according to the channel dimension to obtain fusion features. And expanding the channel number of the fusion feature through convolution operation, up-sampling the fusion feature after the channel number is expanded by using sub-pixel convolution, and performing smooth convolution processing on the up-sampled fusion feature to obtain a reconstructed high-resolution infrared image. Wherein the method further comprises: and constructing a composite loss function containing pixel loss, structural loss and total variation loss based on the reconstructed high-resolution infrared image and a preset high-resolution reference image, and carrying out parameter updating on the shallow characteristic extraction module, the multiple multi-branch cascade perception modules and the upsampling reconstruction module through the composite loss function. The constructing a composite loss function comprising the pixel loss function, the structure loss function and the total variation loss function based on the reconstructed high-resolution infrared image and a preset high-resolution reference image specifically comprises the following steps: And determining a pixel loss function according to the L 1 distance between the reconstructed high-resolution infrared image and the preset high-resolution reference image. And respectively extracting horizontal edge information and vertical edge information of the reconstructed high-resolution infrared image and a preset high-resolution reference image. And determining a reconstructed edge amplitude diagram and a real edge amplitude diagram according to the horizontal edge information and the vertical edge information. And determining a structure loss function according to the L 1 distance b