CN-119399045-B - Global and local multiscale fused infrared guide low-light image enhancement method

CN119399045BCN 119399045 BCN119399045 BCN 119399045BCN-119399045-B

Abstract

The invention discloses an infrared guiding low-light image enhancement method with global and local multiscale fusion, which belongs to the technical field of computer vision and specifically comprises the following steps of constructing an image enhancement network; the method comprises the steps of constructing a data set, training an image enhancement network, acquiring a low-light image and a near-infrared image in the same scene, and inputting the low-light image and the near-infrared image into the trained image enhancement network to obtain an enhanced image. The infrared guiding low-light image enhancement method based on global and local multi-scale fusion solves the problems that in the prior art, calculation is complex, effect is single, illumination enhancement is not considered, and denoising is difficult to realize.

Inventors

ZHAO MINGHUA
BAI XUEFEI
DU SHUANGLI
SHI CHENG
HU JING
WANG LIN
LV ZHIYONG

Assignees

西安理工大学

Dates

Publication Date: 20260508
Application Date: 20241021

Claims (4)

1. The infrared guiding low-light image enhancement method based on global multi-scale fusion is characterized by comprising the following steps of: s1, constructing an image enhancement network; The image enhancement network comprises two 3X 3 convolution layers, wherein the two 3X 3 convolution layers are respectively connected with encoders, the encoders comprise 4 modules which are sequentially connected, the 4 modules which are sequentially connected respectively comprise 2,4 and 8 multi-stage residual attention feature extraction modules, global texture attention fusion modules and local texture attention fusion modules are respectively connected between corresponding modules of the two encoders, one encoder is also connected with an intermediate layer, the intermediate layer comprises 6 multi-stage residual attention feature extraction modules, the intermediate layer is connected with a decoder, the decoder comprises 4 modules which are sequentially connected, the 4 modules which are sequentially connected respectively comprise 2, 2 and 2 multi-stage residual attention feature extraction modules, the global texture attention fusion module between the first corresponding module in the two decoders is connected to the fourth module of the decoder, the global texture attention fusion module between the second corresponding module in the two decoders is connected to the third module of the decoder, the global texture attention fusion module between the third corresponding module in the two decoders is connected to the second decoder, and the global texture attention fusion module between the third corresponding module in the two decoders is connected to the third decoder, and the global texture attention fusion module between the third corresponding module in the two decoders is connected to the fourth module in the decoder; The multi-level residual attention feature extraction module comprises 2 modules which are sequentially connected, wherein the first module comprises a convolution module, a gating module, a residual channel attention module RCAB and a 1X 1 convolution layer which are sequentially connected and formed by a normalization layer, a 1X 1 convolution layer and a 3X 3 deconvolution layer, the second module comprises a normalization layer, a 1X 1 convolution layer, a gating module and a 1X 1 convolution layer which are sequentially connected, and 2 modules are connected in a jumping manner, wherein the residual channel attention module RCAB comprises 1X 1 convolution layer, 1 Prelu activation functions, 1X 1 convolution layer and 1 channel attention CA which are sequentially connected; the global texture attention fusion module comprises two depth separable convolution layers, a local feature attention module, a global feature attention module and 1 Sigmoid activation function, wherein the depth separable convolution layers comprise a grouping convolution layer with the size of 3 multiplied by 3 and the number of groups of channels C and 1 multiplied by 1 convolution layer which are sequentially connected, the local feature attention module comprises 2 convolution blocks, 1 ReLU activation function is connected between the 2 convolution blocks, the 2 convolution blocks are respectively formed by connecting 1 multiplied by 1 convolution layer and BN layer, and the global feature attention module is formed by connecting 21 multiplied by 1 convolution layers and 3D separable convolution layers with the number of 3 multiplied by 3; s2, constructing a data set and training an image enhancement network; S3, acquiring a low-light image and a near-infrared image in the same scene, and inputting the low-light image and the near-infrared image into a trained image enhancement network to obtain an enhanced image; S3.1, acquiring a low-light image and a near-infrared image in the same scene, inputting the low-light image into one path of 3X 3 convolution layer connected with an intermediate layer of an encoder to perform preliminary feature extraction to obtain low-light shallow layer features; s3.2, respectively inputting the low-illumination shallow layer characteristics and the near-infrared shallow layer characteristics into corresponding encoders, sequentially passing through 4 modules in the corresponding encoders, and respectively obtaining low-illumination deep layer structural characteristics and near-infrared deep layer structural characteristics under 4 different scales through 3 times of downsampling; S3.3, inputting the low-illumination deep structural features obtained through the last downsampling into the middle layer after downsampling again, and retaining the original visible light information; s3.4, respectively inputting the corresponding low-light deep structure features and near-infrared deep structure features under 4 different scales into a corresponding global texture attention fusion module and a corresponding local texture attention fusion module to be fused, so as to obtain bimodal fusion features under 4 different scales; The corresponding low-light deep structure features and near-infrared deep structure features under different scales are respectively input into a global texture attention fusion module and a local texture attention fusion module to be fused to obtain bimodal fusion features, and the specific process is as follows: S3.4.1 inputting the low-illumination deep structure feature and the near-infrared deep structure feature F vis 、F nir under the current scale into a 1X 1 deep separable convolution layer to reduce the modal difference of two modal images, and adding the output features to obtain texture fusion features X a , wherein the formula is as follows: in the formula, Representing the operation of the depth separable convolution layer; a convolution operation of grouping convolution with the convolution kernel size of 3×3 and grouping the convolution operation into the number of channels C; S3.4.2 texture fusion feature X a is input to a local feature attention module, feature Xa passes through 1 convolution block ConvBlock and 1 ReLU activation function, then passes through 1 convolution block, and local attention texture fusion feature X L is output, which is shown as follows: in the formula, The representation is subjected to a convolution block ConvBlock operation; representing an activation function operation; representing BN layer operations; S3.4.3 the texture fusion feature X a is input into a global feature attention module, the feature Xa generates Q, K, V three tensors through 3X 3 depth convolution dconv, and the global attention texture fusion feature X g is obtained through 1X 1 convolution layer after passing through Q, K and V global self-attention mechanisms; In the global self-attention mechanism, Q H×W×C is resized to Q HW×C , K H×W×C is resized to K C×HW , self-attention force diagram a C×C is calculated through the interaction between tensors Q, K, self-attention force diagram a C×C interacts with tensor V HW×C to obtain the input feature out HW×C , and tensor is resized to out H×W×C , specifically as follows: in the formula, Representing a self-attention operation; Representing a3 x 3D separable convolution layer operation; is a learnable scaling parameter for controlling the magnitude of the multiplication of the matrices K and Q; S3.4.4 adding the global and local texture attention branch characteristics to obtain a fusion characteristic X lg , generating global and local texture attention weights W through a Sigmoid activation function, and carrying out weight distribution on texture attention characteristics F vis and F nir of the low-light and near-infrared two-mode images to obtain a characteristic based on global and local texture attention fusion The formula is as follows: in the formula, The learning method is a learnable parameter and is applied to global and local texture fusion attention characteristic weight distribution of two different modes; s3.5, respectively inputting the bimodal fusion characteristics under 4 different scales into corresponding modules in a decoder, adding the bimodal fusion characteristics with original visible light information reserved in an intermediate layer, and obtaining enhanced characteristics through up-sampling; And S3.6, inputting the enhancement features into a3 multiplied by 3 convolution layer to obtain an enhancement feature map, and adding the enhancement feature map with the original low-light image by adopting residual connection to obtain an enhancement image.
2. The method for enhancing an infrared-guided low-light image by global and local multiscale fusion according to claim 1, wherein the data set is constructed in S2, specifically: Selecting an image pair of a normal illumination image and a daytime near-infrared image in a scene of a daytime daytome in FMSVD data sets, setting the image pair to be of a uniform size, generating a pseudo-night near-infrared image from the daytime near-infrared image by a reconstruction method to serve as an input near-infrared image, denoising the normal image by Gaussian noise, reducing the pixel brightness of the image to generate a pseudo-night low-light image to serve as an input low-illumination image, dividing the image pair into a training set, a verification set and a test set according to the proportion of 8:1:1, and simultaneously using the image in the scene of the THIRDPARTY data sets and a real night scene image without a reference image in the FMSVD data sets for testing.
3. The method for enhancing an infrared guided low-light image by global and local multiscale fusion according to claim 1, wherein the loss functions comprise a reconstruction loss function L rec , a multiscale structure similarity loss function L ssim and a color loss function L color when the image enhancement network is trained; The reconstruction loss function L rec has the following formula: Wherein N is the number of samples, In order to enhance the image is, For a normally illuminated image, Representing absolute value calculation; The multi-scale structure similarity loss function L ssim has the following formula: Wherein M represents a size parameter, wj represents weight under a j-th level scale, and l j 、c j 、s j represents brightness similarity, contrast similarity and structural similarity of the calculated enhanced image and the normal illumination image under the j-th level scale; The color loss function L color has the following formula: in the formula, The cosine similarity of the enhanced image and the normal illumination image in R, G, B channels is calculated, and the calculation formula is as follows: Where sqrt () represents square root, a and b represent elements to be subjected to cosine similarity calculation, K represents the number of a and b elements, and K represents the kth element.
4. The method for enhancing an infrared guided low-light image by global and local multiscale fusion according to claim 1, wherein the low-light shallow feature and the near-infrared shallow feature sequentially pass through 4 modules in a corresponding encoder, the 4 modules sequentially connected respectively comprise 2, 4 and 8 multi-stage residual attention feature extraction modules, and the processing procedure of the multi-stage residual attention feature extraction modules for shallow features is as follows: S3.2.1 the input shallow feature x passes through a first normalization layer, a first 1×1 convolution layer and a 3×3 deconvolution layer to obtain a local feature x 1 , and the formula is as follows: in the formula, A1 x 1 convolution operation is represented, Representing a3 x 3 deconvolution operation, Representing a linear normalization operation; S3.2.2 decomposing the HxW xC characteristic into two HxW xC/2 characteristics through a first gating module, multiplying the two HxW xC/2 characteristics pixel by pixel to obtain a HxW xC/2 size characteristic, inputting a residual channel attention module RCAB to focus on the concerned characteristic, adding the second 1 x 1 convolution layer to the original input shallow layer characteristic to obtain a structural characteristic y, wherein the formula is as follows: in the formula, Representing the operation of a gating module, dividing the characteristic with the size of H multiplied by W multiplied by C into two characteristics X and Y with the size of H multiplied by W multiplied by C/2, and multiplying the two characteristics by pixel points according to the following formula: in the formula, Representing pixel-by-pixel dot multiplication; the residual channel attention module is represented to operate as follows: Wherein x 2 represents the feature obtained through the first gating module, and x 3 represents the feature obtained through the RCAB module; s3.2.3 the structural feature y is added with the structural feature y after passing through a second normalization layer, a third 1 multiplied by 1 convolution layer, a second gating module and a fourth 1 multiplied by 1 convolution layer, and a deep structural feature z is obtained, wherein the formula is as follows: 。

Description

Global and local multiscale fused infrared guide low-light image enhancement method Technical Field The invention belongs to the technical field of computer vision, and particularly relates to an infrared guiding low-light image enhancement method for global and local multiscale fusion. Background The low light scene is mainly a scene under the condition of insufficient external light sources, such as a night scene, a backlight scene, an extremely low light scene without a light source and the like. The main reasons for generating the low-light image include insufficient brightness of the environment where the target object is located during shooting, and the problems of low brightness, high noise and lack of detail information of the acquired image caused by the multi-party factors such as the position of the light source, the color change, the exposure ratio setting of the acquisition equipment and the like. The purpose of image enhancement is to improve the visual quality of low-light images and to improve the perception of the viewer, thereby better analyzing the image content. Has important research significance and application value in the fields of security monitoring, military application or medical imaging, etc. The low-light image enhancement in a real scene needs to realize the requirements of image brightness improvement, denoising, image texture detail restoration and the like. The existing low-light image enhancement method is mostly based on single Zhang Diguang images for enhancement, hidden noise can be amplified while brightness is improved, and the problems of critical information deletion caused by the fact that textures and noise are removed together in the denoising process occur. Under a low light field scene at real night, the near infrared light has strong anti-interference capability, the acquired night near infrared image has the characteristics of high contrast and low noise, and the method based on the fusion and enhancement of the infrared and visible light images is gradually proposed in the near two years, such as a DVN method and the like. The infrared image guided low-light image enhancement method fully utilizes the characteristics of high contrast and low noise of a near infrared image to solve the image denoising problem, but the existing method has the following problems that the fusion is realized under normal illumination, the illumination change is not obvious, the calculation is complex, the effect is single, the texture structure is lost during feature fusion, the weight proportion of infrared light and visible light is unreasonably distributed, the enhanced image is grey, and the like. Disclosure of Invention The invention aims to provide an infrared guided low-light image enhancement method with global and local multiscale fusion, which solves the problems of complex calculation, single effect, no consideration of illumination enhancement and difficult realization of denoising existing in the existing image enhancement technology. The technical scheme adopted by the invention is that the method for enhancing the infrared guided low-light image by fusing global and local multiscale comprises the following steps: s1, constructing an image enhancement network; s2, constructing a data set and training an image enhancement network; And S3, acquiring a low-light image and a near-infrared image in the same scene, and inputting the low-light image and the near-infrared image into a trained image enhancement network to obtain an enhanced image. The invention is also characterized in that: The image enhancement network comprises two 3X 3 convolution layers, wherein the two 3X 3 convolution layers are respectively connected with encoders, the encoders comprise 4 modules which are sequentially connected, the 4 modules which are sequentially connected comprise 2, 4 and 8 multi-stage residual attention feature extraction modules, global texture attention fusion modules and local texture attention fusion modules are respectively connected between corresponding modules of the two encoders, one encoder is further connected with an intermediate layer, the intermediate layer comprises 6 multi-stage residual attention feature extraction modules, the intermediate layer is connected with a decoder, the decoder comprises 4 modules which are sequentially connected, the 4 modules which are sequentially connected comprise 2, 2 and 2 multi-stage residual attention feature extraction modules respectively, the global texture attention fusion module and the local texture attention fusion module between the first corresponding module in the two decoders are connected to the fourth module of the decoder, the global texture attention fusion module and the local texture attention fusion module between the second corresponding module in the two decoders are connected to the third module of the decoder, the global texture attention fusion module and the local texture attention fusion module between