CN-115861754-B - Fusion method of infrared and visible light images under low-illumination condition

CN115861754BCN 115861754 BCN115861754 BCN 115861754BCN-115861754-B

Abstract

The invention discloses a fusion method of infrared and visible light images under a low illumination condition, which comprises the steps of respectively carrying out convolution processing on infrared and visible light heterologous images under the low illumination condition, obtaining multi-level heterologous double-path feature images by utilizing cascade connection of multi-layer convolution and a maximum pooling layer, carrying out MHSA processing on the feature images of the deepest level, respectively carrying out MHCA processing on the rest shallow feature images, carrying out standardization operation, inputting a new MHSA to obtain an attention matrix, carrying out pixel-by-pixel summation, fusing the two feature images in a splicing mode, and completing fusion of the feature images of different levels by utilizing a plurality of decoding convolution blocks and up-sampling operations. The invention can capture the remote dependent global context characteristics of infrared and visible light images through a double-attention mechanism, effectively fuses heterogeneous image information and provides powerful support for target detection, scene reconstruction and the like of low-illumination scenes.

Inventors

ZHENG TONG
LI SHUNAN
YU ZHONGZHONG
FENG WENBIN

Assignees

北京工商大学
中煤科工集团沈阳研究院有限公司

Dates

Publication Date: 20260512
Application Date: 20221208

Claims (5)

1. A fusion method of infrared and visible light images under a low-illumination condition comprises the following steps: 1) Respectively carrying out convolution treatment on infrared and visible light heterogeneous images under the low-illumination condition, and then obtaining a multi-level heterogeneous double-path characteristic diagram by utilizing cascade connection of a plurality of layers of convolution and a maximum pooling layer; 2) Carrying out multi-head self-attention MHSA processing on the heterogeneous double-way feature map of the deepest hierarchy simultaneously for extracting remote feature information in the feature map; 3) Simultaneously carrying out standardization operation on the heterogeneous double-path feature graphs of different levels processed in the step 2), carrying out multi-head self-attention MHSA processing to obtain an attention moment array, carrying out pixel-by-pixel summation, and fusing the double-path feature graphs in a splicing mode, wherein the method specifically comprises the following steps: 3-1) carrying out 1X 1 convolution on the current input feature map to realize the integration of multi-channel features, and prescribing the value range of each element in the feature map through standardization processing; 3-2) sequentially performing multi-head self-attention MHSA processing and 1×1 convolution and standardization processing, wherein the integrated attention matrix is used as output for extracting long-range dependency and space dependency in the two-path feature map; 3-3) carrying out corresponding pixel summation on the input feature map and the processed feature map, and fusing the two-way feature map in a splicing mode, wherein the fusion respectively reserves the dependence and the space features contained in the two-way input feature map; 4) And 3) aiming at the fused double-path feature map obtained in the step 3), utilizing a plurality of decoding convolution blocks and up-sampling operation to finish the fusion of feature maps of different levels, and finally obtaining a low-illumination infrared and visible light fusion image.
2. The method for fusing infrared and visible light images under low illumination as claimed in claim 1, wherein in step 1), one layer of 1 x 1 convolution and two layers of 3 x 3 convolution are respectively performed on the infrared and visible light heterogeneous images, then one layer of maximum pooling and two layers of 3 x 3 convolution are used as a unit, a plurality of groups of units are cascaded, and the feature images extracted by each group of units form a multi-level heterogeneous dual-path feature image.
3. The method for fusing infrared and visible light images under low illumination as set forth in claim 1, wherein in step 2), multi-head self-attention MHSA processing is performed simultaneously on the deepest-level heterogeneous dual-path feature map, specifically comprising: a) Position coding is carried out on the input feature images, tensors are expressed into a two-dimensional matrix form, and absolute information and relative information among objects in the feature images are captured; b) Operating the current feature matrix and the three embedded matrixes to respectively obtain a query matrix Q, a key value matrix K and a value matrix V; c) Calculating the similarity of all elements in K and Q, inputting the similarity into a softmax function, and obtaining a result through weighted average processing of a value matrix V to obtain an attention matrix which considers interaction information between all K and Q; d) And recovering the attention moment array into a tensor form to obtain multi-head self-attention MHSA output.
4. The method for fusing infrared and visible light images under low illumination as set forth in claim 1, wherein in step 2), multi-head cross attention mhc a processing is performed simultaneously on the remaining heterogeneous two-way shallow feature map, specifically comprising the steps of: e) The key value matrix K and the query matrix Q of the layer are obtained by the calculation of the characteristic diagram Y of the upper layer, the value matrix V is calculated by the characteristic diagram S input by the layer, the calculation of the attention matrix A is completed, F) Tensor Z is obtained through operations of 1X 1 convolution, batch standardization processing, sigmoid activation and up-sampling, and is consistent with the size of a feature map S input by the layer; g) And performing dot product processing on Z and S, performing convolution and up-sampling processing on the result obtained after position coding on the Z and S and the upper layer of feature map Y, and splicing the result with the dot product result to obtain multi-head cross attention MHCA output.
5. The method of fusion of infrared and visible light images under low illumination according to claim 1, wherein step 4) specifically comprises: k) The 2-level linked convolution layer operation is respectively carried out on the feature graphs of different levels, so that the integration of the multi-channel feature information is realized; l) gradually fusing deep features to low-level features through up-sampling operation to realize fusion of multi-level features; m) outputting the fused image, wherein the size of the fused image is consistent with the infrared and visible light images input by the whole network.

Description

Fusion method of infrared and visible light images under low-illumination condition Technical Field The invention relates to infrared and visible light fusion, in particular to a fusion method under a low-illumination condition based on a dual-attention mechanism, and belongs to the field of cross-mode image fusion. Background In the age of information explosion, sensor technology has been rapidly developed. The variety of sensors is increasing, and the information captured by the sensors is also gradually diversified. Due to the different imaging mechanisms, the information contained in the images of different modalities of the same scene is also different. Wherein the infrared sensor images according to the thermal radiation characteristics of the object, is insensitive to brightness variation, and lacks scene details. The visible light camera describes object features based on light reflection, has high resolution, can capture abundant texture detail information, but is obvious in information deletion in the face of weak light conditions. In order to solve the problem that the imaging system of a single sensor has incomplete representation in time, space and spectrum information, an image fusion technology is generated. The technology can fully utilize the complementarity of different sensor imaging systems to generate a fusion image which is more comprehensive in scene representation. Therefore, the technology is widely applied to the fields of military, remote sensing, medicine, security monitoring and the like. The existing infrared and visible light fusion algorithm can be divided into two types, namely a traditional fusion algorithm and a fusion method based on deep learning. Among the conventional infrared and visible light image fusion methods, the multi-scale transformation method is called a classical method among them because of its attention to multi-scale features of the cross-modal image. However, the method is excessively dependent on manually extracted features, and it is difficult to find general features to adapt to different fusion strategies. Accordingly, in recent years, a deep learning-based infrared and visible light image fusion method has been attracting attention. Aiming at low-illumination environments like underground coal mines, even if the characteristics of infrared and visible light images are automatically extracted by a deep learning method and then are fused, the difficulty is obviously increased. Disclosure of Invention The invention aims to realize a low-illumination environment infrared and visible light image fusion method, and utilizes a dual-attention mechanism to realize global interaction and fusion of infrared and visible light image characteristics, reduce loss of local details and realize low-contrast space restoration. Specifically, the invention provides a fusion method of infrared and visible light images under a low-illumination condition, which comprises the following steps: 1) Respectively carrying out convolution treatment on infrared and visible light heterogeneous images under the low-illumination condition, and then obtaining a multi-level heterogeneous double-path characteristic diagram by utilizing cascade connection of a plurality of layers of convolution and a maximum pooling layer; 2) Multi-Head Self-Attention (MHSA) processing is carried out on the feature map of the deepest level for extracting remote feature information in the feature map, multi-Head cross Attention (Multi-Head Cross Attention, MHCA) processing is carried out on the rest shallow layer feature map respectively for enhancing relevant areas in network connection; 3) Respectively carrying out standardization operation on the feature images of different levels, carrying out MHSA processing, carrying out pixel-by-pixel summation after obtaining the attention moment array, and fusing the two paths of feature images in a splicing mode; 4) And utilizing a plurality of decoding convolution blocks and up-sampling operation to finish fusion of feature images of different levels, and finally obtaining a low-illumination infrared and visible light fusion image. In step 1), further, aiming at the problems of serious loss of detail information, obvious reduction of contrast and the like of infrared and visible light images under a low-illumination background, one layer of 1×1 convolution and two layers of 3×3 convolution are respectively carried out on infrared and visible light heterogeneous images, one layer of maximum pooling and two layers of 3×3 convolution are used as a unit, a plurality of groups of units are cascaded, and the characteristic images extracted by each group of units form a multi-level heterogeneous double-path characteristic image. Further, in step 2), MHSA is performed on the feature map of the deepest level to extract remote feature information in the feature map, firstly, the input feature map is subjected to position coding, tensors are expressed in a two-dimensiona