CN-122023995-A - Missing infrared mode image fusion method based on joint shared dictionary

CN122023995ACN 122023995 ACN122023995 ACN 122023995ACN-122023995-A

Abstract

The invention relates to a missing infrared mode image fusion method based on a joint shared dictionary, and belongs to the technical field of infrared and visible light image fusion. The method comprises the steps of decomposing a visible light image by using a trained joint shared dictionary to obtain coding coefficients of a visible light mode, obtaining semantic segmentation features, inputting the visible light coefficients into a mode conversion network, obtaining preliminarily mapped pseudo-infrared coefficients under the assistance of the semantic segmentation features, correcting the preliminarily mapped pseudo-infrared coefficients through coding coefficient consistency constraint and physical response prior to obtain pseudo-infrared coefficients with obvious corrected infrared information, and cooperatively fusing the corrected pseudo-infrared coefficients and the visible light coefficients in a coding coefficient space according to structure retention and energy distribution constraint, and finally reconstructing the shared dictionary to obtain a fused image. The invention solves the problem how to obtain the infrared and visible light fusion image which can be comparable with the infrared and visible light fusion image without missing the mode through the single visible light mode image under the condition of missing the infrared mode.

Inventors

ZHANG YAFEI
MA MENG
LI HUAFENG
XIE MINGHONG
Dong Neng

Assignees

昆明理工大学

Dates

Publication Date: 20260512
Application Date: 20260403

Claims (8)

1. The method for fusion of the missing infrared mode images based on the joint shared dictionary is characterized by comprising the following steps of: Step1, acquiring paired infrared and visible light image training data sets; Step2, inputting paired infrared and visible light images into a convolution dictionary for decomposition and reconstruction, and training to obtain a joint shared dictionary; Step3, decomposing the visible light image by utilizing a joint shared dictionary to obtain coding coefficients of a visible light mode, inputting the visible light image into a frozen semantic encoder branch to obtain semantic segmentation features, inputting the visible light coefficients into a mode conversion network, and obtaining preliminarily mapped pseudo-infrared coefficients under the assistance of the semantic segmentation features; Step4, correcting the preliminarily mapped pseudo infrared coefficient through the code coefficient consistency constraint and the physical response prior to obtain a pseudo infrared coefficient with obvious corrected infrared information; step5, cooperatively fusing the corrected pseudo infrared coefficient and the visible light coefficient according to structural retention and energy distribution constraint in a coding coefficient space, and finally reconstructing through a shared dictionary to obtain a fused image.
2. The method for fusion of missing infrared mode images based on a joint shared dictionary of claim 1, wherein each group of infrared and visible image pairs in a training data set for joint shared dictionary training comprises a group of corresponding infrared and visible images, and the resolution of each image is 640×512; the data set of the infrared and visible light image is preprocessed by randomly cutting the data into 256 multiplied by 256, and normalizing the processed data to be between 0 and 1.
3. The method for fusion of missing infrared modality images based on a federated shared dictionary of claim 1, wherein Step2 comprises: first initialize a federated shared dictionary Then the infrared and visible light images are paired And Simultaneously inputting the codes into an encoder to initialize the code coefficients to obtain corresponding code coefficients as And Sending the initialized coding coefficients and the initialized joint shared dictionary into a depth subnetwork, and solving a target equation: ; Under implicit constraint, automatically learning a joint shared dictionary from data And coding coefficient representation, wherein The regularized terms of the dictionary are represented, Representing sparsity constraint on shared dictionary, and encoding coefficients in depth subnetwork by iterative update method And Federated shared dictionary Respectively updating; In updating the coefficient When it will And The equation corresponding to the visible light coefficient sub-network is expressed as follows: ; thereby obtaining the joint shared dictionary after the visible light coefficient sub-network is updated Sum coefficient of ; And then updating the coefficients Through the infrared coefficient sub-network pair And Updating, wherein an equation corresponding to the infrared coefficient sub-network is expressed as follows: ; thereby obtaining the joint shared dictionary updated by the infrared coefficient sub-network Sum coefficient of Finally obtaining updated coefficients And Federated shared dictionary Inputting them into unified decoder at the same time to make reconstruction so as to obtain the reconstructed infrared visible light image pair And The constraint of a dictionary solving method is met; In training the joint shared dictionary, in order to make each coefficient sub-network output 、 And Satisfying the constraint of dictionary solving target equations, the loss is reconstructed by: ; The method is used for ensuring that the original infrared and visible light images can still be reconstructed by the coefficients and the dictionary after each iteration update.
4. The method for fusion of missing infrared modality images based on a federated shared dictionary of claim 1, wherein Step3 comprises: To image visible light Inputting into encoder, decomposing visible light image into visible light coding coefficient with the help of trained joint shared dictionary The visible light coding coefficient and the corresponding infrared coding coefficient can exist in the unified coefficient space at the moment, the possibility of modal conversion among the coefficients is established, and meanwhile, the visible light image is sent into a frozen semantic encoder branch to obtain the semantic segmentation image characteristics Then the visible light coding coefficient is used And semantic segmentation features Together fed into a coefficient mode conversion network formed by multiple layers of 1×1 and 3×3 convolution layers Is processed by the following steps: ; Outputting the obtained primary converted pseudo infrared coding coefficient And learning nonlinear combination among atoms through mixed modeling among channels by using semantic features of the visible light image as guidance.
5. The method for fusion of missing infrared modality images based on a federated shared dictionary of claim 1, wherein Step4 comprises: Pseudo infrared coding coefficient obtained in the previous step Regarding as a dynamic state variable, the spatial consistency and the local aggregation structure are modeled through a multi-stage recursive updating network, the network structure comprises an inference network block with a plurality of convolution layers and Relu activation layers, and the network is expressed as: ; Where F represents each inference network block, T represents the number of inference network blocks, Representing pseudo infrared coding coefficients obtained after t times of reasoning network blocks, Representing pseudo infrared coding coefficients obtained after t+1 times of reasoning network blocks, Representing the parameter weight of the t-th reasoning network block, and finally obtaining the corrected pseudo infrared coding coefficient By corrected pseudo-infrared coding coefficients Obtaining pseudo infrared image after decoder 。
6. The method for merging missing infrared modality images based on a joint shared dictionary of claim 5, wherein Step4 further comprises: Through consistency loss Constraint pseudo infrared and real infrared external image and coding coefficient two-layer lamination is defined as: ; To maintain structural consistency with the input visible light code, a coding coefficient consistency penalty is introduced: ; Wherein, the Representing gradient operators, and introducing prior loss of infrared physical response to inhibit invalid atoms and strengthen heat radiation related response by prior guidance of the infrared imaging physical response: ; gradually evolves to a stable and reliable infrared coefficient state.
7. The method for fusion of missing infrared modality images based on a federated shared dictionary of claim 1, wherein Step5 comprises: establishing a collaborative fusion network in the coding coefficient space, and inputting corrected pseudo infrared coding coefficients And visible light coding coefficient The coding coefficients are spliced and then subjected to collaborative fusion; the method comprises the steps of obtaining a fusion result, carrying out fusion on the fusion result, wherein the deviation of coding coefficients in spatial distribution in the fusion process is limited by structure retention constraint, and the method is used for guaranteeing the geometric structural consistency of the fusion result and a visible light image: ; Wherein, the The point-of-view is indicated, Representing the collaborative fusion network, outputting and obtaining initial fusion coefficient as Generating a consistency modulation item S through structure maintenance mapping, and generating atomic-level energy weight through energy distribution mapping ; And fusing the coefficients Sharing dictionaries jointly Sending the image into a coefficient decoding network for consistency reconstruction to obtain a fused image with both structural definition and infrared information significance 。
8. The method for merging missing infrared modality images based on a joint shared dictionary as claimed in claim 7, wherein Step5 further comprises: When the coding coefficient collaborative fusion network is trained, the intensity consistency loss is adopted for simultaneously highlighting the heat radiation significance and maintaining the visible light structure And gradient consistency loss : ; Wherein, the Representing the gradient operator, max is larger in pixel/gradient magnitude point by point, causing Simultaneously inherits the clear edges of the infrared thermal intensity peak value and the visible light, and the final fusion loss is ; And in the training stage, the pseudo infrared coding coefficient reasoning network and the joint shared dictionary D are frozen, and only the coding coefficient collaborative fusion network is optimized.

Description

Missing infrared mode image fusion method based on joint shared dictionary Technical Field The invention relates to a missing infrared mode image fusion method based on a joint shared dictionary, and belongs to the technical field of infrared and visible light image fusion. Background The current infrared-visible light image fusion is mainly used for retaining the significance of an infrared thermal target and visible light texture details. The methods are often dependent on the existence of infrared images, and in practical application, the infrared modes are often lost or unavailable due to equipment cost, acquisition limitation, dyssynchrony, shielding, sensor faults and the like, so that the traditional fusion method cannot be directly executed or the fusion quality is obviously reduced. The existing method comprises the steps of traditional multi-scale/filtering fusion, deep learning end-to-end fusion and cross-modal completion, sparse representation and dictionary learning fusion and the like. The method is mostly dependent on bimodal input, a scheme based on pixel level generation becomes the mainstream under the condition of missing infrared, but the core defect of the method based on the generated missing infrared is that the visible light-to-heat distribution has one-to-many unrecognizable property, a model is easy to generate false hot spots or leak out of a real heat target and possibly accompanies structural drift and dislocation, meanwhile, the method is sensitive to data distribution and equipment domain change, is poor in cross-scene generalization, and the output is often lack of physical calibration and interpretable controllable constraint, so that pixel indexes look good but key heat semantics are unreliable, and therefore higher risks are brought to fusion and downstream tasks. Therefore, a technical scheme capable of realizing stable compensation and fusion and effectively describing cross-mode shared information under the condition of infrared missing is needed. Disclosure of Invention The invention solves the technical problem that the invention provides a missing infrared mode image fusion method based on a joint shared dictionary, which is used for solving the problem of how to obtain higher perceived quality and downstream task performance through a single visible light image under the condition of missing infrared mode. The technical scheme of the invention is that the method for fusion of the missing infrared mode images based on the joint shared dictionary comprises the following steps: Step1, acquiring paired infrared and visible light image training data sets; Step2, inputting paired infrared and visible light images into a convolution dictionary for decomposition and reconstruction, and training to obtain a joint shared dictionary; Step3, decomposing the visible light image by utilizing a joint shared dictionary to obtain coding coefficients of a visible light mode, inputting the visible light image into a frozen semantic encoder branch to obtain semantic segmentation features, inputting the visible light coefficients into a mode conversion network, and obtaining preliminarily mapped pseudo-infrared coefficients under the assistance of the semantic segmentation features; Step4, correcting the preliminarily mapped pseudo infrared coefficient through the code coefficient consistency constraint and the physical response prior to obtain a pseudo infrared coefficient with obvious corrected infrared information; step5, cooperatively fusing the corrected pseudo infrared coefficient and the visible light coefficient according to structural retention and energy distribution constraint in a coding coefficient space, and finally reconstructing through a shared dictionary to obtain a fused image. Further, in Step1, each group of infrared-visible light image pairs in the training data set for joint shared dictionary training comprises a group of corresponding infrared-visible light images, wherein the resolution of each image is 640×512; the data set of the infrared and visible light image is preprocessed by randomly cutting the data into 256 multiplied by 256, and normalizing the processed data to be between 0 and 1. Further, step2 includes: first initialize a federated shared dictionary Then the infrared and visible light images are pairedAndSimultaneously inputting the codes into an encoder to initialize the code coefficients to obtain corresponding code coefficients asAndSending the initialized coding coefficients and the initialized joint shared dictionary into a depth subnetwork, and solving a target equation: ; Under implicit constraint, automatically learning a joint shared dictionary from data And coding coefficient representation, whereinThe regularized terms of the dictionary are represented,Representing sparsity constraint on shared dictionary, and encoding coefficients in depth subnetwork by iterative update methodAndFederated shared dictionaryRespectively updating; In updat