CN-121982465-A - Complex degradation robust image fusion method and system based on multi-mode diffusion flow mutual feedback synergy
Abstract
The invention discloses a complex degradation robust image fusion method and system based on multi-mode diffusion flow mutual feedback cooperation, which belong to the technical field of computer vision image processing. And designing a time perception cross-modal information fusion sub-module, and embedding the module into each time step of diffusion sampling to realize iterative aggregation of cross-modal information. The aggregated variables are reversely fed to two paths of recovery branches at each time step so as to construct a mutual feed cooperative mechanism of mutual reinforcement of information recovery and information fusion by utilizing the cross-modal complementarity to enhance degradation removal capability. The invention breaks through the structural limitation of the traditional paradigm through the unified framework of mutual feedback coordination, realizes the image fusion effect with high robustness, and has the obvious advantages of strong stability, high generalization, excellent fusion quality and the like.
Inventors
- ZHANG HAO
- YANG SHUHAN
- LI ZIZHUO
- MA JIAYI
- MA YONG
Assignees
- 武汉大学
Dates
- Publication Date
- 20260505
- Application Date
- 20260126
Claims (10)
- 1. The complex degradation robust image fusion method based on multi-mode diffusion flow mutual feedback synergy is characterized by comprising the following steps: Acquiring a visible light degradation image and an infrared degradation image; inputting the visible light degradation image and the infrared degradation image into a trained multi-mode diffusion flow mutual feedback cooperative model, and outputting a restored image and a fused image, wherein the training of the multi-mode diffusion flow mutual feedback cooperative model comprises the following steps: constructing a multi-modal dataset based on the visible light degradation image and the infrared degradation image; The method comprises the steps of constructing a multi-mode diffusion flow mutual feedback collaborative model, an automatic encoder, a mutual feedback collaborative module, an automatic decoder and a fusion image, wherein the automatic encoder is used for extracting features of a visible light degradation image and an infrared degradation image; training the built multi-mode diffusion flow mutual feedback collaborative model by utilizing the multi-mode data set, and carrying out alternate regularization by combining information recovery regularization and information fusion regularization to output the trained multi-mode diffusion flow mutual feedback collaborative model.
- 2. The complex degradation robust image fusion method based on multi-mode diffusion flow mutual feedback synergy according to claim 1, wherein the information recovery and information fusion are carried out on the extracted visible light features and infrared features, and the method comprises the following steps: Based on the potential feature space of the automatic encoder, constructing an information recovery sub-module of the visible light and infrared mode at each time step respectively; based on the information recovery submodules of the visible light and infrared modes in each time step, an information fusion submodule of the visible light and infrared modes is constructed, and potential variables of the visible light and infrared modes are fused by the information fusion submodule to generate an aggregation variable of each time step.
- 3. The complex degradation robust image fusion method based on multi-mode diffusion flow mutual feedback synergy of claim 2, wherein the construction of the information recovery sub-module of each of visible light and infrared modes comprises the following steps: modeling the information recovery process of visible light and infrared by adopting a potential diffusion frame based on a mean regression random differential equation; based on the modeling result, the re-parameterization method is adopted to convert the respective condition score function of the visible light and the infrared into the estimation of noise, and the construction of the visible light and the infrared respective information recovery sub-module is completed.
- 4. The complex degenerate robust image fusion method based on multi-modal diffusion flow cross-feed synergy of claim 2, comprising, after generating the aggregated variable for each time step: Inputting the aggregation variable of each time step into an information recovery submodule of visible light and infrared modes to carry out collaborative denoising and reverse track correction; And performing dual coupling on information recovery and information fusion by using an information fusion sub-module to generate a final fusion representation, and performing reverse mapping on the final fusion representation by using an automatic decoder to obtain a fusion image.
- 5. The complex degenerate robust image fusion method based on multi-modal diffusion flow mutual feedback synergy of claim 1, wherein the alternate regularization is performed in combination with information recovery regularization and information fusion regularization, comprising: The parameter updating of the information fusion sub-module and the information recovery sub-module is carried out through information recovery regularization; through information fusion regularization, only parameter updating of the information fusion sub-module is carried out; and carrying out information recovery regularization and information fusion regularization alternately.
- 6. The complex degenerate robust image fusion method based on multi-modal diffusion flow mutual feedback synergy of claim 5, wherein the information recovery regularization expression is: , Wherein, the The information fusion regularization is represented by a set of data, Indicating the desire to spread noise and training sample distribution, The time step of the diffusion is indicated, For the total number of time steps in reverse mutual feedback synergy, Is of a mode shape At a time step Noise prediction result, operator Represents the latent variable back diffusion drift term from the aggregate variable to the mode m, Represents an aggregate variable that is a function of the aggregate, Indicating the type of degradation, f indicating the aggregation, The time step is indicated as such, Representing the ideal target.
- 7. The complex degenerate robust image fusion method based on multi-modal diffusion flow mutual feedback synergy of claim 5, wherein the expression of the information fusion regularization is: , Wherein, the The information fusion regularization is represented by a set of data, Indicating a loss of texture and, Representation of , Representing the color loss.
- 8. The complex degenerate robust image fusion method based on multi-modal diffusion flow mutual feedback synergy according to claim 1, wherein the expression of the output restored image and the fused image is: , , Wherein, the The type of modality is represented and, Indicating the type of degradation and, Representing a restored image of the object, The fused image is represented by a representation of the fused image, An automatic encoder is shown as being provided, Representing the potential variables of each modality, f representing the aggregation, Represents an aggregate variable that is a function of the aggregate, Representing the intermediate characteristics of the features, Representing an aggregate intermediate feature.
- 9. The complex degradation robust image fusion method based on multi-mode diffusion flow mutual feedback synergy according to claim 1, wherein the feature extraction is performed on the visible degradation image and the infrared degradation image, and the method comprises the following steps: and mapping the visible light degradation image and the infrared degradation image to a potential feature space to obtain potential features and intermediate features of respective modes of visible light and infrared.
- 10. A complex degenerate robust image fusion system based on multi-modal diffusion flow mutual feedback synergy, comprising: the image acquisition module is used for acquiring a visible light degradation image and an infrared degradation image; The image recovery and fusion module is used for inputting the visible light degradation image and the infrared degradation image into a trained multi-mode diffusion flow mutual feedback cooperative model and outputting a recovery image and a fusion image, wherein the training of the multi-mode diffusion flow mutual feedback cooperative model comprises the following steps: constructing a multi-modal dataset based on the visible light degradation image and the infrared degradation image; The method comprises the steps of constructing a multi-mode diffusion flow mutual feedback collaborative model, an automatic encoder, a mutual feedback collaborative module, an automatic decoder and a fusion image, wherein the automatic encoder is used for extracting features of a visible light degradation image and an infrared degradation image; training the built multi-mode diffusion flow mutual feedback collaborative model by utilizing the multi-mode data set, and carrying out alternate regularization by combining information recovery regularization and information fusion regularization to output the trained multi-mode diffusion flow mutual feedback collaborative model.
Description
Complex degradation robust image fusion method and system based on multi-mode diffusion flow mutual feedback synergy Technical Field The invention belongs to the technical field of computer vision image processing, relates to a multi-mode image recovery and information fusion technology, and in particular relates to a complex degradation robust image fusion method and system based on multi-mode diffusion flow mutual feedback collaboration. Background Due to limitations in imaging mechanisms and hardware conditions, single modality imaging systems often capture only a portion of the properties of the scene, making it difficult to form a complete representation of the real world. Multi-modal image fusion techniques have been developed that aim to fuse information from multiple source images, such as visible light, infrared, etc., to obtain a more comprehensive and reliable scene description. The technology is widely applied to the key fields of automatic driving, intelligent security, military reconnaissance and the like, and the visual perception capability under the complex environment is remarkably improved. However, in practical application scenarios, the image acquisition process is inevitably affected by real physical conditions, and each modality image is often accompanied by different types of degradation (such as low illumination, haze, noise, low contrast, streaks, etc.). The challenges of information fusion are particularly acute when multimodal images degrade at the same time. At present, two robust image fusion modes are mainly formed aiming at the influence of degradation interference on fusion quality, but the two robust image fusion modes have inherent structural limitations when dealing with complex degradation scenes. One is an integrated hard regression paradigm, and multi-modal degradation is recovered to a high-quality result through simultaneous learning degradation removal and information fusion by an end-to-end network. However, the differences of different degradation types are obvious, and the requirements of each mode on information retention are also inconsistent, so that the model has difficulty in considering the processing of multiple degradation and the retention of a cross-mode structure in a single mapping. The other type is a decoupling optimization paradigm, degradation recovery and information fusion are separated, and each mode image is enhanced independently first and then fusion is carried out. Although the method can improve the quality of the degraded image to a certain extent, because the recovery process and the fusion process are mutually independent, each mode cannot utilize cross-mode complementary information in the recovery stage. Meanwhile, the suitability between the recovery result and the fusion module is lacking, and the two-stage target inconsistency is easy to be caused, so that the final fusion performance is influenced. In summary, both types of paradigms have difficulty in coping with multiple types of complex degradations with significant spatial variations in real scenes, and structural limitations thereof have become major bottlenecks for improving the robust image fusion capability. The existing robust image fusion paradigm has structural limitations whether it is integrated hard regression or decoupled optimization. The former has difficulty in combining the treatment of multiple degradations and the retention of cross-modal structures in a single map; the latter is due to the fact that the recovery and the fusion process are independent, so that the recovery stage lacks cross-modal complementary information, and the recovery result and the fusion module lack suitability. These problems together constitute a major performance bottleneck for existing approaches when dealing with multiple types of complex composite degradation scenarios. Therefore, the relation between the degradation recovery and the information fusion is redefined, a novel fusion paradigm capable of realizing effective synergy of the two is constructed, and the method has important significance for breaking through the performance bottleneck of the existing method in a complex degradation scene. Disclosure of Invention Aiming at the problems that the existing robust image fusion paradigm is difficult to consider the processing of multiple degradations, the retention of a cross-modal structure and the lack of cross-modal complementary information in a recovery stage in a single mapping, and the suitability between a recovery result and a fusion module is lacking, the invention provides a complex degradation robust image fusion method based on the cooperation of multi-modal diffusion flow and mutual feedback, through the fusion paradigm based on potential diffusion, the time-aware cross-modal information fusion sub-module and the information recovery sub-module are constructed, and the two functional tasks (information fusion and information recovery) are guaranteed to effectively