CN-122023989-A - Visible light-SAR image fusion method and system based on double-flow symmetrical transformers

CN122023989ACN 122023989 ACN122023989 ACN 122023989ACN-122023989-A

Abstract

The invention discloses a visible light-SAR image fusion method and a visible light-SAR image fusion system based on double-flow symmetrical convertors, which belong to the technical field of image processing and comprise the steps of firstly carrying out multi-scale feature extraction and preliminary fusion on visible light and SAR source images by constructing a double-flow symmetrical framework to ensure synchronous characterization of heterogeneous modal information; and meanwhile, the multi-modal distribution judgment is carried out on the fused image through the double discriminator design, and the antagonism training of the generator and the discriminator is driven by the multi-element loss function, so that the visual fidelity and the detail resolving power of the fusion result are obviously improved while the SAR image noise is restrained.

Inventors

SUI CHENHONG
Zhou Jingju
XU SHUO

Assignees

烟台大学

Dates

Publication Date: 20260512
Application Date: 20260202

Claims (10)

1. The visible light-SAR image fusion method based on the double-flow symmetrical transducer is characterized by comprising the following steps of: Obtaining a visible light image and a SAR image under the same scene, and performing spatial registration on the visible light image and the SAR image; Constructing a generation countermeasure network framework comprising a generator and two mutually independent discriminators; Respectively extracting multi-scale shallow features of the visible light image and the SAR image by using double-flow symmetrical branches in the generator, inputting the shallow features into a transducer module, and carrying out global semantic modeling and long-range dependency capturing through a self-attention mechanism; Extracting microscopic texture detail features from the visible light image and the SAR image by using a serial structure, extracting macroscopic space structure features from the visible light image and the SAR image by using a parallel structure, and performing multi-scale aggregation fusion on the microscopic texture detail features and the macroscopic space structure features to generate a fusion feature map; Scanning the fusion feature map from the horizontal direction, the vertical direction and the diagonal direction by utilizing a strip convolution block, extracting and reinforcing linear edge information, and generating an enhanced fusion feature map; Reconstructing the enhanced fusion feature map through a decoding network to obtain an initial fusion image; And performing countermeasure discrimination on the initial fusion image and the source image by using the two mutually independent discriminators, and iteratively optimizing network parameters by combining a multi-element composite loss function to output a final fusion model.
2. The method of claim 1, wherein the dual-flow symmetric leg comprises two feature extraction legs corresponding to the visible light image and SAR image, respectively, each feature extraction leg comprising a concatenated convolutional layer, an activation function layer, and a transducer module; the convolution layer is used for capturing local spatial features of the corresponding modal images through a spatial convolution operator; The activation function layer is used for carrying out nonlinear mapping on the convolution characteristics; The transducer module is used for establishing long-range dependency relations among feature points through a self-attention mechanism and executing global feature modeling.
3. The method of claim 1, wherein the serial structure is used to extract microtextured detail features with small receptive fields, comprising: Sequentially inputting a source image into a convolution layer and an activation layer, and capturing microscopic textures and local details of the image through cascade convolution operation; Cross-layer fusion is carried out on the features before convolution processing and the features after processing through residual connection, so that a shallow feature map is generated; and inputting the shallow feature map to a transducer module, and carrying out global association modeling and feature refinement by using a self-attention mechanism.
4. The method of claim 1, wherein the parallel structure is used to extract macro-spatial structural features with large receptive fields, comprising: Constructing a double-sub-network parallel architecture, synchronously executing convolution operator, nonlinear activation and transducer self-attention operation in each sub-network, and capturing a global topological structure of a source image from multiple dimensions; extracting information complementary macroscopic characterization from different feature subspaces through parallel mapping of the double sub-networks; And carrying out aggregation and fusion on complementary features output by the double sub-networks through a residual error connection mechanism to generate a deep feature map.
5. The method of claim 1, wherein the multi-scale aggregate fusion comprises: And performing channel splicing on the micro texture detail features output by the serial structure and the macro space structure features output by the parallel structure to generate a fusion feature map containing space structure information and texture detail information.
6. The method of claim 1, wherein the strip convolution block is configured to perform multi-directional anisotropic feature enhancement, comprising: constructing directional convolution branches comprising a horizontal direction, a vertical direction, a left diagonal direction and a right diagonal direction; Performing directional convolution operation in each branch by using a strip convolution kernel, and capturing edge and structure topology information distributed along different geometric orientations in a source image; And mapping the characteristics output by each directional branch on the channel dimension for splicing and polymerizing to generate a fusion characteristic diagram with a multidirectional edge enhancement effect.
7. The method of claim 1, wherein reconstructing the enhanced fusion profile through a decoding network comprises: Establishing a jump connection path from the feature extraction layer to the reconstruction module, and directly transmitting the multi-scale feature mapping extracted by the serial structure and the parallel structure to the reconstruction module; Cascading the aggregated fusion feature map with reference features introduced through jump connection, and executing feature dimension reduction and nonlinear mapping through a cascading convolution layer; And introducing a transducer mechanism to carry out global modeling on feature distribution in the reconstruction process, and finally reconstructing the feature distribution into a multi-mode fusion image.
8. The method of claim 1, wherein the two discriminators are independent of each other and are identical in structure, each discriminator comprising a convolution module, a nonlinear activation module, and a normalization module in a cascade arrangement; The convolution module is used for capturing local feature distribution of the image; The normalization module is used for stabilizing gradient flow in the training process; The activation module is used for enhancing the nonlinearity of the discrimination boundary; Each discriminator outputs a confidence score for evaluating the consistency of the fused image with the corresponding modality source image in the data distribution.
9. A visible light-SAR image fusion system comprising a memory, a processor and a computing program stored in the memory and executable on the processor, wherein the processor implements the method of any of claims 1-8 when executing the computing program.
10. A computer readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the method of any of claims 1-8.

Description

Visible light-SAR image fusion method and system based on double-flow symmetrical transformers Technical Field The invention belongs to the technical field of image processing, and particularly relates to a visible light-SAR image fusion method and system based on double-flow symmetrical transformers. Background The fusion of visible light and Synthetic Aperture Radar (SAR) images aims at synthesizing sensor information of different imaging mechanisms and generating a fusion image with Gao Biaozheng force and high information abundance. The traditional image fusion method is mostly dependent on manually designed feature extraction operators and heuristic fusion rules, has the limitations of weak feature generalization capability, insufficient modeling capability for nonlinear mapping and the like when complex dynamic scenes and heterogeneous high-dimensional data are processed, and is difficult to automatically adapt to diversified image distribution, so that the fused image has poor performance in terms of detail maintenance and space consistency. In recent years, a deep learning method represented by a Convolutional Neural Network (CNN) remarkably improves fusion performance, and automatic extraction of deep features is realized through a self-learning mechanism. However, since convolution operators are limited to local receptive fields, there is still a significant disadvantage in constructing all-round long-range dependencies and recovering fine textures. In addition, the existing deep learning fusion framework mostly adopts a single-flow framework, complementary distribution characteristics among heterogeneous source images are difficult to fully mine, and the partial double-flow framework increases information input dimension, but the technical bottlenecks of key texture loss, edge diffusion and the like are still faced due to lack of a directional sensing mechanism aiming at heterogeneous edges. Aiming at the technical problems that in the prior art, the multi-mode image fusion effect is poor, details are lost, global information and local information are difficult to be compatible, and the like, the invention provides a visible light-SAR image fusion method and system based on a double-flow symmetrical transducer. According to the invention, the double-flow symmetrical architecture is adopted to extract multi-scale characteristics of heterogeneous source images, so that the depth alignment of different modal information is ensured. In the generator, a composite mechanism of a serial structure and a parallel structure is innovatively introduced, wherein the serial structure captures microscopic textures and local details through deep cascading, and the parallel structure captures macroscopic space topological features through multipath mapping, so that the limitation of traditional convolution on multi-scale feature modeling is effectively solved. Disclosure of Invention In order to solve the technical problems, the invention provides a visible light-SAR image fusion method based on double-flow symmetrical convertors, which is characterized by comprising the following steps: Obtaining a visible light image and a SAR image under the same scene, and performing spatial registration on the visible light image and the SAR image; Constructing a generation countermeasure network framework comprising a generator and two mutually independent discriminators; Respectively extracting multi-scale shallow features of the visible light image and the SAR image by using double-flow symmetrical branches in the generator, inputting the shallow features into a transducer module, and carrying out global semantic modeling and long-range dependency capturing through a self-attention mechanism; Extracting microscopic texture detail features from the visible light image and the SAR image by using a serial structure, extracting macroscopic space structure features from the visible light image and the SAR image by using a parallel structure, and performing multi-scale aggregation fusion on the microscopic texture detail features and the macroscopic space structure features to generate a fusion feature map; Scanning the fusion feature map from the horizontal direction, the vertical direction and the diagonal direction by utilizing a strip convolution block, extracting and reinforcing linear edge information, and generating an enhanced fusion feature map; Reconstructing the enhanced fusion feature map through a decoding network to obtain an initial fusion image; And performing countermeasure discrimination on the initial fusion image and the source image by using the two mutually independent discriminators, and iteratively optimizing network parameters by combining a multi-element composite loss function to output a final fusion model. Optionally, the dual-flow symmetrical branch comprises two feature extraction branches corresponding to the visible light image and the SAR image respectively, and each feature extraction branch c