CN-121982281-A - Method and system for cross-modal registration of synthetic aperture radar image and visible light image

CN121982281ACN 121982281 ACN121982281 ACN 121982281ACN-121982281-A

Abstract

The invention belongs to the technical field of image filtering and discloses a method and a system for cross-modal registration of a synthetic aperture radar image and a visible light image, wherein the method comprises the steps of performing robust correction of large-scale rotation and scale difference between SAR and OPT by GRB, and respectively generating global registration features of the SAR and the OPT; and thirdly, performing cross-modal semantic alignment on the global registration features and the local registration features of the SAR and the OPT by using a cross-modal Mamba interaction module to finish cross-modal registration of the SAR and the OPT. The invention connects the global registration GRB and the local fine matching LRB through a closed loop feedback mechanism, and ensures the robust registration of the heterologous images.

Inventors

CHEN JIE
Pazilaiti Nurmaiti
GUO XIANFEI
Wan Huiyao
LI HUI
XU YUNHENG
LI YINGSONG
HUANG ZHIXIANG
YANG ZHIGAO

Assignees

安徽大学
中科卫星(安徽)数据科技有限公司

Dates

Publication Date: 20260505
Application Date: 20251230

Claims (10)

1. A method for cross-modal registration of synthetic aperture radar images with visible light imagery, comprising: The method comprises the steps that firstly, a global registration module GRB is used for carrying out large-scale rotation and robust correction of scale difference between a synthetic aperture radar image SAR and a visible light image OPT, and global registration features of the synthetic aperture radar image SAR and global registration features of the visible light image OPT are respectively generated; Secondly, based on global registration features of the SAR and global registration features of the OPT, matching local texture structures in the SAR image and the OPT image by using a local registration module LRB, and respectively generating the local registration features of the SAR image and the OPT image; thirdly, performing Cross-modal semantic alignment on the global registration feature of the SAR image, the global registration feature of the OPT image, the local registration feature of the SAR image and the local registration feature of the OPT image by using a Cross-modal Mamba interaction module Cross-SS2Former to finish the Cross-modal registration of the SAR image and the OPT image.
2. The method of claim 1, wherein the step of determining the position of the substrate comprises, In the first step, the method for generating the global registration features of the synthetic aperture radar image SAR and the global registration features of the visible light image OPT by using the global registration module GRB includes: through the GRI sub-module, a learnable convolution kernel is applied to the synthetic aperture radar image SAR and the visible light image OPT in parallel at a plurality of preset angles, rotation and other variable characteristics are extracted and aggregated into a rotation invariant descriptor; Through a GSI sub-module, respectively dynamically generating a multi-scale feature pyramid for the synthetic aperture radar image SAR and the visible light image OPT, and carrying out channel fusion after recovering resolution by bilinear interpolation; And splicing the output features of the GRI sub-module and the GSI sub-module along the channel dimension, and carrying out joint optimization on the output features and the global pooling layer through a feedforward neural network FFN to respectively generate global registration features of the synthetic aperture radar image SAR and global registration features of the visible light image OPT.
3. The method of claim 2, wherein the step of determining the position of the substrate comprises, The global registration module GRB expression is: ; Wherein, the In order to feed-forward the neural network, For the global context enhancement to be performed, For the concatenation of the dimensions of the channels, As an output feature of the GRI sub-module, For the output characteristics of the GSI sub-module, R is the characteristic real set, B represents the batch size, As a dimension of the space of the feature map, , Is the number of output channels of the GRI submodule, Is the GSI submodule output channel number.
4. The method of claim 1, wherein the step of determining the position of the substrate comprises, In the second step, the method for generating the local registration features of the synthetic aperture radar image SAR and the local registration features of the visible light image OPT by using the local registration module LRB includes: Initializing a group of 2D Gabor filters with uniform distribution angles according to a preset direction number D and a kernel size K, and stacking the 2D Gabor filters into a learnable parameter tensor; Based on global registration features of the synthetic aperture radar image SAR and global registration features of the visible light image OPT, depth separable convolution is respectively applied to feature images of the synthetic aperture radar image SAR and the visible light image OPT after global registration, and response tensors in all directions are obtained; The Euclidean norms of all the pixel points in all directions are calculated, and a rotation-unchanged amplitude characteristic diagram is generated; And constructing a space-direction attention map through channel averaging and Softmax normalization, and carrying out weighted fusion on the original direction response according to the space-direction attention map, so as to respectively generate local registration features of the SAR image and local registration features of the OPT image.
5. The method of claim 4, wherein the step of determining the position of the first electrode is performed, The expression of the local registration module LRB is: ; Wherein, the Representing local registration features output by the local registration module, b, D, h, w being a dimension index, D being the total number of directions, C representing channels, An auxiliary index representing the dimensions of the channel, An auxiliary index representing the dimension of the direction, Is a global registration feature output by the global registration module and is also an input feature map of the local registration module, Gabor convolution kernel representing the d-th direction, And (3) with Belonging to the same class of operators, the different direction dimensions are distinguished only by the index d', Representing the spatio-directional attention weight, In order for the direction response to be a directional response, Is a convolution operation.
6. The method of claim 1, wherein the step of determining the position of the substrate comprises, In the third step, a Cross-SS2Former of a Cross-modal Mamba interaction module utilizes a parallel selective state space model to construct long-range dependence, and Cross-modal semantic alignment is realized through weighted fusion; Cross-modal Mamba interaction module Cross-SS2Former has the expression: ; Wherein, the Is the fusion weight of the automatic learning, 、 The state space model updating operators corresponding to the synthetic aperture radar image SAR and the visible light image OPT are respectively adopted, 、 And (3) under the time step t, synthesizing global fusion characteristics and local fusion characteristics of the aperture radar image SAR and the visible light image OPT.
7. A system for cross-modal registration of synthetic aperture radar images with visible light imagery, employing the method of any one of claims 1-6, comprising: The global registration module is used for carrying out robust correction of large-scale rotation and scale difference between the synthetic aperture radar image SAR and the visible light image OPT, and respectively generating global registration features of the synthetic aperture radar image SAR and global registration features of the visible light image OPT; The local registration module is used for matching local texture structures in the synthetic aperture radar image and the visible light image based on the global registration feature of the synthetic aperture radar image SAR and the global registration feature of the visible light image OPT, and respectively generating the local registration feature of the synthetic aperture radar image SAR and the local registration feature of the visible light image OPT; The cross-modal Mamba interaction module is used for performing cross-modal semantic alignment on the global registration feature of the SAR, the global registration feature of the OPT, the local registration feature of the SAR and the local registration feature of the OPT to complete the cross-modal registration of the SAR and the OPT.
8. The system according to claim 7, wherein: The global registration module comprises a GRI sub-module, a GSI sub-module and a splicing module; The GRI sub-module is used for respectively applying a learnable convolution kernel to the synthetic aperture radar image SAR and the visible light image OPT in parallel at a plurality of preset angles, extracting rotation and other variable characteristics and converging the rotation and other variable characteristics into a rotation-invariant descriptor; the GSI sub-module is used for dynamically generating a multi-scale characteristic pyramid for the synthetic aperture radar image SAR and the visible light image OPT respectively, and carrying out channel fusion after recovering the resolution by bilinear interpolation; the splicing module is used for splicing the output features of the GRI sub-module and the GSI sub-module along the channel dimension, and carrying out joint optimization on the GRI sub-module and the global pooling layer through the feedforward neural network FFN to respectively generate global registration features of the synthetic aperture radar image SAR and global registration features of the visible light image OPT.
9. The system according to claim 7, wherein: the local registration module comprises a filter unit, a convolution unit, an European norm unit and a weighted fusion unit; The filter unit is used for initializing a group of 2D Gabor filters with uniform distribution angles according to a preset direction number D and a kernel size K, and stacking the 2D Gabor filters into a learnable parameter tensor; The convolution unit is used for respectively applying depth separable convolution to the feature images of the synthetic aperture radar image SAR and the visible light image OPT after global registration based on the global registration feature of the synthetic aperture radar image SAR and the global registration feature of the visible light image OPT to acquire response tensors in all directions; The Euclidean norm unit is used for calculating the Euclidean norms of each pixel point in all directions and generating a rotation-unchanged amplitude characteristic diagram; The weighted fusion unit is used for normalizing and constructing a space-direction attention map through channel average and Softmax, and carrying out weighted fusion on the original direction response according to the space-direction attention map, so as to respectively generate local registration features of the synthetic aperture radar image SAR and local registration features of the visible light image OPT.
10. The system according to claim 7, wherein: the cross-modal Mamba interaction module utilizes a parallel selective state space model to construct long-range dependence, and realizes cross-modal semantic alignment through weighted fusion.

Description

Method and system for cross-modal registration of synthetic aperture radar image and visible light image Technical Field The invention belongs to the technical field of image filtering, and particularly relates to a method and a system for cross-modal registration of a synthetic aperture radar image and a visible light image. Background Effective heterogeneous image registration is critical to the task of subsequent interpretation. In remote sensing applications, it is often difficult for a single-modality image to provide adequate and robust information support, and multi-source fusion is a necessary trend. The SAR and the OPT image have high complementarity, namely the optical image has rich textures, accords with human eye perception, is easily limited by cloud and fog, illumination and day and night, and has all-weather and all-day imaging capability, can penetrate partial ground objects, but has speckle noise and geometric distortion, and the structural expression is not visual. Therefore, the accurate cross-modal registration of SAR and OPT images is a key premise of downstream tasks such as multi-source fusion, change detection, target recognition and the like. However, the imaging mechanism, the observation geometry (such as SAR side view and optical front view) and the radiation characteristic difference between the two are significant, resulting in large angle rotation, dimensional change and local geometric distortion existing between the images. The traditional registration method has limited feature expression capability in heterogeneous scenes, but the traditional deep learning model can alleviate radiation difference, but is difficult to effectively construct long-range dependence and multi-scale geometric invariance, and especially has insufficient registration precision in complex ground object areas, so that the cooperative utilization of multi-mode remote sensing information is restricted. Disclosure of Invention In order to solve the technical problems in the background, the invention provides a robust and structural sensing method for cross-modal registration of a synthetic aperture radar image SAR and a visible light image OPT, which aims to solve the problems of large angle rotation, scale change and local geometric distortion in the prior art by constructing a twin network RSI-SMamba combining closed-loop global-local registration and structural sensing end-to-end state space models. The method realizes robust alignment of large rotation and scale change by constructing a twin network RSI-SMamba based on a state space model and GRI and GSI in GRB, introduces a learnable multidirectional Gabor convolution through LRB, finely captures a direction sensitive texture, and simultaneously, applies structure and angle constraint to feature embedding in the registration process based on a structure-aware angular marginal loss function ArcSS, thereby enhancing feature discriminant and cross-modal correspondence. In order to achieve the above object, the present invention provides the following solutions: A method for cross-modal registration of synthetic aperture radar images with visible light imagery, comprising: The method comprises the steps that firstly, a global registration module GRB is used for carrying out large-scale rotation and robust correction of scale difference between a synthetic aperture radar image SAR and a visible light image OPT, and global registration features of the synthetic aperture radar image SAR and global registration features of the visible light image OPT are respectively generated; Secondly, based on global registration features of the SAR and global registration features of the OPT, matching local texture structures in the SAR image and the OPT image by using a local registration module LRB, and respectively generating the local registration features of the SAR image and the OPT image; thirdly, performing Cross-modal semantic alignment on the global registration feature of the SAR image, the global registration feature of the OPT image, the local registration feature of the SAR image and the local registration feature of the OPT image by using a Cross-modal Mamba interaction module Cross-SS2Former to finish the Cross-modal registration of the SAR image and the OPT image. Preferably, in the first step, the method for generating the global registration features of the synthetic aperture radar image SAR and the global registration features of the visible light image OPT by using the global registration module GRB comprises: through the GRI sub-module, a learnable convolution kernel is applied to the synthetic aperture radar image SAR and the visible light image OPT in parallel at a plurality of preset angles, rotation and other variable characteristics are extracted and aggregated into a rotation invariant descriptor; Through a GSI sub-module, respectively dynamically generating a multi-scale feature pyramid for the synthetic aperture radar image SAR and the vi