CN-122023848-A - Multi-mode image matching method based on phase consistency structural information distillation
Abstract
The invention discloses a multimode image matching method based on phase consistency structural information distillation. The method comprises the steps of firstly obtaining stable structure prior information through phase consistency calculation on a multi-modal image pair to be matched, forming structure-guided feature expression together with multi-scale features extracted by a convolutional neural network, carrying out structural distillation training on the network by utilizing structural consistency loss and cross-modal consistency loss, enabling the obtained coarse-scale features and fine-scale features to have stronger structural consistency under the cross-modal condition, completing coarse-scale global matching by adopting a feature transformation module with alternately stacked self-attention and cross-attention on the basis to generate candidate matching pairs, and carrying out local refinement on the candidate matching pairs through the fine-scale feature matching pairs to obtain a final matching set. According to the invention, the phase consistency structure prior is deeply fused with the deep learning matching framework, so that the matching precision and the computing efficiency are improved while the cross-mode matching generalization is ensured.
Inventors
- LIU HAIBO
- Geng Zhiling
- FENG CHENGUO
Assignees
- 湖南大学
Dates
- Publication Date
- 20260512
- Application Date
- 20260413
Claims (7)
- 1. The multi-mode image matching method based on phase consistency structure information distillation is characterized by comprising the following steps of: S1, distilling multi-scale feature coding and phase consistency structural information: S1-1, multimode image to be matched And Respectively calculating phase consistency characteristic diagrams 、 As a structure guide map; S1-2, will 、 Alignment to a set of scales by bilinear interpolation, average pooling, or other equivalent downsampling means Obtaining a coarse-scale phase consistency graph 、 And fine scale phase consistency plot 、 As teacher monitoring information; s1-3, multimode image is processed And Inputting the convolutional neural network to extract multi-scale features to obtain coarse-scale features of two images 、 Fine scale features 、 ; S1-4, constructing a phase consistency prediction head on each scale, mapping the characteristics into a phase consistency prediction graph, and obtaining a coarse scale phase consistency prediction graph as 、 And fine-scale phase consistency prediction graph 、 ; S1-5, constructing a loss function, performing structural distillation training, and constructing phase consistency structural distillation loss: (6), Wherein, the 、 As the weight coefficient of the light-emitting diode, In order to achieve a loss of structural consistency, Is a cross-modal consistency loss; S1-6 according to the loss function Performing joint training on the feature extraction network and the prediction head to obtain distillation coarse-scale features containing structure priori information 、 Fine scale features 、 ; S2, coarse-scale feature matching based on attention mechanism, namely introducing position codes to distilled coarse-scale features, performing context information and cross-image interaction through a feature transformation module to obtain fused coarse-scale features, calculating a similarity matrix based on the fused coarse-scale features, and obtaining a coarse-scale matching set from the similarity matrix ; S3, fine-scale feature matching based on attention mechanism, namely, for each pair of coarse matches in the coarse-scale matching set, cutting a local window on a distilled fine-scale feature graph, carrying out feature enhancement through a lightweight feature transformation module to obtain locally enhanced features, and carrying out correlation calculation and expected solution based on the locally enhanced features to obtain a final fine-scale matching set of sub-pixel level ; S4, phase consistency structure information distillation and matching combined optimization, namely respectively constructing coarse scale matching loss based on the steps S2 and S3 And fine-scale matching loss Combining the phase consistency structure distillation loss with the coarse-scale matching loss and the fine-scale matching loss to obtain a total loss: (14), And training the network accordingly to obtain a multi-scale feature extraction and matching model for reasoning.
- 2. The method for matching multi-modal images based on phase-consistency structural information distillation according to claim 1, wherein said step S1-1 specifically comprises the steps of: s1-1-1, carrying out inverse Fourier transform on the Log-Gabor to obtain an even symmetric filter in the spatial domain And odd symmetric filter And for images At different scales And direction of Respectively performing convolution operation to obtain corresponding response components And : (1), Wherein, represents convolution operation; S1-1-2, calculating an image At different scales And direction of Amplitude at And phase of : (2), (3); S1-1-3, for the same direction Phase calculation weighted mean phase for all scales above Defining a phase deviation function: (4), The phase values of the images in multiple directions and multiple scales are integrated to obtain the phase consistency of the images, and the calculation expression is as follows: (5), In the formula, Representation points A phase consistency value at which the phase is consistent, For the frequency spread weight factor, In order to compensate for the noise level, In order to prevent the denominator from being a constant of zero, Representing the take positive operator.
- 3. The method for matching multi-modal images based on phase-consistency structural information distillation according to claim 1, wherein said step S1-5 specifically comprises the steps of: s1-5-1, construction structural consistency loss : (7); S1-5-2, construction of Cross-modality consistency loss : (8), Wherein, the Is divided into dimensions Coarse scale and The thin dimension of the material is used for the production of the plastic material, 、 Is an image And On the scale of A phase consistency prediction graph on the model, 、 Is an image And On the scale of A phase consistency diagram on the upper surface of the optical fiber, Representing dimensions A corresponding position mapping function under, i.e. image And On the scale of Points on Mapping into an image At the same scale The coordinates of the corresponding points on the map, Representing a loss of structural similarity.
- 4. The method for matching multi-modal images based on phase-consistency structural information distillation according to claim 1, wherein said step S2 comprises the steps of: s2-1, introducing position codes for the coarse-scale features obtained in the step S-1 to explicitly inject spatial position information; S2-1-1, the coarse scale features obtained in step S-1 、 Flattened into a sequence representation by spatial location 、 Wherein , And The height and width of the feature map are respectively, Sequence index for characteristic channel number And two-dimensional coordinates One-to-one correspondence is achieved, , ; S2-1-2 for each position Constructing two-dimensional sinusoidal position coding vectors Its first one The number of channels is: (9), Wherein, the , , Forming a position coding matrix from the above Adding the position code and the coarse scale feature sequence to obtain And ; S2-2, position-coding the sequence 、 Inputting a characteristic transformation module with multiple layers of self-attention and cross-attention alternately stacked to obtain characteristics after integrating context information and cross-image interaction information 、 , (10), Wherein the method comprises the steps of Using standard multi-head attention or high-efficiency/linear attention implementations; S2-3, for image Arbitrary coarse-scale position in (a) And an image Arbitrary coarse-scale position in (a) Calculating a similarity score matrix : (11), Wherein, the The method is used for representing the inner product operation, Is that Temperature coefficient of (c); s2-4, based on similarity score matrix Obtaining a coarse matching confidence coefficient matrix through a micro-matchable operator Filtering out the matching with confidence coefficient lower than the threshold value and obtaining a coarse-scale matching set by mutually being nearest neighbor consistency constraint 。
- 5. The method for matching multi-modal images based on phase-consistency structural information distillation according to claim 1, wherein said step S3 comprises the steps of: s3-1, roughly matching each pair Mapping to fine scale feature map coordinates And on a fine scale feature map 、 Respectively to 、 Cutting the center into a size of Obtaining local fine scale feature blocks 、 ; S3-2, will 、 Input to a lightweight feature transformation module stacked alternately with self-attention and cross-attention Obtaining locally enhanced features 、 ; S3-3 to Is the center vector of the query, and Correlation calculation is carried out on all positions in the window Obtaining a matching probability map, and performing expected calculation on the probability map to obtain a map Pixel location on Outputting the final fine-scale matching set 。
- 6. The method for matching multi-modal images based on phase-consistency structural information distillation according to claim 1, wherein said step S4 comprises the steps of: S4-1, by true value transformation relation among images S2-4 obtained coarse matching confidence matrix Constructing coarse-scale matching loss : (12); S4-2 for each query point By calculating the total variance of the corresponding heat map To measure the uncertainty and construct fine-scale matching loss : (13), Wherein, the For the query point In the drawings A true corresponding position on the upper surface of the plate, A final fine-scale matching set; S4-3, combining the phase consistency structure distillation loss and the matching loss in the S1-5 to obtain the total loss: (14)。
- 7. The method for multi-modal image matching based on phase-consistency structural information distillation as set forth in claim 1, wherein the position codes in step S2 are sinusoidal position codes, learnable position codes or relative position codes.
Description
Multi-mode image matching method based on phase consistency structural information distillation Technical Field The invention relates to the field of computer vision, in particular to a multi-mode image matching method based on phase consistency structural information distillation. Background The multi-modality image matching aims at establishing homonymous point or pixel level correspondence from images acquired from different imaging mechanisms or different sensors. The multi-mode images such as visible light, infrared, synthetic aperture radar and the like have obvious differences in radiation characteristics, noise distribution and the like due to different imaging mechanisms, and the same object or the same structure often has large differences in different modes, so that the problems of unstable characteristics, increased mismatching and the like easily occur in the matching process. Existing image matching methods can be broadly divided into two categories, traditional methods based on manual features and methods based on deep learning. The traditional method generally relies on characteristics such as points and lines in an image, builds local feature descriptors based on structural information such as gradient and phase consistency, and builds corresponding relations, and has a certain interpretability and engineering maturity, better generalization performance and slower speed. The deep learning-based method obtains better effects in single-mode or weak difference scenes through network learning characteristic representation and matching relation, but has insufficient cross-mode generalization capability. Therefore, how to effectively introduce the structural information of the traditional method into the feature coding and matching framework based on the deep learning, provide an effective structural prior for the matching method based on the deep learning, improve the generalization of the cross-modal matching and simultaneously maintain the computing efficiency, and still need to be further studied. Disclosure of Invention Aiming at the problems that the traditional multi-mode image matching method in the prior art is low in calculation speed of extracting phase consistency structural information, and the matching method based on deep learning is poor in generalization of cross-mode matching, the invention provides the multi-mode image matching method based on phase consistency structural information distillation, which aims at fusing the computation efficiency of the cross-mode generalization of the phase consistency characteristics and the matching method based on deep learning and realizing stable and rapid matching of multi-mode images. The technical scheme adopted for solving the technical problems is that the multi-mode image matching method based on phase consistency structural information distillation comprises the following steps: S1, distilling multi-scale feature coding and phase consistency structural information: S1-1, multimode image to be matched AndRespectively calculating phase consistency characteristic diagrams、As a structure guide map; S1-2, will 、Alignment to a set of scales by bilinear interpolation, average pooling, or other equivalent downsampling meansObtaining a coarse-scale phase consistency graph、And fine scale phase consistency plot、As teacher monitoring information; s1-3, multimode image is processed AndInputting the convolutional neural network to extract multi-scale features to obtain coarse-scale features of two images、Fine scale features、; S1-4, constructing a phase consistency prediction head on each scale, mapping the characteristics into a phase consistency prediction graph, and obtaining a coarse scale phase consistency prediction graph as、And fine-scale phase consistency prediction graph、; S1-5, constructing a loss function, performing structural distillation training, and constructing phase consistency structural distillation loss: (6), Wherein, the 、As the weight coefficient of the light-emitting diode,In order to achieve a loss of structural consistency,Is a cross-modal consistency loss; S1-6 according to the loss function Performing joint training on the feature extraction network and the prediction head to obtain distillation coarse-scale features containing structure priori information、Fine scale features、; S2, coarse-scale feature matching based on attention mechanism, namely introducing position codes to distilled coarse-scale features, performing context information and cross-image interaction through a feature transformation module to obtain fused coarse-scale features, calculating a similarity matrix based on the fused coarse-scale features, and obtaining a coarse-scale matching set from the similarity matrix; S3, fine-scale feature matching based on attention mechanism, namely, for each pair of coarse matches in the coarse-scale matching set, cutting a local window on a distilled fine-scale feature graph, carrying out feature enhancement throu