CN-121329962-B - Lightweight remote sensing image change detection method for space misalignment

CN121329962BCN 121329962 BCN121329962 BCN 121329962BCN-121329962-B

Abstract

The invention discloses a lightweight remote sensing image change detection method used under space misalignment. The invention includes first extracting multi-scale features for registration and change detection using MobileNet V Larget. And secondly, realizing semi-dense feature point matching by using a space consistency module, and establishing a cross-scale space transformation model so that feature graphs with different scales are mutually aligned. Then, the space-time heterogeneity of the double-time phase features is enhanced by the space-time difference coordination module through the feature diagrams with different scales. And finally, fusing the multi-scale difference features to generate a change detection result. The SVCD, SYSU-CD and SECOND data sets are selected for testing, and compared with the current mainstream change detection network. The result shows that the method can effectively construct the space transformation relation between the images to be registered, is obviously superior to other methods in quantitative analysis and qualitative analysis, and has certain advantages in network complexity.

Inventors

Gong Liangxiong
QIU JIAPING
WANG GUANGZE
CAO XINGXING
FENG JINFU
WANG HONGGEN
CHENG YUANMING
ZHAO XINGYOU
Gui Leifeng
LIU CHUANRUI
QIAN XIAOGANG
LUO CHAO
CHEN ZHI

Assignees

南昌市测绘勘察研究院有限公司

Dates

Publication Date: 20260512
Application Date: 20251121

Claims (6)

1. The lightweight remote sensing image change detection method for the space misalignment is characterized by comprising the following steps of: for front and back time phase images, extracting multi-scale feature images under different time phases by utilizing a weight sharing backbone network; The coding stage comprises the steps of establishing a trans-scale space transformation model by utilizing a space consistency module, enabling feature images of different scales to be mutually aligned, and constructing a semi-dense corresponding relation under a certain scale of front-back time phase images so as to supervise the training of the space consistency module; The spatial consistency module comprises a feature point detection branch and a local descriptor branch; in the feature point detection branch, firstly, a local area of an input gray-scale image is flattened into a feature in a channel dimension by a sliding window expanding operation Any channel corresponds to an 8 x 8 grid region of the original image that is local and non-overlapping, then 41 x1 convolution pairs are used Finally, a lightweight characteristic point detection framework ALIKE is selected as a teacher network to monitor the characteristic point thermodynamic diagram Obtaining a characteristic point thermodynamic diagram by adopting the same method aiming at the post-time phase image T2 The difference is that the front time phase image T1 is subjected to random affine transformation to obtain T1 ́, the T1 ́ is converted into a gray level image, and the rear time phase image T2 is directly converted into a gray level image; in the descriptor branch, first, the extracted multi-scale feature map is processed by 1×1 convolution and bilinear upsampling operations Is kept consistent, and the sum of all feature maps is calculated Then, using convolution operations of different convolution kernel sizes in succession, calculating to obtain semi-dense local feature descriptors Finally, obtaining the confidence coefficient of the feature point by using 3 continuous 1X 1 convolution and Sigmoid functions Obtaining semi-dense local feature descriptors by adopting the same method aiming at the post-time phase image T2 And feature point confidence Wherein subscripts 1 and 2 are used to identify a pre-temporal image T1 and a post-temporal image T2; the aligned different scale feature images are subjected to a space-time difference coordination module to obtain substantial space spectrum difference features between the two-time-phase images; The space-time difference collaborative module comprises space collaborative awareness and semantic collaborative awareness, and the space collaborative awareness and the semantic collaborative awareness calculate absolute differences between the aligned double-time-phase characteristic images under different scales i Secondly, constructing semantic difference features between the aligned double-phase feature images in a channel alternative splicing mode ∈ Then, the semantic difference characteristic after channel dimension reduction is obtained by using an efficient channel attention mechanism and 1X 1 convolution ∈ H and W are respectively high and wide, C is the channel number, and then in the semantic collaborative perception branch, the semantic collaborative difference feature is calculated by utilizing a multi-channel cross attention mechanism ∈ In the space cooperative sensing branch, calculating characteristic diagrams under different receptive fields in 4 different void rate convolution parallel connection modes, and calculating space cooperative difference characteristics by using a space cross attention mechanism ∈ And finally, to And Channel splicing is carried out, and space-time difference cooperative characteristics are obtained after 1 multiplied by 1 convolution, batch normalization and ReLU activation treatment ∈ ; In the space cooperative sensing, a space cross attention mechanism is utilized to construct the space dependency relationship of the space difference feature and the semantic difference feature, and for a given space difference feature map ∈ I.e. And semantic difference feature map ∈ I.e. Firstly, generating characteristic graphs of different receptive fields by utilizing 3X 3 depth separable convolution of different void ratios And then to Summing calculation is carried out, and feature graphs after different receptive fields are fused are calculated by utilizing a ReLU activation function ∈ ; (1) Wherein: A depth separable convolution representing a convolution kernel of size 3 and a void fraction d; Second, in the channel dimension, pair Respectively carrying out average pooling and maximum pooling, then splicing the two, and finally utilizing 7×7 convolution Calculating a spatial feature map with Sigmoid function ∈ ; (2) Wherein AvGpool and Maxpool represent average pooling and maximum pooling, respectively, concat represents channel stitching, Representing a Sigmoid function; then, in the spatial dimension of any channel, pair Performing average pooling, and calculating channel characteristic diagram by using 1×1 common convolution, reLU activation function and Sigmoid function ∈ ; (3) Wherein: The convolution kernel is represented by convolution with the size of 1, and the input channel and the output channel are respectively C1 and C2; Finally, will And (3) with 、 Is a result of multiplication of (a) Adding to obtain a space cooperative attention graph ∈ ; (4) In the semantic collaborative awareness, a multi-head channel cross attention mechanism is utilized to construct the channel dependency relationship of the space difference feature and the semantic difference feature, and for a given space difference feature map ∈ I.e. And semantic difference feature map ∈ I.e. First, to 、 The average pooling process of 5 x 5 windows is performed, followed by layer normalization, 1x1 convolution Calculating to obtain a space pooling feature map Pooling feature map for channels And ∈ ; (5) (6) Wherein: average pooling with a representative window size of 5 and input and output space sizes of H x W, H ́ x W ́ respectively, Normalizing the representative layer; Then, will And Matrix multiplication is performed and divided by a scaling factor Obtaining an attention weight matrix through Softmax function normalization, and finally, combining the attention weight matrix with the attention weight matrix Multiplying to obtain features of global context association ∈ ; (7) Finally, to Global average pooling is performed, followed by Performing matrix multiplication operation to obtain a semantic collaborative attention graph ∈ ; (8) And in the decoding stage, the multi-scale spatial spectrum difference features are fused, so that a change detection result is generated.
2. The method for detecting the change of the lightweight remote sensing image under the condition of space misalignment of claim 1, wherein the backbone network is MobileNet V Larget, prior to feature extraction, the front-time-phase image T1 is subjected to random affine transformation to obtain T1 ́, and a semi-dense corresponding relation of T1 ́ and T2 under a 1/8 scale is constructed according to a matching relation between the front-time-phase image T1 and the rear-time-phase image T2.
3. The method for detecting light-weight remote sensing image change under space misalignment of claim 1, wherein the joint local descriptor loss in the space consistency module Confidence loss of feature points Loss of feature point detection Pixel level offset loss Training, and the specific calculation process of each loss is as follows: 1) Local descriptor loss First, from the semi-dense descriptors of the spatial transformation model 、 Mid-sampling to obtain two sets of descriptors 、 ∈ Wherein each group contains N64-dimensional descriptors, N being the number of semi-dense corresponding points, and secondly, calculating the descriptors 、 Two-way similarity matrix S epsilon between Then, the softmax function is applied to each row of S and S T , respectively, to obtain a matching probability matrix P=softmax (S) and Q=softmax (S T ), the ith row of the matrix P representing Is the ith point of (2) The matching probabilities of all points in (a) and (b) are the same, and in ideal cases, the correct matching correspondence should be located on the diagonal of the matrices P and Q, so that the local descriptor loss is the sum of the negative log likelihood of the correct matching probabilities: (9) 2) Feature point confidence loss First, from a confidence map in a spatial transformation model 、 Sampling to obtain two groups of confidence coefficients consistent with the sampling point positions of the local descriptors 、 ∈ Then, calculate the probability matrix P and Q using the matching Is at the ith point of (2) Maximum match confidence in (a) And its inverse maximum match confidence The confidence of the bi-directional matching of the ith point is And finally, constructing a feature point confidence loss through the L1 norm: (10) 3) Feature point detection loss The feature point detection branch adopts knowledge distillation mode to make supervision training, the supervision signal provided by teacher network ALIKE is feature point position set A = { (x, y) } under the resolution of original image, and the feature point detection loss is The definition is as follows: (11) wherein: And Is a thermodynamic diagram of characteristic points of the network prediction, H and W are the height and width of the thermodynamic diagram respectively, Is that The corresponding ALIKE predicted tag at position (i, j), Representing that at position (i, j), the model prediction belongs to a label The probability of a category is determined by, The calculation formula is as follows: (12) wherein: Representative of a downward rounding, for thermodynamic diagrams Each position (i, j) on the gray image corresponding to an 8 x 8 grid area, if a feature point (x m , y m ) epsilon A exists, downsampling into the grid, encoding its relative position in the grid into an integer between 0 and 63, otherwise, labeling Set to 64, indicating that no feature point exists at this position; 4) Pixel level offset loss Descriptor obtained by sampling And Splicing along the channel dimension, then inputting the spliced channel dimension into a multi-layer perceptron MLP, and calculating to obtain a prediction offset confidence matrix O epsilon ; (13) Then, constructing an offset loss function based on the predicted offset confidence matrix O and the true offset label l e {0,1, 2..63 }, and weighting by the bi-directional matching confidence M i of the matching pair; (14) in the formula, let the semi-dense point set of T1 ́ be P= { (x, y) }, the calculation mode of l is as follows: (15) Finally, the overall matching loss function defines the weighting of the individual losses.
4. The method for detecting the change of the lightweight remote sensing image under the condition of space misalignment according to claim 1, wherein the decoding stage utilizes a scale self-adaptive sensing module to fuse difference feature images under different scales in a top-down and bottom-up dynamic weight fusion mode, finally, the feature images are up-sampled by 4 times by bilinear interpolation, and then the channel number is adjusted to be the category number by 1X 1 convolution, so that a change detection result is output.
5. A light-weight remote sensing image change detection system for the condition of space misalignment is characterized by comprising a processor and a memory, wherein the memory is used for storing program instructions, and the processor is used for calling the program instructions in the memory to execute the light-weight remote sensing image change detection method for the condition of space misalignment according to any one of claims 1-4.
6. A computer readable storage medium comprising a readable storage medium having a computer program stored thereon, which when executed, implements a lightweight remote sensing image change detection method for use under spatial misalignment as claimed in any of claims 1-4.

Description

Lightweight remote sensing image change detection method for space misalignment Technical Field The invention belongs to the field of remote sensing image processing, and particularly relates to a lightweight remote sensing image change detection method used under space misalignment. Background Remote sensing image registration and change detection are key tasks of multi-time sequence analysis. The remote sensing image registration is to perform spatial position pixel-by-pixel alignment on the images of the same area with different imaging mechanisms, different sensors, different time phases and different visual angles, and is a key step of downstream tasks such as multi-time phase image change detection, image fusion, image analysis and the like. The change detection is to analyze the remote sensing images of different time phases in a specific area, identify and quantify the surface change of the same area, and is widely applied to the fields of natural resource monitoring, ecological environment monitoring, national defense safety and the like. In the field of traditional remote sensing image registration, a large number of methods have been proposed by related scholars, and are mainly divided into methods based on regions, features and learning. The region-based method directly builds an optimal geometric transformation model by comparing shallow image information according to similarity measures such as mutual information (Mutual Information), cross-correlation coefficients (Cross Correlation) and the like, and realizes pixel level alignment between images to be registered. However, region-based methods are more sensitive to non-linear radiation (intensity) differences, and there is a serious feature redundancy in the low-level characterization of images, creating robust feature detection and description models between images presents significant challenges. The feature-based registration process is divided into four steps, feature extraction, feature description, feature matching, and spatial geometric transformation. The geometrical transformation parameters, such as representative methods of SIFT, ORB and SURF, are estimated according to the local correspondence between features using salient geometrical features of points, lines and planes of the image. However, due to the accumulation of computational errors and the heterogeneity of local deformations of the image (such as topography fluctuations, sensor distortions, etc.), feature-based methods have difficulty accurately representing nonlinear pixel-level deformations, resulting in modeling distortions of pixel spatial correspondences. Due to the rapid development of artificial intelligence technology, end-to-end remote sensing image registration based on deep learning has great potential, and meanwhile, a plurality of change detection methods based on deep learning are sequentially developed, and compared with the traditional method, the method has remarkable breakthrough in detection precision and robustness under complex change scenes. However, with the rapid development of earth observation technology, the heterogeneity of dual-phase images under different imaging conditions of different sensors leads to the difference of geometric forms and spectral responses of the same ground object in the dual-phase images. Therefore, how to effectively construct a substantial spatial spectrum difference between two-phase images is continuously paid attention to by researchers. Although the existing method has significantly advanced in the aspects of remote sensing image registration and change detection respectively, registration and change detection are generally regarded as two independent tasks, and a unified framework for registration and change detection is not effectively constructed. In addition, the existing change detection method lacks cooperative interaction of difference feature space and semantic information, and is difficult to effectively construct substantial spatial spectrum differences between two-time-phase images. Disclosure of Invention Aiming at the current deep learning method, registration and change detection are generally regarded as independent tasks, and unified processing of registration and change detection cannot be performed. In addition, the existing change detection method lacks cooperative interaction of difference feature space and semantic information, and is difficult to effectively construct substantial spatial spectrum differences between two-time-phase images. Therefore, the invention provides a lightweight network for joint remote sensing image registration and change detection. First, mobileNet V Large is used to extract multi-scale features for registration and change detection. And secondly, realizing semi-dense feature point matching by using a space consistency module, and establishing a cross-scale space transformation model so that feature graphs with different scales are mutually aligned. Then, the space-