CN-121982545-A - Mamba-based edge-refined remote sensing image semantic change detection method

CN121982545ACN 121982545 ACN121982545 ACN 121982545ACN-121982545-A

Abstract

The invention discloses a Mamba-based edge-refined remote sensing image semantic change detection method, and belongs to the technical field of remote sensing image change detection. Aiming at the problems that the detail optimization of a boundary area is insufficient and the predicted edge is rough in the process of feature extraction and fusion in the prior art, the invention provides the following technical scheme that firstly, a twin Mamba encoder backbone network is utilized to extract multi-level features of a double-time-phase remote sensing image, secondly, cross-time-phase feature interaction and difference feature extraction are carried out through a Mamba-based difference module, then, an edge-refined visual state space decoder is adopted to fuse expansion and corrosion operation and an attention mechanism to strengthen edge information, and meanwhile, the learning capacity of the model on the edge detail is improved by combining a loss function strategy of depth boundary supervision and change area supervision. Based on SECOND dataset verification, the method is superior to the existing main stream method in terms of indexes such as precision, cross-over ratio, F1 score and the like, the boundary precision and semantic segmentation effect of change detection are remarkably improved, and the method is suitable for high-precision demand scenes such as city planning, disaster assessment and the like.

Inventors

ZHANG YUNZUO
SUN SHIBO
Zhen Jiawen
SUN YUCHUAN
HUO LEI

Assignees

石家庄铁道大学

Dates

Publication Date: 20260505
Application Date: 20260201
Priority Date: 20250519

Claims (5)

1. The edge refined remote sensing image semantic change detection method based on Mamba is characterized by comprising the following steps of: S1, constructing a twin encoder based on a visual state space model, and respectively extracting multi-scale features of a double-phase remote sensing image; the twin encoder based on the visual state space model comprises: carrying out 2D selective scanning on an input characteristic image, namely carrying out convolution operation with the convolution kernel size of 4 and the step length of 4 on an image with the input resolution of 512x512, changing the resolution of the input image into 128x128 and the channel number from 3 to 128; iterating N times to complete feature extraction, wherein N is changed according to different levels, the backbone network is four levels in total, N of each level is set to be 2,2,15,2, and the features after the level extraction are integrated by blocks, the number of channels is doubled and then transmitted to the next level of the backbone network until the end; s2, constructing a Mamba-based difference module for extracting difference features between double-time-phase features output by the twin encoder; The Mamba-based difference module includes: feature maps of different phases extracted from a twinned skeleton of a visual state space model And Subtracting the absolute values, and transmitting the absolute values into a visual state space model to obtain a feature map Feature map And After the linear layer and the depth convolution module are respectively carried out, a characteristic diagram is obtained And ; Using a linear layer to map the two features And Generating matrices B, C, D and The matrix C generated by the two feature graphs is exchanged to achieve information interaction between the features, expressed as: , , , , , , Exp () in the above formula is a matrix exponential equation, the behavior n of matrix a, the columns k, And 、 And 、 And 、 And From characteristic diagrams And The matrix A is a diagonal matrix in the training process, and the matrix index calculation of the matrix A can be simplified into scalar index calculation of numerical values on diagonal lines, namely: , In the above Diagonal elements for matrix a; will output the characteristic diagram And Performing layer normalization, and transmitting into linear layer and original input feature map And Element-level addition is performed to the feature map Element level multiplication is carried out to obtain a characteristic diagram And feature map For characteristic diagram And feature map Performing channel dimension splicing, halving the channel dimension by convolution with the convolution kernel size of 1, performing batch normalization and ReLU function activation on the channel dimension to obtain an output feature map Expressed as: , , , LN is layer normalization, linear represents a Linear layer, concat represents channel dimension splicing, BN represents batch normalization, and ReLU represents an activation function; S3, inputting the difference features into a visual state space decoder with edge refinement, and generating an edge information refinement semantic change graph by combining edge information enhancement by an edge enhancer; The edge-refined visual state space decoder comprises: Map the characteristic map Processing by using a state space module, and carrying out layer normalization and convolution to obtain a characteristic diagram Expressed as: , For output characteristic diagram Enhanced edge effect feature using edge extractor ; Convolving the features extracted from the edge information with the feature map Element level multiplication is carried out to achieve an enhanced feature map Is expressed as: , For characteristic diagram Convolving to form an output result of edge supervision, and combining the output result with Fusing, and calculating channel dimension weights of fused features to obtain a feature map ; Map the characteristic map And feature map Element level multiplication is performed, and the element level multiplication is performed with a feature map Adding, inputting into a space attention module SA, outputting a characteristic diagram Expressed as: , Map the characteristic map And feature map Feature map Multiplying by a feature map Adding to obtain an output characteristic diagram Expressed as: ; S4, calculating multi-level edge loss through edge labels by adopting a deep edge supervision strategy, and optimizing a network training effect; s5, fusing the depth edge loss, the semantic segmentation loss and the change detection loss; s6, inputting the double-phase remote sensing image into the trained model, and outputting a semantic change detection result.
2. The Mamba-based edge-refined remote sensing image semantic change detection method as claimed in claim 1, wherein the depth edge supervision strategy comprises: The method comprises the steps of carrying out edge extraction on a change image label in a training data set by using a Canny operator, taking the change image label as an edge label for training, guiding a decoder of a model to pay attention to feature learning of an edge part in the training process, carrying out edge supervision by combining binary cross entropy and Dice loss, and carrying out depth edge supervision training by setting different weights on depth edge loss on different levels, wherein the depth edge supervision training is expressed as follows: , , , , Wherein, the Representing the first on the label The value of the pixel is determined by the pixel value, Representing model prediction The value of the pixel is determined by the pixel value, Is a very small number, is used to prevent zero removal errors, Balancing BCE loss and Dice loss on behalf of balancing factors, setting Weights between depth edge losses for different levels in depth supervision Setting up 。
3. The Mamba-based edge refinement remote sensing image semantic change detection method according to claim 2, wherein the fusing of depth edge loss, semantic segmentation loss and change detection loss includes: a change detection section, which is obtained by combining a cross entropy function with a Dice loss, the partial loss function being expressed as: , The loss function of the semantic segmentation part is obtained by combining cross entropy with the Lovasz-softmax function, and the loss function of the part is expressed as: , , Wherein, the And Remote sensing images representing different imaging times, the overall loss function is defined as follows: 。
4. The method for detecting semantic changes of Mamba-based edge-refined remote sensing images according to claim 1, wherein the edge extractor comprises: For output characteristic diagram Expansion Dilate and corrosion Erode operations are performed respectively, and edge characteristics are obtained by subtracting the characteristic diagram after corrosion from the characteristic diagram after expansion, and are expressed as: 。
5. the method for detecting semantic changes of Mamba-based edge-refined remote sensing images according to claim 1, wherein the step of calculating channel dimension weights of fusion features obtains feature graphs Comprising: Map the characteristic map Continuously using two convolutions to form an output edge feature map, and combining the output edge feature map with an edge label to perform supervision training And feature map Adding element level, inputting the processed characteristic diagram into channel attention module CA to obtain output characteristic diagram Expressed as: 。

Description

Mamba-based edge-refined remote sensing image semantic change detection method Technical Field The invention relates to a Mamba-based edge-refined remote sensing image semantic change detection method, and belongs to the technical field of remote sensing image change detection. Background The semantic change detection is used as an important research direction in the fields of remote sensing and computer vision, and has an irreplaceable effect in practical applications such as urban planning, disaster assessment, environment monitoring, land use analysis and the like. Through analysis of the double-phase or multi-phase remote sensing image, semantic change detection can identify the change of the earth surface coverage type and give semantic labels to the change, and key information support is provided for decision making. Daudt et al propose various supervised learning methods based on a full convolutional neural network for remote sensing image semantic change detection, and propose a network architecture capable of simultaneously performing change detection and surface coverage mapping, and use predicted surface coverage information to assist in change detection. Ding et al decouple the semantic segmentation and the change detection of the two subtasks, realize the depth integration of the two subtasks through the depth feature fusion, and design Siam-SR and Cot-SR blocks for enhancing the semantic representation of each time phase branch and modeling the semantic correlation between the two time phase features. Chen et al applied Mamba architecture in the field of remote sensing image change detection and designed three different architectures for three different change detection tasks, binary change detection, semantic change detection and building damage assessment. The existing semantic change detection method is insufficient in focusing on detail optimization of a boundary area in a feature extraction and fusion link or lacks a special edge refinement mechanism. Some of the global semantics are biased to ignore the local boundary precision, and some of the global semantics are dependent to be simply fused without solving the boundary discontinuity, so that the predicted change graph has rough edges, and the performance of the predicted change graph in the application scene of high-precision boundary division is limited. The rough edge not only reduces the visual quality of the change area, but also can cause misjudgment on the change range, especially in tasks requiring accurate boundary information such as city expansion monitoring or post-disaster reconstruction. Disclosure of Invention Aiming at the problem of rough edges in the existing method, the invention aims to provide a Mamba-based edge refined remote sensing image semantic change detection method, which comprises the following steps: S1, constructing a twin encoder based on a visual state space model, and respectively extracting multi-scale features of a double-phase remote sensing image; S2, constructing a Mamba-based difference module for the difference characteristics between the double-phase characteristics output by the twin encoder; The Mamba-based difference module includes: Feature maps of different phases respectively extracted from twin trunks composed of visual state space models AndSubtracting the absolute values, and transmitting the absolute values into a visual state space model to obtain a characteristic diagramFeature mapAndAfter the linear layer and the depth convolution module are respectively carried out, a characteristic diagram is obtainedAnd; Using a linear layer to map the two featuresAndGenerating matrices B, C, D andThe matrix C generated by the two feature graphs is exchanged to achieve information interaction among the features, and the whole calculation process is expressed as: , , , , , , Exp () in the above formula is a matrix index equation, and the behavior n and the column of matrix a are k. And、And、And、AndFrom characteristic diagramsAndObtained by linear layer projection. Matrix arrayThe discretization step length parameter obtained through input feature learning controls the time resolution of a state space model on an input sequence or the adjusting capability of a receptive field, and is a key part of a selective mechanism. The matrix a presented here is actually a HIPPO initialized a matrix, which is designed as a diagonal matrix during the actual training process in order to achieve parallel training. HIPPO initialization is a mathematical initialization strategy for the state space matrix A that has been demonstrated to help the model efficiently memorize historical information. In the invention, we consult the idea to set the initial value of the diagonal matrix a to a value that follows the decay law of the characteristic values of the HIPPO matrix to provide a good training starting point. Firstly extracting characteristic value attenuation rules of an HIPPO original matrix, mapping the characte