CN-121982320-A - Remote sensing image segmentation method based on double-order saliency guidance and space gating

CN121982320ACN 121982320 ACN121982320 ACN 121982320ACN-121982320-A

Abstract

The invention discloses a remote sensing image segmentation method based on double-order saliency guidance and space gating, and belongs to the field of computer vision. The method comprises the steps of firstly constructing a training data set, extracting features through a double-branch encoder, wherein the features comprise a global sequence scanning branch and a saliency spiral scanning branch, the global layout information is reserved by adopting horizontal vertical forward and backward scanning, and the two-way spiral path is generated by positioning an initial anchor point through a saliency map and focusing on the features of a key region. The encoder realizes the deep fusion of the convolutional neural network and the Mamba model through a space gating state conversion mechanism, and the space characteristics are utilized to guide Mamba hidden state update. The decoder completes the calibration alignment of the double-branch features through a multi-stage cross-scan feature calibration module, strengthens semantic consensus and optimizes detail difference, and adopts a three-head supervision and consistency constraint strategy joint training model. The invention obviously improves the segmentation precision and efficiency of the complex ground object of the high-resolution remote sensing image.

Inventors

SUN SHIBO
ZHANG YUNZUO
ZHAO HUI
CHEN DAN

Assignees

石家庄铁道大学

Dates

Publication Date: 20260505
Application Date: 20260130

Claims (5)

1. The remote sensing image segmentation method based on the double-sequence significance guiding and space gating is characterized by at least comprising the following steps: S1, constructing a training data set; S2, constructing a double-branch encoder, performing feature extraction on an input image, completing downsampling operation, inputting the feature extraction into a next encoding module, arranging 4 encoding modules in the whole model encoding part, wherein each encoding module consists of two space priori perception modules and a Mamba fused feature extraction unit, and performing inter-branch addition fusion operation on features output by the 1 st, 2 nd and 3 rd layers of double-branch encoding modules; The double-branch encoder comprises a global sequence scanning path branch and a significance spiral scanning path branch, wherein a space prior perception module and a Mamba fused feature extraction unit are integrated in the two branches; The space prior perception module adopts a lightweight CNN codec, performs feature extraction on an input feature map through convolution operation to obtain a space feature map, and captures space structure, topological relation and nonlinear association information among pixels of an image; the feature extraction unit fused by the space priori perception module and Mamba realizes feature fusion through space gating state conversion, namely SGST mechanism, and the space feature images extracted by the corresponding branch space priori perception modules are used for generating a space feature image sequence according to a scanning path By the following constitution And calculating and generating a gating signal, modulating a state transition process of Mamba by the gating signal, wherein a state updating formula is as follows: , Wherein the method comprises the steps of For the element-by-element multiplication operation, , In order to gate the signal to be controlled, As a matrix of historical state weights, In order to input the weight matrix, For a sequence of images, In order to be in the current hidden state, Hiding the state for the previous moment; The global sequential scanning path branch adopts a horizontal and vertical forward and backward scanning mechanism, and a space feature map output by a remote sensing image and a space priori sensing module is flattened into a 1D sequence according to a scanning sequence; The saliency spiral scanning path branches firstly project a multichannel space feature image into a single-channel saliency image S through 1X 1 convolution, and the saliency image S is converted into a space probability distribution image P by using a space normalization function, wherein the formula is as follows: , Wherein the method comprises the steps of Respectively the abscissa and the ordinate of the characteristic image, H, W respectively the length and the width of the characteristic image, exp is an exponential function, Is super-parameter, start anchor point The calculation formula is as follows: , Wherein, the Is that Is defined by the transverse axis of (c), Is that Generating a clockwise inward outward and counterclockwise inward outward bidirectional spiral scanning path by using the initial anchor point, and flattening the remote sensing image and the space feature map into a 1D sequence according to the bidirectional spiral scanning path; S3, constructing a progressive cooperative three-path decoder, and decoding the output characteristics of the encoder; the progressive collaborative three-path decoder comprises a sequential decoding path, a spiral decoding path and a fusion decoding path, feature fusion and calibration are realized through a multi-stage cross-scan mechanism feature alignment and collaborative CSFC module, cross-branch cross calibration is performed through the CSFC module in each scale stage of the decoder, the features of a global sequential scanning path branch encoder are utilized to generate global mask constraint spiral decoding path features, meanwhile, the features after calibration are input into the next decoding module of each branch by utilizing the fine features of a significant spiral scanning path branch encoder, in the fusion decoding path, the features of each level of the sequential decoding path and the spiral decoding path after calibration through the CSFC module are added and fused with decoding features output by the last decoding module of the fusion decoding path, a fusion feature map is obtained through continuous up-sampling and nonlinear transformation of the decoding module, and finally the features output by the three-path decoder output a prediction result through a segmentation head; the multi-stage cross-scan mechanism feature alignment and collaborative CSFC module consists of a 1X 1 convolution layer, a 3X 3 depth separable convolution layer, a Sigmoid activation function layer, a global average pooling layer and hyperbolic tangent activation functions, and is used for inputting features respectively And Performing interaction of a cross-scan mechanism, adaptively adjusting fusion weights according to the consistency degree of the two branch features, strengthening feature response in a semantic consensus region, and performing detail calibration by using a difference term in a complex edge region generating cognitive conflict, wherein the specific formula is as follows: , , Wherein the method comprises the steps of For the element-by-element multiplication operation, In order to perform the element-by-element subtraction operation, For a1 x1 convolution operation, The convolutional layer may be separable for a depth of 3 x 3, For the calibrated synergic feature map, tanh is the numerical distribution of hyperbolic tangent activation function to balance the difference features; S4, training a model by adopting three-head supervision and consistency constraint training strategies, setting a sequential branch pre-measurement head, a spiral branch pre-measurement head and a fusion branch pre-measurement head, and losing functions Including segmentation loss And auxiliary consistency loss The total loss formula is: , Wherein the method comprises the steps of In order to fuse the branch prediction results, In order to sequence the branch prediction result, For a spiral branch prediction result, As a real tag it is possible to provide a real tag, In order to divide the loss of the device, In order to assist in the consistency loss, And As the weight coefficient of the light-emitting diode, , ; The auxiliary consistency loss And calculating by adopting symmetrical KL divergence, wherein the formula is as follows: , A function is calculated for the KL-divergence, The calculation formula of (2) is as follows: , Wherein H, W is the length and width of the current prediction result, C is the number of the divided ground object categories, Is a minimum value constant and is used for avoiding abnormal logarithmic operation numerical values; s5, inputting the remote sensing image to be segmented into a trained model, and outputting a pixel-level semantic segmentation result.
2. The remote sensing image segmentation method based on double-order significance guiding and space gating according to claim 1, wherein the lightweight CNN codec is an encoder-decoder architecture, the encoder part adopts a lightweight feature extraction unit combining depth separable convolution and large-kernel convolution, and the decoder part is composed of transpose convolution, depth separable convolution, transpose convolution, batch normalization layer and ReLU activation function; the light-weight feature extraction unit sequentially comprises a 7 multiplied by 7 large-core convolution layer, a depth separable convolution layer, a batch normalization layer and a ReLU activation function, and the space structure information of an input image is completely reserved while the total amount of network parameters is reduced through edge filling and convolution setting with the step length of 1.
3. The remote sensing image segmentation method based on double-order saliency guidance and space gating according to claim 1, wherein the generation process of the saliency spiral scanning path is that two reference direction vector sets are predefined , Wherein Representing a movement to the right and, Representing an upward movement of the device and the device, Representing a movement to the left, Representing a downward movement of the device and the device, Representing a slave set Or (b) The reference direction vector set of the clockwise spiral scanning path is that the current motion vector circularly selected by modulo operation is The reference direction vector set of the anticlockwise spiral scanning path is The path is from the initial anchor point Initially, the spiral growth is achieved by continuously changing the direction of movement, while the number of pixels d that are continuously stepped in the current direction before turning increases stepwise with increasing number of turns m, whose mathematical logic is: , Wherein m is a direction switch counter, the initial value is 0, To round down this means that the scan trajectory first follows Direction shift 1 pixel, steering trailing edge Move 1 pixel, turn back again Move 2 pixels, and so on, the current coordinates in each specific single pixel stepping operation According to the currently selected direction vector The next potential pixel location is calculated: , Wherein, the Is that Is defined by the transverse axis of (c), Is that For each generated potential coordinate, the system needs to perform real-time boundary validity discrimination, and the criteria are: , Only when Mask is 1, the coordinates are sequentially written into the one-dimensional index mapping table If the coordinates are out of range, the system directly ignores the point and continues to execute subsequent track stepping and steering logic until The total number of the effective coordinates contained in the index table reaches H multiplied by W, and the index table is finally constructed The discrete points in the two-dimensional space are arranged into a linear sequence with strong local relevance: , Wherein the method comprises the steps of Is that Each of the index tables , Corresponds to a unique pixel coordinate in the original image.
4. The method for remote sensing image segmentation based on bi-sequential saliency guidance and spatial gating according to claim 1, wherein the segmentation loss is Employing cross entropy loss With the Dice loss The specific calculation formula is as follows: , , , Wherein H, W is the length and width of the current prediction result, C is the number of the divided ground object categories, Probability value representing that current predicted branch is predicted as category c at pixel point and value range , Is the true label for category c at the pixel point, For the balance coefficient, take the value range The invention is provided with The accuracy of class probability distribution is guaranteed, and the problem of class imbalance is effectively relieved.
5. The remote sensing image segmentation method based on double-sequence saliency guidance and space gating according to claim 1, wherein in the double-branch encoder, the weights of the feature extraction units fused by the two-branch space prior perception module and Mamba are not shared, and global features and local fine features are independently learned respectively.

Description

Remote sensing image segmentation method based on double-order saliency guidance and space gating Technical Field The invention relates to the technical field of computer vision, in particular to a remote sensing image segmentation method based on double-order saliency guidance and space gating. Background The remote sensing image segmentation is a key technology for dividing the remote sensing image into different semantic areas according to the ground object category, and is widely applied to scenes such as land utilization monitoring, environmental change evaluation, city planning and the like. Along with the development of remote sensing technology, the high-resolution remote sensing image contains more detail of ground features, but the problems of sparse target distribution, complex space structure, strong background interference and the like exist at the same time, and higher requirements are put forward on the feature extraction capability of the segmentation model. Existing segmentation models are mainly divided into two types, a model based on Convolutional Neural Network (CNN) and a model based on sequence modeling. CNN model is good at capturing local space structure and topological relation, but has the problems of low efficiency and limited receptive field when modeling long-range dependence, and sequence modeling model (such as Mamba) based on a transducer or State Space Model (SSM) can efficiently process long sequence data and capture global dependence, but lacks prior perception of space structure, and easily generates deviation in feature boundary and fine structure segmentation of remote sensing image. In order to achieve both space structure capturing and long-range dependent modeling, some prior art attempts to fuse CNN with sequence models, but have the defects that firstly, the fusion mode is simple, most of the features are spliced or weighted summed, depth synergy of the two models is not realized, secondly, the scanning mode is single, fixed raster scanning is adopted, the differential feature extraction requirements of a significant region in a remote sensing image cannot be adapted, thirdly, cross-branch feature alignment is insufficient, semantic deviation exists among features generated by different scanning modes in a double-branch structure, and final segmentation accuracy is affected. Therefore, a remote sensing image segmentation method capable of realizing space priori guidance, adaptive scanning and depth feature alignment is needed. Disclosure of Invention The invention aims to provide a remote sensing image segmentation method based on double-sequence significance guide and space gating, which solves the problems that the existing model is difficult to consider in space structure modeling and long-range dependency capture, has poor adaptability of a scanning mode, deviation of cross-branch characteristics and the like in remote sensing image segmentation, and improves segmentation precision and efficiency. To achieve the above object, an embodiment of the present invention provides a remote sensing image segmentation method based on two-order saliency guidance and spatial gating, including the following steps: S1, constructing a training data set; S2, constructing a double-branch encoder, performing feature extraction on an input image, completing downsampling operation, inputting the feature extraction into a next encoding module, arranging 4 encoding modules in the whole model encoding part, wherein each encoding module consists of two space priori perception modules and a Mamba fused feature extraction unit, and performing inter-branch addition fusion operation on features output by the 1 st, 2 nd and 3 rd layers of double-branch encoding modules; The double-branch encoder comprises a global sequence scanning path branch and a significance spiral scanning path branch, wherein a space prior perception module and a Mamba fused feature extraction unit are integrated in the two branches; The space priori perception module adopts a lightweight CNN codec, and performs feature extraction on an input feature map through convolution operation to obtain a space feature map, and captures the space structure, topological relation and nonlinear association information among pixels of an image; The feature extraction unit fused by the space priori perception module and Mamba realizes feature fusion through a space gating state conversion (SGST) mechanism, and the space feature images extracted by the corresponding branch space priori perception modules generate a space feature image sequence according to a scanning path By the following constitutionAnd calculating and generating a gating signal, modulating a state transition process of Mamba by the gating signal, wherein a state updating formula is as follows: , Wherein the method comprises the steps of ,In order to gate the signal to be controlled,As a matrix of historical state weights,In order to input the weight matrix,For a