CN-121982453-A - Intelligent interpretation method for double-time-phase remote sensing image

CN121982453ACN 121982453 ACN121982453 ACN 121982453ACN-121982453-A

Abstract

The invention provides a double-phase remote sensing image intelligent interpretation method, which is characterized in that a boundary constraint change detection model BCnet based on a visual basic model is constructed by the method, the general semantic representation potential of the visual basic model is fully mined, and the problem of high-frequency information loss in the direct migration process of the visual basic model is solved by introducing a difference detail enhancement module and a multi-scale edge enhancement module, so that fine depiction of tiny changes and complex textures is realized. On the basis, an edge feature constraint strategy is designed, and a boundary supervision signal is introduced into a feature space and a prediction space simultaneously by combining an edge feature aggregation module, so that the continuity and geometric integrity of a change detection result on a space structure are enhanced. Experimental results prove that BCnet has higher robustness and precision advantages in high-difficulty scenes such as dense building groups, complex edges and the like.

Inventors

TANG LIJUN
LIU SHENBO
ZHAO DONGXUE

Assignees

长沙理工大学

Dates

Publication Date: 20260505
Application Date: 20260130

Claims (8)

1. The intelligent interpretation method of the double-time-phase remote sensing image is characterized by comprising the following steps of: s1, constructing a boundary constraint remote sensing image change detection model based on a visual basic model; step S2, training the change detection model based on a remote sensing image data set to obtain a trained change detection model; S3, inputting the remote sensing images with different time phases into a trained change detection model to obtain an interpretation result; The boundary constraint remote sensing image change detection model BCnet based on the visual basic model comprises a SAM encoder module, a difference detail enhancement module, a multi-scale edge enhancement module, an edge feature aggregation module and a change detection head, wherein the edge feature aggregation module comprises an edge detection head, and the change detection head comprises a convolution with a convolution kernel of 3 multiplied by 3; The method comprises the steps of inputting a dual-time-phase remote sensing image into a SAM encoder module to extract shallow semantic features and four-level semantic features, inputting the four-level semantic features into a difference detail enhancement module to obtain four difference features, inputting the shallow semantic features and the four difference features into a multi-scale edge enhancement module to obtain multi-scale edge features, inputting the multi-scale edge features into an edge detection head to obtain an edge mask, inputting the multi-scale edge features into an edge feature aggregation module to obtain edge aggregation features, and inputting the edge aggregation features into a change detection head to obtain a change mask; In the SAM encoder module, a single-input SAM image encoder is improved to a twin structure with two double-phase remote sensing image inputs, that is, the double-phase remote sensing images T1 and T2 are respectively input into a first SAM image encoder and a second SAM image encoder, and the two SAM image encoders share weights; The difference detail enhancement module comprises four difference detail enhancement units DDEU, namely a first, a second, a third and a fourth DDEU respectively connected to the transducer layers of the 6 th, 12 th, 18 th and 24 th SAM image encoder with global receptive field, specifically, the remote sensing image T1 is input into the first SAM image encoder from which the first SAM image encoder with global receptive field Individual Transformer layer acquisition semantic features The remote sensing image T2 is input into a second SAM image encoder from the corresponding first SAM image encoder Acquisition of semantic features by the Transformer layer Wherein, the method comprises the steps of, ; Subsequently, feature pairs And Is input into the first DDEU, and generate an output Wherein, the method comprises the steps of, The hierarchy number representing the transducer is shown, Unit numbers of DDEU are shown, an The specific correspondence is that And As an input to the first DDEU, an output Will (i) be And As an input to the second DDEU, an output Will (i) be And As an input to the third DDEU, an output Will (i) be And As an input to fourth DDEU, an output 。
2. The method of claim 1, wherein in describing the structure of DDEU, use is made of And Representing two input features of DDEU, use Representing its output characteristics, DDEU comprising two parallel branches, a detail guide branch and a difference modeling branch; In detail instruction branch, input features And Splicing in the channel dimension to obtain splicing characteristics , wherein, The method comprises the following steps: Wherein, the Representing and splicing; Subsequently, the splice features are subjected to a two-layer convolution operation Enhancement and compression-a first layer convolution is used to strengthen the feature and a second layer convolution is used to reduce the number of channels to a dimension Obtaining detail guidance features ; In the difference modeling branch, firstly, calculating element-by-element absolute difference of two input features to generate a difference feature : To capture the distribution characteristics of the variation information in both the spatial dimension and the channel dimension, Is sent into two parallel sub-paths, on the channel sub-paths, the channel statistical characteristics are obtained through global maximum pooling and compression along the space axis On the space sub-path, compressing along the channel axis by global maximization to obtain the space response characteristic ; Channel attention weights and spatial attention weights are then generated by a convolution operation in combination with a Sigmoid activation function, respectively, i.e And : Wherein, the Representing a Sigmoid function for constraining weights to the [0,1] interval; finally, the attention weights are respectively applied to the detail guiding features And generating enhanced difference features through weighted fusion : 。
3. The method of claim 2, wherein the input of the multi-scale edge enhancement module MEEM includes semantic fusion features from a1 st transducer layer in the SAM image encoder And four difference features from the difference detail enhancement module output 、、、 Wherein the remote sensing image T1 is input into a first SAM image encoder, and shallow semantic features are acquired from the 1 st transducer layer with the global receptive field The remote sensing image T2 is input into a second SAM image encoder to obtain shallow semantic features from the 1 st transducer layer ; And Element-by-element addition to obtain ; The process flow of MEEM is as follows: (1) Shallow layer edge prior feature extraction: MEEM first fuse features to shallow semantics Performing multi-scale convolution processing by converting a group of parallel 3×3 convolutions and up-sampling into shallow edge prior features with different spatial resolutions and uniform channel dimensions Specifically, the method comprises the steps of, The first sub-path is obtained by 8 times up-sampling and 3 x3 convolution through four parallel sub-paths The second sub-path is obtained by 4 times up-sampling and 3 x 3 convolution The third sub-path is obtained by 2 times up-sampling and 3×3 convolution The fourth sub-path is obtained by 1 times up-sampling and 3×3 convolution ; (2) Differential feature alignment and enhancement: for difference features from difference detail enhancement module output Firstly, enhancing the scale information expression capability by transposition convolution, and then adopting 1X 1 convolution to compress and align channel dimensions to obtain multi-scale difference characteristic representation of uniform channel number : Wherein, the Representing a transpose convolution; representing a convolution; (3) Feature fusion Shallow edge prior feature Features different from multiple scales Dimension addition is carried out to obtain fusion characteristics : (4) Two-way interaction The two-way interaction strategy combining top-down and bottom-up is adopted, namely, high-level semantic features are transferred to a lower layer through downsampling so as to inhibit background noise; In particular the number of the elements, , By downsampling Adding to obtain , By downsampling Adding to obtain , By downsampling Adding to obtain ; , By upsampling Adding to obtain , By upsampling Adding to obtain , By upsampling Adding to obtain ; (5) Edge enhancement and multi-scale feature integration After each level of bidirectional interaction is fused, introducing a multi-layer perceptron to carry out nonlinear mapping and channel recalibration on the characteristics so as to further enhance boundary response and obtain edge enhancement characteristics of output of each scale I.e. Obtained by a multi-layer perceptron ; Edge enhancement features that output individual scales Uniformly adjusting to 128×128 resolution, and splicing in the channel dimension to form the final multi-scale edge feature representation: 。
4. The method of claim 3, wherein the edge feature aggregation module EFAM outputs features of MEEM EFAM includes an edge detection head and a double-layer convolution as inputs; the path through the edge detection head is referred to as the edge detection head path, The path directly through the double-layer convolution is called the original feature path; in the edge detection head, for input features Maximum pooling is carried out along the horizontal direction and the vertical direction respectively to obtain the sensitive statistical characteristics of the two directions And The operation not only captures the long-range context dependence in the horizontal direction, but also reserves the position relation in the vertical direction, thereby being beneficial to the model to more accurately position the space range of the change object; Subsequently, it will And (3) with Respectively sending the signals into a convolution layer, and generating normalized spatial horizontal direction attention weight through a Sigmoid activation function And a spatially perpendicular direction attention weight ; Wherein, the Representing a Sigmoid activation function; representing a convolution; After the attention weight is obtained, the EFAM carries out self-adaptive weighting on the input characteristics through element-by-element multiplication to realize the fine focusing on the change area, and the method is as follows: to further enhance local feature consistency and discriminant, EFAM weights heavily the features Applying a layer of convolution transformation; Meanwhile, in the original characteristic path, the double-layer convolution pair is adopted To prevent the damage of the original multi-scale edge feature caused by the over-strong boundary guiding, the module adopts a residual error connection mechanism to fuse the edge detection head path and the original feature path to obtain an edge aggregation feature The following is shown: 。
5. The method of claim 4, further designing an edge feature constraint strategy EFCS comprising three parts of edge label generation, edge supervision loss, and edge back-steering: (1) Edge label generation An automatic generation strategy of edge label based on morphology is designed, a reliable boundary supervision signal is built from binary real label by an edge generator, specifically, firstly, gaussian filtering is adopted to smooth the real label of a change area to restrain noise and relieve the discreteness of marked edge, then, the initial boundary of the change area is extracted from the smoothed real label by a Canny edge detection operator, and in order to enhance the continuity and fault tolerance of the boundary, the initial boundary is expanded to obtain the edge label ; (2) Edge supervision loss BCnet Multi-scale edge features during training Inputting the branches of the edge detection head and outputting a boundary prediction probability map I.e. edge masking, to compromise the structural consistency of the classification accuracy and the overall shape at the pixel level, an edge Loss function is used which combines binary cross entropy Loss (BCE Loss) with Dice Loss : Wherein, the Representing a binary cross entropy loss; representing the Dice loss; (3) Edge reversal guidance Aggregating edges into features The boundary prediction probability map is input into the change detection head branch together to obtain a change prediction result, namely a change mask, so as to realize the guidance of a characteristic layer The change area is supervised by adopting binary cross entropy loss, and the method is as follows: Wherein, the Representing a real result of the change, namely a real label; combining the change detection task and the boundary constraint task, training the model by adopting the following joint loss function: Wherein, the And represents a trade-off coefficient for balancing the change detection accuracy with the boundary constraint intensity.
6. The method of claim 5, wherein the step of determining the position of the probe is performed, 0.3.
7. The method of any of claims 1-6, wherein introducing LoRA adapter performs parameter tuning of the SAM image encoder.
8. The method of claim 7, wherein the remote sensing image dataset of step S2 is LEVIR-CD.

Description

Intelligent interpretation method for double-time-phase remote sensing image Technical Field The invention relates to the technical field of intelligent interpretation methods of remote sensing images, in particular to a double-phase remote sensing image intelligent interpretation method. Background The remote sensing image intelligent interpretation method aims at accurately identifying and positioning the earth surface coverage and the change information caused by human activities by analyzing the remote sensing images acquired by the same geographic area at different time. As a key technical means in the field of earth observation and environment monitoring, the remote sensing image intelligent interpretation method plays an irreplaceable role in practical applications such as urban expansion monitoring, land utilization and coverage change analysis, disaster assessment, ecological environment protection, national land resource management and the like. Along with the continuous improvement of the spatial resolution of the remote sensing image, the detail of the ground object presents richer textures and geometric features, and especially in the scenes of dense urban building areas, farmland staggered areas, complex natural landforms and the like, the interpretation target often presents the features of variable scale, irregular morphology, fine semantic difference and the like. How to accurately extract the change information under the interference of complex background and maintain the geometric integrity of the boundary of the change target has become a core problem of common attention in the current academic and engineering application fields. With the rapid development of deep learning technology, a remote sensing image intelligent interpretation method based on Convolutional Neural Network (CNN) and transducer has become a mainstream research paradigm. The deep learning method effectively improves the change detection precision by virtue of the strong feature extraction and context modeling capability, but still has a further improvement space in the aspects of sensing fine granularity change and fine characterization of change boundaries in a complex scene. The visual basic model can effectively model complex ground object structures and high-level semantic information, and provides a brand new research paradigm for a remote sensing image intelligent interpretation method. The visual basic model refers to a general visual representation model obtained by pre-training in a self-supervision, weak supervision or contrast learning mode on a large-scale general image or a multi-mode dataset, and typical representatives include SAM, CLIP, DINOv and the like. The visual base model is able to learn visual representations with strong generalization ability and cross-task migration potential thanks to a large-scale data-driven pre-training process. However, the task of directly applying the visual basic model to the change detection of intelligent interpretation of the remote sensing image still has significant limitations that on one hand, the high-level features output by the visual basic model are usually focused on semantic consistency and global structural modeling, and the characterization capability of local details is limited. On the other hand, the direct migration general vision basic model often ignores the strict requirements of change detection on space positioning precision and boundary geometric integrity, and the problems of boundary blurring, detail missing or overcomplete and the like of a detection result are easy to occur. Disclosure of Invention First, the technical problem to be solved Based on the method, the invention provides a double-time-phase remote sensing image intelligent interpretation method, which aims to solve the problems of inaccurate change detection and incomplete change target boundary caused by large change target scale span, weak semantic difference, irregular boundary and the like in a complex scene in the background technology. (II) technical scheme In order to achieve the above purpose, the present invention provides a dual-phase remote sensing image intelligent interpretation method, comprising: s1, constructing a boundary constraint remote sensing image change detection model based on a visual basic model; step S2, training the change detection model based on a remote sensing image data set to obtain a trained change detection model; S3, inputting the remote sensing images with different time phases into a trained change detection model to obtain an interpretation result; The boundary constraint remote sensing image change detection model BCnet based on the visual basic model comprises a SAM encoder module, a difference detail enhancement module, a multi-scale edge enhancement module, an edge feature aggregation module and a change detection head, wherein the edge feature aggregation module comprises an edge detection head, and the change detection head comprise