CN-121685576-B - SAM-GAN-based building image segmentation method and system
Abstract
The invention discloses a building image segmentation method and system based on SAM-GAN, and relates to the technical field of image data processing. The method comprises the steps of obtaining an original remote sensing image, preprocessing the original remote sensing image to obtain a preprocessed image, generating a preliminary building segmentation result based on a preset SAM-GAN model according to the preprocessed image, wherein the SAM-GAN model comprises a SAM generator and a double-arbiter network, the double-arbiter network is used for optimizing the SAM generator and is a double-arbiter framework integrating semantics and topological constraints, and performing geometric rule self-adaptive optimization post-processing on the preliminary building segmentation result to obtain a final building segmentation result. The invention improves the accuracy, quality and generalization capability of the building image segmentation result.
Inventors
- JIANG QINYI
- ZHANG YAN
- SHAN XIN
- Luo Zifei
- Zhao Lingyuan
Assignees
- 环天智慧科技股份有限公司
Dates
- Publication Date
- 20260508
- Application Date
- 20260211
Claims (4)
- 1. A SAM-GAN based building image segmentation method, comprising: Acquiring an original remote sensing image, and preprocessing the original remote sensing image to obtain a preprocessed image; Generating a preliminary building segmentation result based on a preset SAM-GAN model according to the preprocessed image, wherein the SAM-GAN model comprises a SAM generator and a double-discriminant network, the double-discriminant network is used for optimizing the SAM generator, and the double-discriminant network is a double-discriminant framework integrating semantics and topological constraints; performing post-processing of geometric rule self-adaptive optimization on the preliminary building segmentation result to obtain a final building segmentation result; the SAM generator comprises an encoder and a decoder, wherein the encoder is used for carrying out preliminary feature extraction on the preprocessed image; The decoder is used for reconstructing the initially extracted features, decoding and outputting an initial building segmentation result, wherein the initial building segmentation result comprises an initial mask, a low-order semantic tensor and a boundary distance feature map; The encoder is ViT-H encoder, a ViT-H transducer structure disclosed by a SAM model is selected, and the training of the first 10 epochs is frozen; the decoder is composed of a 3-layer transform decoder and 2-layer 5×5 depth separable convolution layers, wherein the 3-layer transform decoder is original to the SAM model; the dual-discriminant network comprises a semantic discriminant and a topology discriminant, and the semantic discriminant and the topology discriminant share an encoder; the semantic discriminator comprises PatchGAN-style true and false discriminating heads and multiple types of semantic discriminating heads, the input of the semantic discriminator is formed by splicing a preliminary mask, a low-order semantic tensor and a preprocessed image, and the output of the semantic discriminator is a true and false image and semantic probability; The input of the topology discriminator is a primary mask and a boundary distance feature map, and the output of the topology discriminator is a topology grade; The post-processing comprises grid vectorization, vertex simplification, short edge deletion, sharp angle deletion, obtuse angle deletion, main direction alignment and vector rasterization; Grid vectorization, namely converting a preliminary mask of the preliminary building segmentation result from a grid format to a vector format to obtain a first intermediate result; vertex reduction, namely, vertex reduction is carried out on the first intermediate result based on a DP algorithm, and the short side proportion threshold value provided by the reduced threshold value along with the reinforcement learning action interface Dynamically adjusting to obtain a second intermediate result; deleting short sides, namely deleting short sides of the second intermediate result to obtain a third intermediate result; deleting sharp angle and obtuse angle by using sharp angle threshold value provided by reinforced learning action interface And an obtuse angle threshold value Deleting sharp angles and obtuse angles from the third intermediate result to obtain a fourth intermediate result; The main direction alignment is that the main direction of the outline of the building is determined based on the fourth intermediate result, and the included angle between each side of the polygon and the main direction is judged and rotationally adjusted to obtain a fifth intermediate result; Vector rasterizing, namely re-rasterizing the fifth intermediate result into a binary mask image, namely a final building segmentation result; The short side ratio threshold Sharp angle threshold And an obtuse angle threshold value Automatic optimization through a reinforcement learning and meta learning geometric threshold strategy network, comprising: building short side ratio threshold Sharp angle threshold And an obtuse angle threshold value A formed action space; after the action in the action space is executed, the environment calls the black box post-processing to sequentially execute grid vectorization, vertex simplification, short side deleting, sharp angle deleting, obtuse angle, main direction alignment and vector rasterization, and a regularization mask is obtained; calculating rewards from the regularization mask , Wherein the first item Representing the intersection ratio of the polygon and the real mask, a second term A penalty for the difference in the number of vertices, As a weight parameter of the second term, For the number of vertices to be the number of vertices, The number of vertices of the true labeled polygon, the third item For the purpose of a topology-destruction penalty, A weight parameter of the third item, a fourth item For the edge-direction consistency penalty, As a weight parameter of the fourth term, For the angle value of the i-th angle, Is the main direction angle of the building contour, Is the number of corners; Optimizing the rewards R by adopting a PPO algorithm to maximize the rewards R and obtain an optimal short side proportion threshold value Sharp angle threshold And an obtuse angle threshold value 。
- 2. The SAM-GAN based building image segmentation method of claim 1, wherein the dual arbiter network has a loss function The method comprises the following steps: ; ; in the formula, A loss function for the SAM generator; Is a semantic discriminator A loss function of (2); is a topology discriminator A loss function of (2); Is a dynamic weight, which follows the training period Exponentially increasing for balancing the losses of the semantic discriminators and the topology discriminators; as the maximum value of the weight is given, As a function of the time-weighting factor, Is a training period.
- 3. The SAM-GAN based building image segmentation method of claim 1, wherein the training phase of the SAM-GAN model comprises: The freezing initialization stage, which is to completely freeze the encoder and the topology discriminator and only train the relevant parameters of the decoder and the semantic discriminator; a semantic countermeasure training stage, namely thawing all SAM generator parameters in the stage, starting a semantic discriminator, and keeping a topology discriminator in a frozen state; and a topology enhancement training stage, namely thawing the topology identifier in the stage, so that the loss of the topology identifier can carry out explicit micro constraint on the Euler difference and the connected component difference of the mask.
- 4. A SAM-GAN based building image segmentation system, comprising: The acquisition and preprocessing unit is used for acquiring an original remote sensing image, preprocessing the original remote sensing image and obtaining a preprocessed image; The model segmentation unit is used for generating a preliminary building segmentation result based on a preset SAM-GAN model according to the preprocessed image, wherein the SAM-GAN model comprises a SAM generator and a dual-discriminant network, the dual-discriminant network is used for optimizing the SAM generator, and the dual-discriminant network is a dual-discriminant framework integrating semantic and topological constraints; the post-processing unit is used for performing geometric rule self-adaptive optimization post-processing on the preliminary building segmentation result to obtain a final building segmentation result; the SAM generator comprises an encoder and a decoder, wherein the encoder is used for carrying out preliminary feature extraction on the preprocessed image; The decoder is used for reconstructing the initially extracted features, decoding and outputting an initial building segmentation result, wherein the initial building segmentation result comprises an initial mask, a low-order semantic tensor and a boundary distance feature map; The encoder is ViT-H encoder, a ViT-H transducer structure disclosed by a SAM model is selected, and the training of the first 10 epochs is frozen; the decoder is composed of a 3-layer transform decoder and 2-layer 5×5 depth separable convolution layers, wherein the 3-layer transform decoder is original to the SAM model; the dual-discriminant network comprises a semantic discriminant and a topology discriminant, and the semantic discriminant and the topology discriminant share an encoder; the semantic discriminator comprises PatchGAN-style true and false discriminating heads and multiple types of semantic discriminating heads, the input of the semantic discriminator is formed by splicing a preliminary mask, a low-order semantic tensor and a preprocessed image, and the output of the semantic discriminator is a true and false image and semantic probability; The input of the topology discriminator is a primary mask and a boundary distance feature map, and the output of the topology discriminator is a topology grade; The post-processing comprises grid vectorization, vertex simplification, short edge deletion, sharp angle deletion, obtuse angle deletion, main direction alignment and vector rasterization; Grid vectorization, namely converting a preliminary mask of the preliminary building segmentation result from a grid format to a vector format to obtain a first intermediate result; vertex reduction, namely, vertex reduction is carried out on the first intermediate result based on a DP algorithm, and the short side proportion threshold value provided by the reduced threshold value along with the reinforcement learning action interface Dynamically adjusting to obtain a second intermediate result; deleting short sides, namely deleting short sides of the second intermediate result to obtain a third intermediate result; deleting sharp angle and obtuse angle by using sharp angle threshold value provided by reinforced learning action interface And an obtuse angle threshold value Deleting sharp angles and obtuse angles from the third intermediate result to obtain a fourth intermediate result; The main direction alignment is that the main direction of the outline of the building is determined based on the fourth intermediate result, and the included angle between each side of the polygon and the main direction is judged and rotationally adjusted to obtain a fifth intermediate result; Vector rasterizing, namely re-rasterizing the fifth intermediate result into a binary mask image, namely a final building segmentation result; The short side ratio threshold Sharp angle threshold And an obtuse angle threshold value Automatic optimization through a reinforcement learning and meta learning geometric threshold strategy network, comprising: building short side ratio threshold Sharp angle threshold And an obtuse angle threshold value A formed action space; after the action in the action space is executed, the environment calls the black box post-processing to sequentially execute grid vectorization, vertex simplification, short side deleting, sharp angle deleting, obtuse angle, main direction alignment and vector rasterization, and a regularization mask is obtained; calculating rewards from the regularization mask , Wherein the first item Representing the intersection ratio of the polygon and the real mask, a second term A penalty for the difference in the number of vertices, As a weight parameter of the second term, For the number of vertices to be the number of vertices, The number of vertices of the true labeled polygon, the third item For the purpose of a topology-destruction penalty, A weight parameter of the third item, a fourth item For the edge-direction consistency penalty, As a weight parameter of the fourth term, For the angle value of the i-th angle, Is the main direction angle of the building contour, Is the number of corners; Optimizing the rewards R by adopting a PPO algorithm to maximize the rewards R and obtain an optimal short side proportion threshold value Sharp angle threshold And an obtuse angle threshold value 。
Description
SAM-GAN-based building image segmentation method and system Technical Field The invention relates to the technical field of image data processing, in particular to a method and a system for segmenting a building image based on SAM-GAN. Background With the continuous progress of the earth observation remote sensing technology, the high-resolution remote sensing satellite technology is continuously innovated, and the acquired image is continuously improved in scale and quality. The method makes the automatic identification and segmentation of specific features, such as buildings, roads and the like, by utilizing the high-resolution images become a focus of attention in numerous scientific research and engineering fields. The building is taken as a space which is vital to human life and a ground feature element which occupies a core position on a map, and has extremely remarkable research value. The automatic segmentation of the building in the research image not only has important value in key engineering applications such as topographic mapping, household investigation and the like, but also has profound theoretical significance, namely, the feature that the structural height of the building is closely related to human activities plays a typical role in the development of an evaluation statistical method, a machine learning algorithm and the fields such as image segmentation, target identification, regularization and the like. In addition, the high-resolution image is utilized to divide the building, so that key reference data and technical support are provided for a plurality of fields such as disaster assessment, change monitoring, intelligent city development and the like. Over the past decades, traditional techniques have attempted to segment buildings using textures, lines, shadows, and more complex, empirically designed features, but these methods have failed to achieve complete automation of building segmentation and no commercial software has been available. The recent progress of deep learning technology has prompted a new wave of automated research on building image segmentation. At present, although means such as deep learning (such as UNet, SAM and the like of alike basic models) and generating an countermeasure network (GAN) are widely adopted for building segmentation of high-resolution remote sensing images, the following technical bottlenecks and defects mainly exist: 1. segmentation model output mask authenticity and structural consistency are not enough The end-to-end split network represented by patent publication CN114187520a, although improving the apparent quality of the mask, has the common problems of mask boundary burrs, false holes, breakage or insufficient closure. Masking of model output due to lack of explicit constraints on segmented contour structureDifficult and authentic labelsStructurally aligned, results in long term high levels of error indicators: The structure IoU is significantly lower than the pixels IoU, and the arbiter adopts a single pixel level consistency loss, so that it is difficult to capture the topological properties (such as the number of connected components, the Euler number and the like) of the building outline. 2. Threshold value rigidification, generalization and automation degree of post-processing of building contours are limited Conventional DP (Douglas-Peucker) polygon simplification, short side clipping, angle threshold clipping, main direction alignment and other processes are dependent on manually experienced or statically set super parameters, such as short side proportion thresholdSharp angle thresholdThreshold of obtuse angle. The performance of the fixed threshold method is greatly reduced when the method is used for transferring across resolution and scenes, the method is difficult to automatically adapt to the conditions of a new city and a new sensor, and the self-adaption and intelligent optimization capability is lacked. 3. GAN lacks structural innovation for countermeasure segmentation, and it is difficult to significantly improve fine-granularity segmentation quality In recent years, the concept of gan+ segmentation (such as pix2pix, sam+gan) has been largely disclosed (see PPA-SAM, SAM-GAN for XCT, etc.), but many stay in the single-arbiter pixel countermeasure phase, and there is no targeted improvement on the topology of remote sensing building segmentation (closure, no false hole, small target fracture). In view of this, the present application has been made. Disclosure of Invention The invention aims to solve the technical problems of inaccurate segmentation result, poor segmentation quality and the like caused by insufficient mask authenticity and structural consistency obtained by the existing building image segmentation method. The invention aims to provide a method and a system for segmenting a building image based on SAM-GAN, which provides an integrated innovation scheme for sharing dual-discriminant SAM-GAN and self-adaptiv