CN-121982539-A - Remote sensing residential point extraction method and system based on multi-source data fusion semantic segmentation

CN121982539ACN 121982539 ACN121982539 ACN 121982539ACN-121982539-A

Abstract

The invention discloses a remote sensing residential point extraction method and a remote sensing residential point extraction system based on multi-source data fusion semantic segmentation, and particularly relates to the field of remote sensing image processing and land utilization mapping, wherein cloud rejection and annual synthesis are carried out on a medium-resolution multispectral image; the method comprises the steps of adopting a multisource fusion automatic labeling strategy, automatically generating a rural residential point pixel level training label by utilizing a watertight surface product, an open source map interest point and a night lamplight learning area-brightness threshold value, constructing a SELPFormer lightweight converter segmentation model, introducing a Lite-PPM, SCSE and ELA characteristic enhancement module and a lightweight decoder on the basis of SegFormer, and outputting a rural residential point pixel level extraction result for regional scale rural residential point drawing.

Inventors

GAO YONGNIAN
ZHOU QIAN
YU FEIHONG
GONG YUQING
XU WENJING
DENG QIQI
ZHANG WENBO

Assignees

河海大学

Dates

Publication Date: 20260505
Application Date: 20260128

Claims (10)

1. The remote sensing residential point extraction method based on multi-source data fusion semantic segmentation is characterized by comprising the following steps of: S1, cloud coverage elimination is carried out on a medium-resolution multispectral remote sensing image of an area to be analyzed, and annual synthesis is carried out on the multispectral image on the pixel scale, so that an annual synthesis image with uniform space-time reference is obtained; S2, acquiring global impermeable surface raster data, open source map data and night light image data which cover the annual synthetic image; extracting urban interest points and rural interest points from the open source map data, judging initial urban attributes or rural attributes of the impermeable map spots based on the spatial distance between the central position of the impermeable map spots and different types of interest points, calculating the impermeable map spot area A of the initial urban sample and the initial rural sample and the average noctilucent brightness L, constructing a joint judging basis for distinguishing the urban map spots and the rural resident map spots, and learning to obtain a judging threshold value, thereby generating a rural resident pixel level training label; S3, constructing a SELPFormer model based on a semantic segmentation network SegFormer, forming a training sample by using the rural residential point pixel-level training labels generated in the step S2 and the corresponding annual synthetic images, and training the SELPFormer model to obtain a remote sensing rural residential point segmentation model after training is completed; s4, rural residential point extraction, namely inputting the annual composite image obtained in the step S1 into the remote sensing rural residential point segmentation model, outputting a pixel level division result diagram with the same size as the input image, and extracting pixels belonging to the rural residential point category in the division result diagram as rural residential points.
2. The remote sensing residential point extraction method based on multi-source data fusion semantic segmentation according to claim 1, wherein the step S2 comprises: S21, binarizing and vectorizing global impermeable surface raster data according to a preset impermeable proportion threshold value to obtain a plurality of impermeable image spots, and recording the area A and the space position of each impermeable image spot; S22, extracting urban interest points and rural interest points from open source map data, and completing initial judgment of urban attributes or rural attributes of the watertight map spots based on the spatial distances between the central positions of the watertight map spots and the urban interest points and the rural interest points; s23, night light image data which are spatially registered with the impermeable image spots are obtained, the night light brightness of pixels in each impermeable image spot is subjected to polymerization calculation, the average night light brightness L of the impermeable image spots is obtained, and the average night light brightness L and the area A of the impermeable image spots are used as the human activity intensity joint representation of the impermeable image spots; S24, randomly extracting a plurality of impervious pattern spot samples from samples which are initially judged to be urban pattern spots and rural pattern spots, carrying out logarithmic transformation on the area A of the impervious pattern spots and the average noctilucent brightness L, removing abnormal values, constructing an area-brightness cumulative distribution function of the urban samples and an area-brightness survival distribution function of the rural samples, calculating absolute values of difference values of the two distribution functions under different brightness thresholds, taking a brightness threshold corresponding to the minimum absolute value of the difference values as a single judgment threshold, and taking the median of the single judgment threshold as a final brightness threshold T after repeating for a plurality of times; And S25, applying the brightness threshold T to the joint discrimination score S of all the impermeable spots, automatically dividing the impermeable spots into urban spots and rural resident spot spots, judging the spots as rural resident spot candidate spots when the brightness threshold T is less than the S, and rasterizing the areas judged as the rural resident spot candidate spots to obtain pixel-level labels serving as rural resident spot training labels.
3. The remote sensing residential point extraction method based on multi-source data fusion semantic segmentation according to claim 2, wherein after rural residential point training labels are generated, the rural residential point training labels are overlapped with high-resolution remote sensing images, manual quality inspection is performed on automatic labeling results, unreliable samples are removed or corrected, and only samples which are clear in boundary, complete in shape and consistent with actual rural residential points are reserved as training data of SELPFormer models.
4. The remote sensing residential point extraction method based on multi-source data fusion semantic segmentation according to claim 1, wherein the SELPFormer model comprises a hierarchical transform encoder, a multi-scale feature enhancement module and a lightweight decoder, wherein: S31, performing overlapped patch embedding and multi-head self-attention calculation on an input remote sensing image I epsilon R H×W×B by a layered transducer encoder, and outputting a multi-scale feature map { C1, C2, C3 and C4}, wherein H and W respectively represent the height and the width of the remote sensing image, and B represents the number of optical channels of the remote sensing image; S32, the multi-scale feature enhancement module comprises: The lightweight pyramid pooling module Lite-PPM is overlapped on the deepest feature graph C4 and is used for introducing a multi-scale global context; the space and channel compression excitation module SCSE is positioned between the feature map C3 and the feature map C4 enhanced by Lite-PPM and is used for recalibrating the features from the channel dimension and the space dimension; the edge perception learning and aggregation module ELA is positioned at the transverse connection part between the C2 to C4 and the lightweight decoder and is used for enhancing local edge and texture information; S33, the lightweight decoder linearly projects the multi-scale features processed by the multi-scale feature enhancement module to a unified channel dimension d, performs up-sampling and fusion, and outputs a pixel-level rural residential point probability map.
5. The remote sensing residential point extraction method based on multi-source data fusion semantic segmentation according to claim 4, wherein the feature maps C2, C3 and C4 processed by the edge perception learning and aggregation module ELA are respectively subjected to linear projection to a unified channel number, are spliced in a channel dimension after being up-sampled to the same spatial resolution, are subjected to feature fusion by a 1×1 convolution or multi-layer sensor, and are output by a classification head to form a rural residential point probability map with the same size as an input image.
6. The remote sensing residential point extraction method based on multi-source data fusion semantic segmentation according to claim 4, wherein the lightweight pyramid pooling module Lite-PPM is used for processing the encoder deepest feature map C4 to aggregate multi-scale global context, and the calculation process is as follows: Set the self-adaptive pooling scale set as , wherein, For each scale S epsilon S, carrying out self-adaptive average pooling on the input characteristic C4, and then carrying out 1X 1 convolution, batch normalization and nonlinear activation on the ReLU to obtain the branch characteristics of each scale ; Wherein, the Is the batch normalization BatchNorm on the s-th branch, Is to pool the features to Is used for the self-adaptive average pooling of the (c), An upsampling operator for upsampling to a height H, width W; Is the first A feature map of the individual scale branch outputs; Representing a 1 multiplied by 1 convolution layer corresponding to the s-th pooled scale branch of PPM; associating input feature C4 with each scale branch feature Splicing in the channel dimension to obtain ; For splice features Applying 1X 1 projection convolution, batch normalization and nonlinear activation to obtain Lite-PPM module output ); A1 x1 convolution layer representing the projection layer; Is BatchNorm after the projection convolution.
7. The method for extracting remote sensing residential points based on multi-source data fusion semantic segmentation according to claim 4, wherein the spatial and channel compression excitation module SCSE is used for inputting a feature map , The method comprises C3 and C4, and comprises a channel branch cSE and a space branch sSE, wherein the calculation process is as follows: the channel branches cSE perform global average pooling on the input features in the spatial dimension to obtain ε And obtaining the channel attention weight vector through two full-connection layers and nonlinear activation ; Representing a Sigmoid activation function; Spatial branch sSE applies a 1×1 convolution to the input features and activates them by Sigmoid to obtain a spatial attention weighting map ; Broadcasting channel weights in the space dimension, broadcasting spatial division weights in the channel dimension, and inputting features Respectively multiplying and adding the elements to obtain the SCSE output of the space and channel compression excitation module , Indicating that the channel attention weights are to be given, Representing a spatial attention pattern; Wherein the method comprises the steps of , C represents the number of channels of the feature map, C represents the category set, C {0,1}, Wherein 0 is the background category, 1 is the rural residential point category, r is the channel compression rate, Representing element-by-element multiplication, GAP Pooling for global averaging.
8. The method for extracting remote sensing residents based on multi-source data fusion semantic segmentation according to claim 4, wherein the edge-aware learning and aggregation module ELA is used for inputting a feature map , The method comprises the steps of taking C2, C3 and C4 as inputs, generating attention map by adopting a one-dimensional direction pooling and local convolution fusion mode, and carrying out residual error gating, wherein the calculation process is as follows: global average pooling is carried out on the input features along the horizontal direction and the vertical direction respectively to obtain one-dimensional direction features, and the one-dimensional convolution, normalization and Sigmoid activation are carried out to generate direction attention weights And ; Will be , The two-dimensional attention A is obtained by weighted fusion after element-by-element broadcasting For input features Applying depth-separable convolutions to obtain locally enhanced features ; Adopting a residual error gating mode to fuse the attention characteristic and the local convolution characteristic to obtain the ELA output of the edge perception learning and aggregation module ; Wherein the convolution kernel k= {5, 3}, corresponding to C2, C3, C4 layers in the SELPFormer model, α is the residual gating coefficient.
9. The remote sensing residential point extraction method based on multi-source data fusion semantic segmentation according to claim 1, wherein the SELPFormer model adopts equal weight combination of Focal loss and Dice loss as a total loss function in a training stage to treat the problem of unbalanced sample number of rural residential point category and background category, and the total loss function L is defined as: ; the Focal loss is defined as: ; The Dice loss is defined as: ; Wherein, the total number of pixels is N; {0,1} is the true label of the nth pixel under category set c, [0,1] Is the corresponding prediction probability, gamma is the focus parameter of Focal loss, epsilon is the constant that prevents the denominator from being zero, class weight The value range is [1.0,3.0].
10. Remote sensing residential point extraction system based on multi-source data fusion semantic segmentation, characterized in that it comprises: the image preprocessing module is used for performing remote sensing image acquisition, cloud/cloud image rejection and annual synthesis to obtain an annual synthesized image; The multi-source fusion automatic labeling module is used for executing multi-source data fusion and training label automatic generation; SELPFormer a model training module, which is used for executing SELPFormer model construction, training and verification to obtain a remote sensing rural residential point segmentation model; And the rural residential point extraction module is used for inputting the annual synthetic image into the remote sensing rural residential point segmentation model and outputting rural residential point extraction results.

Description

Remote sensing residential point extraction method and system based on multi-source data fusion semantic segmentation Technical Field The invention relates to the field of remote sensing image processing and land utilization drawing, in particular to a remote sensing residential point extraction method and a remote sensing residential point extraction system based on multi-source data fusion semantic segmentation. Background The existing full-ball land coverage and impervious surface products generally classify rural residential points into 'built-up areas' or 'artificial ground surfaces', urban built-up areas and rural residential points are not distinguished, and under the condition of medium resolution, the rural residential points often exist in the form of small patches embedded into farmland background, and are similar to the spectrums of other impervious ground objects such as roads, industrial and mining areas, hardened bare areas and the like, so that the traditional method is difficult to finely identify. The high-resolution image visual interpretation or object-oriented classification can better extract rural residential points, but the interpretation cost is extremely high and is difficult to popularize to the national scale, the deep learning semantic segmentation model has higher precision in the field of building extraction, but is highly dependent on a large number of manual labeling samples, and the segmentation effect of small targets and complex boundaries on the medium-resolution image is still insufficient. Especially, the lack of a large-scale and high-quality rural residential pixel-level label severely restricts the performance exertion of the deep learning model on the medium resolution scale. Therefore, it is necessary to design an integrated method for automatically labeling by multi-source fusion on the data layer and structurally optimizing small patches and complex boundaries on the model layer, so as to realize automatic and high-precision extraction of rural residents on a large-scale medium-resolution remote sensing image. Disclosure of Invention Therefore, the invention provides a remote sensing residential point extraction method and a remote sensing residential point extraction system based on multi-source data fusion semantic segmentation, which are used for solving the problems in the background technology. In order to achieve the purpose, the invention provides the technical scheme that the remote sensing residential point extraction method based on multi-source data fusion semantic segmentation comprises the following steps: S1, cloud coverage elimination is carried out on a medium-resolution multispectral remote sensing image of an area to be analyzed, and annual synthesis is carried out on the multispectral image on the pixel scale, so that an annual synthesis image with uniform space-time reference is obtained; S2, acquiring global impermeable surface raster data covering the annual synthetic image, open source map data and night light image data, vectorizing the global impermeable surface raster data to obtain impermeable spots, calculating an impermeable spot area A and average night light brightness L, extracting urban interest points and rural interest points from the open source map data, judging initial urban attributes or rural attributes of the impermeable spots based on the spatial distance between the central position of the impermeable spots and different types of interest points, calculating the impermeable spot area A and the average night light brightness L of an initial urban sample and an initial rural sample, constructing a joint judgment basis for distinguishing the urban spots and resident rural spot spots, and learning to obtain a judgment threshold value, thereby generating rural resident spot pixel level training labels; S3, constructing a SELPFormer model (a lightweight Transformer semantic segmentation model) based on a semantic segmentation network SegFormer, forming a training sample by using the rural residential point pixel-level training labels generated in the step S2 and the corresponding annual synthetic images, and training the SELPFormer model to obtain a remote sensing rural residential point segmentation model after training is completed; s4, rural residential point extraction, namely inputting the annual composite image obtained in the step S1 into the remote sensing rural residential point segmentation model, outputting a pixel level division result diagram with the same size as the input image, and extracting pixels belonging to the rural residential point category in the division result diagram as rural residential points. Preferably, the step S2 includes: S21, binarizing and vectorizing global impermeable surface raster data according to a preset impermeable proportion threshold value to obtain a plurality of impermeable image spots, and recording the area A and the space position of each impermeable image spot; S22, extract