CN-116310628-B - Token mask mechanism-based large-scale village-in-city extraction method
Abstract
The invention discloses a large-scale urban village extraction method and system based on a token mask mechanism, and relates to the technical field of remote sensing geographic information systems, wherein the method comprises the steps of obtaining a high-resolution remote sensing image dataset, marking urban village areas, and obtaining an urban village marking dataset; performing data augmentation operation on the urban village annotation data set to obtain an augmented urban village annotation data set, iterating training to construct an urban village extraction model, setting training parameters, updating model parameters of the urban village extraction model to obtain a trained urban village extraction model, acquiring a high-resolution remote sensing image of a region to be detected, inputting the trained urban village extraction model, and extracting the urban village range in the region to be detected. The method can quickly and accurately identify the village region in the city from the high-resolution remote sensing image, has high extraction accuracy and efficiency and wide application range, and realizes the high-precision and robust large-scale village drawing application in the city.
Inventors
- SHI QIAN
- LIU MENGXI
- CHEN ANQI
Assignees
- 中山大学
Dates
- Publication Date
- 20260505
- Application Date
- 20230206
Claims (9)
- 1. A method for large scale village in city extraction based on token mask mechanism, comprising: s1, acquiring a high-resolution remote sensing image dataset; S2, labeling areas of villages in the cities and the villages in the remote sensing images in the high-resolution remote sensing image data set to obtain a village labeling data set in the cities and the villages; s3, performing data augmentation operation on the urban village annotation data set to obtain an augmented urban village annotation data set; S4, setting training parameters by using the city and village extraction model constructed by the augmented city and village annotation data set through iterative training, and updating the model parameters of the city and village extraction model to obtain a trained city and village extraction model; the constructed village in city extraction model comprises a feature extractor, a decoder and a classifier which are connected in sequence, wherein the decoder is based on a token mask mechanism; the feature extractor is used for extracting multi-level depth features of the amplified village marking data set in the city and inputting the multi-level depth features into the decoder based on the token mask mechanism; The decoder based on the token mask mechanism correspondingly extracts the advanced semantic information tokens of the multi-level depth features, randomly masks the advanced semantic information tokens according to a preset mask rate to obtain masked advanced semantic information tokens; the classifier obtains a village identification result in the city according to the final classification characteristics; s5, acquiring a high-resolution remote sensing image of the region to be detected, inputting a trained urban village extraction model, and extracting the urban village range in the region to be detected.
- 2. The method for extracting the large-scale city village based on the token mask mechanism according to claim 1, wherein in the step S1, the specific method for obtaining the high-resolution remote sensing image dataset is as follows: And uniformly collecting a plurality of high-resolution remote sensing images with the same image size from Google Earth, screening cloud-free and clear remote sensing images, and forming a high-resolution remote sensing image data set.
- 3. The large-scale village in city extraction method based on the token mask mechanism according to claim 1, wherein in the step S2, the specific method for obtaining the village in city labeling dataset is as follows: interpreting and marking a city-to-village vector boundary for each remote sensing image in the high-resolution remote sensing image dataset by utilizing POI data, and obtaining a pixel-level city-to-village marked image through rasterization, wherein a value of 0 represents non-city-to-village pixels, and a value of 255 represents city-to-village pixels; And cutting each remote sensing image and the corresponding urban village marking image into a sample pair with a preset size through non-overlapping sampling to form an urban village marking data set.
- 4. The method for extracting the large-scale village in the city based on the token mask mechanism according to claim 1, wherein in the step S3, the data augmentation operation comprises random rotation and overturn, and the village in the city label data set is extended to N times of the original data amount, so as to obtain the augmented village in the city label data set.
- 5. The large-scale metropolitan village extraction method based on a token mask mechanism according to claim 1, wherein the feature extractor comprises a first convolution layer, a first residual error module, a second residual error module, a third residual error module and a fourth residual error module which are sequentially connected; the output ends of the first residual error module, the second residual error module and the fourth residual error module are also connected with a decoder based on a token mask mechanism; each residual error module has the same structure and comprises a first basic block and a second basic block which are sequentially connected; Each first basic block has the same structure and comprises a second convolution layer, a third convolution layer, a batch normalization layer and an activation function layer which are sequentially connected, and the input end of the second convolution layer is connected with the input end of the activation function layer.
- 6. The large-scale metropolitan area village extraction method based on a token mask mechanism according to claim 5, wherein said decoder based on a token mask mechanism comprises a first mask transducer module, a second mask transducer module and a third mask transducer module connected in sequence; The input end of the first mask conversion module is connected with the output end of the fourth residual error module, the input end of the second mask conversion module is also connected with the output end of the second residual error module, the input end of the third mask conversion module is also connected with the output end of the first residual error module, and the output end of the third mask conversion module is connected with the classifier; Each mask transducer module has the same structure and comprises a transducer encoder, a token mask device and a transducer decoder which are connected in sequence, wherein the mask code rate is preset in the token mask device.
- 7. The method for large scale city village extraction based on token mask mechanism according to claim 1, wherein said setting training parameters comprises setting training times, batch size, optimizer, initial learning rate, learning rate drop mechanism and loss function.
- 8. The large scale city village extraction method based on token mask mechanism according to claim 7, wherein said set optimizer is Adam optimizer and said set loss function is binary cross entropy loss function.
- 9. A token mask mechanism based large scale metropolitan area village extraction system for implementing the method of any one of claims 1-8, comprising: The data acquisition module is used for acquiring a high-resolution remote sensing image data set; the data labeling module is used for labeling the urban and rural areas of the remote sensing images in the high-resolution remote sensing image data set to obtain an urban and rural labeling data set; The data augmentation module is used for carrying out data augmentation operation on the urban village annotation data set to obtain an augmented urban village annotation data set; The model training module is used for setting training parameters for a village extraction model constructed by iterative training of the augmented village labeling data set, and updating model parameters of the village extraction model to obtain a trained village extraction model; The urban village extraction module is used for acquiring the high-resolution remote sensing image of the area to be detected, inputting the trained urban village extraction model, and extracting the urban village range in the area to be detected.
Description
Token mask mechanism-based large-scale village-in-city extraction method Technical Field The invention relates to the technical field of remote sensing geographic information systems, in particular to a method and a system for extracting a large-scale village in a city based on a token mask mechanism. Background With the rapid urban treatment in China, a unique regional space phenomenon, namely urban village, is generated. On the one hand, urban villages provide cheap living space for a large amount of external labor force of cities, and alleviate urban housing pressure, on the other hand, the urban villages have the problems of low volume rate, poor sanitary environment, imperfect infrastructure and the like, and are contrary to the sustainable development concept of cities, so that the urban development is hindered. Therefore, the update and transformation of villages in cities becomes a break to alleviate the contradiction between urban development and land utilization. However, early monitoring methods for land areas in villages in cities still rely mainly on-site investigation, which consumes a lot of manpower and material resources. Therefore, how to automatically acquire the coverage of villages in cities in real time becomes a problem that must be primarily solved by city management. Early research on villages in cities mainly depends on social investigation and current land utilization figures as data sources, and the method has the advantages of high labor cost, high time consumption and low data acquisition efficiency, and is difficult to realize dynamic monitoring of the villages in the cities. In recent years, high-resolution remote sensing images are widely applied to urban land utilization information extraction due to the characteristics of convenient data acquisition, comprehensive information and wide coverage, including urban waterproof surface detection, building identification, urban green space monitoring and other fields. The convolutional neural network (Convolutional Neural Networks, CNN) which is rising in recent years can automatically learn multi-level characteristic representation from a large amount of data, so that effective information hidden by the data is fully utilized, breakthrough progress is made in the field of image classification, and a plurality of students apply the convolutional neural network to the extraction and identification of villages in cities. For example, li et al combine an unsupervised deep convolutional neural network and an unsupervised deep fully connected neural network to perform urban village extraction of scene block scale under unsupervised learning, and Nicholus et al extract spatial features with discriminant by using the convolutional neural network to realize automatic identification of urban villages. The research shows that the deep learning method based on the convolutional neural network can effectively extract multi-layer semantic features of the high-resolution remote sensing image, realizes quick and accurate extraction of villages in cities, and provides support for urban structure data for urban planning management and decision. However, compared with common urban ground objects and areas, the difficult problem of extracting the large-scale urban village still exists due to the fact that the urban village is complex in structure and large in difference of shape scale, spectrum texture and other characteristics. Specifically, the urban village identification task based on the high-resolution remote sensing image has the problems of large intra-class space scale difference and confusion of appearance similarity between classes, and brings difficulty to accurate identification of urban villages. In addition, conventional deep learning models suffer from the problem of being prone to overfitting due to insufficient sample data in the city. Therefore, how to realize rapid and accurate identification of villages in large-scale cities based on high-resolution remote sensing images is always a research problem in the remote sensing field. The prior art discloses a deep learning-based urban village identification and population estimation method and a system-level computer storage medium, wherein in the urban village identification stage, urban road network diagrams are extracted, road network outlines are extracted on the road network diagrams by using opencv python packages, image blocks are cut on remote sensing satellite images, urban village labeling is carried out on the cut image blocks, samples are selected to form a training sample set, a Mask-RCNN model is used for training and predicting, urban village distribution map on urban remote sensing satellite images is obtained, and in the prior art, urban village distribution can be identified to a certain extent by adopting the Mask-RCNN model, but the memory consumption is extremely high due to multi-layer convolution in the Mask-RCNN model, and the training and prediction efficiency