CN-121996999-A - POI (Point of interest) -based double-branch self-encoder-two-stage clustering city functional area identification method
Abstract
The application relates to the technical field of city development and service, in particular to a method for identifying urban functional areas based on POI double-branch self-encoders-two-stage clustering. The method comprises the steps of obtaining and preprocessing POI (point of interest) and road network data of a multisource point of interest) to construct a POI data set with space units, extracting POI semantic and space features by utilizing a double-branch self-encoder network, fusing the POI semantic and space features into deep fusion feature vectors of all units, pre-clustering the feature vectors by adopting a K-Means algorithm, performing fine clustering by using a DBSCAN algorithm based on the result to realize urban functional area division, and finally calculating a functional mixing degree index based on the division result and the POI data set, quantifying the functional mixing degree of all units to generate an urban functional area identification result. The application effectively improves the accuracy of the division of the functional areas and the definition of the quantification of the functional mixture.
Inventors
- YANG HAIBO
- GAO MEIYAN
- DONG YITONG
- WANG PEIXI
- QIU SHIKE
- DU JUN
Assignees
- 郑州大学
- 河南省科学院地理研究所
Dates
- Publication Date
- 20260508
- Application Date
- 20260119
Claims (10)
- 1. A POI-based dual-branch self-encoder-two-stage clustering city functional area identification method, characterized in that the method comprises the following steps: S1, acquiring and preprocessing multi-source interest point POI data and road network data, and constructing a POI data set with space units; s2, processing the POI data set with the space units by utilizing a double-branch self-encoder network, extracting POI semantic features and POI space features, and fusing the POI semantic features and the POI space features into depth fusion feature vectors corresponding to the space units; S3, pre-clustering the depth fusion feature vector by adopting a K-Means algorithm to obtain a pre-clustering result, and performing fine clustering by adopting a DBSCAN algorithm based on the pre-clustering result to obtain a city function division result; And S4, calculating the function mixing degree index of each space unit based on the urban function area division result and the POI data set with the space units, and quantizing the function mixing degree of each space unit based on the function mixing degree index to generate an urban function area identification result.
- 2. The method according to claim 1, wherein S1 comprises: S11, acquiring multi-source POI data and road network data from different open platforms; s12, cleaning and coordinate correction processing are carried out on the POI data of the multisource interest points, and standard POI data are obtained; s13, classifying text information of the standard POI data by using a natural language processing technology to obtain a POI data text classification set; S14, analyzing a road buffer area based on the road network data to generate a traffic community grid; s15, the POI data text classification set is associated to the traffic community grid according to the space coordinates, and a POI data set with space units is formed.
- 3. The method of claim 2, wherein the multi-source point of interest POI data includes at least data from a german map AMap platform and an open street map OSM platform, the road network data originating from the open street map OSM platform.
- 4. The method according to claim 1, wherein S2 comprises: s21, constructing a dual-branch self-encoder network comprising a text encoder branch and a space encoder branch; S22, taking POI data in each space unit in the POI data set with space units as input, extracting POI semantic features through the text encoder branches, and extracting POI space features through the space encoder branches; s23, splicing the POI semantic features and the POI spatial features corresponding to the same spatial unit, and generating depth fusion feature vectors corresponding to the spatial unit.
- 5. The method of claim 4, wherein the text encoder branch takes as input text combinations of categories and names of POI data in space units, processes through an embedded layer and a long-short term memory network, and outputs semantic feature vectors of a first dimension; The space encoder branches take longitude and latitude coordinates of POI data in space units as input, and process the POI data through a fully connected network to output space feature vectors of a second dimension; the sum of the first dimension and the second dimension is the total dimension of the depth fusion feature vector.
- 6. The method of claim 1, wherein prior to S3, performing a validity assessment on the depth fusion feature vector, the validity assessment comprising: performing principal component analysis on the depth fusion feature vector to evaluate variance interpretation rate, simultaneously adding Gaussian noise to the depth fusion feature vector, and calculating distance variation coefficients of features before and after noise to evaluate noise robustness; and judging whether the evaluation result meets the preset clustering condition, if so, reserving, otherwise, adjusting parameters of the dual-branch self-encoder network according to the evaluation result, and regenerating a depth fusion feature vector until the evaluation result meets the preset clustering condition.
- 7. The method according to claim 1, wherein S3 comprises: s31, pre-clustering the depth fusion feature vector by adopting a K-Means algorithm, and determining a pre-clustering result for an optimization target by using a minimized error square sum; S32, taking the pre-clustering result as an input sample, and carrying out fine clustering by adopting a DBSCAN algorithm to obtain a city function division result, wherein the DBSCAN algorithm adopts HAVERSINE distances as space distance measurement standards.
- 8. The method of claim 7, wherein in S32, a Ball-Tree index structure is used to perform neighbor search, and after clustering, a contour coefficient is used to quantitatively evaluate structural rationality of a clustering result, and the evaluation result is used to automatically determine an optimal clustering parameter.
- 9. The method of claim 1, wherein calculating the functional mix index in S4 comprises dynamically constructing an effective subset excluding zero value POI types based on the duty cycle of the different types of POI data in each spatial unit, and calculating the functional mix of the corresponding spatial unit using an information entropy model based on the duty cycle and the effective subset.
- 10. The method of claim 9, wherein the degree of functional mix is used to represent the degree of mix diversity of urban functions within the corresponding spatial units, and a higher value indicates a stronger mix diversity.
Description
POI (Point of interest) -based double-branch self-encoder-two-stage clustering city functional area identification method Technical Field The application relates to the field of city development and service, in particular to a method for identifying urban functional areas based on POI double-branch self-encoders-two-stage clustering. Background With the acceleration of the urban process, the functional structure of the city is increasingly complex, and how to accurately identify and evaluate the urban functional area becomes a key problem for supporting the planning and management of the fine city. The accurate division of urban functional areas is an important foundation in the fields of urban management, resource allocation, infrastructure construction and the like. At present, a city functional area identification method based on point of interest (POI) data is widely applied to the aspects of city planning, traffic management, environmental protection and the like. However, the existing method generally faces the double limitations of incomplete feature extraction and poor adaptability of the clustering method when processing POI data. In the existing urban functional area identification research based on POI data, most methods generally isolate and process semantic features or spatial features of POIs, and lack of a depth fusion mechanism, so that the semantic representation capability of the functional area is insufficient. Semantic features typically rely on information such as class, description, etc. of POIs, but these information often do not adequately represent complex relationships between different functional regions. The spatial features mainly depend on the geographical position data of the POIs, but in a high-dimensional space, the relationship of the features is complex, and the features are difficult to accurately identify directly by a traditional clustering method. In addition, the existing clustering method generally depends on a single clustering algorithm, and it is difficult to consider both the calculation efficiency of high dimension data and the recognition accuracy of complex space modes. Especially in the urban functional area division process, the boundary of the functional area is often fuzzy and the division is not fine, so that the clustering result is inaccurate, and some urban functional areas with potential can be missed. Therefore, how to accurately extract features in high-dimensional data and improve the adaptability of a clustering method becomes a key problem to be solved in the current urban functional area identification research. Another key problem is that the conventional entropy model can cause the log to be undefined when processing the zero value POI, and the conventional smoothing method can introduce systematic deviation, so that the mixing degree evaluation result deviates from the real situation. The quantitative deviation directly affects the accuracy of urban space structure analysis, and is difficult to provide reliable quantitative basis for urban planning and management. In particular, current quantization methods tend to ignore null or missing values in POI data, which severely limits the accuracy of functional differentiation and assessment. Therefore, how to improve the accuracy and stability of urban functional area identification through effective feature extraction, clustering method optimization and mixing degree evaluation is a problem to be solved. The invention provides a city functional area identification method based on a POI double-branch self-encoder and two-stage clustering, and aims to solve the problems of insufficient POI feature fusion, insufficient adaptability of a clustering method, quantization deviation and the like in the functional area identification in the prior art. Through the combination of the deep learning and the unsupervised learning methods, the method can fully integrate the semantic and spatial characteristics of POIs, improve the geographic suitability and the recognition accuracy of clustering, and effectively eliminate zero-value interference in the quantization process, thereby providing more accurate and reliable support for fine management and planning of cities. Disclosure of Invention The application provides a POI (point of interest) -based dual-branch self-encoder-two-stage clustering city functional area identification method which is used for improving the accuracy of functional area division and the definition of functional mixing degree quantification. In a first aspect, the application provides a POI-based dual-branch self-encoder-two-stage clustering city functional area identification method, which comprises the following steps: S1, acquiring and preprocessing multi-source interest point POI data and road network data, and constructing a POI data set with space units; s2, processing the POI data set with the space units by utilizing a double-branch self-encoder network, extracting POI semantic