CN-121980249-A - Urban space general representation learning method, device, terminal and storage medium based on multi-mode space-time data fusion

CN121980249ACN 121980249 ACN121980249 ACN 121980249ACN-121980249-A

Abstract

The application relates to the technical field of urban data characterization. The application discloses a method, a device, a terminal and a storage medium for learning urban space general characterization based on multi-mode space-time data fusion, which can improve the applicability of the urban general characterization and enable the urban general characterization to be applicable to diversified urban analysis tasks. The method comprises the steps of obtaining multi-mode space-time data of each space unit in a target city, setting a corresponding view for each mode space-time data, generating single-view internal representation of each space unit under the view corresponding to each mode space-time data based on each mode space-time data of each space unit, generating multi-view fusion representation corresponding to each space unit based on the single-view internal representation of each space unit under the view corresponding to all mode space-time data, forming a multi-view fusion representation matrix by the multi-view fusion representation corresponding to all space units, and carrying out global aggregation processing on the multi-view fusion representation matrix to obtain city general representation.

Inventors

TU WEI
YU JUNXIAN
Cai Zhaoyue
CAO JINZHOU
LI QINGQUAN

Assignees

深圳大学

Dates

Publication Date: 20260505
Application Date: 20260408

Claims (10)

1. A city space general characterization learning method based on multi-mode space-time data fusion is characterized by comprising the following steps: Acquiring multi-mode space-time data of each space unit in a target city, and setting a corresponding view for each mode space-time data; Generating a single view internal representation of each space unit under a view corresponding to each mode space-time data based on each mode space-time data of each space unit; Generating multi-view fusion characterization corresponding to each space unit based on single-view internal characterization of each space unit under the view corresponding to all modal space-time data, and forming a multi-view fusion characterization matrix by the multi-view fusion characterization corresponding to all space units; And carrying out global aggregation treatment on the multi-view fusion characterization matrix to obtain the city universal characterization of the target city.
2. The method for learning the universal representation of the urban space based on the multi-modal spatiotemporal data fusion according to claim 1, wherein the multi-modal spatiotemporal data of each spatial unit comprises a target visual feature, a target semantic representation and a target track representation corresponding to each spatial unit in the target city.
3. The method for learning universal representation of urban space based on multi-modal spatiotemporal data fusion according to claim 1, wherein the step of generating a single intra-view representation of each spatial unit under the view corresponding to each modal spatiotemporal data based on each modal spatiotemporal data of each spatial unit comprises: Constructing a relation set based on all space units in the target city; In the relation set, determining an association unit set of each space unit under the view corresponding to each mode space-time data; And carrying out fusion processing on each space unit and the associated unit set of each space unit under the view corresponding to each modal space-time data by adopting a characteristic aggregation function to obtain single-view internal representation of each space unit under the view corresponding to each modal space-time data.
4. The method for learning urban spatial general representation based on multi-modal spatiotemporal data fusion according to claim 1, wherein the step of generating a multi-view fusion representation corresponding to each spatial unit based on the single-view intra-representation of each spatial unit under the views corresponding to all modal spatiotemporal data comprises: Carrying out fusion weight calculation processing on the single-view internal representation of each space unit under the view corresponding to each modal space-time data by adopting an attention mechanism, and obtaining a target fusion weight of each space unit under the view corresponding to each modal space-time data; And carrying out weighted fusion calculation processing on the target fusion weight of each space unit under the view corresponding to all the modal space-time data and the single-view internal representation of each space unit under the view corresponding to all the modal space-time data by adopting a first preset algorithm to obtain the multi-view fusion representation corresponding to each space unit.
5. The method for learning the universal representation of the urban space based on the multi-modal spatio-temporal data fusion according to claim 1, characterized in that after the step of obtaining the universal representation of the target city by global aggregation processing of the multi-view fusion representation matrix, the method for learning the universal representation of the urban space based on the multi-modal spatio-temporal data fusion further comprises: Converting the city generic representation into a task-specific representation of a first target task; Calculating the task specific characterization and at least one second target task by adopting a second preset algorithm to obtain a characterization matrix of the first target task fused with all the second target tasks; the first target task and all the second target tasks are selected from one of an estimation task of the economic level of the target city, a prediction task of population distribution, a prediction task of travel flow, an estimation task of urban health or a prediction task of environmental quality, and the first target task and all the second target tasks are different tasks.
6. The method for learning a universal representation of a city space based on multi-modal spatiotemporal data fusion of claim 5, wherein the step of converting the universal representation of the city into a task-specific representation of a first target task comprises: acquiring task prompt embedding of the first target task; And embedding and guiding the task characteristics of the first target task by adopting the task prompt to focus the general characterization of the city, and obtaining the task specific characterization of the first target task.
7. The method for learning urban spatial general characterizations based on multimodal spatiotemporal data fusion according to claim 6, wherein the step of obtaining the task prompt embedment of the first target task comprises: Acquiring task demand information of the first target task; encoding the task demand information into a machine-resolvable task description; and embedding the task description into a task prompt of the first target task by adopting a learnable task prompt encoder.
8. A universal urban space characterization learning device based on multi-mode space-time data fusion is characterized by comprising: The multi-mode space-time data acquisition module is used for acquiring multi-mode space-time data of each space unit in the target city and setting a corresponding view for each mode space-time data; the single view internal representation generation module is used for generating single view internal representation of each space unit under the view corresponding to each mode space-time data based on each mode space-time data of each space unit; the multi-view fusion characterization matrix construction module is used for generating multi-view fusion characterization corresponding to each space unit based on single-view internal characterization of each space unit under the view corresponding to all modal space-time data, and forming the multi-view fusion characterization matrix corresponding to all the space units; and the city general representation obtaining module is used for carrying out global aggregation processing on the multi-view fusion representation matrix to obtain the city general representation of the target city.
9. A terminal device comprising a processor and a memory for storing a computer program, said processor being adapted to invoke and run the computer program stored in said memory, to perform the steps of the method for learning urban spatial general characterizations based on multimodal spatio-temporal data fusion as defined in any one of the preceding claims 1 to 7.
10. A computer-readable storage medium storing a computer program for causing a computer to execute the steps of the urban space general representation learning method based on multi-modal spatiotemporal data fusion according to any of the preceding claims 1 to 7.

Description

Urban space general representation learning method, device, terminal and storage medium based on multi-mode space-time data fusion Technical Field The application relates to the technical field of urban data characterization. More particularly, the application relates to a method, a device, a terminal and a storage medium for learning urban space universal characterization based on multi-mode space-time data fusion. Background The existing urban population distribution prediction task, urban travel flow prediction task or urban environment quality prediction task all require multi-modal space-time data of a city, and the multi-modal space-time data generally comprises remote sensing images, street view images, interest points (namely, geographic entity positions of the city on a map) and vehicle movement tracks. The problem that data distribution difference often exists in multi-mode space-time data used by different prediction tasks, and no method for converting the multi-mode space-time data into urban general characterization applicable to the multi-prediction tasks exists at present. Thus, there is a need for improvement and advancement in the art. Disclosure of Invention The embodiment of the application aims to provide a method, a device, a terminal and a storage medium for learning urban space general characterization based on multi-mode space-time data fusion, which can improve the applicability of the urban general characterization and enable the urban general characterization to be suitable for diversified urban analysis tasks. The embodiment of the application is mainly realized by the following technical scheme: in a first aspect of the embodiment of the present application, there is provided a method for learning urban spatial general characterizations based on multi-modal spatio-temporal data fusion, including: Acquiring multi-mode space-time data of each space unit in a target city, and setting a corresponding view for each mode space-time data; Generating a single view internal representation of each space unit under a view corresponding to each mode space-time data based on each mode space-time data of each space unit; Generating multi-view fusion characterization corresponding to each space unit based on single-view internal characterization of each space unit under the view corresponding to all modal space-time data, and forming a multi-view fusion characterization matrix by the multi-view fusion characterization corresponding to all space units; And carrying out global aggregation treatment on the multi-view fusion characterization matrix to obtain the city universal characterization of the target city. According to one embodiment of the application, the multi-modal spatiotemporal data for each spatial unit includes a target visual feature, a target semantic representation, and a target trajectory representation for each spatial unit in the target city. According to one embodiment of the application, the step of generating a single-view intra-representation of each spatial unit under a view corresponding to each modal spatiotemporal data based on each modal spatiotemporal data of each spatial unit comprises: Constructing a relation set based on all space units in the target city; In the relation set, determining an association unit set of each space unit under the view corresponding to each mode space-time data; And carrying out fusion processing on each space unit and the associated unit set of each space unit under the view corresponding to each modal space-time data by adopting a characteristic aggregation function to obtain single-view internal representation of each space unit under the view corresponding to each modal space-time data. According to one embodiment of the present application, the step of generating a multi-view fusion representation for each spatial unit based on the single-view intra-representation of each spatial unit under the view corresponding to all modal spatio-temporal data comprises: Carrying out fusion weight calculation processing on the single-view internal representation of each space unit under the view corresponding to each modal space-time data by adopting an attention mechanism, and obtaining a target fusion weight of each space unit under the view corresponding to each modal space-time data; And carrying out weighted fusion calculation processing on the target fusion weight of each space unit under the view corresponding to all the modal space-time data and the single-view internal representation of each space unit under the view corresponding to all the modal space-time data by adopting a first preset algorithm to obtain the multi-view fusion representation corresponding to each space unit. According to an embodiment of the present application, after the step of performing global aggregation processing on the multi-view fusion characterization matrix to obtain the city universal characterization of the target city, the city space universal character