Search

CN-122019623-A - Data processing method, apparatus, device, storage medium, and program product

CN122019623ACN 122019623 ACN122019623 ACN 122019623ACN-122019623-A

Abstract

The application discloses a data processing method, a device, equipment, a storage medium and a program product, and relates to the technical field of computers. The method comprises the steps of obtaining a GPS data set corresponding to a query statement and a geometric object set representing an administrative region of a polygonal geometric object of a three-level administrative region of a provincial region, constructing a global space partition index of a geographic range partition crossing the two types of sets and a distributed space index of a tree space formed by organizing a minimum circumscribed rectangle in a single partition according to a geographic space adjacent relation based on the two types of sets, screening out the administrative region corresponding to a GPS feature point in the GPS data set based on the local space index of the single partition, and generating structured data comprising longitude and latitude data of the GPS feature point and the administrative data of the three-level administrative region of the provincial region according to the administrative region corresponding to the GPS feature point. Therefore, the efficiency of associating massive GPS longitude and latitude data with three-level administrative areas and forming structured data comprising administrative area data can be improved.

Inventors

  • WANG YAXIONG
  • WANG RONGHAI
  • WANG YINGZHUO
  • Zhan shuai
  • CAI YU
  • WANG BO
  • GUO BAOLIN
  • ZHENG DAN
  • LIU AN

Assignees

  • 中国银联股份有限公司

Dates

Publication Date
20260512
Application Date
20260116

Claims (16)

  1. 1. A method of data processing, comprising: Acquiring a GPS data set and an administrative region geometric object set corresponding to a query statement, wherein the administrative region geometric object set is a polygonal geometric object set for representing a three-level administrative region of a province and city area; constructing a distributed space index based on the GPS data set and the administrative region geometric object set, wherein the distributed space index comprises a global space partition index and a local space index, the global space partition index is an index of a geographic range partition crossing the GPS data set and the administrative region geometric object set, and the local space index is an index of a tree space formed by organizing a minimum circumscribed rectangle according to a geographic space adjacent relation in a single partition; screening administrative areas corresponding to GPS characteristic points in the GPS data set based on the local spatial indexes of the single subarea, wherein the GPS characteristic points in the GPS data set are not overlapped; And generating structured data according to the administrative region corresponding to the GPS feature points, wherein the structured data comprises longitude and latitude data of the GPS feature points and administrative data of three-level administrative regions of provincial regions.
  2. 2. The method of claim 1, wherein prior to the acquiring the GPS data set and the administrative area geometric object set corresponding to the query statement, the method further comprises: acquiring user input data, wherein the user input data comprises N first query sentences and administrative region division data storage paths, and N is an integer greater than or equal to 2; analyzing the N first query sentences to obtain an initial GPS data source; Constructing the GPS data set based on the initial GPS data source; Acquiring administrative region division data according to the administrative region division data storage path, wherein the administrative region division data comprises data for representing three-level administrative region division relations of province and city areas; And determining the administrative region geometric object set according to the administrative region division data.
  3. 3. The method of claim 2, wherein parsing the N first query sentences to obtain an initial GPS data source comprises: Identifying characteristic objects in each first query statement of the N first query statements through a mark variable to obtain a semantic boundary of each first query statement, wherein the mark variable comprises at least one of an un-terminated quote mark variable and a leading escape character mark; Deleting annotation content related to the feature object in each first query statement based on the semantic boundary of each first query statement to obtain a second query statement corresponding to each first query statement; and taking the table type data structure of the last second query statement in the N second query statements as the initial GPS data source according to the sequence of identifying the query statements.
  4. 4. The method of claim 2 or 3, wherein the user input data further includes a latitude and longitude field name, wherein the constructing the GPS data set based on the initial GPS data source comprises: extracting M groups of longitude and latitude fields from the initial GPS data source according to the longitude and latitude field names, wherein M is an integer greater than or equal to 2; Converting each group of longitude and latitude fields in the M groups of longitude and latitude fields into a point geometric object through a geospatial standard set conversion function; And performing de-duplication and invalid data filtering operation on the converted point geometric object through a redundant service field filtering function to obtain the GPS data set.
  5. 5. The method of claim 2, wherein said determining said set of administrative region geometric objects from said administrative region partition data comprises: Extracting boundary coordinate data corresponding to each administrative region in the provincial region three-level administrative region from the administrative region division data, wherein the boundary coordinate data is used for indicating a space boundary range of the administrative region; Correcting the boundary coordinate data corresponding to each administrative region according to a validity correction rule to obtain corrected boundary coordinate data corresponding to each administrative region, wherein the validity correction rule comprises at least one of a rule for correcting coordinates exceeding a geographic range and a rule for correcting overlapped boundaries; Converting the corrected boundary coordinate data corresponding to each administrative region to obtain polygonal geometric objects of each administrative region; determining the minimum circumscribed rectangle of each administrative region according to the polygonal geometric object of each administrative region through a geometric range statistics algorithm; And associating the minimum circumscribed rectangle of each administrative region with administrative attribute data corresponding to each administrative region to obtain the administrative region geometric object set, wherein the administrative attribute data comprises a level identification of each administrative region in a provincial region three-level, administrative data of each administrative region and upper and lower level association relation data of each administrative region in the provincial region three-level.
  6. 6. The method of claim 1, wherein the constructing a distributed spatial index based on the GPS data set and the administrative region geometric object set comprises: determining a class of index objects and a class of index objects based on the GPS data set and the administrative region geometric object set, wherein the class of index objects are single-point minimum circumscribed rectangles of the GPS feature points, and the class of index objects are polygonal minimum circumscribed rectangles of each administrative region in the administrative region geometric object set; The method comprises the steps of respectively partitioning the geographic range of the class-one index object and the class-two index object according to geographic partitioning rules through a multidimensional balanced partitioning tree partitioner to obtain at least one partition; And constructing the global space partition index based on the at least one partition and the single-point minimum bounding rectangle and the polygon minimum bounding rectangle in each partition.
  7. 7. The method of claim 1 or 6, wherein said constructing a distributed spatial index based on said GPS data set and said administrative region geometric object set comprises: Determining a class of sub-index objects and a class of sub-index objects based on the GPS data set and the administrative region geometric object set for a single partition, wherein the class of sub-index objects are single-point minimum circumscribed rectangles of each GPS feature point in the single partition, and the class of sub-index objects are polygon minimum circumscribed rectangles of each administrative region in the three-level administrative region of the provincial region in the single partition; Respectively constructing indexes for the sub-index objects of the class and the sub-index objects of the class through a rectangular tree index structure algorithm to obtain the local spatial index; The local spatial index comprises a rectangular tree index corresponding to the class of sub-index objects and a rectangular tree index corresponding to the class of sub-index objects.
  8. 8. The method according to claim 7, wherein the indexing the class of sub-index objects and the class of sub-index objects by a rectangular tree index structure algorithm to obtain the local spatial index comprises: Respectively constructing indexes for the sub-index objects of the class and the sub-index objects of the class through a rectangular tree index structure algorithm to obtain a first candidate rectangular tree index corresponding to the sub-index objects of the class and a second candidate rectangular tree index corresponding to the sub-index objects of the class; Respectively constructing a first tree structure corresponding to the first candidate rectangular tree index and a second tree structure corresponding to the second candidate rectangular tree index according to the geospatial adjacent relation; Checking the first tree structure and the second tree structure according to an integrity checking rule, wherein the integrity checking rule comprises checking whether a minimum circumscribed rectangle of a father node in the tree structure completely wraps a minimum circumscribed rectangle of a child node corresponding to the father node or not and checking whether all minimum circumscribed rectangles in each partition are in the tree structure or not; And under the condition that verification is passed, taking the first candidate rectangular tree index as a rectangular tree index corresponding to the class of sub-index objects, and taking the second candidate rectangular tree index as a rectangular tree index corresponding to the class of sub-index objects.
  9. 9. The method of claim 1, wherein the screening out administrative regions corresponding to GPS feature points in the GPS data set based on local spatial indexes of the individual partitions comprises: Based on the local spatial index of a single partition, extracting the polygon minimum circumscribed rectangle of the GPS feature points in the single partition, which is adjacent to the administrative region in the geographic space, so as to obtain a candidate polygon minimum circumscribed rectangle set of the single partition; Screening a target administrative region from a single candidate polygon minimum circumscribed rectangle set of the subareas, wherein the polygon minimum circumscribed rectangle of the target administrative region comprises a single-point minimum circumscribed rectangle of a complete GPS feature point; Traversing all the subareas in the global space subarea index, and summarizing the target administrative areas screened by all the subareas to obtain the administrative areas corresponding to the GPS characteristic points in the GPS data set.
  10. 10. The method of claim 1, wherein generating the structured data according to the administrative regions corresponding to the GPS feature points comprises: sequencing the administrative areas corresponding to the GPS feature points according to the priorities of the three-level administrative areas in the provincial and urban areas; Under the condition that the sequenced administrative region lacks at least one administrative region of the three levels of the provincial region, the administrative region corresponding to the GPS feature points is complemented based on the upper and lower level association relation data of the sequenced administrative region in the three levels of the provincial region; and generating the structured data according to the administrative data of the administrative region after completion corresponding to the GPS feature points.
  11. 11. The method of claim 10, wherein the method further comprises, before the completion of the administrative region corresponding to the GPS feature point, based on upper and lower association data of the ordered administrative region in the third level of the province and city area: Under the condition that the sequenced administrative areas comprise at least two first administrative areas with the same level, and the first administrative areas are the administrative areas with the highest priority in the priorities, calculating the linear distance between the GPS characteristic point and the central point of each first administrative area; And taking the administrative region with the linear distance smaller than or equal to a preset threshold value as the administrative region with the highest priority in the administrative regions corresponding to the GPS feature points.
  12. 12. The method of claim 10 or 11, wherein the priorities of the three-level administrative regions of the provincial region include a priority of the provincial administrative region, a priority of the municipal-level administrative region, and a priority of the region-level administrative region, wherein, The priority of the district-level administrative region is greater than the priority of the city-level administrative region, which is greater than the priority of the provincial-level administrative region.
  13. 13. A data processing apparatus, characterized in that, The acquisition module is used for acquiring a GPS data set and an administrative region geometric object set corresponding to the query statement, wherein the administrative region geometric object set is a polygonal geometric object set representing a three-level administrative region of a province area; the construction module is used for constructing a distributed space index based on the GPS data set and the administrative region geometric object set, wherein the distributed space index comprises a global space partition index and a local space index, the global space partition index is an index of a geographic range partition crossing the GPS data set and the administrative region geometric object set, and the local space index is an index of a tree space formed by organizing a minimum circumscribed rectangle according to a geographic space adjacent relation in a single partition; the screening module is used for screening administrative areas corresponding to GPS characteristic points in the GPS data set based on the local spatial indexes of the single subarea, and the GPS characteristic points in the GPS data set are not overlapped; The generation module is used for generating structured data according to the administrative region corresponding to the GPS feature points, wherein the structured data comprises longitude and latitude data of the GPS feature points and administrative data of three-level administrative regions of provincial regions.
  14. 14. A computer device, the computer device comprising: a processor and a memory storing computer program instructions; The processor, when executing the computer program instructions, implements a data processing method as claimed in any one of claims 1-12.
  15. 15. A computer-readable storage medium, on which computer program instructions are stored which, when executed by a processor, implement a data processing method according to any of claims 1-12.
  16. 16. A computer program product comprising a computer program which, when executed by a processor, implements a data processing method as claimed in any one of claims 1 to 12.

Description

Data processing method, apparatus, device, storage medium, and program product Technical Field The present application relates to the field of computer technologies, and in particular, to a data processing method, apparatus, device, storage medium, and program product. Background In the case of financial big data such as payment wind control, transaction region statistics, and merchant location matching, it is generally required to process hundred million-level global positioning system (Global Positioning System, GPS) longitude and latitude data, such as point of sale (POS) machine positioning data and user equipment positioning data, where the core requirement is to correspond each GPS point to a three-level administrative area of province-city-region, so as to form structured data including administrative areas, so as to support downstream services such as risk identification and regional service analysis. Therefore, how to improve the efficiency of associating massive GPS longitude and latitude data with three-level administrative areas to form structured data including administrative area data in a financial big data scenario is a concern. Disclosure of Invention The embodiment of the application provides a data processing method, a system, a device, equipment, a storage medium and a program product, which can improve the efficiency of forming structured data comprising administrative region data by associating massive GPS longitude and latitude data with three-level administrative regions in a financial big data scene. In a first aspect, an embodiment of the present application provides a data processing method, including: acquiring a GPS data set and an administrative region geometric object set corresponding to the query statement, wherein the administrative region geometric object set is a polygonal geometric object set representing a three-level administrative region of a province and city area; Constructing a distributed spatial index based on the GPS data set and the administrative region geometric object set, wherein the distributed spatial index comprises a global spatial partition index and a local spatial index, the global spatial partition index is an index of a geographic range partition crossing the GPS data set and the administrative region geometric object set, and the local spatial index is an index of a tree space formed by organizing a minimum circumscribed rectangle according to a geographic space adjacent relation in a single partition; Screening administrative areas corresponding to GPS characteristic points in the GPS data set based on local spatial indexes of the single partition, wherein the GPS characteristic points in the GPS data set are not overlapped; and generating structured data according to the administrative region corresponding to the GPS feature points, wherein the structured data comprises longitude and latitude data of the GPS feature points and administrative data of three-level administrative regions of the provincial region. In a second aspect, an embodiment of the present application provides a data processing apparatus, including: The acquisition module is used for acquiring a GPS data set and an administrative region geometric object set corresponding to the query statement, wherein the administrative region geometric object set is a polygonal geometric object set representing a three-level administrative region of a province and city area; The construction module is used for constructing a distributed space index based on the GPS data set and the administrative region geometric object set, wherein the distributed space index comprises a global space partition index and a local space index, the global space partition index is an index of a geographic range partition crossing the GPS data set and the administrative region geometric object set, and the local space index is an index of a tree space formed by organizing a minimum circumscribed rectangle according to a geographic space adjacent relation in a single partition; the screening module is used for screening administrative areas corresponding to GPS characteristic points in the GPS data set based on the local spatial index of the single partition, and the GPS characteristic points in the GPS data set are not overlapped; The generation module is used for generating structured data according to the administrative region corresponding to the GPS feature points, wherein the structured data comprises longitude and latitude data of the GPS feature points and administrative data of three-level administrative regions of the provincial area. In a third aspect, embodiments of the present application provide a computer device comprising a processor and a memory storing computer program instructions; the processor when executing the computer program instructions implements the data processing method as shown in the first aspect. In a fourth aspect, embodiments of the present application provide a computer storage m