CN-121996650-A - Quality inspection and optimization method for global planning data of homeland space
Abstract
The invention provides a quality inspection optimizing method for global planning data of a homeland space, which is characterized in that each layer of global planning data of the homeland space, which needs quality inspection, is split into a plurality of independent tasks to be inspected according to the size of data volume and a splitting strategy, a task list is generated, the splitting strategy is used for splitting data of each layer according to data business characteristics and data space distribution characteristics, a single machine parallel computing mechanism and/or a local area network parallel computing mechanism are used for carrying out parallel quality inspection computation on the plurality of split tasks, quality inspection computation results of each task are generated, and a data quality inspection report of a data packet is generated. The invention splits and combines the data package according to the business and space characteristics, performs parallel computation of single-machine space data, performs distributed computation of idle resource utilization in local area network, and the like, and solves the problems that quality inspection and inspection computation of partial mass data image layers in the frequently-collected territorial space overall planning data are extremely slow or the result cannot be computed.
Inventors
- ZHANG QI
- TU YANG
- CUI BEI
- FU KANG
- ZHANG SIQI
- DENG XIAOHONG
Assignees
- 吉奥时空信息技术股份有限公司
Dates
- Publication Date
- 20260508
- Application Date
- 20260409
Claims (9)
- 1. The quality inspection and optimization method for the global planning data of the homeland space is characterized by comprising the following steps of: Acquiring a data packet of national soil space overall planning data needing quality inspection and examination, splitting each layer in the data packet into a plurality of independent tasks to be inspected according to the data volume and a splitting strategy, and generating a task list, wherein the splitting strategy is to split each layer according to data service characteristics and data space distribution characteristics; Parallel quality inspection and examination calculation is carried out on the split tasks to be inspected based on a single machine parallel calculation mechanism and/or a local area network parallel calculation mechanism, and quality inspection and examination calculation results of each task to be inspected are generated; And collecting quality inspection calculation results of all tasks to be inspected, and generating quality inspection reports of the global planning data of the homeland space.
- 2. The method for optimizing quality inspection of global planning data in a homeland space according to claim 1, wherein the splitting each layer in the data packet into a plurality of independent tasks to be inspected according to the data size and the splitting policy, generating the task list comprises: Extracting a target vector layer set of the data packet, and carrying out depth analysis on each target vector layer in the target vector layer set to obtain data volume information of each target vector layer; Judging whether any one target vector layer needs to be split or not according to the data volume information of the any one target vector layer; if the splitting is not needed, taking any target vector image layer as a task to be inspected; If the splitting is needed, splitting any one target vector layer into a plurality of tasks to be inspected based on the data service characteristics and the data space distribution characteristics of the target vector layer; and merging all tasks to be inspected to generate a global task queue.
- 3. The method for quality inspection and optimization of global planning data in a homeland space according to claim 2, wherein extracting a set of target vector layers of the data packet, performing a depth analysis on each target vector layer in the set of target vector layers, and obtaining data volume information of each target vector layer comprises: extracting all vector layers from a file database according to the structure definition of the data packet; Screening all target vector layers from all vector layers based on standard database specifications; Performing depth analysis on each target vector layer to generate meta-information containing geometric types, element numbers and element node numbers, wherein the element numbers and the element node numbers in the meta-information form data volume information of the target vector layers; judging whether any one target vector layer needs to be split according to the data volume information of the target vector layer, including: if the element number and the element node number of any one of the target vector layers are smaller than the corresponding minimum threshold values, the any one of the target vector layers does not need to be split; The element number and the element node number of any one target vector layer are both larger than the corresponding maximum threshold, and then any one target vector layer needs to be split; For other cases, determining whether the any one of the target vector layers needs to be split based on the geometric type, the element number and the element node number of the any one of the target vector layers.
- 4. A method for quality inspection and optimization of global planning data in a homeland space according to claim 3, wherein for other cases, determining whether any one of the target vector layers needs to be split based on the geometric type, the number of elements and the number of element nodes of the target vector layer comprises: Counting the element number N, the element node number V and the geometric type complexity coefficient G of any target vector layer; Configuring the maximum element number T_feature_max, the maximum element node number T_vertex_max and the geometric complexity maximum G_max which can be contained in a target vector image layer, and configuring element number weight W_feature, element node number weight W_vertex and geometric type weight W_geometry, wherein W_feature+W_vertex+W_geometry=1; And respectively normalizing the element number N, the element node number V and the geometric type complexity coefficient G of any target vector layer: Norm_Feature = min(1.0, N / T_feature_max); Norm_Vertex = min(1.0, V / T_vertex_max); Norm_Geometry = G; Wherein, norm_Fenature is the number of elements after standardization, norm_Vertex is the number of element nodes after standardization, norm_Geome is the geometric complexity coefficient G after standardization; Calculating the segmentation score of any target vector layer: Score = W_feature×Norm_Feature+W_vertex×Norm_Vertex+ W_geometry×Norm_Geometry; if the Score is more than or equal to a preset segmentation threshold S_threshold, any one of the target vector layers needs to be segmented, otherwise, any one of the target vector layers does not need to be segmented.
- 5. The method of claim 2, wherein if splitting is required, splitting the any one of the target vector layers into a plurality of tasks based on the data traffic characteristics and the data spatial distribution characteristics of the any one of the target vector layers, comprising: based on the data service characteristics of any one of the target vector layers, carrying out data splitting on any one of the target vector layers; Or based on the data space distribution characteristics of any one target vector layer, splitting the data of any one target vector layer; or splitting the data of any one of the target vector layers based on the data service characteristics of any one of the target vector layers, and splitting the data of any one of the target vector layers based on the data space distribution characteristics of any one of the target vector layers.
- 6. The method of claim 5, wherein splitting the data of any one of the target vector layers based on the data traffic characteristics of the target vector layer, comprises: If any one of the target vector layers contains service attribute characteristics and space distribution characteristics, carrying out data splitting according to the unique value in the attribute field of the any one of the target vector layers to generate a plurality of split tasks to be inspected; and if any one of the target vector layers does not contain the business attribute characteristics, indirectly splitting the any one of the target vector layers based on space matching: the standard layer is disassembled into independent space graphic elements corresponding to the unique values of the attribute fields according to the unique values of the attribute fields; And taking a plurality of independent space graphs as screening ranges, performing topological operation of a space relation on any target vector layer, screening out target layer data of unique values of each attribute field, wherein the target layer data of the unique values of each attribute field is a task to be inspected.
- 7. The method of claim 5, wherein the splitting the data of any one of the target vector layers based on the data space distribution characteristics of the target vector layers comprises: dividing any one of the target vector layers into a dot layer, a line layer and a surface layer according to geometric types; The dot pattern layer is split based on the adaptive quadtree index method, and the dot pattern layer comprises the following steps: Starting an undivided dot pattern layer as a root node, recursively checking the number of dot elements in each node, dividing the node into 4 sub-nodes if the number of dot elements exceeds a set number threshold, and repeating the splitting process for each sub-node until the number of dot elements in all the sub-nodes is smaller than or equal to the set number threshold, wherein all leaf nodes are all the split sub-blocks, and one sub-block corresponds to one task to be inspected; the line graph layer is split based on an STR tree segmentation algorithm, and the method comprises the following steps: a, carrying out ascending arrangement on all line elements in a current node to be segmented according to the core coordinates of the x-axis dimension of each line element, wherein the core coordinates refer to the center coordinates of the line elements; b, calculating a segmentation score m=ceil (N/K) according to a quantity threshold K, wherein N is the total number of line elements in the current node, and ceil () is an upward rounding function; c, dividing all line elements into m subgroups according to the sequence of the core coordinates of the x-axis dimension after sequencing, dividing the minimum external moment MBR of the current node to be segmented into m sub-MBRs according to the corresponding x-axis dimension, wherein each subgroup corresponds to one sub-MBR, namely one sub-node; d, for each sub-node, checking whether the number of the line elements in the sub-node exceeds a number threshold K, if not, the sub-node is a final segmentation result, if so, switching to a y-axis dimension, and according to the core coordinates of the y-axis dimension of each line element, carrying out ascending order on all the line elements of the sub-node, and calculating a segmentation score m=ceil (N/K) according to the number threshold K, wherein N represents the total number of all the line elements in the sub-node; e, dividing all line elements into m subgroups according to the sequence of the core coordinates of the y-axis dimension after sequencing, dividing the minimum external moment MBR of the current node to be segmented into m sub-MBRs according to the corresponding y-axis dimension, wherein each subgroup corresponds to one sub-MBR, namely one sub-node; f, for each sub-node, checking whether the number of the line elements in the sub-node exceeds a number threshold K, if not, the sub-node is a final segmentation result, and if so, returning to the step a until the number of the line elements in all the sub-nodes does not exceed the number threshold K, wherein each leaf node corresponds to a task to be inspected; Wherein, based on R tree, splitting the face layer using a plug-in-break strategy: calculating MBR for each effective surface element to form a surface element set F to be processed; initializing an empty R tree, configuring core parameters, namely a leaf node maximum factor number M and a minimum filling factor h, and determining a tree basic structure; traversing the surface element set F to be processed, and inserting each surface element into a leaf node with minimum MBR expansion after insertion; If the number of the face elements in the inserted leaf node exceeds M, reinserting part of the face elements in the leaf node to other nodes; If the number of the surface elements in the reinserted leaf nodes still exceeds M, respectively calculating the total area increment and the overlapped area of the MBR of the two segmented nodes under the dimension of the x axis and the dimension of the y axis, and selecting the dimension with the minimum increment and the minimum overlapped area as a segmentation axis; Sorting all face elements in the leaf node according to coordinates on a segmentation axis, traversing segmentation points meeting M is less than or equal to |S1| is less than or equal to M-M, selecting the segmentation point with the smallest total area increment, and splitting all the face elements in the leaf node into two new leaf nodes S1 and S2, wherein|S1| refers to the number of the face elements in the leaf node S1 after segmentation; If the number of the surface elements in the segmented leaf node S2 also exceeds M, the leaf node S2 is segmented again until the number of the surface elements in all the leaf nodes does not exceed M, and each leaf node corresponds to a task to be inspected.
- 8. The method for optimizing quality inspection of global planning data in a homeland space according to claim 1, wherein the generating the quality inspection calculation result of each task based on the parallel quality inspection calculation of the split plurality of tasks by the single machine parallel computing mechanism and/or the local area network parallel computing mechanism comprises: For the quality inspection task to be selected for single-machine parallel computing, the quality inspection computing is performed on a plurality of quality inspection tasks based on a single-machine parallel computing mechanism: a', the main process detects the number of available CPU cores on a single machine, and determines the number of working processes based on the number of available CPU cores; b', initially distributing all the tasks to be inspected to all the working processes according to the number of the tasks to be inspected and the number of the working processes, the computing performance of each working process, the data size and the computing complexity of each task to be inspected; c', calculating the polling weight of each working process based on the load state of each working process in the process of executing the task to be inspected by the working process; d', transferring part of tasks to be inspected on the working process with the polling weight smaller than the preset threshold to the working process with the polling weight larger than the preset threshold; e ', repeatedly executing c ' and d ' until all tasks to be inspected are calculated; For the task of selecting local area network parallel computing, the task parallel quality inspection computation is performed on a plurality of tasks based on a local area network parallel computing mechanism: a ", scanning computing nodes in the local area network, and screening available computing nodes according to the delay time and the resource score of each computing node; b ", distributing a corresponding number of tasks to be inspected for each computing node according to the computing capacity of each available computing node and the data size and the computing complexity of each task to be inspected in the global task queue; c ", if the global task queue still has unallocated tasks to be inspected, calculating the priority dynamic weight of each available computing node according to the resource score of each available computing node in the process of executing the available computing node; d ", distributing part or all of the tasks to be inspected which are not distributed as available computing nodes with the largest priority dynamic weights; e ', repeatedly executing the steps c ' and d ', until all the tasks to be inspected in the global task queue are calculated.
- 9. The method for optimizing quality inspection of global planning data of a homeland space according to claim 1, wherein the assembling the quality inspection calculation results of the plurality of tasks comprises: And taking the target vector layer as a unit, gathering quality inspection calculation results of a plurality of tasks to be inspected of the same target vector layer, and generating a quality inspection report of each target vector layer.
Description
Quality inspection and optimization method for global planning data of homeland space Technical Field The invention relates to the technical field of data quality inspection and examination, in particular to a quality inspection and examination optimizing method for overall planning data in a homeland space. Background In the quality inspection and examination process of the current global planning and assembly data of the homeland space, the situation that complex topology or attribute logic relation calculation is frequently carried out on single-layer or multi-layer of massive data exists. In the case of particularly large data volumes, partial layers of the individual delivery results are calculated abnormally slowly or cannot be calculated at all. The current data quality inspection and examination mainly comprises a quality inspection examination tool based on a client side and a quality inspection examination service based on a server side (the prior art is introduced in the aspects of existing patents, documents, products and the like). The client quality inspection and examination tool (CS end) is generally used for automatically or semi-automatically calculating, checking and modifying topology, attributes and various logic problems existing in data, and the quality and logic requirements of the homeland space planning data are regularly defined based on specific algorithm, calculation rule engine or data classification and other technologies in the tool, so that the client quality inspection and examination tool has the characteristics of automation, light weight, high expandability and the like. The data quality inspection and examination scheme has the defect that the existing data quality inspection and examination tool at the C/S end can only perform quality inspection and examination calculation on data with proper data quantity in terms of topological relation, attribute logic, spatial attribute consistency and the like. Along with the gradual refinement of the homeland space planning work, the scale of the data becomes larger and larger, so that the data volume becomes larger and larger, and therefore, under the condition that the data volume of partial layers in some converged data is particularly large, the situation that a calculation result cannot be normally obtained or the calculation is abnormally time-consuming in the calculation process can occur. There are inconveniences that occur for a scene in which it is desired to quickly know the calculation result and perform the subsequent work. The service end data quality inspection and examination program (BS end) generally uses a spatial big data calculation technology to perform full-automatic data quality inspection and examination by using distributed calculation capacity (such as Spark, flink, hadoop and the like) and customized rules and algorithms, and then realizes data preview and interaction by a webgis technology. The method has the disadvantages of high technical threshold, complex deployment, possible occurrence of space computing performance bottleneck, weak interactivity and visualization capability, and high cluster hardware (server, storage) and maintenance cost. Disclosure of Invention Aiming at the technical problems existing in the prior art, the invention provides a quality inspection and optimization method for the global planning data of the homeland space, which overcomes the defect of low quality inspection and inspection efficiency of the global planning data of the homeland space in the prior art. The invention provides a quality inspection optimizing method for global planning data of a homeland space, which comprises the following steps: Acquiring a data packet of national soil space overall planning data needing quality inspection and examination, splitting each layer in the data packet into a plurality of independent tasks to be inspected according to the data volume and a splitting strategy, and generating a task list, wherein the splitting strategy is to split each layer according to data service characteristics and data space distribution characteristics; Parallel quality inspection and examination calculation is carried out on the split tasks to be inspected based on a single machine parallel calculation mechanism and/or a local area network parallel calculation mechanism, and quality inspection and examination calculation results of each task to be inspected are generated; And collecting quality inspection calculation results of all tasks to be inspected, and generating quality inspection reports of the global planning data of the homeland space. According to the quality inspection and examination optimizing method for the global planning data of the homeland space, which is provided by the invention, the quality inspection and examination calculating of partial mass data layers in the global planning data of the homeland space which are frequently crossed are slow or the result can not be calculated