Search

CN-121979625-A - Index calculation dynamic scheduling method and system based on topological sorting

CN121979625ACN 121979625 ACN121979625 ACN 121979625ACN-121979625-A

Abstract

The invention discloses a dynamic scheduling method and a system for index calculation based on topological sorting, wherein the method firstly constructs an index blood-edge relation directed acyclic graph according to index metadata and a dependency relation, carries out topological sorting to determine a calculation sequence, and simultaneously initializes a calculable list containing all basic indexes; the system adopts a master-slave distributed architecture, wherein a master node maintains a computable list and the node state in a graph, a plurality of slave nodes circularly request tasks to the master node, the master node responds to the requests and distributes computable indexes to be distributed, the slave nodes notify the master node after executing the computation, the master node immediately marks the node as completed and checks whether the subsequent nodes meet the computable conditions, if yes, the computable list is added, the process is circularly carried out until all the nodes are completed in computation, and the invention realizes parallel execution of independent tasks, effectively reduces waiting and synchronization expenditure, and further improves the computation efficiency and the resource utilization rate.

Inventors

  • ZHUANG YULIN
  • ZHAO YUXIAO
  • LU ZIQI
  • WU ZHE
  • CHEN MINGLI
  • ZHANG NAN
  • LU HAOSHENG
  • Xie Jiacen
  • JIANG SHUSONG

Assignees

  • 南京南瑞瑞中数据股份有限公司
  • 南瑞集团有限公司

Dates

Publication Date
20260505
Application Date
20251208

Claims (10)

  1. 1. The index calculation dynamic scheduling method based on the topological sorting is characterized by comprising the following steps of: Receiving a target index calculation request, and recursively constructing a complete index blood-margin relation graph according to predefined index metadata and dependency relations, wherein the blood-margin relation graph is a directed acyclic graph; Performing topological sorting on the directed acyclic graph, determining the calculation sequence of indexes, and initializing a calculable index list, wherein the index list initially comprises all basic indexes without precursor dependence; a master node maintains the state of each node in the calculable index list and the directed acyclic graph by adopting a master-slave distributed architecture, and a plurality of slave nodes request calculation tasks from the master node in a cyclic request mode; The master node responds to the task request of the slave node, selects unassigned calculable indexes from the calculable index list to be assigned, and marks the indexes as assigned; The slave node executes the distributed index calculation task and sends a task completion notification to the master node after calculation is completed; The master node receives the task completion notification, marks the corresponding node in the directed acyclic graph as completed, traverses all the subsequent nodes of the node, and adds the subsequent nodes into the computable index list if all the predecessor nodes of a certain subsequent node are completed; And repeating the task allocation and task completion notification processing processes until the computable index list is empty and all nodes in the directed acyclic graph are marked as completed.
  2. 2. The method of claim 1, further comprising, after constructing the directed acyclic graph, detecting whether a cyclic dependency exists in the directed acyclic graph by a depth-first search algorithm, and if so, terminating a job and reporting an error.
  3. 3. The method of claim 1, wherein the round robin request scheme comprises: the slave node sends a task request to the master node; If the slave node receives the effective index, reading data from a central database and executing calculation, writing a calculation result into the central database, and then sending the task completion notification to the master node; And if the slave node receives the null response, the slave node waits for a preset time and then resends the task request.
  4. 4. The method of claim 1, wherein the master node employs a mutual exclusion lock mechanism to ensure exclusive access to shared scheduling information, the shared scheduling information including the list of computable metrics and node states of the directed acyclic graph.
  5. 5. The method of claim 1, further comprising a fault tolerant mechanism: the master node monitors the task execution time of the slave node, if the task execution time is overtime, the master node judges that the node is faulty, and the task is re-marked as unallocated and then added into the calculable index list for rescheduling; and the slave node adopts a stateless design, and is reconnected with the master node after the fault is restarted so as to continue to participate in calculation.
  6. 6. The method of claim 1, further comprising a performance optimization strategy: The master node performs load balancing based on historical execution time prediction of the index; the master node distributes tasks for the slave nodes according to a data localization principle; And a buffer mechanism is built in the slave node and is used for storing frequently used intermediate calculation results.
  7. 7. The method of claim 1, wherein the metrics include a base metric, a derived metric, and a composite metric; The basic index is original data directly collected from a data source; the derived index is generated by single index through mathematical operation or aggregation operation; The composite index is generated by combining a plurality of indexes according to a predefined logic rule or mathematical model.
  8. 8. An index calculation dynamic scheduling system based on topological sorting, comprising: at least one master node configured to perform the master node functions in the method of any one of claims 1 to 7 for maintaining an indicator blood-edge relationship directed acyclic graph, a computable indicator list, and task scheduling; A plurality of slave nodes configured to perform the slave node functions in the method of any one of claims 1 to 7 for requesting tasks from the master node, performing index calculations and informing of task completion; And the central database is used for storing index metadata, index dependency relations and index calculation results and is accessed by the master node and the slave node.
  9. 9. A computer readable storage medium having stored thereon a computer program, which when executed by a processor implements the topology ordering based metric calculation dynamic scheduling method of any of claims 1 to 7.
  10. 10. An electronic device comprising a memory, a processor and a program stored on the memory and executable on the processor, characterized in that the processor implements the topology sequencing based index calculation dynamic scheduling method according to any one of claims 1 to 7 when executing the program.

Description

Index calculation dynamic scheduling method and system based on topological sorting Technical Field The invention belongs to the technical field of distributed computation and task scheduling, and particularly relates to a dynamic scheduling method and system for index computation based on topological sorting. Background With the acceleration of the enterprise digital transformation process, the data-driven fine operation and management become core competitive. In business scenes such as financial wind control, retail analysis and monitoring of the Internet of things, real-time and accurate index calculation demands are increasingly highlighted, and particularly when mass data and high concurrency inquiry are faced, the traditional serial processing mode has difficulty in supporting business demands of millisecond-level response. The real-time calculation of the multi-dimensional and compound indexes by enterprises is continuously deepened, the logic relationship among the indexes is increasingly complex, the characteristics of multi-layer nesting and dynamic association are often presented, and higher standards are provided for the parallel processing capacity, the resource elasticity and the scheduling efficiency of the computing system. Currently, static task scheduling schemes based on Directed Acyclic Graphs (DAGs), such as APACHE SPARK and other distributed computing frameworks, are commonly adopted in the industry, a fixed execution plan is built by pre-analyzing task dependency relationships, and batch scheduling is performed according to Stage (Stage) division. Such schemes typically rely on static topology ordering results to layer tasks, where tasks in the same phase are executed in parallel, and data transfer is achieved between phases through a Shuffle or synchronization mechanism. In addition, part of the systems adopt a scheduling strategy based on a priority queue or combine with a resource reservation mechanism to ensure the execution sequence of the key tasks. Although the scheme realizes task parallelism to a certain extent, the static scheduling mechanism lacks dynamic adjustment capability during operation, so that the problems of unbalanced resource utilization, overlong task waiting time and the like easily occur when complex and variable index dependence is processed. Especially at the phase boundary, strict synchronization operation generates larger communication overhead and idle delay, and elasticity of cluster resources cannot be fully utilized. In addition, the general distributed framework is not optimized for common characteristics such as nested dependence and dynamic association in index calculation, so that the scheduling process is stiff, and the method is difficult to adapt to business scenes with high real-time requirements and frequent changes of dependence. Disclosure of Invention The invention aims to provide a dynamic scheduling method and a system for index calculation based on topological ordering, which reduce task waiting time and synchronous expenditure by a topological ordering scheduling mechanism for dynamically sensing task dependency relationship, thereby improving the parallel efficiency and resource utilization rate of index calculation. The technical scheme is that the index calculation dynamic scheduling method based on topological sorting comprises the following steps: Receiving a target index calculation request, and recursively constructing a complete index blood-margin relation graph according to predefined index metadata and dependency relations, wherein the blood-margin relation graph is a directed acyclic graph; Performing topological sorting on the directed acyclic graph, determining the calculation sequence of indexes, and initializing a calculable index list, wherein the index list initially comprises all basic indexes without precursor dependence; a master node maintains the state of each node in the calculable index list and the directed acyclic graph by adopting a master-slave distributed architecture, and a plurality of slave nodes request calculation tasks from the master node in a cyclic request mode; The master node responds to the task request of the slave node, selects unassigned calculable indexes from the calculable index list to be assigned, and marks the indexes as assigned; The slave node executes the distributed index calculation task and sends a task completion notification to the master node after calculation is completed; The master node receives the task completion notification, marks the corresponding node in the directed acyclic graph as completed, traverses all the subsequent nodes of the node, and adds the subsequent nodes into the computable index list if all the predecessor nodes of a certain subsequent node are completed; And repeating the task allocation and task completion notification processing processes until the computable index list is empty and all nodes in the directed acyclic graph are marked as completed. P