CN-121542049-B - Distributed remote multi-available district calculation power scheduling method
Abstract
The invention discloses a distributed multi-available-area computing power scheduling method, which relates to the technical field of distributed management, and comprises the following steps that a management node collects available area state information and reports the information to a main service node so as to construct a dynamic global resource view; the method comprises the steps of receiving a calculation task by a main service node, analyzing and splitting the calculation task into sub-tasks, calculating the cost of scheduling the sub-tasks to different available areas by constructing a self-adaptive cost function, selecting the lowest cost as a target available area, distributing the sub-tasks to a management node of the target available area, distributing the sub-tasks to the calculation node for execution, returning the aggregated result to the main service node, monitoring the state of the available area, and rescheduling the tasks in the fault available area. The invention realizes the computational power scheduling with low delay, high availability and high resource utilization rate by combining centralized coordination and distributed execution.
Inventors
- TONG TIANLE
- YANG JIAWEI
- GUO SHU
- DING WEIRONG
- LI HONGLI
Assignees
- 贵州算家计算服务有限公司
Dates
- Publication Date
- 20260508
- Application Date
- 20260116
Claims (5)
- 1. The distributed remote multi-available-area computing power dispatching method is operated in a system consisting of a main service node, a central database and N available areas connected through the Internet, wherein each available area comprises a management node and M computing nodes, and is characterized by comprising the following steps: Step S1, a management node of an available area periodically collects state information of the available area and reports the state information to a main service node, and the main service node constructs and maintains a dynamically updated global resource view according to the state information; S2, the main service node receives a calculation task submitted by a user, and analyzes and splits the task into subtasks; step S3, for each subtask, the main service node calculates the cost of the dispatching to each available area by constructing a self-adaptive cost function based on the global resource view, and selects the available area with the lowest cost as a target available area; Step S4, the main service node distributes the subtasks to the management nodes of the target available areas, and each management node distributes the tasks to the computing nodes in the domain to execute; step S5, the calculation node returns the result, and finally reports the result to the main service node after the result is aggregated by the management node; S6, the main service node monitors the states of all the available areas and reschedules the subtasks of the failed available areas; In the step S1, the global resource view is composed of a state tuple of each available area, wherein the state tuple comprises an available computing force vector, a comprehensive load index, a network delay, a network effective bandwidth and a state mark; Wherein the status flags include online, offline, congestion, and error; The comprehensive load index is obtained through weighted summation calculation, the weighted summation comprises the following four items, wherein the first item is the product of a CPU utilization rate average value and a CPU weight coefficient, the second item is the product of a memory utilization rate average value and a memory weight coefficient, the third item is the product of an I/O utilization rate average value and an I/O weight coefficient, and the fourth item is the product of task queue relative saturation and a queue weight coefficient, wherein the task queue relative saturation is obtained by dividing the real-time length of a management node local task queue by the maximum length of the queue, and the calculation formula is as follows: Wherein, the The index of the available region is represented and, The time is represented by the time period of the day, Indicating the integrated load index of the available region i at time t, Representing the average CPU utilization of all computing nodes of the available region, Represents the average memory utilization of all computing nodes of the available region, Representing the average I/O utilization of all computing nodes of the available region, Representing the management node's local task queue length, Indicating the maximum length of the queue, 、 、 And Weight coefficients respectively representing CPU utilization rate, memory utilization rate, I/O utilization rate and queue proportion; In the step S3, the adaptive cost function is formed by adding two parts, wherein the first part is a weighted sum obtained by multiplying the time estimated cost, the network estimated cost and the economic estimated cost with corresponding weight coefficients respectively, the second part is a product of a load penalty factor and a target available area comprehensive load index, and the calculation formula of the adaptive cost function is as follows: Wherein, the Representing the j-th sub-task, Indicating the i-th available region and, Representing the scheduling costs calculated by the primary service node for each subtask and each availability zone, Representing time estimated cost, i.e. estimating subtasks from historical performance data or benchmark test models In the available area Is a function of the runtime on the typical compute node, Representing the estimated cost of the network, Representing economic estimated cost, i.e. executing subtasks The financial cost of the required computing resources and data transmission resources, 、 And Weight coefficients respectively representing the time estimated cost, the network estimated cost and the economic estimated cost, Representing a load penalty factor; in the self-adaptive cost function, the method for calculating the network estimated cost comprises the following steps: calculating input data transmission overhead, wherein the input data transmission overhead is determined by dividing the input data size of a subtask by the input effective bandwidth of a target available area and adding an input network delay value; Calculating output data transmission cost, wherein the output data transmission cost is determined by dividing the estimated output data size of a subtask by the output-to-network delay value of a target available area; calculating the sum of the input data transmission overhead and the output data transmission overhead to obtain the network estimated cost; the calculation formula of the network estimated cost is as follows: Wherein, the Representing subtasks Is used to determine the size of the input data of (a), Representing subtasks The estimated size of the output data is calculated, Representing the primary service node to the availability at time t Is used to determine the bandwidth of the incoming active bandwidth, Indicating the available area at time t Outgoing effective bandwidth to the primary service node, Representing the primary service node to the availability at time t Is used to determine the incoming network delay of (a), Indicating the available area at time t An outbound network delay to the primary service node; In the step S4, the task allocation performed by the management node to the computing nodes in the domain specifically includes the following steps: the management node receives the subtasks from the main service node and adds the subtasks into a local task queue; The local management node calculates a comprehensive mismatching degree score for the subtasks and each calculation node, wherein the comprehensive mismatching degree score comprises a resource fitting item and a cache affinity item; the local management node selects a computing node with the lowest comprehensive mismatch score as a target node, and distributes the subtasks to the selected target node for execution; the calculation formula of the comprehensive mismatch degree score is as follows: Wherein, the Representing the mth computing node within the available region, Representing subtasks to be scheduled And candidate computing nodes Is a combination of the mismatch degree scores of (c), Representing subtasks Is a function of the resource demand vector of (a), Representing computing nodes Is a vector of currently available resources of (a), Representing a distance function between the two vectors, Represents the coefficient of attenuation and, The current time is indicated as such, Representing computing nodes The last time the timestamp of the child task from the same parent task was processed, Representing a natural exponential function.
- 2. The method according to claim 1, wherein said step S6 comprises the steps of: The main service node monitors the state of each management node through a heartbeat mechanism; If the main service node continuously loses the heartbeat signal from a certain management node which reaches the failure judgment threshold value for a plurality of times, judging that the available area to which the main service node belongs fails, marking the state of the available area as offline, and simultaneously removing the available area from the global resource view; the main service node identifies all tasks which are distributed to the fault available area but are not confirmed to be completed, and the tasks are put into a global task queue again and are rescheduled to other healthy available areas; the fault judging threshold is dynamically set according to the heartbeat interval and the maximum fault recovery time target tolerable by the system.
- 3. The method of claim 2, wherein the management nodes communicate with the computing nodes in the available areas through a secure shell protocol based remote management connection for executing commands and transmitting file data through a secure file transfer protocol, and wherein the master service node, the management nodes in each available area and the central database are interconnected through a wide area network communication link constructed based on the public internet or a private network and encrypted using a transport layer security protocol or a secure socket layer protocol.
- 4. A storage medium having instructions stored therein which, when read by a computer, cause the computer to perform a distributed off-site multi-availability zone power scheduling method according to any one of claims 1-3.
- 5. An electronic device comprising a processor and the storage medium of claim 4, the processor executing instructions in the storage medium.
Description
Distributed remote multi-available district calculation power scheduling method Technical Field The invention relates to the technical field of distributed management, in particular to a distributed multi-available-area computing power scheduling method. Background With the comprehensive arrival of the digital economic age, the total data amount is explosive and increased, and the computing power becomes the core productivity. However, the distribution of the computing power resources among regions, levels and subjects is very uneven, namely, the data in the eastern region is enriched, the computing power cost is high, and the computing power resources in the western region are rich and the demand is insufficient. In recent years, distributed computing power scheduling technology has rapidly developed, and the early-stage intra-cluster scheduling is shifted to cross-regional and cross-main-body global scheduling. The core aim is to change the traditional resource type supply into task type service, and finally realize ubiquitous availability and efficient utilization of computing power resources. The efficient and intelligent distributed computing power dispatching has great strategic significance in improving digital competitiveness, promoting east-west collaborative development and promoting industrial digital transformation. The digital base not only solves the structural mismatch of the computational resources and reduces the total computational cost of society, but also supports the front application development of AI large model training, telemedicine, smart cities and the like. Although the distributed computing power dispatching is remarkably developed, the prior art still faces a plurality of challenges and defects in the process of landing, and the problems are mainly solved that computing power resources of different computing power main bodies are high in heterogeneity, so that accurate evaluation and comparison of computing power are difficult to realize in the process of cross-domain dispatching, and optimal matching is difficult to realize. The expansibility of the distributed scheduling system has a bottleneck, and each node is added, so that extra load is brought to the main node, and the construction of a large-scale calculation pool is restricted. The existing scheduling algorithm still has room for improvement in the level of intelligence when dealing with ultra-large-scale and dynamically-changing computing force environments. Meanwhile, how economic factors such as bandwidth cost and the like brought by cross-regional dispatching are balanced with dispatching efficiency, and a mature business mode and a product scheme are lacked. The invention discloses a distributed computing power dispatching system based on AIGC, which comprises a node acquisition module, a task management module, a computing power dispatching module, a monitoring module and a user service module, wherein the node acquisition module is used for acquiring available distributed computing power nodes in the system, the task management module is used for acquiring computing power task information and decomposing the computing power tasks into a plurality of subtasks for execution, the computing power dispatching module is used for completing computing power task allocation to each computing power node by combining historical data and real-time data of the computing power nodes, the monitoring module is used for monitoring task execution conditions of the computing power nodes, and the user service module is used for providing a user interface to complete information interaction between a user and the system. For example, the chinese patent with the grant bulletin number CN115051988B discloses a fusion scheduling system based on distributed computing power, which mainly relates to the technical field of computing power scheduling, and is used for solving the technical problems that the existing computing power scheduling scheme is time-consuming and labor-consuming, can cause data redundancy, has idle computing power in some areas, and causes the waste of computing power resources. The system comprises a registration center module, a scheduling center module, a weight center module, a deployment center module and a scheduling center module, wherein the registration center module is used for acquiring an algorithm list uploaded by the computing force nodes, the scheduling center module is used for receiving and distributing computing force requirements and algorithm deployment requests, the computing force requirements are sequentially issued to all computing force nodes to be detected, the weight center module is used for determining the computing force nodes to be detected, determining the corresponding priority degree of all the computing force nodes to be detected, the deployment center module is used for acquiring available computing force nodes or preset computing force nodes, and deploying the algorithm on any node