CN-121996427-A - Calculation force self-adaptive scheduling method, device, equipment and medium for AI server cluster

CN121996427ACN 121996427 ACN121996427 ACN 121996427ACN-121996427-A

Abstract

The method comprises the steps of collecting multidimensional resource state data and workload data corresponding to the AI server clusters in real time through a global monitoring module, dynamically predicting short-term workload and long-term workload of each AI server, generating prediction load data corresponding to each AI server, adaptively adjusting the computing power of each AI server cluster according to the prediction load data, the multidimensional resource state data and the workload data, generating an initial scheduling strategy of the AI server clusters corresponding to current computing power tasks, finely adjusting the initial scheduling strategy based on a resource allocation optimization algorithm of a hypergraph model, and generating a target resource allocation strategy of the AI server clusters. Therefore, the adaptive adjustment of the computing power of the AI server cluster is realized, and the task computing efficiency of the AI server cluster is improved.

Inventors

ZHU RONG
HU RUYUN

Assignees

深圳市宇晟实业有限公司

Dates

Publication Date: 20260508
Application Date: 20260211

Claims (10)

1. An adaptive computing power scheduling method for an AI server cluster, the method comprising: Acquiring multidimensional resource state data and workload data corresponding to an AI server cluster in real time, wherein the multidimensional resource state data is used for indicating that multidimensional resources corresponding to each AI server are in an enabling state or a dormant state currently, and the workload data is used for indicating the overall load of the multidimensional resources corresponding to each AI server; according to the workload data and the historical load data of each AI server, dynamically predicting short-term workload and long-term workload of each AI server through a multi-scale space-time attention prediction model, and generating predicted load data corresponding to each AI server respectively; According to the predicted load data, the multidimensional resource state data and the workload data, the computing power of each AI server cluster is adaptively adjusted through a reinforcement learning algorithm of multi-objective constraint, and an initial scheduling strategy of the AI server cluster corresponding to the current computing power task is generated, wherein the multi-objective constraint comprises at least one of load balancing constraint, communication cost constraint and task resource demand constraint; and carrying out fine adjustment on the initial scheduling strategy based on a resource allocation optimization algorithm of a hypergraph model to generate a target resource allocation strategy of the AI server cluster, and carrying out scheduling allocation on the computing tasks through the target resource allocation strategy to generate computing task allocation schemes corresponding to all the AI servers, wherein supernodes in the hypergraph model represent AI server nodes, and supersides represent AI computing tasks and are connected with all AI server node sets meeting the task resource requirements.
2. The adaptive computing power scheduling method of claim 1, wherein the generating predicted load data corresponding to each AI server by dynamically predicting short-term workload and long-term workload of each AI server according to the workload data and historical load data of each AI server through a multi-scale spatiotemporal attention prediction model includes: decomposing the historical load data to generate trend items, period items and residual items corresponding to the historical load data, and modeling by using polynomial regression, fourier series decomposition and long-term and short-term memory networks respectively to generate a cluster physical topology and network connection relation model, wherein the cluster physical topology and network connection relation model is a graph model; capturing association among all nodes in the cluster physical topology and network connection relation model through a graph rolling network so as to generate a load association relation and a propagation characteristic relation among all nodes; according to the load association relation and the propagation feature relation, the time sequence features and the topology association features in the cluster physical topology and network connection relation model are fused through a space-time attention mechanism, and a multi-step prediction load result corresponding to future load data of each AI server is generated; and determining the predicted load data of each AI server through an adaptive selector based on the integrated learning according to the multi-step predicted load result.
3. The method for adaptive computing power scheduling of AI server clusters according to claim 2, wherein capturing, by a graph convolutional network, associations between nodes in the cluster physical topology and network connection relationship model to generate load association relationships and propagation feature relationships between nodes comprises; acquiring adjacent nodes between each node and a connection relation between each node and the adjacent nodes in the cluster physical topology and network connection relation model; And determining the load association relation and the propagation characteristic relation between each two nodes according to the adjacent nodes and the connection relation.
4. The adaptive algorithm scheduling method of AI server cluster according to claim 1, wherein the resource allocation optimization algorithm based on hypergraph model performs fine adjustment on the initial scheduling policy to generate a target resource allocation policy of the AI server cluster, and the method comprises the following steps of; Obtaining the hypergraph model, wherein the hypernodes in the hypergraph model represent all the AI server nodes, the hypergraph model comprises multidimensional resource attribute vectors, and the hyperedges represent AI calculation tasks and are connected with all the AI server node sets capable of meeting the task resource requirements; The resource allocation problem is formed into a constrained hypergraph balanced division problem, the optimization target is to minimize the cross-node communication cost, and meanwhile, task resource requirements are met and load balance among nodes is kept so as to generate a hypergraph division algorithm to be solved; According to the initial scheduling strategy, adopting iteration improved layering to solve the hypergraph dividing algorithm to be solved so as to generate an optimal dividing mode corresponding to the hypergraph model; and generating the target resource allocation strategy of the AI server cluster according to the optimal partitioning mode and the corresponding AI server node set.
5. The method of adaptive computing power scheduling for AI server clusters of any of claims 1-4, further comprising: And optimizing the multi-scale space-time attention prediction model and the multi-target constraint reinforcement learning algorithm according to the target resource allocation strategy to generate a target multi-scale space-time attention prediction model and a target multi-target constraint reinforcement learning algorithm.
6. The adaptive computing power scheduling method of AI server cluster according to claim 5, wherein the scheduling and distributing the computing power tasks by the target resource allocation policy, generating computing power task allocation schemes corresponding to the AI servers, includes: Acquiring real-time data indexes corresponding to all the AI servers; According to the target resource allocation strategy and the real-time data index, carrying out horizontal copy number expansion or vertical resource quota adjustment on the calculation power of each AI server; Determining AI server nodes needing to be added or removed in the AI server cluster and computing force tasks needing to be split according to the computing force tasks and the horizontal copy number expansion or the vertical resource quota adjustment; And generating the calculation task allocation scheme corresponding to each AI server according to the calculation task to be split and the AI server node to be added or removed.
7. The adaptive computing power scheduling method of AI server cluster according to claim 5, wherein the scheduling and distributing the computing power tasks by the target resource allocation policy, generating computing power task allocation schemes corresponding to the AI servers, includes: acquiring a hierarchical service quality requirement corresponding to the calculation task; scheduling and distributing the computing power task through the target resource distribution strategy to generate an initial computing power task distribution scheme; And according to the hierarchical service quality requirement, the initial computing power task allocation scheme is adjusted, and the computing power task allocation scheme corresponding to each AI server is generated.
8. An adaptive computing power scheduling apparatus for an AI server cluster, the apparatus comprising: the system comprises an acquisition module, a storage module and a storage module, wherein the acquisition module is used for acquiring multidimensional resource state data and workload data corresponding to an AI server cluster in real time, the multidimensional resource state data are used for indicating that multidimensional resources corresponding to each AI server are in an enabling state or a dormant state currently, and the workload data are used for indicating the overall load of the multidimensional resources corresponding to each AI server; The first generation module is used for dynamically predicting short-term workload and long-term workload of each AI server through a multi-scale space-time attention prediction model according to the workload data and historical load data of each AI server, and generating predicted load data corresponding to each AI server respectively; the second generation module is used for adaptively adjusting the computing power of each AI server cluster through a reinforcement learning algorithm of multi-objective constraint according to the predicted load data, the multi-dimensional resource state data and the workload data, and generating an initial scheduling strategy of the AI server cluster corresponding to the current computing power task, wherein the multi-objective constraint comprises at least one of load balancing constraint, communication cost constraint and task resource demand constraint; And the execution module is used for carrying out fine adjustment on the initial scheduling strategy based on a resource allocation optimization algorithm of a hypergraph model, generating a target resource allocation strategy of the AI server cluster, carrying out scheduling allocation on the computing tasks through the target resource allocation strategy, and generating a computing task allocation scheme corresponding to each AI server, wherein in the hypergraph model, supernodes represent AI server nodes and supersides represent AI computing tasks and are connected with all AI server node sets meeting the task resource requirements.
9. An electronic device, comprising: A memory having a computer program stored thereon; a processor for executing the computer program in the memory to implement the steps of the method of any one of claims 1-7.
10. A computer readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the steps of the method according to any one of claims 1-7.

Description

Calculation force self-adaptive scheduling method, device, equipment and medium for AI server cluster Technical Field The disclosure relates to the technical field of communication, in particular to a computing power self-adaptive scheduling method, device, equipment and medium of an AI server cluster. Background Along with the wide application and popularization of artificial intelligence and large models, the AI computing power resource demands are increasingly large, and the trend of exponential growth is presented. At present, the main requirement of the server is an AI server of an intelligent computing center, a plurality of GPU cards are arranged on the AI server to provide AI computing power, and AI services of a data center are deployed on the GPU cards of the AI server resource pools. The AI server of the current server mainly uses a GPU to execute AI calculation tasks, the CPU is responsible for data input and AI business program execution, and in an intelligent computing center, the GPU is the main AI calculation force and is scheduled and managed through a cloud platform. In order to improve the management efficiency and the utilization rate of the computing resources, the current server needs to schedule and manage the computing resources through a cloud platform by a virtualization technology. In the related art, a GPU card virtualizes a PF (Physical Function ) into a plurality of VF (Virtual Function) devices for use by a Virtual machine through an SR-IOV virtualization technology, thereby improving the power utilization of the GPU. However, at present, cold migration technology is mostly adopted in the SR-IOV scene, cold migration is interrupted for a long time, the continuous working scene is required for training and reasoning, the service is greatly influenced, and the waste and unbalance of GPU (graphics processing Unit) computational resources are caused. The computational power scheduling algorithm adopted by the thermal migration technology in other scenes cannot determine the optimal target computing node of the thermal migration in the SR-IOV scene because the characteristics of the SR-IOV scene are not fully considered. Disclosure of Invention The disclosure aims to provide a computing power self-adaptive scheduling method, device, equipment and medium for an AI server cluster, so as to solve the technical problem that the computing power of the server cannot be self-adaptively adjusted. To achieve the above object, a first aspect of the present disclosure provides a computing power adaptive scheduler of an AI server cluster, the method comprising: Acquiring multidimensional resource state data and workload data corresponding to an AI server cluster in real time, wherein the multidimensional resource state data is used for indicating that multidimensional resources corresponding to each AI server are in an enabling state or a dormant state currently, and the workload data is used for indicating the overall load of the multidimensional resources corresponding to each AI server; according to the workload data and the historical load data of each AI server, dynamically predicting short-term workload and long-term workload of each AI server through a multi-scale space-time attention prediction model, and generating predicted load data corresponding to each AI server respectively; According to the predicted load data, the multidimensional resource state data and the workload data, the computing power of each AI server cluster is adaptively adjusted through a reinforcement learning algorithm of multi-objective constraint, and an initial scheduling strategy of the AI server cluster corresponding to the current computing power task is generated, wherein the multi-objective constraint comprises at least one of load balancing constraint, communication cost constraint and task resource demand constraint; And carrying out fine adjustment on the initial scheduling strategy based on a resource allocation optimization algorithm of a hypergraph model, generating a target resource allocation strategy of the AI server cluster, carrying out scheduling allocation on the computing tasks through the target resource allocation strategy, and generating a computing task allocation scheme corresponding to each AI server, wherein in the hypergraph model, supernodes represent AI server nodes, and supersides represent AI computing tasks and are connected with all AI server node sets meeting the task resource requirements. Optionally, in some embodiments, the generating, according to the workload data and the historical load data of each AI server, the predicted load data corresponding to each AI server by dynamically predicting the short-term workload and the long-term workload of each AI server through a multi-scale spatiotemporal attention prediction model includes: decomposing the historical load data to generate trend items, period items and residual items corresponding to the historical load data, and mo