Search

CN-122019177-A - Heterogeneous computing power unified scheduling method and system for multi-intelligent computing center

CN122019177ACN 122019177 ACN122019177 ACN 122019177ACN-122019177-A

Abstract

The invention discloses a heterogeneous computing power unified scheduling method and system for a multi-intelligent computing center. The intelligent computing center management method comprises the steps of abstracting physical heterogeneous resources of a plurality of intelligent computing centers into Kubernetes virtual nodes through a four-tuple mapping model to form a uniform resource view, establishing a multi-level resource preference constraint expression and matching mechanism based on combination of intelligent computing center identifiers and computing pool identifiers, defining a unified processor type enumeration and structuring capability description framework, decoupling a scheduling process into three stages of filtering, scoring and pre-binding through a multi-stage plug-in scheduling framework, realizing flexible expansion of a scheduling strategy through a standardized plug-in interface, and executing capacity gating and specification availability verification in an admission stage aiming at interactive operation. The method solves the problems of inter-center resource view splitting, heterogeneous equipment description inequality, scheduling strategy rigidification, poor interactive operation experience and the like, and realizes unified, intelligent and efficient scheduling of heterogeneous computing power resources of the multi-intelligent computing center.

Inventors

  • WU JIAWEN
  • ZHAO YONGJIE
  • XIN BO

Assignees

  • 上海国创驭算人工智能科技有限公司

Dates

Publication Date
20260512
Application Date
20260203

Claims (10)

  1. 1. The heterogeneous computing power unified scheduling method for the multi-intelligent computing center is characterized by comprising the following steps of: The method comprises the steps of S1, abstracting virtual nodes, mapping physical heterogeneous computing power resources of a plurality of intelligent computing centers into virtual nodes in a Kubernetes cluster according to a four-tuple mapping model, wherein each virtual node represents a resource combination uniquely determined by intelligent computing center identification, computing power pool identification, processor type and specification model, and synchronizing real-time resource states of the centers by a virtual node controller through carrying metadata by tags; S2, expressing and matching preference constraint, namely receiving a job request submitted by a user, wherein the request comprises multi-level resource preference constraint defined based on binary combination of intelligent computation center identification and computing power Chi Biaoshi, and the multi-level resource preference constraint at least comprises hard constraint and soft preference; S3, uniformly managing the processor types, namely carrying out compatibility verification on the resource requirement of the job request and the processor capacity represented by the virtual node based on a predefined uniform processor type enumeration and processor capacity description framework; S4, multi-stage plugin scheduling is adopted to carry out scheduling decision on the job, wherein the multi-stage plugin scheduling framework at least comprises a filtering stage, a grading stage and a pre-binding stage which are sequentially executed, and each stage executes corresponding filtering, grading and pre-binding operations through plugins conforming to the unified interface specification so as to determine a target virtual node; And S5, interactive job gating, namely executing capacity gating and specification availability check in a job submitting and admitting stage in response to the fact that the job is identified as the interactive job, and feeding back the capacity gating and specification availability check in real time according to a check result.
  2. 2. The heterogeneous computing power unified scheduling method for the multi-intelligent computing center according to claim 1, wherein the multi-level resource preference constraint comprises: hard constraints specifying the mental arithmetic center and/or the mental arithmetic pool to which the job must be scheduled; Soft preferences specifying one or more weighted mental arithmetic centers and mental pool combined preferences; an expansion constraint specifying at least one constraint of a minimum number of available nodes, a maximum network delay, or a cost budget.
  3. 3. The heterogeneous computing power unified scheduling method for the multi-intelligent computing center according to claim 2, wherein in the scoring stage, the rule for scoring the virtual nodes based on the soft preference comprises: if the intelligent computing center identifier and the computing power Chi Biaoshi of the virtual node are completely matched with the identifier appointed by a certain soft preference, the node obtains the full weight score of the preference; If the virtual node only intelligent computation center identification is matched with a certain soft preference item, the node obtains a first proportional weight score of the preference item, which is lower than the full weight; If the virtual node only calculates that the pool identity matches a certain soft preference, the node obtains a second proportional weight score for the preference that is lower than the first proportional.
  4. 4. The heterogeneous computing power unified scheduling method for the multi-intelligent computing center according to claim 1, wherein the unified processor type enumeration at least comprises a graphic processor, a neural network processor, a depth computing unit, a machine learning unit, a general computing unit and a central processing unit, wherein the processor capability description framework is defined by a protocol buffer message format, and the message structure comprises fields for describing basic attributes, computing power, memory specifications, interconnection topology and software stack compatibility; The step S3 of unified management of the processor types further comprises the steps of maintaining a processor specification registry, storing capability description data of each processor by adopting a key value mapping structure, and registering detailed specification and capability information of heterogeneous processors of different manufacturers and models by using a unified structural capability description framework.
  5. 5. The heterogeneous computing power unified scheduling method for the multi-intelligent computing center according to claim 1, wherein the multi-stage plug-in scheduling framework is characterized in that: the filtering operation executed by the filtering stage at least comprises center pool constraint filtering based on hard constraint, compatibility filtering based on the unified processor type, filtering based on resource capacity and filtering based on node stain and job tolerance; The scoring operations performed by the scoring stage at least comprise a central pool preference score based on the soft preference level, a score based on resource utilization, a score based on topological distribution, and a score based on resource cost; The pre-binding phase performs operations including at least reserving resources at the target intelligent computing center, checking user or project quotas, and writing scheduling binding metadata, and the operations support rollback upon failure.
  6. 6. The heterogeneous computing power unified scheduling method for the multi-intelligent computing center according to claim 1, wherein all plugins in the filtering stage, the grading stage and the pre-binding stage realize a predefined programming language interface; the execution sequence and the priority of the plug-ins are arranged and controlled through the configuration file, wherein the plug-ins of the filtering stage are executed in series, and the plug-ins of the grading stage support parallel execution.
  7. 7. The heterogeneous computing power unified scheduling method for the multi-intelligent computing center according to claim 1, wherein the step S5 is characterized by comprising the following steps: S501, automatically identifying a Notebook type interactive job by checking one or more modes of a preset job type label, a container mirror name keyword, an exposed port number and a resource request mode; S502, performing capacity gating on the identified interactive operation on verification dimensions, wherein the verification dimensions comprise a specification model white list, a center concurrency quantity limit, real-time resource availability and user quota; and S503, returning four types of results which are immediately available, insufficient in quantity, temporarily unavailable or permanently unavailable according to the verification result, and providing an alternative specification or waiting time suggestion when the resources are insufficient.
  8. 8. A heterogeneous computing power unified scheduling system for a multi-intelligent computing center, comprising: The virtual node abstraction module is used for abstracting the physical resources of each intelligent computing center into virtual nodes in the Kubernetes cluster according to the four-tuple mapping model, and configuring metadata labels comprising intelligent computing center identifiers, computing pool identifiers, processor types and specification models for the virtual nodes; The resource synchronization controller is connected with the virtual node abstraction module and the plurality of intelligent computation centers and is used for synchronizing the real-time resource states of the centers and updating the resource information of the corresponding virtual nodes; the scheduling request interface is used for receiving a job request submitted by a user; The constraint analysis module is connected with the scheduling request interface and used for analyzing multi-level resource preference constraints defined based on binary combination of intelligent computation center identification and computing power Chi Biaoshi from the job request; the unified specification registry is used for storing and managing heterogeneous processor specification information based on unified processor type enumeration and processor capability description frameworks; The multi-stage plug-in scheduling engine is respectively connected with the constraint analysis module, the unified specification registry and the virtual node abstraction module and is used for loading and sequentially executing a filtering plug-in set, a grading plug-in set and a pre-binding plug-in set so as to determine a target node from the virtual node according to the multi-stage resource preference constraint, the processor compatibility and the resource capacity; And the job pre-verification module is connected with the dispatching request interface and used for identifying the interactive job and executing capacity gating and specification availability verification on the interactive job before dispatching.
  9. 9. The system of claim 8, wherein the resource synchronization controller is operable to communicate with each of the intelligent computing center resource management systems by characterizing state transition application program interfaces, periodically synchronize resource information to update capacity and availability of virtual nodes, and add prohibited scheduling stains to virtual nodes from intelligent computing centers that are not reachable by the network.
  10. 10. The system of claim 8, wherein the job pre-verification module comprises: the job identification unit is used for identifying the Notebook type interactive job by checking one or more modes of a preset job type label, a container mirror name keyword, an exposed port number and a resource request mode; the gating execution unit is used for executing capacity gating including specification and model white list verification, center concurrency number limit check, real-time resource availability inquiry and user quota verification on the identified operation; And the feedback unit is used for generating and returning feedback information comprising the resource availability state and the substitution suggestion according to the gating result.

Description

Heterogeneous computing power unified scheduling method and system for multi-intelligent computing center Technical Field The invention relates to the technical field of cloud computing and computing power scheduling, in particular to a heterogeneous computing power unified scheduling method and system for a multi-intelligent computing center. Background With the rapid development of artificial intelligence and high performance computing applications, centralized single computing power centers have been difficult to meet the ever-increasing large-scale, distributed computing demands. In this context, intelligent computing center networks composed of a plurality of geographically distributed, heterogeneous devices have been developed aimed at improving the utilization efficiency of the overall computing resources and the reliability of the services through pooling and collaborative scheduling. However, under this multi-center, heterogeneous, new computing architecture, traditional resource scheduling techniques face serious challenges, and existing mainstream solutions have the following obvious drawbacks: 1) Heterogeneous computing power resource abstraction and description lack unified standards, and a cloud native orchestration system represented by Kubernetes in the prior art has become a fact standard for data center resource management. Its default scheduler is designed mainly for homogenous general computing resources (e.g. CPU, memory). For heterogeneous accelerators such as GPU, NPU, DPU, although declaration can be performed through an extended resource mechanism, only a simple number identifier is provided per se, and standardized descriptions of key attributes such as device model numbers, architecture substitution, computing power, memory bandwidth, interconnection topology and the like are lacking. The device plugins provided by different vendors (such as NVIDIA, warrior, chilly, etc.) employ respective proprietary resource names and tag systems, which results in the inability of the upper layer scheduling system to understand and match the diverse computing power demands and supplies from a unified, canonical perspective, resulting in complex scheduling logic, fragmentation, and high system maintenance and cross-platform migration costs. 2) The existing scheduling framework (such as a Kubernetes native scheduler and enhancement projects thereof such as Volco) mainly aims at optimizing resource competition in a single cluster, and the scheduling strategy (such as a node selector and affinity/anti-affinity) does not fully consider a multi-level geographically dispersed resource pool model such as an intelligent computing center-computing pool at the beginning of design. Thus, it is difficult for users to express precisely complex trans-center preferences and constraints such as "priority use of Beijing center A100 calculation pool, secondly downgrade to the Shanghai center type pool, and have to avoid the center that is being serviced". Although the multi-cluster management project (such as Karmada) can realize the distribution of the application among the multi-clusters, the policy abstraction level is higher, the awareness and decision support of fine granularity scheduling factors such as calculation power type, resource pool level, cross-center network quality, cost and the like are weaker, and the business awareness, intelligent cross-center calculation power optimization and load balancing cannot be realized. 3) The interactive job scheduling feedback is lagged, the resource matching efficiency is low, and for the interactive jobs requiring quick start, such as Notebook, the conventional scheduling system generally adopts a passive mode of 'submitting first, queuing later and matching later'. When a user submits a job, the user cannot predict the available condition of resources, often falls into long-time queuing waiting, and finally may fail due to insufficient resources. On the one hand, the user experience is poor, and on the other hand, the scheduling system performs a large number of invalid matching attempts, so that the system overhead is increased, and the overall resource utilization efficiency is reduced. The root cause is the lack of a mechanism for real-time capacity gating and intelligent immediate feedback during the job submission admission phase. 4) The dispatching system architecture is tightly coupled, the expandability and maintainability are poor, and the existing dispatching logic is often deeply embedded in the core code of the dispatcher. When a new scheduling policy (e.g., real-time cost based scheduling, topology affinity based scheduling) needs to be added, it is often necessary to directly modify the scheduler source code. The tightly coupled architecture makes the system difficult to expand, and the development, testing and online cycles of the new strategy are long, the risk is high, and the system cannot adapt to the rapidly-changing business requirements and