EP-4738115-A1 - SCHEDULING OF TASKS FOR ACCELERATORS

EP4738115A1EP 4738115 A1EP4738115 A1EP 4738115A1EP-4738115-A1

Abstract

A computer system is configured to schedule tasks to be performed by a plurality of accelerators including a first accelerator and a second accelerator, by performing the steps of: receiving a plurality of tasks; determining, for each of the tasks, a time period during which the first accelerator would be expected to consume maximum power while performing the task; determining, for each of the tasks, a time period during which the second accelerator would be expected to consume maximum power while performing the task; scheduling the tasks across the first and second accelerators according to the determined time periods of maximum power consumption, wherein based on the scheduling, there is no overlap between time periods for maximum power consumption in the first accelerator and time periods for maximum power consumption in the second accelerator; and instructing the first and second accelerators to perform the tasks according to the scheduling.

Inventors

CHEN, XIAOQI
WEI, MICHAEL

Assignees

VMware LLC

Dates

Publication Date: 20260506
Application Date: 20251015

Claims (15)

A computer system including a processor and memory, wherein the processor executes instructions stored in the memory to schedule tasks to be performed by a plurality of accelerators including a first accelerator and a second accelerator, based on information about power consumption, by performing the following steps: receiving a plurality of tasks to be performed on the plurality of accelerators; determining, for each of the tasks, a time period during which the first accelerator would be expected to consume maximum power for the task while performing the task; determining, for each of the tasks, a time period during which the second accelerator would be expected to consume maximum power for the task while performing the task; scheduling the tasks across the first and second accelerators according to the determined time periods of maximum power consumption for the first and second accelerators, wherein based on the scheduling, there is no overlap between time periods for maximum power consumption in the first accelerator and time periods for maximum power consumption in the second accelerator; and instructing the first and second accelerators to perform the tasks according to the scheduling.
The computer system of claim 1, wherein the first accelerator and the second accelerator are different models of accelerators, and the steps further include: determining, for a first task of the tasks, that the first accelerator and the second accelerator have different time periods during which the respective accelerator would be expected to consume maximum power for the first task while performing the first task.
The computer system of claim 1 or claim 2, wherein the steps further include: determining that the time period during which the first accelerator would be expected to consume maximum power for a first task of the tasks while performing the first task is different from the time period during which the first accelerator would be expected to consume maximum power for a second task of the tasks while performing the second task.
The computer system of one of claims 1 to 3, wherein the tasks require inferences to be generated by artificial neural networks (ANNs) executed by the first and second accelerators, and the steps further include: translating each of the tasks into inputs corresponding to input nodes of the ANNs before instructing the first and second accelerators to perform the tasks.
The computer system of one of claims 1 to 4, wherein the first and second accelerators execute on a workload computer of the computer system that is separate from a management computer of the computer system that schedules the tasks across the first and second accelerators, and the steps further include: transmitting, by the management computer over a network to the workload computer, instructions to queue the tasks for performance by the first and second accelerators according to the scheduling.
The computer system of one of claims 1 to 5, wherein the first accelerator executes on a first workload computer of the computer system that is separate from a management computer of the computer system that schedules the tasks across the first and second accelerators, the second accelerator executes on a second workload computer of the computer system, and the steps further include: transmitting, by the management computer over a network to the first and second workload computers, instructions to queue the tasks for performance by the first and second accelerators according to the scheduling.
The computer system of one of claims 1 to 6, wherein the steps further include: retrieving the tasks from a gateway of a data center, by one or more load balancers executing in the computer system.
A method of scheduling tasks to be performed by a plurality of accelerators including a first accelerator and a second accelerator, based on information about power consumption, the method comprising: receiving a plurality of tasks to be performed on the plurality of accelerators; determining, for each of the tasks, a time period during which the first accelerator would be expected to consume maximum power for the task while performing the task; determining, for each of the tasks, a time period during which the second accelerator would be expected to consume maximum power for the task while performing the task; scheduling the tasks across the first and second accelerators according to the determined time periods of maximum power consumption for the first and second accelerators, wherein based on the scheduling, there is no overlap between time periods for maximum power consumption in the first accelerator and time periods for maximum power consumption in the second accelerator; and instructing the first and second accelerators to perform the tasks according to the scheduling.
The method of claim 8, wherein the first accelerator and the second accelerator are different models of accelerators, the method further comprising: determining, for a first task of the tasks, that the first accelerator and the second accelerator have different time periods during which the respective accelerator would be expected to consume maximum power for the first task while performing the first task.
The method of claim 8 or claim 9, further comprising: determining that the time period during which the first accelerator would be expected to consume maximum power for a first task of the tasks while performing the first task is different from the time period during which the first accelerator would be expected to consume maximum power for a second task of the tasks while performing the second task.
The method of one of claims 8 to 10, wherein the tasks require inferences to be generated by artificial neural networks (ANNs) executed by the first and second accelerators, the method further comprising: translating each of the tasks into inputs corresponding to input nodes of the ANNs before instructing the first and second accelerators to perform the tasks.
The method of one of claims 8 to 11, further comprising: transmitting, over a network to a workload computer, instructions to queue the tasks for performance by the first and second accelerators according to the scheduling.
The method of one of claims 8 to 12, further comprising: transmitting, over a network to a first workload computer that includes the first accelerator and to a second workload computer that includes the second accelerator, instructions to queue the tasks for performance by the first and second accelerators according to the scheduling.
A non-transitory computer-readable medium comprising instructions that are executable in a computer system, wherein the instructions when executed cause the computer system to carry out a method of scheduling tasks to be performed by a plurality of accelerators including a first accelerator and a second accelerator, based on information about power consumption according to one of claims 8 to 13.
The non-transitory computer-readable medium of claim 14, wherein the method further comprises: retrieving the tasks from a gateway of a data center, by one or more load balancers executing in the computer system.

Description

Background Data centers are facilities that house large numbers of computers and networking equipment for storing, managing, and distributing data. Data centers execute software on hardware platforms of the computers to provide cloud computing services remotely from users. A growing number of such services are increasingly heavyweight, including those that perform deep learning for artificial intelligence (AI) applications. Computers typically use accelerators for performing computationally intensive tasks for these applications. As used herein, accelerators are specialized hardware designed for performing tasks such as executing artificial neural networks (ANNs) and rending images and video more efficiently than general-purpose central processing units (CPUs). Examples of accelerators include graphics processing units (GPUs), tensor processing units (TPUs), neural processing units (NPUs), and field-programmable gate arrays (FPGAs). The proliferation of computationally intensive applications has increased the demand for accelerators in data centers. Accelerators consume a significant amount of electricity, especially high-performance models thereof. However, while performing tasks, the accelerators typically do not consume power at a constant level. As a rough example, a particular task that a particular GPU takes 10 milliseconds to execute, may cause that GPU to consume 600 watts for 4 milliseconds, and then cause the GPU to consume 300 watts for the remaining 6 milliseconds. In other words, such task may only cause the GPU to execute at a maximum power for said task for the first 4 milliseconds of the task's execution. As used herein, the "maximum power" for a particular task executing on a particular accelerator is the most power (most energy in a given time period) that task causes the accelerator to consume, e.g., 600 watts in the above example. Such maximum power may be as great as the thermal design power (TDP) of an accelerator, which is the maximum heat that accelerator generates, or such maximum power may be less than the TDP. At varying granularities, data centers have power constraints that limit the execution of accelerators. For example, the computers may be organized into racks, and the wires in such racks have finite power capacities, e.g., 6 kilowatts per rack. Accordingly, the racks can only execute limited numbers of accelerators at any given time without exceeding such capacities and damaging equipment. Such number of accelerators is especially low during times when many accelerators are simultaneously consuming maximum power. Accordingly, during such times, the racks are inefficiently using their maximum power capacities, leading to latencies in executing tasks. Computer systems are desired that reduce such latencies when executing tasks on accelerators. Summary One or more embodiments provide a computer system including a processor and memory, wherein the processor executes instructions stored in the memory to schedule tasks to be performed by a plurality of accelerators including a first accelerator and a second accelerator, based on information about power consumption. By executing such instructions, the computer system performs the steps of: receiving a plurality of tasks to be performed on the plurality of accelerators; determining, for each of the tasks, a time period during which the first accelerator would be expected to consume maximum power for the task while performing the task; determining, for each of the tasks, a time period during which the second accelerator would be expected to consume maximum power for the task while performing the task; scheduling the tasks across the first and second accelerators according to the determined time periods of maximum power consumption for the first and second accelerators, wherein based on the scheduling, there is no overlap between time periods for maximum power consumption in the first accelerator and time periods for maximum power consumption in the second accelerator; and instructing the first and second accelerators to perform the tasks according to the scheduling. Further embodiments include a method comprising the above steps and a non-transitory computer-readable storage medium comprising instructions that cause a computer system to carry out the above steps. Brief Description of the Drawings Figure 1 is a block diagram of a computer system in which a first embodiment may be implemented.Figure 2 is a flow diagram of a method that may be performed by computers of the computer system to execute tasks using GPUs, according to the first embodiment.Figure 3 is a block diagram of a computer system in which a second embodiment may be implemented.Figure 4 is a flow diagram of a method that may be performed by computers of the computer system to execute a task using a GPU, according to the second embodiment.Figure 5 is a block diagram of a computer system in which a third embodiment may be implemented.Figure 6 is a flow diagram of a meth