KR-102964757-B1 - Power performance-based system management
Abstract
The method of the present invention comprises: receiving a workload for a computer system; sweeping at least one parameter of the computer system while executing the workload; monitoring one or more features of the computer system while sweeping the at least one parameter—including the total power consumption of the computer system—; generating a power profile for the workload representing each selected value for the at least one parameter based on an analysis of the monitored total power consumption of the computer system while sweeping the at least one parameter; and executing the workload based on each selected value of the at least one parameter.
Inventors
- 류, 양
- 쉬, 위에
- 구, 펑 페이
- 리, 멍
- 자오, 싱
Assignees
- 인터내셔널 비지네스 머신즈 코포레이션
Dates
- Publication Date
- 20260513
- Application Date
- 20210616
- Priority Date
- 20200618
Claims (20)
- As a method, the above method is: A step of receiving a workload for a computer system; Sweeping at least one parameter of the computer system while executing the above workload; A step of monitoring one or more characteristics of the computer system while sweeping at least one parameter - the one or more characteristics include the total power consumption of the computer system - ; A step of generating a power profile for the workload representing each selected value for the at least one parameter based on an analysis of the total monitored power consumption of the computer system while sweeping the at least one parameter; and A step comprising executing the workload based on each of the selected values of at least one parameter, method.
- The method of claim 1 further comprises the step of receiving one or more constraints relating to at least one parameter of the computer system. method.
- In claim 1, the method further comprises the step of dividing the workload into two or more stages, and The step of sweeping at least one parameter further includes the step of sweeping at least one parameter for each of the two or more stages, and The step of monitoring one or more of the above features includes the step of monitoring one or more of the above features while sweeping at least one parameter for each of the two or more stages, and The step of generating the power profile includes the step of generating a power profile for each of the two or more stages. method.
- In claim 1, the step of sweeping at least one parameter comprises sweeping at least one of a central processing unit (CPU) frequency, a graphics processing unit (GPU) frequency, the number of active cores in a multicore processor, memory bandwidth, network bandwidth, and device state. method.
- In claim 1, the workload is a first workload, and the step of executing the workload based on each of the selected values of the at least one parameter is, A step of comparing the power profile of the first workload with the respective power profiles of one or more other workloads; A step of identifying a compatible workload based on a comparison between the power profile of the first workload and the respective power profiles of one or more other workloads; and The method further includes the step of scheduling the compatible workload to be executed concurrently with the first workload. method,
- In claim 1, the one or more monitored features of the computer system include one or more of a central processing unit (CPU) power consumption, a graphics processing unit (GPU) power consumption, a fan power consumption, a memory power consumption, a disk power consumption, memory bandwidth, memory latency, disk input/output bandwidth, and network bandwidth. method.
- In claim 1, the method further comprises the step of receiving an initial power profile for the workload, and The step of generating the power profile includes updating the initial power profile based on an analysis of the total monitored power consumption of the computer system while sweeping the at least one parameter. method.
- As a computer management system, the computer management system is, Storage device; and It includes a processor coupled to the storage device via communication, The above processor is, Receive a workload for a computer system, While the above workload is being executed, at least one parameter of the computer system is repeatedly adjusted, While adjusting at least one parameter, one or more features of the computer system are monitored - the one or more features include the total power consumption of the computer system - , A power profile for the workload representing each selected value for the at least one parameter is generated based on an analysis of the total power consumption of the computer system while sweeping the at least one parameter, and The power profile is stored on the storage device, and Configured to execute the workload based on the above power profile, Computer Management System.
- In claim 8, the processor is further configured to receive one or more constraints regarding the at least one parameter of the computer system, Computer Management System.
- In claim 8, the processor, Divide the above workload into two or more stages, and For each of the two or more stages mentioned above, the at least one parameter is repeatedly adjusted, While adjusting the at least one parameter for each of the two or more stages, the one or more features are monitored, and Further configured to generate a respective power profile for each of the two or more stages mentioned above, Computer Management System.
- In claim 8, the processor is configured to repeatedly adjust at least one of a central processing unit (CPU) frequency, a graphics processing unit (GPU) frequency, the number of active cores in a multicore processor, memory bandwidth, network bandwidth, and a device state. Computer Management System.
- In claim 8, the workload is a first workload, and the processor is, Compare the power profile of the first workload with the respective power profiles of one or more other workloads, and Identifying compatible workloads based on a comparison between the power profile of the first workload and the respective power profiles of one or more other workloads, and Further configured to schedule the compatible workload to be executed concurrently with the first workload, Computer Management System.
- In claim 8, the monitored one or more features of the computer system include one or more of a central processing unit (CPU power usage, graphics processing unit (GPU) power usage, fan power usage, memory power usage, disk power usage, memory bandwidth, memory latency, disk input/output bandwidth, and network bandwidth. Computer Management System.
- In claim 8, the processor, Receive an initial power profile for the above workload, and Further configured to update the initial power profile based on an analysis of the total monitored power consumption of the computer system while adjusting at least one parameter, Computer Management System.
- As a computer management system, the computer management system is, power-performance management engine - said power-performance management engine is configured to sweep at least one parameter of a computer system while a workload is executed, and to monitor one or more features of the computer system while sweeping the at least one parameter, said one or more features including the total power consumption of the computer system, and to generate a power profile for the workload representing each selected value for said at least one parameter based on an analysis of the monitored total power consumption of the computer system - ; and A power-performance workload scheduler configured to schedule the workload for execution based on the generated power profile, Computer Management System.
- In claim 15, the workload is a first workload, and the power performance workload scheduler is: A step of comparing the power profile of the first workload with the respective power profiles of one or more other workloads, A step of identifying a compatible workload based on a comparison between the power profile of the first workload and the respective power profiles of one or more other workloads, and According to the step of scheduling the compatible workload to be executed concurrently with the first workload. Further configured to schedule the first workload for execution, Computer Management System.
- As a method, the above method is: A step of comparing power performance tables for each of a plurality of workloads - each power performance table represents the respective values of one or more parameters of a computer system for executing each load, and each value of the one or more parameters is selected based on monitoring one or more features of the computer system while iteratively adjusting the one or more parameters, and the one or more features include the power consumption of the computer system - ; A step of identifying at least two compatible workloads based on a comparison of each of the above power performance tables; and A method comprising the step of scheduling at least two compatible workloads to be executed simultaneously by the computer system. method.
- In claim 17, the one or more parameters include at least one of a central processing unit (CPU) frequency, a graphics processing unit (GPU) frequency, the number of active cores in a multi-core processor, memory bandwidth, network bandwidth, and device state. method.
- In claim 17, the one or more monitored features include one or more of central processing unit (CPU) power consumption, graphics processing unit (GPU) power consumption, fan power consumption, memory power consumption, disk power consumption, memory bandwidth, memory latency, disk input/output bandwidth, and network bandwidth. method.
- A computer program stored on a non-transient computer-readable storage medium, wherein, when executed by a processor, the computer program causes the processor, Repeatedly adjusting at least one parameter of the computer system while the workload is running; While adjusting at least one parameter, one or more features of the computer system are monitored - the one or more features include the total power consumption of the computer system - ; Based on an analysis of the total power consumption of the computer system, a power profile for the workload representing each selected value for at least one parameter is generated; To enable the execution of the workload based on the power profile generated above, Computer program.
Description
Power performance-based system management [0001] Many modern computer systems focus on balancing performance improvements with the total cost of ownership (TCO). This is especially true in large data centers (e.g., hyperscale data centers). TCO includes the total cost of acquisition (TCA), maintenance costs, and electricity costs resulting from power consumption. While TCA and maintenance costs are generally fixed investments, electricity costs are variable based on the workloads and configurations of the computer system. [0028] It should be understood that the drawings are merely for illustrating exemplary embodiments and are not intended to limit the scope of the invention, and exemplary embodiments will be described through further specification and detailed description and the use of the accompanying drawings. [0029] Figure 1 is a block diagram of an embodiment of a computer management system. [0030] Figure 2 is a flow diagram of an example of a method for managing a computer system. [0031] Fig. 3 is a block diagram of another embodiment of a computer management system. [0032] Fig. 4 is a block diagram of another embodiment of a computer management system. [0033] Fig. 5 is a block diagram of another embodiment of a computer management system. [0034] FIG. 6 illustrates an example of a cloud computing environment. [0035] Fig. 7 illustrates an example of an abstraction model layer. [0036] In accordance with general practice, the various features described are not depicted in their actual size, but are illustrated to highlight specific features relevant to exemplary embodiments. [0037] In the following detailed description, reference is made to the drawings, which form part of the invention and are shown as a method of describing specific exemplary embodiments. However, it will be understood that other embodiments may be used and logical, mechanical, and electrical variations may be made. Furthermore, the features of the drawings and the methods presented in the specification are not to be interpreted as limiting the order in which individual steps may be performed. Accordingly, the following detailed description should not be taken in a limiting sense. [0038] As discussed above, some systems, particularly in large data centers (e.g., hyperscale data centers), focus on balancing increased performance with the total cost of ownership (TCO). TCO includes the total cost of purchase (TCA), maintenance costs, and electricity costs resulting from power consumption. As with maintenance costs, TCA is generally a fixed investment. The embodiments described herein are configured to improve or optimize the performance per watt of a computer system to help reduce TCO. [0039] Some modern central processing units (CPUs) can adjust their frequency for different workloads to utilize the CPU's power budget. For example, if the workload is very heavy, the frequency may not reach high numbers. However, if the workload is light (e.g., only one active call and a small portion of the logic is used on the CPU), the CPU frequency can be adjusted to a relatively high frequency. While these techniques can improve power savings in some situations, they can also suffer from various limitations. For example, if a given workload has a performance bottleneck on non-CPU devices such as disks, networks, memory, or graphics processing units (GPUs), the computer system will achieve higher performance even with a higher CPU frequency and correspondingly higher CPU power usage. Furthermore, if the workload conflicts with the CPU's internal computational resources between multiple processes or threads, more power will be consumed with minimal performance improvement, even if the CPU frequency increases. Moreover, the increase in frequency and the corresponding rise in temperature will frequently trigger increased demands on CPU cooling devices (e.g., CPU fans) due to thermal requirements, which can lead to increased power consumption and consequently cause a degradation in the power-performance rate. [0040] The embodiments described herein help address the limitations discussed and other limitations. In particular, the embodiments described in more detail below provide a more comprehensive, dynamic, self-learning, and power performance-based computer system management method—which can take into account multiple factors such as workload variation, workload scheduling, total system power consumption, environmental changes, CPU frequency and voltage, etc.—which can provide a more efficient management scheme that can improve performance per power usage and/or performance per TCO. [0041] As used herein, the phrases “at least one,” “one or more,” and “and/or” are open-end expressions that are both conjunctive and disjunctive when operated. For example, the expressions “at least one of A, B, and C,” “at least one of A, B, or C,” “one or more of A, B, and C,” “one or more of A, B, or C,” and “A, B, and/or C” each mean A alone, B alone, C alone, A and B