Search

WO-2026093076-A1 - ADAPTIVE RESOURCE SCHEDULING OPTIMIZATION

WO2026093076A1WO 2026093076 A1WO2026093076 A1WO 2026093076A1WO-2026093076-A1

Abstract

An embodiment generates a performance profile for each configuration of a set of possible configurations, each configuration representing a pairing between an application and a portion of computing resource units for deployment of the application. The embodiment computes a performance approximation formula based on one or more performance profiles generated. The embodiment computes a performance metric for each configuration of the set of possible configurations based on the performance approximation formula. The embodiment constructs an optimization problem representing each configuration of the set of possible configurations. The embodiment solves the optimization problem according to a defined optimization goal and the performance metric of each configuration. The embodiment produces an optimal number of computing resource units to deploy each application and deploys each application over a portion of the set of possible configurations corresponding to the optimal number of computing resource units produced by the optimization problem solution.

Inventors

  • YOUSSEF, ALAA
  • TANTAWI, Asser, Nasreldin
  • MIMURA GONZALEZ, NELSON
  • UHLIG, VOLKMAR

Assignees

  • INTERNATIONAL BUSINESS MACHINES CORPORATION
  • IBM UNITED KINGDOM LIMITED

Dates

Publication Date
20260507
Application Date
20251021
Priority Date
20241101

Claims (20)

  1. 1 . A computer-implemented method comprising: generating a performance profile for at least one configuration of a set of possible configurations, the at least one configuration representing a pairing between an application and a portion of computing resource units for deployment of the application; computing a performance approximation formula based on one or more performance profiles generated; computing a performance metric for the at least one configuration of the set of possible configurations based on the performance approximation formula; constructing an optimization problem representing the at least one configuration of the set of possible configurations; solving the optimization problem according to a defined optimization goal and the performance metric of the at least one configuration; producing, based on results of solving the optimization problem, an optimal number of computing resource units to deploy an application of a set of applications; and deploying the application over a portion of the set of possible configurations corresponding to the optimal number of computing resource units produced for that application.
  2. 2. The computer-implemented of claim 1, wherein the defined optimization goal comprises minimizing deployment cost of the application.
  3. 3. The computer-implemented of claim 1, wherein the defined optimization goal comprises minimizing power consumption corresponding to deployment of the application.
  4. 4. The computer-implemented method of claim 1 , wherein the generating the performance profile for the at least one configuration of the set of possible configurations is based in part on an estimated workload demand corresponding to the application.
  5. 5. The computer-implemented method of claim 4, wherein the estimated workload demand is based in part on a predicted workload demand change.
  6. 6. The computer-implemented method of claim 5, wherein the predicted workload demand change is obtained by iteratively analyzing a real-time workload demand corresponding to the application.
  7. 7. The computer-implemented method of claim 1 , wherein producing the optimal number of computing resource units to deploy the application is based on meeting a service-level objective.
  8. 8. The computer-implemented method of claim 7, wherein the service-level objective comprises a threshold latency, and meeting the service-level objective comprises a determination that a latency meets the threshold latency.
  9. 9. The computer-implemented method of claim 7, wherein the performance approximation formula is based on applying an estimated workload demand of at least one service class of a set of service classes of clients to a queuing model that represents the application on a configuration of the computing resource units to estimate a performance metric defined as the service level objective.
  10. 10. A computer program product comprising one or more computer readable storage media, and program instructions collectively stored on the one or more computer readable storage media, the program instructions executable by a processor to cause the processor to perform operations comprising: generating a performance profile for at least one configuration of a set of possible configurations, the at least one configuration representing a pairing between an application and a portion of computing resource units for deployment of the application; computing a performance approximation formula based on one or more performance profiles generated; computing a performance metric for the at least one configuration of the set of possible configurations based on the performance approximation formula; constructing an optimization problem representing the at least one configuration of the set of possible configurations; solving the optimization problem according to a defined optimization goal and the performance metric of the at least one configuration; producing, based on results of solving the optimization problem, an optimal number of computing resource units to deploy the application; and deploying the application over a portion of the set of possible configurations corresponding to the optimal number of computing resource units produced.
  11. 11 . The computer program product of claim 10, wherein the stored program instructions are stored in a computer readable storage device in a data processing system, and wherein the stored program instructions are transferred over a network from a remote data processing system.
  12. 12. The computer program product of claim 10, wherein the stored program instructions are stored in a computer readable storage device in a server data processing system, and wherein the stored program instructions are downloaded in response to a request over a network to a remote data processing system for use in a computer readable storage device associated with the remote data processing system, further comprising: program instructions to meter use of the program instructions associated with the request; and program instructions to generate an invoice based on the metered use.
  13. 13. The computer program product of claim 10, wherein the defined optimization goal comprises minimizing deployment cost of the application.
  14. 14. The computer program product of claim 10, wherein the defined optimization goal comprises minimizing power consumption corresponding to deployment of the application.
  15. 15. The computer program product of claim 10, wherein the generating the performance profile for the at least one configuration of the set of possible configurations is based in part on an estimated workload demand corresponding to the application.
  16. 16. The computer program product of claim 15, wherein the estimated workload demand is based in part on a predicted workload demand change.
  17. 17. A computer system comprising a processor and one or more computer readable storage media, and program instructions collectively stored on the one or more computer readable storage media, the program instructions executable by the processor to cause the processor to perform operations comprising: generating a performance profile for at least one configuration of a set of possible configurations, the at least one configuration representing a pairing between an application and a portion of computing resource units for deployment of the application; computing a performance approximation formula based on one or more performance profiles generated; computing a performance metric for the at least one configuration of the set of possible configurations based on the performance approximation formula; constructing an optimization problem representing the at least one configuration of the set of possible configurations; solving the optimization problem according to a defined optimization goal and the performance metric of each configuration; producing, based on results of solving the optimization problem, an optimal number of computing resource units to deploy the application; and deploying the application over a portion of the set of possible configurations corresponding to the optimal number of computing resource units produced.
  18. 18. The computer system of claim 17, wherein the defined optimization goal comprises minimizing deployment cost of the application.
  19. 19. The computer system of claim 17, wherein the defined optimization goal comprises minimizing power consumption corresponding to deployment of the application.
  20. 20. The computer system of claim 17, wherein the generating a performance profile for the at least one configuration of a set of possible configurations is based in part on an estimated workload demand corresponding to the application.

Description

ADAPTIVE RESOURCE SCHEDULING OPTIMIZATION BACKGROUND [0001] The present invention relates generally to computer resource capacity planning and dynamic resource allocation. More particularly, the present invention relates to a method, system, and computer program for adaptive resource scheduling optimization. [0002] In the context of cloud computing, resource allocation involves the distribution of computing resources (e.g., CPU, GPU, memory, storage, bandwidth, etc.) among multiple users or applications. In some instances, resource scheduling algorithms are developed to determine how and when resources are allocated among different tasks or applications. Some of these existing algorithms consider factors such as workload, priority, and resource availability to optimize resource utilization and performance. [0003] Cloud computing systems may be designed to be elastic, allowing resources to be automatically scaled up or down based on demand. Dynamic resource allocation techniques aim to provide adequate resources whenever required while maximizing efficiency of distributing available resources. Load balancing techniques may be utilized to distribute incoming network traffic or workload across multiple servers or resources to ensure optimal resource utilization and prevent overloading of any single resource. Further, continuous monitoring of resource usage and performance metrics may also help in identifying bottlenecks, optimizing resource utilization, and ensuring service level agreements (SLAs) are met. [0004] Artificial intelligence (Al) technology has evolved significantly over the past few years. Modern Al systems are achieving human level performance on cognitive tasks like converting speech to text, recognizing objects and images, or translating between different languages. This evolution holds promise for new and improved applications in many industries. Accordingly, Al systems may be designed for various tasks that traditional computer systems were previously incapable. [0005] An Artificial Neural Network (ANN) - also referred to simply as a neural network - is a computing system made up of a number of simple, highly interconnected processing elements (nodes), which process information by their dynamic state response to external inputs. ANNs are processing devices (algorithms and/or hardware) that are loosely modeled after the neuronal structure of the mammalian cerebral cortex. An ANN today might have upwards of billions of interconnected "neuron” processor units, though may be trained using a far fewer number of dedicated hardware processor units (e.g., GPUs). Further, ANNs can be designed to uncover relationships between previously unknown factors. [0006] Large Language Models (LLMs) necessitate significantly more computer resources compared to traditional computing systems due to their complex architecture and massive scale. LLMs are characterized by deep neural networks with millions or even billions of parameters, requiring substantial computational power for training and inference tasks. Traditional computing systems, on the other hand, typically operate on smaller datasets and simpler models, resulting in lower resource requirements. The sheer size and complexity of LLMs demand high-performance computing resources, including advanced CPUs, GPUs, or specialized hardware like TPUs, to handle the intensive computational workload effectively. Moreover, LLMs consume large amounts of memory for processing vast datasets and model parameters, necessitating efficient memory management techniques to optimize resource allocation. The intricate nature of LLMs, coupled with their extensive data processing and model complexity, is responsible for heightened demand for computer resources beyond what traditional computing systems typically entail. SUMMARY [0007] The illustrative embodiments provide for dynamic computer resource allocation optimization. An embodiment includes generating a performance profile for each configuration of a set of possible configurations, such that each configuration represents a pairing between an application and a portion of computing resource units for deployment of the application. The embodiment also includes computing a performance approximation formula based on one or more performance profiles generated. The embodiment also includes computing a performance metric for each configuration of the set of possible configurations based on the performance approximation formula. The embodiment also includes constructing an optimization problem representing each configuration of the set of possible configurations. The embodiment also includes solving the optimization problem according to a defined optimization goal and the performance metric of each configuration. The embodiment also includes producing an optimal number of computing resource units to deploy the application and deploys the application over a portion of the set of possible configurations corresponding to the optimal