WO-2026096113-A1 - DYNAMIC GPU ROUTING SYSTEM FOR OPTIMIZING AI WORKLOADS

WO2026096113A1WO 2026096113 A1WO2026096113 A1WO 2026096113A1WO-2026096113-A1

Abstract

The present disclosure provides a system and method for optimizing artificial intelligence (Al) compute resources. The system includes a data aggregation module collecting specifications from GPU manufacturers and compute providers. A profiling module generates GPU profiles and use-case profiles based on collected specifications. An arbitration module matches Al workloads to optimal GPU resources using generated profiles. A routing module dynamically routes Al workloads to selected GPU resources across multiple compute providers. The system includes a machine learning model that continuously improves matching and routing based on telemetry data from executed workloads. The method enables efficient allocation of Al compute resources by automatically profiling GPUs and use-cases, matching workloads to ideal GPU configurations, and dynamically routing compute jobs across providers.

Inventors

IGNATIUS, James, Douglas

Assignees

Xydo LLC

Dates

Publication Date: 20260507
Application Date: 20250922
Priority Date: 20241031

Claims (20)

1. A system for optimizing artificial intelligence (Al) compute resources, comprising: a profiling module generating GPU profiles and use-case profiles based on collected specifications; an arbitration module matching Al workloads to GPU resources using the generated profiles; and a routing module dynamically routing Al workloads to selected GPU resources across multiple compute providers.
2. The system of claim 1, further comprising a data aggregation module collecting specifications from GPU manufacturers and compute providers.
3. The system of claim 1, wherein the profiling module generates the GPU profiles based on hardware specifications and performance metrics of GPUs.
4. The system of claim 1, wherein the profiling module generates the use-case profiles based on computational requirements and performance characteristics of Al workloads.
5. The system of claim 1, wherein the arbitration module uses a machine learning model to match Al workloads to GPU resources.
6. The system of claim 5, wherein the machine learning model is trained using historical data of Al workload performance on different GPU configurations.
7. The system of claim 1, further comprising a telemetry module collecting performance data from executed Al workloads and providing feedback to improve future matching and routing decisions.
8. A method for optimizing artificial intelligence (Al) compute resources, comprising: generating GPU profiles and use-case profiles based on collected specifications; 44 FH 13106635.1 XYZ-00125 matching Al workloads to GPU resources using the generated profiles; and dynamically routing Al workloads to selected GPU resources across multiple compute providers.
9. The method of claim 8, wherein generating the GPU profiles comprises analyzing hardware specifications and performance metrics of GPUs.
10. The method of claim 8, wherein generating the use-case profiles comprises analyzing computational requirements and performance characteristics of Al workloads.
11. The method of claim 8, wherein matching Al workloads to GPU resources comprises using a machine learning model trained on historical data of Al workload performance on different GPU configurations.
12. The method of claim 11, further comprising continuously updating the machine learning model based on telemetry data collected from executed Al workloads.
13. The method of claim 8, further comprising collecting performance data from executed Al workloads and providing feedback to improve future matching and routing decisions.
14. The method of claim 13, wherein the feedback is used to adjust weightings in the machine learning model used for matching Al workloads to GPU resources.
15. A non-transitory computer-readable medium storing instructions that, when executed by a processor, cause the processor to perform operations for optimizing artificial intelligence (Al) compute resources, the operations comprising: generating GPU profiles and use-case profiles based on collected specifications; matching Al workloads to GPU resources using the generated profiles; and dynamically routing Al workloads to selected GPU resources across multiple compute providers.
16. The non-transitory computer-readable medium of claim 15, wherein generating the GPU profiles comprises analyzing hardware specifications and performance metrics of GPUs. 45 FH 13106635.1 XYZ-00125
17. The non-transitory computer-readable medium of claim 15, wherein generating the use-case profiles comprises analyzing computational requirements and performance characteristics of Al workloads.
18. The non-transitory computer-readable medium of claim 15, wherein matching Al workloads to GPU resources comprises using a machine learning model trained on historical data of Al workload performance on different GPU configurations.
19. The non-transitory computer-readable medium of claim 18, wherein the operations further comprise continuously updating the machine learning model based on telemetry data collected from executed Al workloads.
20. The non-transitory computer-readable medium of claim 19, wherein the operations further comprise adjusting weightings in the machine learning model used for matching Al workloads to GPU resources based on the telemetry data. 46 FH 13106635.1

Description

XYZ-00125 DYNAMIC GPU ROUTING SYSTEM FOR OPTIMIZING Al WORKLOADS CROSS-REFERENCE TO RELATED APPLICATIONS [0001] This application claims priority to U.S. Provisional Application No. 63/714666, filed October 31, 2024, and is hereby incorporated by reference in its entirety. BACKGROUND [0002] Artificial intelligence (Al) and machine learning (ML) workloads have become increasingly prevalent across various industries, driving demand for high-performance computing resources. Graphics Processing Units (GPUs) have emerged as a popular hardware choice for accelerating AI/ML tasks due to their parallel processing capabilities. However, the diverse nature of AI/ML workloads, coupled with the wide array of available GPU models and cloud computing options, presents challenges in optimizing resource allocation and utilization. [0003] As the complexity and scale of AI/ML applications continue to grow, efficient management of GPU resources has become crucial for organizations seeking to balance performance and cost-effectiveness. Traditional approaches to GPU allocation often rely on static assignments or manual selection processes, which may lead to suboptimal resource utilization and increased operational costs. This highlights the importance of developing more sophisticated systems for dynamically matching AI/ML workloads with appropriate GPU resources across different providers and hardware configurations. SUMMARY [0004] This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter. [0005] In some aspects, the system and method for optimizing artificial intelligence (Al) compute resources may offer several advantages over traditional approaches. The dynamic GPU 1 FH 13106635.1 XYZ-00125 routing system may significantly improve resource utilization by intelligently matching Al workloads to the most suitable GPU resources across multiple providers. This may lead to reduced operational costs and improved performance of Al applications. The system's ability to continuously learn and adapt through telemetry data and machine learning models may enable it to optimize resource allocation over time, potentially resulting in increasingly efficient operations. Additionally, the provider-agnostic approach may allow organizations to leverage the best available resources across different platforms, avoiding vendor lock-in and maximizing flexibility. The automated profiling and matching processes may also reduce the manual effort required in resource allocation, potentially saving time and reducing human error. Furthermore, the system's ability to handle diverse Al workloads and GPU configurations may make it adaptable to a wide range of use cases and industries, from small-scale research projects to large- scale enterprise applications. [0006] According to an aspect of the present disclosure, a system for optimizing artificial intelligence (Al) compute resources is provided. The system includes a profiling module generating GPU profiles and use-case profiles based on collected specifications. The system also includes an arbitration module matching Al workloads to GPU resources using the generated profiles. Additionally, the system includes a routing module dynamically routing Al workloads to selected GPU resources across multiple compute providers. [0007] According to other aspects of the present disclosure, the system may include one or more of the following features. The system may further comprise a data aggregation module collecting specifications from GPU manufacturers and compute providers. The profiling module may generate the GPU profiles based on hardware specifications and performance metrics of GPUs. The profiling module may generate the use-case profiles based on computational requirements and performance characteristics of Al workloads. The arbitration module may use a machine learning model to match Al workloads to GPU resources. The machine learning model may be trained using historical data of Al workload performance on different GPU configurations. The system may further comprise a telemetry module collecting performance data from executed Al workloads and providing feedback to improve future matching and routing decisions. 2 FH 13106635.1 XYZ-00125 [0008] According to another aspect of the present disclosure, a method for optimizing artificial intelligence (Al) compute resources is provided. The method includes generating GPU profiles and use-case profiles based on collected specifications. The method also includes matching Al workloads to GPU resources using the generated profiles. Additionally, the method includes dynamically routing Al workloads to selected GPU resources across multiple compute providers. [0009] According to