US-12619465-B2 - Maintaining workload data coherency between local and remote accelerators
Abstract
Technologies for composing a managed node with multiple processors on multiple compute sleds to cooperatively execute a workload include a memory, one or more processors connected to the memory, and an accelerator. The accelerator further includes a coherence logic unit that is configured to receive a node configuration request to execute a workload. The node configuration request identifies the compute sled and a second compute sled to be included in a managed node. The coherence logic unit is further configured to modify a portion of local working data associated with the workload on the compute sled in the memory with the one or more processors of the compute sled, determine coherence data indicative of the modification made by the one or more processors of the compute sled to the local working data in the memory, and send the coherence data to the second compute sled of the managed node.
Inventors
- Mohan J. Kumar
- Murugasamy K. Nachimuthu
- Krishna Bhuyan
Assignees
- INTEL CORPORATION
Dates
- Publication Date
- 20260505
- Application Date
- 20240327
- Priority Date
- 20170830
Claims (18)
- 1 . A cloud computing system for use in providing at least one service associated with executing at least one workload, execution of the at least one workload being configurable to be associated with machine-learning operations, the cloud computing system comprising: compute resources comprising at least one central processing unit and memory circuitry; accelerator resources comprising graphics accelerator circuitry and/or field programmable gate array circuitry; and storage resources for use in association with the compute resources and/or the accelerator resources; wherein: the cloud computing system is to dynamically allocate and/or reallocate, at least in part, based upon resource utilization prediction data, the compute resources, the accelerator resources, and/or the storage resources for use in the execution of the at least one workload; the accelerator resources are configurable for use in implementing the machine-learning operations; the execution of the at least one workload is to result in modified workload-related data; the compute resources are configurable to comprise local and remote compute resources; the accelerator resources are configurable to comprise local and remote accelerator resources; and the cloud computing system is configurable, based upon workload configuration request data, to coordinate providing of the modified workload-related data between the local and the remote compute resources and/or the local and the remote accelerator resources to maintain data coherency with regard to the workload-related data.
- 2 . The cloud computing system of claim 1 , wherein: the at least one workload comprises at least one virtual machine-based workload and/or at least one container-based workload.
- 3 . The cloud computing system of claim 1 , wherein: the cloud computing system is configurable to allocate network data communication bandwidth and/or quality of service associated with the execution of at least one workload.
- 4 . The cloud computing system of claim 1 , wherein: the cloud computing system is configurable to dynamically allocate and/or reallocate, based upon resource utilization data, the compute resources, the accelerator resources, and/or the storage resources for use in the execution of the at least one workload.
- 5 . The cloud computing system of claim 1 , wherein: the cloud computing system is comprised in multiple data centers.
- 6 . The cloud computing system of claim 5 , wherein: the multiple data centers comprise multiple rack servers.
- 7 . One or more non-transitory machine-readable media storing instructions for being executed by one or more machines, the one or more machines to be associated with a cloud computing system, the cloud computing system being for use in providing at least one service associated with at least one workload, execution of the at least one workload being configurable to be associated with machine-learning operations, the instructions, when executed by the one or more machines, resulting in the cloud computing system being configurable to perform certain operations comprising: configuring compute resources of the cloud computing system to comprise at least one central processing unit and memory circuitry; configuring accelerator resources of the cloud computing system to comprise graphics accelerator circuitry and/or field programmable gate array circuitry; and configuring storage resources of the cloud computing system for use in association with the compute resources and/or the accelerator resources; wherein: the cloud computing system is to dynamically allocate and/or reallocate, at least in part, based upon resource utilization prediction data, the compute resources, the accelerator resources, and/or the storage resources for use in the execution of the at least one workload; the accelerator resources are configurable for use in implementing the machine-learning operations; the execution of the at least one workload is to result in modified workload-related data; and the compute resources are configurable to comprise local and remote compute resources; the accelerator resources are configurable to comprise local and remote accelerator resources; and the cloud computing system is configurable, based upon workload configuration request data, to coordinate providing of provide the modified workload-related data between the local and the remote compute resources and/or the local and the remote accelerator resources to maintain data coherency with regard to the workload-related data.
- 8 . The one or more non-transitory machine-readable media of claim 7 , wherein: the at least one workload comprises at least one virtual machine-based workload and/or at least one container-based workload.
- 9 . The one or more non-transitory machine-readable media of claim 7 , wherein: the cloud computing system is configurable to allocate network data communication bandwidth and/or quality of service associated with the execution of at least one workload.
- 10 . The one or more non-transitory machine-readable media of claim 7 , wherein: the cloud computing system is configurable to dynamically allocate and/or reallocate, based upon resource utilization data, the compute resources, the accelerator resources, and/or the storage resources for use in the execution of the at least one workload.
- 11 . The one or more non-transitory machine-readable media of claim 7 , wherein: the cloud computing system is comprised in multiple data centers.
- 12 . The one or more non-transitory machine-readable media of claim 11 , wherein: the multiple data centers comprise multiple rack servers.
- 13 . A method implemented in association with a cloud computing system, the cloud computing system being for use in providing at least one service associated with at least one workload, execution of the at least one workload being configurable to be associated with machine-learning operations, the method comprising: configuring compute resources of the cloud computing system to comprise at least one central processing unit and memory circuitry; configuring accelerator resources of the cloud computing system to comprise graphics accelerator circuitry and/or field programmable gate array circuitry; and configuring storage resources of the cloud computing system for use in association with the compute resources and/or the accelerator resources; wherein: the cloud computing system is to dynamically allocate and/or reallocate, at least in part, based upon resource utilization prediction data, the compute resources, the accelerator resources, and/or the storage resources for use in the execution of the at least one workload; the accelerator resources are configurable for use in implementing the machine-learning operations; the execution of the at least one workload is to result in modified workload-related data; the compute resources are configurable to comprise local and remote compute resources; the accelerator resources are configurable to comprise local and remote accelerator resources; and the cloud computing system is configurable, based upon workload configuration request data, to coordinate providing of the modified workload-related data between the local and the remote compute resources and/or the local and the remote accelerator resources to maintain data coherency with regard to the workload-related data.
- 14 . The method of claim 13 , wherein: the at least one workload comprises at least one virtual machine-based workload and/or at least one container-based workload.
- 15 . The method of claim 13 , wherein: the cloud computing system is configurable to allocate network data communication bandwidth and/or quality of service associated with the execution of at least one workload.
- 16 . The method of claim 13 , wherein: the cloud computing system is configurable to dynamically allocate and/or reallocate, based upon resource utilization data, the compute resources, the accelerator resources, and/or the storage resources for use in the execution of the at least one workload.
- 17 . The method of claim 13 , wherein: the cloud computing system is comprised in multiple data centers.
- 18 . The method of claim 17 , wherein: the multiple data centers comprise multiple rack servers.
Description
CLAIM OF PRIORITY This patent application is a continuation of prior U.S. patent application Ser. No. 18/103,739, filed Jan. 31, 2023 and titled “CLOUD-BASED SCALE-UP SYSTEM COMPOSITION,” which is a continuation of prior U.S. patent application Ser. No. 17/246,388, filed Apr. 30, 2021 and titled “CLOUD-BASED SCALE-UP SYSTEM COMPOSITION,” now U.S. Pat. No. 11,630,702, issued on Apr. 18, 2023, which is a continuation of prior U.S. patent application Ser. No. 16/344,582, filed Apr. 24, 2019 and titled “CLOUD-BASED SCALE-UP SYSTEM COMPOSITION,” now U.S. Pat. No. 11,016,832, issued on May 25, 2021, which is a national stage entry under 35 USC § 371(b) of prior International Application No. PCT/US2017/063756, filed Nov. 29, 2017 and titled “CLOUD-BASED SCALE-UP SYSTEM COMPOSITION,” which claims the benefit of prior U.S. Provisional Patent Application No. 62/427,268, filed Nov. 29, 2016, prior Indian Provisional Patent Application Ser. No. 201741030632, filed Aug. 30, 2017, and prior U.S. Provisional Patent Application No. 62/584,401, filed Nov. 10, 2017. Each of the aforesaid prior patent applications is hereby incorporated herein by reference in its entirety. BACKGROUND In a system that utilizes multiple processors to cooperatively execute a workload, the multiple processors are typically physically located on the same compute device. In such systems, the processors communicate with one another through a shared memory and/or a local bus to cooperatively execute the workload. However, a given workload may utilize only a portion of the available processors on the compute device. As a result, the other processors may be underutilized, leading to wastage of resources. Conversely, the workload may benefit from being executed on a greater number of processors than the set of processors available on a single compute device, such as when multiple tasks within the workload are amenable to concurrent execution. In a data center, such as a cloud data center in which customers agree to pay a predefined amount of money in return for a set of target quality of service metrics (e.g., a target latency, a target throughput, etc.), incorrectly matching the available resources of compute devices to the workloads may result in lost money and/or time, either in the form of purchasing and allocating too many resources (e.g., processors) to a workload or providing too few resources (e.g., processors) to a workload that could be executed more efficiently (e.g., at a higher quality of service) with more processors on the same compute device. BRIEF DESCRIPTION OF THE DRAWINGS The concepts described herein are illustrated by way of example and not by way of limitation in the accompanying figures. For simplicity and clarity of illustration, elements illustrated in the figures are not necessarily drawn to scale. Where considered appropriate, reference labels have been repeated among the figures to indicate corresponding or analogous elements. FIG. 1 is a diagram of a conceptual overview of a data center in which one or more techniques described herein may be implemented according to various embodiments; FIG. 2 is a diagram of an example embodiment of a logical configuration of a rack of the data center of FIG. 1; FIG. 3 is a diagram of an example embodiment of another data center in which one or more techniques described herein may be implemented according to various embodiments; FIG. 4 is a diagram of another example embodiment of a data center in which one or more techniques described herein may be implemented according to various embodiments; FIG. 5 is a diagram of a connectivity scheme representative of link-layer connectivity that may be established among various sleds of the data centers of FIGS. 1, 3, and 4; FIG. 6 is a diagram of a rack architecture that may be representative of an architecture of any particular one of the racks depicted in FIGS. 1-4 according to some embodiments; FIG. 7 is a diagram of an example embodiment of a sled that may be used with the rack architecture of FIG. 6; FIG. 8 is a diagram of an example embodiment of a rack architecture to provide support for sleds featuring expansion capabilities; FIG. 9 is a diagram of an example embodiment of a rack implemented according to the rack architecture of FIG. 8; FIG. 10 is a diagram of an example embodiment of a sled designed for use in conjunction with the rack of FIG. 9; FIG. 11 is a diagram of an example embodiment of a data center in which one or more techniques described herein may be implemented according to various embodiments; FIG. 12 is a simplified block diagram of at least one embodiment of a system for composing a managed node with multiple processors on multiple compute sleds to cooperatively execute a workload; FIG. 13 is a simplified block diagram of a orchestrator server of FIG. 12; FIG. 14 is a simplified block diagram of at least one embodiment of an environment that may be established by a compute sled of FIGS. 12 and 13; and FIGS. 15-16 are a simp