US-12619506-B2 - Datacenter hardware reallocation through workload distribution optimization
Abstract
An embodiment includes detecting by a Hardware Management Component of a datacenter, a failure of a hardware in the datacenter. The embodiment includes responsive to the detecting, determining by an Asset Manager Component, a lead time for a replacement of the failed hardware. The embodiment includes determining by the Hardware Management Component a replacement decision based on a priority of a workload deployed on the failed hardware and the lead time. The embodiment includes deciding by the Hardware Management Component whether to replace the failed hardware with a substitute hardware from a lower priority system of the datacenter based on the replacement decision. The embodiment also includes adjusting by a Workload Controller Component the workload deployed on the failed hardware based on the replacement decision.
Inventors
- John S. Werner
- Arkadiy O. Tsfasman
- Zachary A. Silverstein
- Logan Bailey
Assignees
- INTERNATIONAL BUSINESS MACHINES CORPORATION
Dates
- Publication Date
- 20260505
- Application Date
- 20240606
Claims (20)
- 1 . A computer-implemented method enhancing a distributed system of a datacenter comprising: detecting, by a Hardware Management Component of the datacenter, a failure of a failed hardware in the datacenter; responsive to the detecting, determining by an Asset Manager Component, the Asset Manager Component extracting hardware data for the failed hardware by executing an application programming interface call on the failed hardware, a lead time for a replacement of the failed hardware; determining by injecting the hardware data into the Hardware Management Component to output a replacement decision based on a priority of a workload deployed on the failed hardware and the lead time; deciding by the Hardware Management Component whether to replace the failed hardware with a substitute hardware from a lower priority system of the datacenter based on the replacement decision; and adjusting by a Workload Controller Component the workload deployed on the failed hardware based on the replacement decision, the distributed system is enhanced wherein the workload of the failed hardware of the distributed system is dynamically reallocated from the failed hardware to the substitute hardware.
- 2 . The computer-implemented method of claim 1 , wherein the adjusting further comprises transferring by the Workload Controller Component the workload from the failed hardware.
- 3 . The computer-implemented method of claim 2 , wherein the replacement decision further comprises comparing the lead time with a threshold based on the priority of the workload deployed on the failed hardware.
- 4 . The computer-implemented method of claim 1 , wherein a Deployment Controller controls a robot to install the substitute hardware.
- 5 . The computer-implemented method of claim 1 , wherein the replacement decision is based on comparing the priority of the failed hardware against a priority metric of the datacenter.
- 6 . The computer-implemented method of claim 1 , further comprising replacing the substitute hardware with a replacement hardware.
- 7 . The computer-implemented method of claim 1 , further comprising transferring by the Workload Controller Component a workload from the substitute hardware from the lower priority system of the datacenter.
- 8 . A computer program product comprising one or more computer readable storage media, and program instructions collectively stored on the one or more computer readable storage media, the program instructions executable by a processor to cause the processor to perform operations enhancing a distributed system of a datacenter comprising: detecting, by a Hardware Management Component of the datacenter, a failure of a failed hardware in the datacenter; responsive to the detecting, determining by an Asset Manager Component, the Asset Manager Component extracting hardware data for the failed hardware by executing an application programming interface call on the failed hardware, a lead time for a replacement of the failed hardware; determining by injecting the hardware data into the Hardware Management Component to output a replacement decision based on a priority of a workload deployed on the failed hardware and the lead time; deciding by the Hardware Management Component whether to replace the failed hardware with a substitute hardware from a lower priority system of the datacenter based on the replacement decision; and adjusting by a Workload Controller Component the workload deployed on the failed hardware based on the replacement decision, the distributed system is enhanced wherein the workload of the failed hardware of the distributed system is dynamically reallocated from the failed hardware to the substitute hardware.
- 9 . The computer program product of claim 8 , wherein the adjusting further comprises transferring by the Workload Controller Component the workload from the failed hardware.
- 10 . The computer program product of claim 8 , wherein the replacement decision further comprises comparing the lead time with a threshold based on the priority of the workload deployed on the failed hardware.
- 11 . The computer program product of claim 8 , wherein a Deployment Controller controls a robot to install the substitute hardware.
- 12 . The computer program product of claim 8 , wherein the replacement decision is based on comparing the priority of the failed hardware against a priority metric of the datacenter.
- 13 . The computer program product of claim 8 , further comprising replacing the substitute hardware with a replacement hardware.
- 14 . The computer program product of claim 8 , further comprising transferring by the Workload Controller Component a workload from the substitute hardware from the lower priority system of the datacenter.
- 15 . A computer system comprising a processor and one or more computer readable storage media, and program instructions collectively stored on the one or more computer readable storage media, the program instructions executable by the processor to cause the processor to perform operations enhancing a distributed system of a datacenter comprising: detecting, by a Hardware Management Component of the datacenter, a failure of a failed hardware in the datacenter; responsive to the detecting, determining by an Asset Manager Component, the Asset Manager Component extracting hardware data for the failed hardware by executing an application programming interface call on the failed hardware, a lead time for a replacement of the failed hardware; determining by injecting the hardware data into the Hardware Management Component to output a replacement decision based on a priority of a workload deployed on the failed hardware and the lead time; deciding by the Hardware Management Component whether to replace the failed hardware with a substitute hardware from a lower priority system of the datacenter based on the replacement decision; and adjusting by a Workload Controller Component the workload deployed on the failed hardware based on the replacement decision, the distributed system is enhanced wherein the workload of the failed hardware of the distributed system is dynamically reallocated from the failed hardware to the substitute hardware.
- 16 . The computer system of claim 15 , wherein the adjusting further comprises transferring by the Workload Controller Component the workload from the failed hardware.
- 17 . The computer system of claim 15 , wherein the replacement decision further comprises comparing the lead time with a threshold based on the priority of the workload deployed on the failed hardware.
- 18 . The computer system of claim 15 , wherein the replacement decision is based on comparing the priority of the failed hardware against a priority metric of the datacenter.
- 19 . The computer system of claim 15 , further comprising replacing the substitute hardware with a replacement hardware.
- 20 . The computer system of claim 15 , further comprising transferring by the Workload Controller Component a workload from the substitute hardware from the lower priority system of the datacenter.
Description
BACKGROUND The present invention relates generally to datacenters. More particularly, the present invention relates to a method, system, and computer program for Datacenter Hardware Reallocation Through Workload Distribution Optimization. Datacenters play a key role in today's technology infrastructure, ensuring secure storage and the necessary connectivity to supply a vast range of services, including website hosting, business data storage, financial transaction processing and much more. Datacenters largely comprise of servers, cooling and ventilation, backup power, software virtualization and management, network connection, and monitoring and maintenance systems. The average datacenter has about 100,000 servers, over 10,000 switches, and thousands of related hardware. Achieving overall operational reliability and efficiency is difficult when system parts fail. Sudden failure can affect resource utilization across a distributed compute network of a datacenter, and may cause severe disruption to critical services. The ability to dynamically reallocate components generates optimization capabilities in light of supply chain constraints that may reduce system optimization and reliability. In the face of external constraints, the ability to dynamically share hardware resources and workloads across the distributed system in a datacenter allows organizations to leverage internal resources to “bridge the gap” and run at a “relative optimal level” when constrained by external suppliers. SUMMARY The illustrative embodiments provide for Datacenter Hardware Reallocation Through Workload Distribution Optimization. An embodiment includes detecting by a Hardware Management Component of a datacenter, a failure of a hardware in the datacenter. The embodiment includes responsive to the detecting, determining by an Asset Manager Component, a lead time for a replacement of the failed hardware. The embodiment includes determining by the Hardware Management Component a replacement decision based on a priority of a workload deployed on the failed hardware and the lead time. The embodiment includes deciding by the Hardware Management Component whether to replace the failed hardware with a substitute hardware from a lower priority system of the datacenter based on the replacement decision. The embodiment also includes adjusting by a Workload Controller Component the workload deployed on the failed hardware based on the replacement decision. An embodiment includes a computer usable program product. The computer usable program product includes a computer-readable storage medium, and program instructions stored on the storage medium. An embodiment includes a computer system. The computer system includes a processor, a computer-readable memory, and a computer-readable storage medium, and program instructions stored on the storage medium for execution by the processor via the memory. BRIEF DESCRIPTION OF THE DRAWINGS The novel features believed characteristic of the invention are set forth in the appended claims. The invention itself, however, as well as a preferred mode of use, further objectives, and advantages thereof, will best be understood by reference to the following detailed description of the illustrative embodiments when read in conjunction with the accompanying drawings, wherein: FIG. 1 depicts a block diagram of a computing environment in accordance with an illustrative embodiment; FIG. 2 depicts a diagram in an environment in accordance with an illustrative embodiment; FIG. 3 depicts a flowchart diagram in an environment in accordance with an illustrative embodiment; and FIG. 4 depicts a system diagram in accordance with an illustrative embodiment. DETAILED DESCRIPTION Datacenters play a key role in today's technology infrastructure, ensuring secure storage and the necessary connectivity to supply a vast range of services, including website hosting, business data storage, financial transaction processing and much more. Datacenters largely comprise of servers and hardware, cooling and ventilation, backup power, software virtualization and management, network connection, and monitoring and maintenance systems. The average datacenter has about 100,000 servers, over 10,000 switches, and thousands of related hardware. Achieving overall operational reliability and efficiency is difficult when system parts fail. Sudden failure can affect resource utilization across a distributed compute network of a datacenter, and may cause severe disruption to critical services. The ability to dynamically reallocate components generates optimization capabilities in light of supply chain constraints that may reduce system optimization and reliability. In the face of external constraints, the ability to dynamically share hardware resources and workloads across the distributed system in a datacenter allows organizations to leverage internal resources to “bridge the gap” and run at a “relative optimal level” when constrained by external suppliers.