US-12625753-B2 - Hardware management based on failure prediction in a multi-tiered architecture

US12625753B2US 12625753 B2US12625753 B2US 12625753B2US-12625753-B2

Abstract

Techniques are disclosed for hardware management in an information processing system. For example, a method obtains, at an edge platform, one or more failure prediction values for one or more device types of the edge platform, wherein a failure prediction value for a device type represents a likelihood of failure associated with the device type. The method, at the edge platform, computes one or more health indicator values for the one or more device types based on the one or more failure prediction values, computes one or more behavior indicator values for the one or more device types based on the one or more health indicator values. The method causes, in response to the one or more behavior indicator values for the one or more device types, determination of one or more proactive actions to be initiated prior to a failure of one or more devices of the edge platform.

Inventors

Parminder Singh Sethi
Anay Kishore
Praveen Kumar

Assignees

DELL PRODUCTS L.P.

Dates

Publication Date: 20260512
Application Date: 20240627

Claims (20)

1 . A method comprising: obtaining, at an edge platform, one or more failure prediction values for one or more device types of the edge platform, wherein a failure prediction value for a device type represents a likelihood of failure associated with the device type; computing, at the edge platform, one or more health indicator values for the one or more device types based on the one or more failure prediction values; computing, at the edge platform, one or more behavior indicator values for the one or more device types based on the one or more health indicator values; causing, in response to the one or more behavior indicator values for the one or more device types, one or more proactive actions to be initiated prior to a failure of one or more devices of the edge platform, wherein the one or more proactive actions comprise one of updating and upgrading one or more hardware resources of the one or more devices; recomputing, at the edge platform, the one or more behavior indicator values based on a set of monitored parameters of one or more devices of the edge platform; and sending, from the edge platform to a centralized backend device from which the one or more failure prediction values originate, at least a portion of the set of monitored parameters to enable the centralized backend device to update at least a portion of the one or more failure prediction values; wherein the set of monitored parameters are received at the edge platform as part of a monitoring policy from the centralized backend device; and wherein the obtaining, computing, causing, recomputing and sending steps are executed by a processing device operatively coupled to a memory.
2 . The method of claim 1 wherein the one or more health indicator values are computed using one or more probability distribution functions.
3 . The method of claim 2 wherein the one or more probability distribution functions comprise a probability mass function.
4 . The method of claim 3 wherein the one or more probability distribution functions further comprise a cumulative distribution function.
5 . The method of claim 4 wherein a health indicator value for a device type is computed via a logical addition of a computation result of the probability mass function and a computation result of the cumulative distribution function.
6 . The method of claim 2 wherein at least one of the one or more probability distribution functions comprises a Poisson distribution function.
7 . The method of claim 1 wherein the one or more failure prediction values comprise mean coefficient values representative of respective failure predictions for the one or more device types.
8 . The method of claim 1 further comprises initiating a registration, by the edge platform, with a content delivery network device in a network of distributed content delivery network devices that are connected to a centralized backend device from which the one or more failure prediction values originate.
9 . The method of claim 1 wherein the steps of the method are executed by an edge analyzer client installed at the edge platform.
10 . An apparatus comprising: a processing device operatively coupled to a memory and configured: to obtain, at an edge platform, one or more failure prediction values for one or more device types of the edge platform, wherein a failure prediction value for a device type represents a likelihood of failure associated with the device type; to compute, at the edge platform, one or more health indicator values for the one or more device types based on the one or more failure prediction values; to compute, at the edge platform, one or more behavior indicator values for the one or more device types based on the one or more health indicator values; to cause, in response to the one or more behavior indicator values for the one or more device types, one or more proactive actions to be initiated prior to a failure of one or more devices of the edge platform, wherein the one or more proactive actions comprise one of updating and upgrading one or more hardware resources of the one or more devices; to recompute, at the edge platform, the one or more behavior indicator values based on a set of monitored parameters of one or more devices of the edge platform and to send, from the edge platform to a centralized backend device from which the one or more failure prediction values originate, at least a portion of the set of monitored parameters to enable the centralized backend device to update at least a portion of the one or more failure prediction values; wherein the set of monitored parameters are received at the edge platform as part of a monitoring policy from the centralized backend device.
11 . The apparatus of claim 10 wherein the one or more health indicator values are computed using one or more probability distribution functions.
12 . The apparatus of claim 11 wherein the one or more probability distribution functions comprise a probability mass function.
13 . The apparatus of claim 12 wherein the one or more probability distribution functions further comprise a cumulative distribution function.
14 . The apparatus of claim 13 wherein a health indicator value for a device type is computed via a logical addition of a computation result of the probability mass function and a computation result of the cumulative distribution function.
15 . The apparatus of claim 11 wherein at least one of the one or more probability distribution functions comprises a Poisson distribution function.
16 . An article of manufacture comprising a non-transitory processor-readable storage medium having stored therein program code of one or more software programs, wherein the program code when executed by at least one processing device causes said at least one processing device to perform steps of: obtaining, at an edge platform, one or more failure prediction values for one or more device types of the edge platform, wherein a failure prediction value for a device type represents a likelihood of failure associated with the device type; computing, at the edge platform, one or more health indicator values for the one or more device types based on the one or more failure prediction values; computing, at the edge platform, one or more behavior indicator values for the one or more device types based on the one or more health indicator values; causing, in response to the one or more behavior indicator values for the one or more device types, one or more proactive actions to be initiated prior to a failure of one or more devices of the edge platform, wherein the one or more proactive actions comprise one of updating and upgrading one or more hardware resources of the one or more devices; recomputing, at the edge platform, the one or more behavior indicator values based on a set of monitored parameters of one or more devices of the edge platform; and sending, from the edge platform to a centralized backend device from which the one or more failure prediction values originate, at least a portion of the set of monitored parameters to enable the centralized backend device to update at least a portion of the one or more failure prediction values; wherein the set of monitored parameters are received at the edge platform as part of a monitoring policy from the centralized backend device.
17 . The article of manufacture of claim 16 wherein the one or more health indicator values are computed using one or more probability distribution functions.
18 . The article of manufacture of claim 17 wherein the one or more probability distribution functions comprise a probability mass function.
19 . The article of manufacture of claim 18 wherein the one or more probability distribution functions further comprise a cumulative distribution function.
20 . The article of manufacture of claim 19 wherein a health indicator value for a device type is computed via a logical addition of a computation result of the probability mass function and a computation result of the cumulative distribution function.

Description

FIELD The field relates generally to information processing systems, and more particularly to management of hardware resources in such information processing systems. BACKGROUND In modern information processing systems, e.g., customer-based datacenters, there is a need to update and/or upgrade hardware resources. Such updates and upgrades typically involve a significant amount of planning and careful calculations of the existing resources with respect to demands and forecasts to arrive at an informed decision about deploying replacement and/or additional resources. This is a technical challenge particularly with respect to edge platforms. SUMMARY Embodiments provide techniques for hardware management in an information processing system. For example, in one embodiment, a method obtains, at an edge platform, one or more failure prediction values for one or more device types of the edge platform, wherein a failure prediction value for a device type represents a likelihood of failure associated with the device type. The method computes, at the edge platform, one or more health indicator values for the one or more device types based on the one or more failure prediction values. The method computes, at the edge platform, one or more behavior indicator values for the one or more device types based on the one or more health indicator values. The method causes, in response to the one or more behavior indicator values for the one or more device types, determination of one or more proactive actions to be initiated prior to a failure of one or more devices of the edge platform. Further illustrative embodiments are provided in the form of a non-transitory computer-readable storage medium having embodied therein executable program code that when executed by a processor causes the processor to perform the above steps. Still further illustrative embodiments comprise an apparatus with a processor and a memory configured to perform the above steps. These and other features and advantages of embodiments described herein will become more apparent from the accompanying drawings and the following detailed description. BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 depicts an information processing system with multiple edge datacenters and corresponding client modules connected to a backend server through a content delivery network server in an illustrative embodiment. FIG. 2 depicts components of an edge datacenter and of a client module in an illustrative embodiment. FIG. 3 depicts components of a content delivery network server in an illustrative embodiment. FIG. 4 depicts components of a backend server in an illustrative embodiment. FIG. 5 depicts an architecture including multiple edge client modules connected to a backend server through respective content delivery network servers in an illustrative embodiment. FIG. 6 depicts a multi-tiered architecture including hardware management functionalities in an illustrative embodiment. FIG. 7 depicts a probability mass function for use in hardware management in an illustrative embodiment. FIG. 8 depicts a cumulative distribution function for use in hardware management in an illustrative embodiment. FIG. 9 depicts health index calculation results for use in hardware management in an illustrative embodiment. FIG. 10 depicts a process for hardware management in a multi-tiered architecture according to an illustrative embodiment. FIGS. 11 and 12 depict examples of processing platforms that may be utilized to implement at least a portion of an information processing system according to illustrative embodiments. DETAILED DESCRIPTION Illustrative embodiments will be described herein with reference to exemplary information processing systems and associated computers, servers, storage devices and other processing devices. It is to be appreciated, however, that embodiments are not restricted to use with the particular illustrative system and device configurations shown. Accordingly, the term “information processing system” as used herein is intended to be broadly construed, so as to encompass, for example, processing systems comprising cloud computing and storage systems, as well as other types of processing systems comprising various combinations of physical and virtual processing resources. An information processing system may therefore comprise, for example, at least one data center or other type of cloud-based system that includes one or more clouds hosting tenants that access cloud resources. Such systems are considered examples of what are more generally referred to herein as cloud-based computing environments. Some cloud infrastructures are within the exclusive control and management of a given enterprise, and therefore are considered “private clouds.” The term “enterprise” as used herein is intended to be broadly construed, and may comprise, for example, one or more businesses, one or more corporations or any other one or more entities, groups, or organizations. An “entity” as illustrative