Search

CN-122002760-A - Overheat recognition and repair device

CN122002760ACN 122002760 ACN122002760 ACN 122002760ACN-122002760-A

Abstract

The present disclosure relates to identifying and repairing overheating devices. The present disclosure describes techniques for intelligently detecting overheating devices in a network or data center and taking action to address such overheating devices. The present disclosure also describes evaluating heat dissipation information associated with components of devices in a network, making predictions regarding network outages based on the evaluation of heat dissipation information, and taking actions to address, mitigate, or prevent such network outages. In one example, the present disclosure describes a method comprising collecting, by a computing system, information regarding a thermal metric for a plurality of network devices, identifying, by the computing system and based on the information regarding the thermal metric, a particular network device that rapidly changes temperature, and taking, by the computing system, an action to address an effect of overheating associated with the particular network device.

Inventors

  • Ganesh Biagotti Matad Sankada
  • Timnawan Srida
  • Raja Komula
  • Rajendra Shivaram Yawatkar

Assignees

  • 瞻博网络公司

Dates

Publication Date
20260508
Application Date
20251031
Priority Date
20241108

Claims (20)

  1. 1. A computing system comprising a processing circuit and a storage medium, wherein the processing circuit has access to the storage medium and is configured to: collecting information about heat metrics of a plurality of network devices; identifying a particular network device at risk of overheating based on the information about the heat metric, and Action is taken to address the effects of overheating associated with the particular network device.
  2. 2. The computing system of claim 1, wherein to collect the information about the thermal metrics, the processing circuit is further configured to: information about heat dissipation associated with each of the plurality of network devices is collected.
  3. 3. The computing system of claim 2, wherein each network device of the plurality of network devices comprises a plurality of components, and wherein to collect the information about heat dissipation associated with each network device of the plurality of network devices, the processing circuit is further configured to: For each network device, collecting the information about heat dissipation across the plurality of components included within each network device.
  4. 4. The computing system of claim 3, wherein to identify the particular network device, the processing circuit is further configured to: The cooling efficiency of at least some of the plurality of components within each of the plurality of network devices is evaluated based on the information about heat dissipation.
  5. 5. The computing system of claim 4, wherein to identify the particular network device, the processing circuit is further configured to: Determining that the particular network device has a component at risk of failure based on the evaluation.
  6. 6. The computing system of any one of claims 1 to 5, wherein to collect the information about the thermal metrics, the processing circuit is further configured to: temperature data is collected from sensors associated with each of the plurality of network devices.
  7. 7. The computing system of claim 6, wherein each network device of the plurality of network devices has a chassis, and wherein to collect temperature data from the sensor, the processing circuit is further configured to: temperature data is collected from sensors placed at strategic locations on the chassis of each network device of the plurality of network devices.
  8. 8. The computing system of any one of claims 1 to 5, wherein to collect the information about the thermal metrics, the processing circuit is further configured to: storing the information in a time-series data memory, and Based on the stored information, periodic time-series analysis of the temperature metrics of each of the plurality of network devices is enabled.
  9. 9. The computing system of any of claims 1 to 5, wherein to identify a particular network device at risk of overheating, the processing circuitry is further configured to: A particular network device that is fast overheating is identified.
  10. 10. The computing system of any of claims 1 to 5, wherein to take action to address the effects of overheating associated with the particular network device, the processing circuitry is further configured to: generating an alert providing information about overheating associated with the particular network device, and Enabling the administrator to take action.
  11. 11. The computing system of claim 10, wherein to generate the alert providing information, the processing circuit is further configured to: Including information recommending a rearrangement in which the particular network device is relocated to a location with better air circulation.
  12. 12. The computing system of any of claims 1 to 5, wherein to take action to address the effects of overheating associated with the particular network device, the processing circuitry is further configured to: the workload is redistributed by removing the workload from the particular network device.
  13. 13. The computing system of claim 2, wherein to collect the information about heat dissipation associated with each of the plurality of network devices, the processing circuit is further configured to: Time series data associated with the heat dissipation metric is stored.
  14. 14. The computing system of claim 13, wherein to identify the particular network device showing signs of overheating, the processing circuitry is further configured to: training a machine learning model based on at least some of the time series data to predict a heat dissipation pattern of a component within a network device; the machine learning model is applied to predict that the particular network device has a component at risk of failure.
  15. 15. The computing system of any of claims 1 to 5, wherein to take action to address the effects of overheating associated with the particular network device, the processing circuitry is further configured to: A control signal is sent to another system that instructs the other system to perform an operation to address the effects of overheating associated with the particular network device.
  16. 16. A method of detecting an overheating device, comprising: Collecting, by a computing system, information about heat metrics for a plurality of network devices; Identifying, by the computing system and based on the information about the heat metric, a particular network device at risk of overheating, and Actions are taken by the computing system to address the effects of overheating associated with the particular network device.
  17. 17. The method of claim 16, wherein collecting the information about the thermal metrics comprises: information about heat dissipation associated with each of the plurality of network devices is collected.
  18. 18. The method of claim 17, wherein each of the plurality of network devices comprises a plurality of components, and wherein collecting information associated with each of the plurality of network devices about heat dissipation comprises: For each network device, collecting the information about heat dissipation across the plurality of components included within each network device.
  19. 19. The method of claim 18, wherein identifying the particular network device comprises: The cooling efficiency of at least some components of the plurality of network devices is evaluated based on the information about heat dissipation.
  20. 20. A non-transitory computer-readable medium comprising instructions that, when executed, cause processing circuitry of a computing system to: collecting information about heat metrics of a plurality of network devices; identifying a particular network device at risk of overheating based on the information about the heat metric, and Action is taken to address the effects of overheating associated with the particular network device.

Description

Overheat recognition and repair device RELATED APPLICATIONS The present application claims the benefit of U.S. patent application Ser. No. 19/343,375, filed on Ser. No. 2025, 9, 29, which claims the benefit of Indian provisional patent application Ser. No. 202441086013, filed on Ser. No. 2024, 11, 7, each of which is incorporated herein by reference in its entirety. Technical Field The present disclosure relates to computer networks, and more particularly to managing heat generated in data centers. Background Overheating can have a significant adverse effect on the data center. The elevated temperatures may cause hardware failures, resulting in system outages and potential data loss. In addition, high temperatures can jeopardize the performance of the servers, resulting in a slowdown that affects the overall efficiency of the data center. Prolonged exposure to heat may accelerate degradation of the electronic components, resulting in increased maintenance costs and the need for more frequent replacement. Often, insufficient thermal management poses serious risks to reliability and operational continuity of the data center. Disclosure of Invention The present disclosure describes techniques for intelligently detecting overheating devices in a network or data center and taking action to address such overheating devices. The present disclosure also describes evaluating heat dissipation information associated with components of devices in a network, making predictions regarding network outages based on the evaluation of heat dissipation information, and taking actions to address, mitigate, or prevent such network outages. In some examples, the disclosure describes operations performed by a computing system in accordance with one or more aspects of the disclosure. In one particular example, the present disclosure describes a method that includes collecting, by a computing system, information regarding a thermal metric for a plurality of network devices, identifying, by the computing system and based on the information regarding the thermal metric, a particular network device that rapidly changes temperature, and taking, by the computing system, an action to address an effect of overheating associated with the particular network device. In another example, the disclosure describes a method comprising collecting, by a computing system, information about heat dissipation associated with each of a plurality of network devices, wherein each network device includes a plurality of components, and wherein collecting information about heat dissipation includes, for each network device, collecting information about heat dissipation across a plurality of components included within each network device, evaluating, by the computing system, cooling efficiency of at least some of the components of the plurality of network devices based on the information about heat dissipation, and identifying, by the computing system and based on the evaluation, a particular network device having components with increased risk of failure. In another example, the present disclosure describes a system comprising a storage system and processing circuitry capable of accessing the storage system, wherein the processing circuitry is configured to perform the operations described herein. In yet another example, the present disclosure describes a computer-readable storage medium comprising instructions that, when executed, configure processing circuitry of a computing system to perform operations described herein. This summary is intended to provide a brief overview of some of the subject matter described in this document. Accordingly, the above features are merely examples and should not be construed to narrow the scope or spirit of the subject matter described herein. Other features, objects, and advantages of the disclosure will be apparent from the description and drawings, and from the claims. Drawings FIG. 1 is a block diagram illustrating an example system including a data center in which examples of the techniques described herein may be implemented. Fig. 2A, 2B, and 2C are conceptual diagrams of an arrangement of devices within racks in a data center according to one or more aspects of the present disclosure. Fig. 3A and 3B are conceptual diagrams of devices within racks in a data center, where heat dissipation information is collected from the devices, according to one or more aspects of the present disclosure. Fig. 4 is a flowchart illustrating operations performed by an example controller in accordance with one or more aspects of the present disclosure. Fig. 5 is a flowchart illustrating operations performed by an example computing system in accordance with one or more aspects of the present disclosure. Detailed Description FIG. 1 is a block diagram illustrating an example system 8 including a data center in which examples of the techniques described herein may be implemented. In general, data center 100 provides an operating environment