Search

US-12619422-B1 - Firmware upgrade of computing device fleets

US12619422B1US 12619422 B1US12619422 B1US 12619422B1US-12619422-B1

Abstract

Some aspects of the present disclosure involve a method including: receiving instructions specifying a particular site from among a plurality of sites, wherein each of the plurality of sites comprises one or more computing devices, each of the one or more computing devices comprising a plurality of integrated circuit (IC) chips that are configured to perform similar computations in parallel; and in response to receiving the instructions, performing a firmware upgrade for each of the one or more computing devices at the particular site.

Inventors

  • Marshall Long
  • Sridhar Chirravuri
  • Diana Pham
  • Matangi Vaidyanathan
  • Sairam Jalakam Devarajulu

Assignees

  • Auradine, Inc.

Dates

Publication Date
20260505
Application Date
20250519

Claims (20)

  1. 1 . A method comprising: displaying, on a user interface of an operator machine, information about a plurality of computing devices distributed in a plurality of sites, wherein each of the plurality of sites comprises one or more computing devices, each of the one or more computing devices comprising a plurality of integrated circuit (IC) chips that are configured to perform similar computations in parallel; receiving instructions specifying a particular site from among the plurality of sites, wherein the instructions comprise at least one of an input configuring a particular start time point of a firmware upgrade, or a time window of the firmware upgrade; and in response to receiving the instructions, performing a firmware upgrade for each of the one or more computing devices at the particular site, wherein performing the firmware upgrade comprises: performing a staged rollout of the firmware upgrade of the plurality of IC chips at the particular site within the time window from the particular start time point, wherein performing the staged rollout of the firmware upgrade of the plurality of IC chips comprises selecting subsets of the plurality of IC chips and performing the firmware upgrade of the selected subsets sequentially, different subsets being selected at different start time points within the time window.
  2. 2 . The method of claim 1 , wherein receiving the instructions specifying the particular site comprises: receiving, through the user interface of the operator machine, an input selecting the particular site from among the plurality of sites represented on the user interface.
  3. 3 . The method of claim 1 , wherein performing the firmware upgrade for each of the one or more computing devices at the particular site comprises at least one of: performing a firmware upgrade of a central controller unit included in each of the one or more computing devices at the particular site, or performing a firmware upgrade of an individual controller unit included in each of the plurality of IC chips in each of the one or more computing devices at the particular site.
  4. 4 . The method of claim 1 , comprising: displaying, on the user interface, information about a plurality of groups in at least one site, wherein each of the plurality of groups comprises one or more computing devices; receiving second instructions specifying one or more groups from among the plurality of groups; and in response to receiving the second instructions, performing a firmware upgrade for one or more computing devices in each of the one or more specified groups.
  5. 5 . The method of claim 4 , wherein performing the firmware upgrade for each of the one or more computing devices in each of the one or more specified groups comprises at least one of: performing a firmware upgrade of a central controller unit included in each of the one or more computing devices in each of the one or more specified groups, or performing a firmware upgrade of an individual controller unit included in each of the plurality of IC chips in each of the one or more computing devices in each of the one or more specified groups.
  6. 6 . The method of claim 3 , wherein performing the firmware upgrade comprises downloading and installing a new firmware image to the central controller unit included in each of the one or more computing devices or the individual controller unit included in each of the plurality of IC chips in each of the one or more computing devices.
  7. 7 . The method of claim 6 , wherein performing the firmware upgrade further comprises: computing a checksum or cryptographic hash of the new firmware image; comparing the checksum or the cryptographic hash to a predetermined value; and rejecting or accepting the new firmware image based on the comparison result.
  8. 8 . The method of claim 6 , wherein performing the firmware upgrade further comprises: detecting a failure during downloading or installing of the new firmware image; and reverting each of the plurality of IC chips to a previous firmware image.
  9. 9 . The method of claim 6 , wherein performing the firmware upgrade further comprises: verifying the new firmware image using a public key, wherein the new firmware image was signed with a private key paired with the public key; and installing the new firmware image in response to the verification being successful.
  10. 10 . The method of claim 1 , further comprising: displaying, on the user interface, information about a plurality of computing devices at the particular site; receiving third instructions specifying a particular computing device from among the plurality of computing devices at the particular site; and in response to receiving the third instructions, performing a firmware upgrade for the particular computing device.
  11. 11 . The method of claim 10 , wherein performing the firmware upgrade for the particular computing device at the particular site comprises at least one of: performing a firmware upgrade of a central controller unit included in the particular computing device, or performing a firmware upgrade of an individual controller unit included in each of a plurality of IC chips in the particular computing device.
  12. 12 . The method of claim 4 , further comprising: displaying, on the user interface, information about a plurality of computing devices in the one or more specified groups; receiving third instructions specifying a particular computing device from among the plurality of computing devices in the one or more specified groups; and in response to receiving the third instructions, performing a firmware upgrade for the particular computing device.
  13. 13 . The method of claim 12 , wherein performing the firmware upgrade for the particular computing device comprises at least one of: performing a firmware upgrade of a central controller unit included in the particular computing device, or performing a firmware upgrade of an individual controller unit included in each of a plurality of IC chips in the particular computing device.
  14. 14 . The method of claim 1 , wherein the plurality of IC chips are configured to perform cryptographic hash computations or process large language model data.
  15. 15 . One or more non-transitory computer-readable media storing instructions that, when executed, cause one or more processors to perform operations comprising the method of claim 1 .
  16. 16 . The method of claim 1 , wherein performing the firmware upgrade of the selected subsets sequentially comprises one of: (i) performing a firmware upgrade of a second subset of IC chips after completion of a firmware upgrade of a first subset of IC chips, (ii) performing the firmware upgrade of the second subset of IC chips before the completion of the firmware upgrade of the first subset of IC chips, or (iii) performing firmware upgrades of IC chips in each subset in parallel.
  17. 17 . The method of claim 1 , wherein performing the firmware upgrade of the selected subsets comprises determining a randomized start time point for each subset of IC chips, using the particular start time point as a seed for randomization.
  18. 18 . A system comprising: a plurality of computing devices distributed in a plurality of sites, each of the plurality of sites comprising one or more computing devices, each of the one or more computing devices comprising a plurality of integrated circuit (IC) chips that are configured to perform similar computations in parallel; an operator machine communicably coupled to the plurality of computing devices; and memory storing instructions that, when executed, cause one or more processors to perform operations comprising: displaying, on a user interface of the operator machine, information about the plurality of computing devices distributed in the plurality of sites; receiving instructions specifying a particular site from among the plurality of sites, wherein the instructions comprise at least one of an input configuring a particular start time point of a firmware upgrade, or a time window of the firmware upgrade; and in response to receiving the instructions, performing a firmware upgrade for each of the one or more computing devices at the particular site, wherein performing the firmware upgrade comprises: performing a staged rollout of the firmware upgrade of the plurality of IC chips at the particular site within the time window from the particular start time point, wherein performing the staged rollout of the firmware upgrade of the plurality of IC chips comprises selecting subsets of the plurality of IC chips and performing the firmware upgrade of the selected subsets sequentially, different subsets being selected at different start time points within the time window.
  19. 19 . The system of claim 18 , wherein performing the firmware upgrade for each of the one or more computing devices at the particular site comprises at least one of: performing a firmware upgrade of a central controller unit included in each of the one or more computing devices at the particular site, or performing a firmware upgrade of an individual controller unit included in each of the plurality of IC chips in each of the one or more computing devices at the particular site.
  20. 20 . The system of claim 18 , wherein performing the firmware upgrade of the selected subsets comprises determining a randomized start time point for each subset of IC chips, using the particular start time point as a seed for randomization.

Description

TECHNICAL FIELD The present disclosure generally relates to devices, systems, and methods to upgrade the firmware of one or more fleets of computing devices. BACKGROUND Managing firmware upgrades across computing operations distributed across a large number of computing devices is complex, time-consuming, and prone to failures that can disrupt computing efficiency. Traditional upgrade processes can require manual intervention, lack failure recovery mechanisms, and do not provide a structured approach to deploying updates at scale. This can result in inconsistent firmware versions across computing devices, increased downtime, and operational inefficiencies, particularly when managing computing devices across multiple sites or groups. SUMMARY One aspect of the present disclosure relates to a method including: receiving instructions specifying a particular site from among a plurality of sites, wherein each of the plurality of sites includes one or more computing devices, each of the one or more computing devices including a plurality of integrated circuit (IC) chips that are configured to perform similar computations in parallel; and in response to receiving the instructions, performing a firmware upgrade for each of the one or more computing devices at the particular site. The method can include other optional features. For example, in some implementations, receiving the instructions specifying the particular site includes: receiving, through a user interface of an operator machine, an input selecting the particular site from among the plurality of sites represented on the user interface. In some implementations, performing the firmware upgrade for each of the one or more computing devices at the particular site includes at least one of: performing a firmware upgrade of a central controller unit included in each of the one or more computing devices at the particular site, or performing a firmware upgrade of an individual controller unit included in each of the plurality of IC chips in each of the one or more computing devices at the particular site. In some implementations, at least one site of the plurality of sites comprises a plurality of groups, each group of the plurality of groups comprising one or more computing devices, the method further including: receiving second instructions specifying one or more groups of the at least one site; and in response to receiving the second instructions, performing a firmware upgrade for each of the one or more computing devices in each of the one or more specified groups of the at least one site. In some implementations, receiving the second instructions specifying the one or more groups of the at least one site includes: receiving, through a user interface of an operator machine, an input selecting the one or more groups from among the plurality of groups represented on the user interface. In some implementations, performing the firmware upgrade for each of the one or more computing devices in each of the one or more specified groups includes at least one of: performing a firmware upgrade of a central controller unit included in each of the one or more computing devices in each of the one or more specified groups, or performing a firmware upgrade of an individual controller unit included in each of the plurality of IC chips in each of the one or more computing devices in each of the one or more specified groups. In some implementations, performing the firmware upgrade further includes: receiving, through a user interface of an operator machine, an input configuring a start time point of the firmware upgrade; receiving, through the user interface, an input configuring a time window of the firmware upgrade; and performing a staged rollout of the firmware upgrade of the plurality of IC chips at the particular site within the time window from the start time point. In some implementations, performing the firmware upgrade includes downloading and installing a new firmware image to the central controller unit included in each of the one or more computing devices or the individual controller unit included in each of the plurality of IC chips in each of the one or more computing devices. In some implementations, performing the firmware upgrade further includes: computing a checksum or cryptographic hash of the new firmware image; comparing the checksum or the cryptographic hash to a predetermined value; and rejecting or accepting the new firmware image based on the comparison result. In some implementations, performing the firmware upgrade further includes: detecting a failure during downloading or installing of the new firmware image; and reverting each of the plurality of IC chips to a previous firmware image. In some implementations, performing the firmware upgrade further includes: verifying the new firmware image using a public key, wherein the new firmware image was signed with a private key paired with the public key; and installing the new firmware image in response t