Search

US-12625825-B2 - Systems and methods for on the fly routing in the presence of errors

US12625825B2US 12625825 B2US12625825 B2US 12625825B2US-12625825-B2

Abstract

Systems and methods are provided for “on the fly” routing of data transmissions in the presence of errors. Switches can establish flow channels corresponding to flows in the network. In response to encountering a critical error on a network link along a transmission path, a switch can generate an error acknowledgement. The switch can transmit the error acknowledgements to ingress ports upstream from the network link via the plurality of flow channels. By transmitting the error acknowledgement, it indicates that the network link where the critical error was encountered is a failed link to ingress ports upstream from the failed link. Subsequently, each ingress port upstream from the failed link can dynamically update the path of the plurality of flows that are upstream from the failed link such that the plurality of flows that are upstream from the failed link are routed in a manner that avoids the failed link.

Inventors

  • Jonathan P. Beecroft
  • Edwin L. Froese

Assignees

  • HEWLETT PACKARD ENTERPRISE DEVELOPMENT LP

Dates

Publication Date
20260512
Application Date
20240528

Claims (20)

  1. 1 . A method, comprising: establishing a flow channel defining a data path along which transmission of a packets of a flow occurs from an ingress switch to an egress switch; returning acknowledgements (ACKs) on the flow channel from the egress switch to the ingress switch in response to the transmission of the packets, the ACKs traversing the data path in reverse order back to the ingress switch; in response to encountering a failed link along the data path based on receipt of an ACK at a link upstream of the failed link, converting the ACK to an error ACK identifying the failed link; and at one or more ingress ports upstream from the failed link, updating the data path to avoid the failed link, wherein a subsequent packet of the flow traversing the updated data path avoids the failed link.
  2. 2 . The method of claim 1 , wherein the data path comprises the ingress switch, the egress switch, and one or more intermediate switches between the ingress switch and the egress switch.
  3. 3 . The method of claim 2 , wherein the ACKs are generated by the egress switch or the one or more intermediate switches.
  4. 4 . The method of claim 2 , wherein the ACKs release the flow channel at the egress switch, the one or more intermediate switches, and the ingress switch in reverse order when no other packets are sent on the flow.
  5. 5 . The method of claim 2 , wherein state information of the flow is updated at the ingress switch, the egress switch, and the one or more intermediate switches based on information carried by the ACKs, the information comprising downstream data path information.
  6. 6 . The method of claim 5 , wherein the ACK comprise a type field providing the information, the information including information regarding the failed link.
  7. 7 . The method of claim 5 , further comprising updating the information carried by the ACKs prior to being forwarded to a next upstream switch of the data path.
  8. 8 . The method of claim 7 , wherein the state information and forwarding information are stored in flow channel tables associated with the switches of the data path.
  9. 9 . The method of claim 8 , wherein the data path and an amount of data belonging to the path are described in a set of the flow channel tables that are dynamically connected.
  10. 10 . The method of claim 5 , wherein the state information reflects an amount of outstanding, unacknowledged data.
  11. 11 . The method of claim 1 , further comprising, assigning a flow ID to the packets at each switch of the data path, establishing a chain of flow IDs.
  12. 12 . The method of claim 11 , further comprising, distinguishing distinctions between the flow and one or more other flows traveling on a same fabric link based on the chain of flow IDs.
  13. 13 . The method of claim 1 , further comprising: establishing a plurality of flow channels defining data paths along which transmission of packets of a plurality of flows occur; and at one or more ingress ports upstream from the failed link along the plurality of flow channels, updating the data paths to avoid the failed link, wherein subsequent packets of the plurality of flows traversing the updated data paths avoid the failed link.
  14. 14 . A switch, comprising: an application-specific integrated circuit (ASIC) configured to: establish a flow channel defining a data path along which transmission of packets of a flow occurs, the switch being one switch of the defined data path; generating acknowledgements (ACKs) on the flow channel in response to the transmission of the packets, and the ACKs traversing along the defined data path in reverse order; and in response to encountering a failed link along the defined data path based on receipt of an ACK at a link upstream of the failed link, converting the ACK to an error ACK identifying the failed link, prompting updating of the defined data path to avoid the failed link at each ingress port upstream from the failed link.
  15. 15 . The switch of claim 14 , wherein state information of the flow is updated at the switch based on information carried by the ACKs, the information comprising downstream data path information.
  16. 16 . The switch of claim 15 , wherein the ACKs comprise a type field providing the information, the information including information regarding the failed link.
  17. 17 . The switch of claim 15 , wherein the ASIC is further configured to update the information carried by the ACKs prior to being forwarded to a next upstream switch of the defined data path.
  18. 18 . The switch of claim 17 , wherein the ASIC is further configured to store the state information and forwarding information in a flow channel table associated with the switch.
  19. 19 . The switch of claim 15 , wherein the state information reflects an amount of outstanding, unacknowledged data.
  20. 20 . The switch of claim 14 , wherein the ASIC is further configured to assign a flow ID to the packets, the flow ID being included as part of a chain of flow IDs assigned at each switch of the defined data path wherein the chain of flow IDs allows for distinctions between the flow and one or more other flows traveling on a same fabric link.

Description

CROSS REFERENCE TO RELATED APPLICATIONS This application is a continuation of U.S. patent application Ser. No. 17/594,712, filed on Oct. 27, 2021, which application is a national stage of International Application No. PCT/US2020/024342, filed on Mar. 23, 2020, which claims the benefit of U.S. Provisional Application No. 62/852,203 filed on May 23, 2019, U.S. Provisional Application No. 62/852,273 filed on May 23, 2019 and U.S. Provisional Application No. 62/852,889. The disclosures of the aforementioned applications are hereby incorporated by reference in their entireties. STATEMENT OF GOVERNMENT RIGHTS The invention(s) described herein were made with U.S. Government support under one or more of the contracts set forth below. The U.S. Government has certain rights in these inventions. Contract TitleCustomer/AgencyContract ReferenceFastForward-2Lawrence LivermoreSubcontract B609229National Security,under prime contractLLC/Dept of EnergyDE-AC52-07NA27344BeePresentMaryland ProcurementH98230-15-D-0020;OfficeDelivery Order 003SeaBiscuitMaryland ProcurementII98230-14-C-0758OfficePathForwardLawrence LivermoreSubcontract B620872National Security,under prime contractLLC/Dept of EnergyDE-AC52-07NA27344DesignForwardThe Regents of theSubcontract 7078453University of California/under prime contractDept of EnergyDE-AC02-05CII11231DesignForward-2The Regents of theSubcontract 7216357University of California/under prime contractDept of EnergyDE-AC02-05CII11231 DESCRIPTION OF RELATED ART As network-enabled devices and applications become progressively more ubiquitous, various types of traffic as well as the ever-increasing network load continue to demand more performance from the underlying network architecture. For example, applications such as high-performance computing (HPC), media streaming, and Internet of Things (IOT) can generate different types of traffic with distinctive characteristics. As a result, in addition to conventional network performance metrics such as bandwidth and delay, network architects continue to face challenges such as scalability, versatility, and efficiency. BRIEF DESCRIPTION OF THE DRAWINGS The present disclosure, in accordance with one or more various embodiments, is described in detail with reference to the following figures. The figures are provided for purposes of illustration only and merely depict typical or example embodiments. FIG. 1 illustrates an example network in which various embodiments may be implemented. FIG. 2A illustrates an example switch that facilitates flow channels. FIG. 2B illustrates an example of how switches along a data path can maintain flow state information. FIG. 3A illustrates an example fabric header for a data packet. FIG. 3B illustrates an example acknowledgement (ACK) packet format. FIG. 3C illustrates an example relationship between different variables used to derive and maintain state information of a flow. FIG. 4A illustrates an example of how flow channel tables can be used to deliver a flow. FIG. 4B illustrates an example of an edge flow channel table (EFCT). FIG. 4C illustrates an example of an input flow channel table (IFCT). FIG. 4D illustrates an example of an output flow channel table (OFCT). FIG. 5 illustrates an example of a network experiencing congestion, where adaptive routing and “on the flow” routing can be implemented. FIG. 6 illustrates a flow chart of an exemplary process of “on the fly” routing in the presence of errors, in accordance with various embodiments. FIG. 7 illustrates an example switch that facilitates flow channels for “on the fly” routing in the presence of errors. FIG. 8 is an example computing component that may be used to implement various features of embodiments described in the present disclosure. The figures are not exhaustive and do not limit the present disclosure to the precise form disclosed. DETAILED DESCRIPTION The present disclosure describes systems and methods that can accommodate exascale computing, e.g., perform data-intensive tasks such as simulations, data analytics, artificial intelligence workloads at exascale speeds. In particular, an High Performance Computing (HPC) network or interconnect fabric is provided that may be Ethernet-compatible, able to connect to third-party data storage, and can be built using a switch component that is extremely high bandwidth, e.g., on the order to 12.8 Tb/s/dir per switch with, e.g., 64 200 Gbps ports that support large network creation with very low diameter (e.g., only three network hops). A switch chip is used to implement the aforementioned switch. The switch chip is a custom Application Specific Integrated Circuit (ASIC) designed for the network. As an example, it can provides 64 network ports that can operate at either 100 Gbps or 200 Gbps for an aggregate throughput of 12.8 Tbps. Each network edge port is able to support IEEE 802.3 Ethernet, and Optimized-IP based protocols as well as Portals, an enhanced frame format that provides support for higher rates of sma