Search

US-20260128984-A1 - VIRTUAL ROUTING FIELDS

US20260128984A1US 20260128984 A1US20260128984 A1US 20260128984A1US-20260128984-A1

Abstract

A switch including a plurality of ports; a management processor; and a switch core configured to receive a packet for transmission including a destination local ID (‘DLID’) and a virtual routing field (‘VRF) augmenting the routing of the packet on a route to the DLID according to a particular routing algorithm in dependence upon the VRF.

Inventors

  • Gary Muntz
  • Mark Atkins

Assignees

  • Cornelis Networks, Inc.

Dates

Publication Date
20260507
Application Date
20251229

Claims (20)

  1. 1 . A method of routing with virtual routing fields in a high-performance computing system, the method comprising: receiving, by a switch, a packet for transmission, the packet including a destination local ID (‘DLID’) and a virtual routing field (‘VRF); augmenting, by the switch in dependence upon the VRF, the routing of the packet on a route to the DLID according to a particular routing algorithm.
  2. 2 . The method of claim 1 wherein the switch supports a plurality of routing algorithms and wherein augmenting the routing of the packet on a route to the DLID according to a particular routing algorithm includes augmenting the routing according to each of the plurality of routing algorithms.
  3. 3 . The method of claim 1 wherein the packet includes a plurality of VRFs and augmenting the routing of the packet on a route to the DLID according to a particular routing algorithm includes augmenting the routing of the packet in dependence upon the plurality of VRFs.
  4. 4 . The method of claim 1 wherein augmenting the routing of the packet on a route to the DLID according to a particular routing algorithm in dependence upon the VRF further comprises passing a LID definition through a mask.
  5. 5 . (canceled)
  6. 6 . The method of claim 1 wherein the LID definition includes a linear LID definition.
  7. 7 . The method of claim 1 wherein the VRF restricts traffic to a subset of global links.
  8. 8 . The method of claim 1 wherein the VRF defines a plane of switches to the destination.
  9. 9 . The method of claim 1 further comprising receiving, from a fabric manager, one or more VRFs.
  10. 10 . The method of claim 1 wherein the VRF is the is a subfield of a hierarchical LID definition and wherein the hierarchical LID definition includes a pipeline designation.
  11. 11 . The method of claim 1 wherein the VRF and the DLID reside in different fields of the packet.
  12. 12 . A switch comprising: a plurality of ports; a management processor; and a switch core configured to receive a packet for transmission including a destination local ID (‘DLID’) and a virtual routing field (‘VRF) and configured to augment the routing of the packet on a route to a DLID according to a particular routing algorithm in dependence upon the VRF.
  13. 13 . The switch of claim 12 wherein the switch supports a plurality of routing algorithms and wherein the switch core is further configured to augment the routing according to each of the plurality of routing algorithms.
  14. 14 . The switch of claim 12 wherein the packet includes a plurality of VRFs, and the switch core is configured to augment the routing of the packet in dependence upon the plurality of VRFs.
  15. 15 . The switch of claim 12 wherein the switch core is configured to pass a LID definition through a mask.
  16. 16 . The switch of claim 12 wherein the LID definition includes a hierarchical LID definition and the VRF resides to the hierarchical LID definition and wherein the switch core is configured to pass the HLID including the VRF through a ternary mask whose value enables the VRF.
  17. 17 . The switch of claim 12 wherein the LID definition includes a linear LID definition.
  18. 18 . The switch of claim 12 wherein the VRF defines a plane of switches to the destination.
  19. 19 . The switch of claim 12 further comprising receiving, by the management processor from a fabric manager, one or more VRFs.
  20. 20 . The switch of claim 12 wherein the VRF is a subfield of a hierarchical LID definition and wherein the hierarchical LID definition includes a pipeline designation.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS This application is a continuation of U.S. patent application Ser. No. 18/544,407 filed Dec. 18, 2023 which is incorporated by reference in its entirety herein. BACKGROUND High-Performance Computing (‘HPC’) refers to the practice of aggregating computing in a way that delivers much higher computing power than traditional computers and servers. HPC, sometimes called supercomputing, is a way of processing huge volumes of data at very high speeds using multiple computers and storage devices linked by a cohesive high-bandwidth, low-latency fabric. HPC makes it possible to explore and find answers to some of the world's biggest problems in science, engineering, business, and others. Artificial Intelligence (‘AI’) is another field of technology embracing the use of high bandwidth, low-latency fabrics. HPC and AI systems often have many computing devices, switches, and resources arranged in fabrics with switches that support adaptive routing. It's well known that adaptive routing is highly beneficial in congested networks when the receiving nodes can keep up with the transmitters. But high-volume traffic with incast bandwidth that overloads receivers will spread congestion to far more links under adaptive routing than deterministic routing. A classic case of this scenario is storage traffic. Typically, large numbers of compute nodes can generate high traffic to relatively few storage nodes with substantial risk of an incast problem. “Incast problem” refers to a networking phenomenon that occurs in large scale distributed computing systems where multiple nodes may simultaneously send data to fewer nodes. If the receiving nodes are not able to handle the incoming data or if the network infrastructure is not designed to handle such simultaneous events, a large amount of data traffic may converge on a single point in the network. This phenomenon is known as an “incast” problem. The challenge in such scenarios is to manage and allocate network resources effectively to prevent congestion and maintain smooth communication between the nodes. Explicit Congestion Notification (ECN) is a feature used in computer networking to manage network congestion. ECN handles congestion without relying solely on packet drops. Network devices mark packets instead of dropping them when congestion is detected. These marks indicate to the sender that the network is experiencing congestion, without actually discarding packets. The sender then adjusts its transmission rate, accordingly, reducing the amount of data sent and helping to alleviate congestion before it becomes severe. Explicit Congestion Notification (ECN) or related techniques can respond to high-incast scenarios, but they are not instantaneously effective, and performance can suffer significantly. The present invention complements ECN or related techniques by adding configurable isolation between incast-prone traffic and better-behaved, latency sensitive traffic, as well as load balancing and pipeline identification. BRIEF DESCRIPTION OF THE DRAWINGS Many aspects of the present disclosure can be better understood with reference to the following drawings. The components in the drawings are not necessarily to scale, with emphasis instead being placed upon illustrating the principles of the disclosure. Moreover, in the drawings, like reference numerals designate corresponding parts throughout the several views. FIG. 1 sets forth a system diagram of an example high-performance computing environment useful for routing with virtual routing fields according to embodiments of the present invention. FIG. 2 sets forth a line drawing illustrating the effect of virtual routing fields for routing with virtual routing fields according to embodiments of the present invention. FIG. 3 sets forth a line drawing illustrating hierarchical LID including a virtual routing field for traffic type for routing according to embodiments of the present invention. FIG. 4 sets forth a line drawing illustrating hierarchical LID including a virtual routing field for traffic type and another for pipeline designation for routing according to embodiments of the present invention. FIG. 5 sets forth a switch for routing with virtual routing fields according to embodiments of the present invention. FIG. 6 sets forth a block diagram of a compute node for routing with virtual routing fields according to embodiments of the present invention. FIG. 7 sets forth a flow chart illustrating an example method of routing with virtual routing fields according to embodiments of the present invention. DETAILED DESCRIPTION Methods, systems, devices, and products for routing in a high-performance computing system using virtual routing fields according to embodiments of the present invention are described with reference to the attached drawings beginning with FIG. 1. FIG. 1 sets forth a system diagram of an example high-performance computing environment (100) with a fabric (140) that supports r