Search

US-12627598-B2 - Fine-granularity admission and flow control for rack-level network connectivity

US12627598B2US 12627598 B2US12627598 B2US 12627598B2US-12627598-B2

Abstract

A system for admission and flow control is disclosed. In some embodiments, the system includes a switch for routing network traffic, having multiple classes of service (CoSs), from multiple ingress ports to one or more of multiple egress ports. The system also includes multiple ingress-level class of service queues (InCoS-Qs) and one or more egress-level class of service queues (EgCoS-Qs), each InCoS-Q and EgCoS-Q corresponding to one of CoSs. The switch is configured to detect congestion in a particular EgCoS-Q, corresponding to a particular CoS, the particular EgCoS-Q being associated with a particular host; identify an InCoS-Q corresponding to that particular CoS, and associated with that particular host; and block that InCoS-Q, while allowing routing of the network traffic from one or more InCoS-Qs corresponding to that particular CoS, the one or more InCoS-Qs corresponding to one or more other hosts.

Inventors

  • Gurjeet Singh
  • Ari ARAVINTHAN
  • Shimon Muller
  • Jay Peterson
  • Shrijeet Mukherjee

Assignees

  • Enfabrica Corporation

Dates

Publication Date
20260512
Application Date
20240604

Claims (20)

  1. 1 . A system for admission and flow control, comprising: a switch for routing network traffic, having a plurality of classes of service, from each of a plurality of ingress ports to one or more of a plurality of egress ports, the plurality of classes of service including a first class and a second class; and a plurality of interfaces including a first interface and a second interface, each interface corresponding to a respective ingress port from the plurality of ingress ports and comprising a plurality of class of service queues (CoS-Qs) corresponding to respective classes of service from the plurality of classes of service, the plurality of CoS-Qs including a first COS-Q corresponding to the first class and a second COS-Q corresponding to the second class, wherein the switch is configured to: detect congestion in the first CoS-Q in the first interface; and block network traffic from the first CoS-Q in the first interface to an egress port from the plurality of egress ports, while continuing to (1) route, to the egress port, network traffic from the second COS-Q in the first interface, and (2) route, to the plurality of egress ports including the egress port, network traffic from the first and second CoS-Qs in the second interface.
  2. 2 . The system of claim 1 , wherein the plurality of interfaces includes one or more network interface cards.
  3. 3 . The system of claim 1 , wherein the switch is further configured to: sort a plurality of packets of network traffic based on an ingress class of service (CoS) associated with each packet of the plurality of packets; route the sorted packets to a plurality of output queues, each output queue of the plurality of output queues storing the packets for a respective destination; move the packets in each output queue to a separate input queue at the egress port according to a respective host sending the packets; and re-sort the packets at the egress port based on an egress CoS associated with each packet.
  4. 4 . The system of claim 3 , further comprising a parser and classifier that are configured to: parse and classify packet headers of the plurality of packets of network traffic; and identify one or more packet properties from the packet headers, wherein the one or more packet properties include at least one of the ingress CoS, an egress port identifier, or an egress CoS.
  5. 5 . The system of claim 4 , wherein the ingress CoS for a packet from the plurality of packets is a priority class determined by a software application installed on a host that sent the packet.
  6. 6 . The system of claim 1 , wherein the switch is further configured to: determine that congestion is absent; and forward all network traffic from the plurality of ingress ports to the plurality of egress ports.
  7. 7 . The system of claim 1 , wherein the switch is further configured to: in response to the detected congestion, generate a congestion notification to trigger stalling of packet forwarding from a host associated with the first interface.
  8. 8 . The system of claim 7 , wherein the switch is further configured to: send the congestion notification to network interface controller (NIC) queues of the host; and stop scheduling subsequent packet transfers for the first class from the NIC queues of the host.
  9. 9 . The system of claim 1 , wherein the switch is further configured to use a backpressure algorithm to detect the congestion.
  10. 10 . The system of claim 1 , wherein the switch comprises a plurality of NICs integrated with a Top of Rack (TOR) switch.
  11. 11 . A method for admission and flow control, comprising: routing, by a switch, network traffic from each of a plurality of ingress ports to one or more of a plurality of egress ports through a plurality of interfaces including a first interface and a second interface, wherein the network traffic has a plurality of classes of service including a first class and a second class, wherein each interface of the plurality of interfaces corresponds to a respective ingress port from the plurality of ingress ports and comprises a plurality of class of service queues (CoS-Qs) corresponding to respective classes of service from the plurality of classes of service, wherein the plurality of CoS-Qs includes a first COS-Q corresponding to the first class and a second COS-Q corresponding to the second class, and wherein routing, by the switch, the network traffic comprises: detecting congestion in the first CoS-Q in the first interface; and blocking network traffic from the first CoS-Q in the first interface to an egress port from the plurality of egress ports, while continuing to (1) route, to the egress port, network traffic from the second COS-Q in the first interface, and (2) route, to the plurality of egress ports including the egress port, network traffic from the first and second CoS-Qs in the second interface.
  12. 12 . The method of claim 11 , wherein the plurality of interfaces includes one or more network interface cards.
  13. 13 . The method of claim 11 , further comprising: sorting a plurality of packets of network traffic based on an ingress class of service (CoS) associated with each packet of the plurality of packets; routing the sorted packets to a plurality of output queues, each output queue of the plurality of output queues storing the packets for a respective destination; moving the packets in each output queue to a separate input queue at the egress port according to a respective host sending the packets; and re-sorting the packets at the egress port based on an egress CoS associated with each packet.
  14. 14 . The method of claim 13 , further comprising: parsing and classifying packet headers of the plurality of packets of network traffic; and identifying one or more packet properties from the packet headers, wherein the one or more properties include at least one of the ingress CoS, an egress port identifier, or an egress CoS.
  15. 15 . The method of claim 14 , wherein the ingress CoS for a packet from the plurality of packets is a priority class determined by a software application installed on a host that sent the packet.
  16. 16 . The method of claim 11 , further comprising: determining that congestion is absent; and forwarding all network traffic from the plurality of ingress ports to the plurality of egress ports.
  17. 17 . The method of claim 11 , further comprising: in response to the detected congestion, generating a congestion notification to trigger stalling of packet forwarding from a host associated with the first interface.
  18. 18 . The method of claim 17 , further comprising: sending the congestion notification to network interface controller (NIC) queues of the host; and stopping to schedule subsequent packet transfers for the first class from the NIC queues of the host.
  19. 19 . The method of claim 11 , wherein the congestion is detected by applying a backpressure algorithm.
  20. 20 . The method of claim 11 , wherein the switch comprises a plurality of NICs integrated with a Top of Rack (TOR) switch.

Description

CROSS REFERENCE TO RELATED APPLICATION This application is a divisional of and claims priority to and benefit of U.S. patent application Ser. No. 18/096,371, filed Jan. 12, 2023, which is incorporated by reference herein in its entirety. TECHNICAL FIELD This disclosure relates to a communication system that allows admission and flow control over offending host interfaces and/or offending queues within a host's host interfaces. BACKGROUND Admission and flow control is one of the most important data processing procedures in communication systems. It coordinates the amount of data that can be sent before a receiving limit is reached, based at least on designing and configuration of switches and Network Interface Controller (NICs). While admission and flow control has been a focus of research and industry for years, there are still some limitations. For example, in parallel computing, the applications that run in parallel often share a NIC when connecting to a network, which may cause an application to be temporally restricted in resource use when another application uses more resources than were originally allocated. This noisy neighbor problem, i.e., the behavior of an application affecting the performance of another application sharing the same NIC, becomes even worse in a virtualization environment where the use of resources are mostly controlled on the application side. When a switch gets congested because of a misbehaving application on a sending server, the congestion status on the switch may further propagate. That is, the congestion may be extended not only to multiple applications running on the same server but also to multiple applications running on different servers that temporally share the same egress or exiting port on the switch, resulting in overall performance degradation. SUMMARY To address the aforementioned shortcomings, a granularity-level admission and flow control system is provided. In some embodiments, the system includes a switch for routing network traffic, having a plurality of classes of service, from each of a plurality of ingress ports to one or more of a plurality of egress ports. The system also includes a plurality of interfaces (NICs), each interface corresponding to a respective ingress port and comprising one or more class of service queues (CoS-Qs) respectively corresponding to one or more of the plurality of classes of service. In some embodiments, the switch is configured to detect congestion in a particular CoS-Q, corresponding to a particular class of service, the particular CoS-Q belonging to a particular interface associated with a particular ingress port; and block that particular interface, while allowing routing of the network traffic from one or more CoS-Qs corresponding to that particular class of service, the one or more CoS-Qs belonging to one or more other interfaces associated with one or more other ingress ports. In other embodiments, the system includes a switch for routing network traffic, having a plurality of classes of service, from each of a plurality of ingress ports to one or more of a plurality of egress ports. The system includes a plurality of ingress-level class of service queues (InCoS-Qs), each InCoS-Q corresponding to one of the plurality classes of service. The system also includes one or more egress-level class of service queues (EgCoS-Qs), each EgCoS-Q corresponding to one of the plurality classes of service. In some embodiments, the switch is configured to detect congestion in a particular EgCoS-Q, corresponding to a particular class of service, the particular EgCoS-Q being associated with a particular host; identify an InCoS-Q corresponding to that particular class of service, and associated with that particular host; and block that InCoS-Q, while allowing routing of the network traffic from one or more InCoS-Qs corresponding to that particular class of service, the one or more InCoS-Qs corresponding to one or more other hosts. The above and other preferred features, including various novel details of implementation and combination of elements, will now be more particularly described with reference to the accompanying drawings and pointed out in the claims. It will be understood that the particular methods and apparatuses are shown by way of illustration only and not as limitations. As will be understood by those skilled in the art, the principles, and features explained herein may be employed in various and numerous embodiments. BRIEF DESCRIPTION OF THE DRAWINGS The disclosed embodiments have advantages and features which will be more readily apparent from the detailed description, the appended claims, and the accompanying figures (or drawings). A brief introduction of the figures is below. FIG. 1 illustrates an exemplary prior art architecture for switch-based network connection, according to some embodiments. FIG. 2 illustrates an exemplary system that enables granularity-level admission and flow control, according to some embodiment