Search

US-12621245-B2 - Centralized aggregated elephant flow detection and management

US12621245B2US 12621245 B2US12621245 B2US 12621245B2US-12621245-B2

Abstract

A semiconductor chip for implementing aggregated flow detection and management includes a number of pipes, where each pipe is coupled to a portion of ports on the semiconductor chip that are to receive data packets. A logic is coupled to the pipes and is used to detect and manage an elephant flow. The elephant flow-detection and management logic includes a flow table and a byte counter.

Inventors

  • Sachin Prabhakarrao KADU

Assignees

  • AVAGO TECHNOLOGIES INTERNATIONAL SALES PTE. LIMITED

Dates

Publication Date
20260505
Application Date
20230828

Claims (20)

  1. 1 . A semiconductor chip for implementing aggregated flow detection and management, the semiconductor chip comprising: a plurality of pipes, each pipe having one or more ports, the one or more ports being configured to receive data packets; and a circuit coupled to the plurality of pipes and configured to detect a large flow across the plurality of pipes, wherein the circuit comprises a flow table and a byte counter, wherein the circuit is configured to detect the large flow in response to an aggregated count of bytes, accumulated by the byte counter, that is aggregated across the plurality of pipes, wherein the aggregated count of bytes is stored in the flow table, and wherein the aggregated count of bytes is aggregated per flow of each pipe of the plurality of pipes such that the large flow is detected across the plurality of pipes.
  2. 2 . The semiconductor chip of claim 1 , wherein the circuit is configured to receive, for each data packet of the data packets, a flow information, an ingress port, an egress port and a count of bytes in each data packet of the data packets, the flow information being stored in the flow table.
  3. 3 . The semiconductor chip of claim 2 , wherein the flow information comprises a count of different values corresponding to a transmission control protocol, a source port address, a destination port address and an associated transmission protocol.
  4. 4 . The semiconductor chip of claim 2 , wherein the count of the bytes in each data packet is derived from an end-of-packet (EOP) and is used to enable flow-detection support associated with a transmission protocol, irrespective of availability of data packet lengths in header bytes of the transmission protocol.
  5. 5 . The semiconductor chip of claim 1 , wherein the circuit further comprises a flow-aging logic implemented on the semiconductor chip, wherein a single circuit for the flow-aging logic is provided for the plurality of pipes on the semiconductor chip.
  6. 6 . The semiconductor chip of claim 5 , wherein the circuit is configured to track flows across the semiconductor chip to detect the large flow.
  7. 7 . The semiconductor chip of claim 6 , wherein the circuit is configured to track the flows across the semiconductor chip by comparing a per-flow accumulated byte count against configurable byte-count and time-period thresholds to detect the large flow, the large flow being an elephant flow.
  8. 8 . The semiconductor chip of claim 7 , wherein the large flow is a large continuous flow, wherein the circuit is further configured to report the large continuous flow and support aging of the large continuous flow within a configurable interval, wherein the flow-aging logic is configured to evict a flow if there is no update on the flow, the per-flow accumulated byte count is lower than a corresponding programmed threshold for a programmed time duration, or a traffic congestion on an egress port eases.
  9. 9 . The semiconductor chip of claim 7 , wherein the circuit is agnostic to an elephant flow changing ingress ports of a given router.
  10. 10 . The semiconductor chip of claim 1 , wherein each pipe of the plurality of pipes is configured to provide a flow hash calculated on flow information.
  11. 11 . The semiconductor chip of claim 10 , wherein the circuit is configured to track the flow hash if a corresponding egress port is congested and/or loaded to a certain threshold for a predefined time interval.
  12. 12 . The semiconductor chip of claim 10 , wherein the flow table comprises a load-aware equal-cost multipath (ECMP) group table including group base and size information.
  13. 13 . A method of detection and management of an aggregated flow, the method comprising: configuring a pipe of a plurality of pipes disposed on a semiconductor chip to receive a data packet from a port of a plurality of ports of the semiconductor chip; receiving, by a circuit on the semiconductor chip, for the data packet, live updates on flow information from the plurality of ports; and detecting, by the circuit, a large flow across the plurality of pipes based on at least the flow information, the flow information comprising an aggregated byte count provided by a byte counter, wherein the aggregated byte count is aggregated per flow of each pipe of the plurality of pipes such that the large flow is detected across the plurality of pipes.
  14. 14 . The method of claim 13 , further comprising configuring the circuit to receive the live updates regarding an ingress port, an egress port and a count of bytes in the data packet.
  15. 15 . The method of claim 14 , further comprising configuring the circuit to: derive the count of the bytes in the data packet from an EOP, and enable detection support for a transmission protocol irrespective of availability of data packet lengths.
  16. 16 . The method of claim 13 , further comprising configuring the circuit to: examine each data packet from a given pipe to determine whether an associated flow entry exists, and create and update a corresponding counter when the associated flow entry does not exist.
  17. 17 . The method of claim 16 , further comprising configuring the circuit to: update changes to ingress ports and/or egress ports and corresponding counters, when the associated flow entry exists, compare an accumulated byte count, per flow, against configurable byte-count and time-period thresholds, notify a corresponding ingress pipeline to take a programmed action or send the flow information to software, aggregate byte counts for given flows from the plurality of pipes and different filters; and evict a flow if there is no update on the flow or the aggregated byte count is lower than a corresponding programmed threshold for a programmed time duration or traffic congestion on an egress port eases, wherein the large flow comprises a large continuous flow.
  18. 18 . The method of claim 17 , further comprising configuring the circuit to: collect additional information on an equal-cost multipath (ECMP) group size and corresponding next hops when the flow information and a corresponding flow hash exists; use the flow information and the corresponding flow hash to automatically derive a desired destination to segregate or distribute the large flow; and notify, in response to evicting the flow, a corresponding ingress port to perform at least one of taking the programmed action or sending an evicted flow into the software.
  19. 19 . A system comprising: memory; and one or more processors coupled to the memory and configured to execute instructions to perform operations comprising: receiving, by a pipe of a plurality of pipes disposed on a semiconductor chip, data packets from a plurality of ports of the semiconductor chip; receiving, by the one or more processors, live updates on flow information regarding the data packets from the plurality of ports; and detecting large flow across the plurality of ports based at least on the flow information, wherein the flow information is stored in a table coupled to a byte counter, wherein the table includes an aggregated count of bytes that is aggregated per flow of each pipe of the plurality of pipes such that the large flow is detected across the plurality of pipes.
  20. 20 . The system of claim 19 , wherein the operations further comprise: tracking a flow hash when data traffic congestion occurs at a corresponding egress port; notifying a corresponding pipe to record an entire flow information corresponding to the flow hash when the large flow is detected; configuring a separate pipe to automatically increment or decrement a priority level of data packets in a detected elephant flow; causing the corresponding pipe to report the detected elephant flow to software using first-in-first-out (FIFO) direct memory access (DMA); and leveraging the flow hash to segregate or distribute the detected elephant flow across a desired destination port or a selected set of destination ports using a load-aware equal-cost multipath (ECMP) structure.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS This application is a continuation of U.S. patent application Ser. No. 17/475,297, filed Sep. 14, 2021, the entire disclosure of which has been incorporated by reference herein. TECHNICAL FIELD The present description relates generally to Ethernet communications and, in particular, to centralized aggregated elephant flow detection and management. BACKGROUND There are a small number of flows that carry the majority of internet traffic and a large number of flows that carry comparatively very little internet traffic. The flows that carry the large amount of internet traffic and consume the routing resources at a much higher rate over a network link are called elephant flows. Elephant flows significantly influence the flow of internet traffic through the networks by causing packet drops and increasing latencies. Effective detection, analysis and management of these flows can help reduce congestion, packet drops and tail latencies. The existing implementations cannot analyze and detect flows across multiple pipes when a flow changes ingress port (for various reasons such as load balancing, the port is down for maintenance, etc.), and when it falls on a different processing pipe. The cost of implementation is replicated per pipe and can be significant, in terms of chip area and power consumption, for devices having 16 or 32 pipes. In other words, the existing solutions do not support aggregation across multiple pipes, and their elephant flow detection is based only on the number of bytes in a given flow and not on the corresponding egress queues/port loading and/or congestion. Further, all the accounting is done only on a start-of-packet (SOP), thus limiting the support to only the flows, which has data packet byte count information available at the SOP. BRIEF DESCRIPTION OF THE DRAWINGS Certain features of the subject technology are set forth in the appended claims. However, for purposes of explanation, several embodiments of the subject technology are set forth in the following figures. FIG. 1 is a block diagram illustrating an example of a switch and/or router chip within which the centralized aggregated elephant flow detection and management of the subject technology is implemented. FIG. 2 is a block diagram illustrating limiting aspects of an example of an existing elephant-flow detection system. FIG. 3 is a high-level block diagram illustrating an example of an aggregated elephant-flow detection system, according to various aspects of the subject technology. FIG. 4 is a block diagram illustrating an example of an aggregated elephant-flow detection system, according to various aspects of the subject technology. FIG. 5 is a diagram illustrating an example of a flow management using a load-aware equal-cost multipath (ECMP) mapping. FIG. 6 is a flow diagram illustrating an example of an aggregate elephant flow detection process, in accordance with some aspects of the subject technology. FIG. 7 is a flow diagram illustrating an example of an elephant-flow management process, in accordance with some aspects of the subject technology. FIG. 8 is an electronic system within which some aspects of the subject technology may be implemented. DETAILED DESCRIPTION The detailed description set forth below is intended as a description of various configurations of the subject technology and is not intended to represent the only configurations in which the subject technology may be practiced. The appended drawings are incorporated herein and constitute part of the detailed description, which includes specific details for providing a thorough understanding of the subject technology. However, the subject technology is not limited to the specific details set forth herein and may be practiced without one or more of the specific details. In some instances, structures and components are shown in a block-diagram form in order to avoid obscuring the concepts of the subject technology. The subject technology is directed to methods and systems for centralized elephant flow detection and management. In communication networks, elephant flow refers to an extremely large flow of data through a network link that can occupy a disproportionate share (e.g., more than 1%) of the total bandwidth over a period of time. Elephant flows can significantly influence the internet traffic through the network links by causing packet drops and increasing latencies, and can consume routing resources at a much higher rate over a network link. The subject technology can detect, analyze and manage elephant flows in order to reduce traffic congestion, packet drops, and tail latencies. For every data packet going through, each pipe can provide information related to a number of subjects to a central elephant flow-detection module, as described in more detail herein. The detected elephant flows are reported to enable changes to their quality of service. The detected elephant flows can also be segregated or distributed acr