CN-122027584-A - Cache sharing management method and device of distributed exchange chip and electronic equipment

CN122027584ACN 122027584 ACN122027584 ACN 122027584ACN-122027584-A

Abstract

The invention provides a cache sharing management method, a device and electronic equipment of a distributed exchange chip, wherein the exchange chip comprises a plurality of exchange units, the method comprises the steps of dividing the exchange units into at least one shared domain, each domain comprises a plurality of adjacent exchange units, a local input cache of each exchange unit comprises private and shared caches, the shared caches of all exchange units in the same shared domain form a shared cache pool of the shared domain, for any exchange unit, the credit signals in the credit-based flow control mechanism on the exchange chip are multiplexed through RCA corresponding to the exchange unit, the dynamic demand of at least one upstream virtual channel on shared cache resources in the shared cache pool of the exchange unit is perceived, and in response to the dynamic demand, the shared credits from the shared cache pool are dynamically distributed to each upstream virtual channel through RCA, and the shared credits are used for controlling the data flow of the upstream virtual channel to access the shared cache pool. The invention improves the whole cache utilization rate of the exchange chip.

Inventors

REN FENGYUAN
WANG XU
ZHANG YANG

Assignees

兰州大学

Dates

Publication Date: 20260512
Application Date: 20260122

Claims (10)

1. A cache sharing management method of a distributed switching chip is characterized in that the distributed switching chip comprises a plurality of switching units, and the method comprises the following steps: Dividing the plurality of switching units into at least one shared domain, wherein each shared domain comprises a plurality of adjacent switching units, and each switching unit is provided with a regional credit distributor; the local input buffer memory of each exchange unit comprises a private buffer memory and a shared buffer memory, and the shared buffer memories of all exchange units in the same shared domain form a shared buffer memory pool of the shared domain; For any switching unit, multiplexing credit signals in the existing credit-based flow control mechanism on the distributed switching chip through the regional credit distributor corresponding to the switching unit, and sensing the dynamic demand of at least one upstream virtual channel on shared cache resources in the shared cache pool of the switching unit; And responding to the dynamic demand of each upstream virtual channel for the shared cache resource, and dynamically distributing sharing credit from the shared cache pool to each upstream virtual channel through the regional credit distributor, wherein the sharing credit is used for controlling the data flow of the upstream virtual channel to access the shared cache pool.
2. The method for cache sharing management of a distributed switching chip according to claim 1, wherein the dividing the plurality of switching units into at least one shared domain comprises at least one of the following modes: dividing the switching units located in the same physical row of the switching units into the same shared domain; dividing the switching units located in the same physical column of the switching units into the same shared domain; Dividing the switching units which are arranged in rectangular clusters in the plurality of switching units into the same shared domain; Dividing the plurality of switching units into the shared domains based on a logical network topology distance.
3. The method for cache sharing management of a distributed switch chip according to claim 1, wherein the multiplexing, by the regional credit allocator corresponding to the switch unit, credit signals in an existing credit-based flow control mechanism on the distributed switch chip, and perceiving a dynamic demand of at least one upstream virtual channel for shared cache resources in the shared cache pool of the switch unit includes: Pre-allocating reserved spaces with preset capacity in the shared cache pool for each upstream virtual channel, wherein the reserved spaces with preset capacity are dynamically configured based on the priority and flow bandwidth requirements of the upstream virtual channels; for any one of the upstream virtual channels, monitoring a state change of a pre-allocation credit value corresponding to the reserved space of the upstream virtual channel to sense the dynamic demand; And taking the consumption event corresponding to the pre-allocation credit value as a resource application signal sent by the upstream virtual channel.
4. The method according to claim 1, wherein the dynamically allocating, by the regional credit allocator, the shared credits from the shared cache pool to each of the upstream virtual channels in response to the dynamic demand for the shared cache resources by each of the upstream virtual channels, comprises: Qualification of upstream virtual channels of the demand from each of the upstream virtual channels by the regional credit allocator in response to dynamic demand for the shared cache resources from each of the upstream virtual channels, the qualification including a capacity check to determine whether a total amount of allocated resources of a shared cache of the switching unit is less than a total capacity of the shared cache of the switching unit, and a threshold check to determine whether a number of shared credits already acquired by the upstream virtual channel of the demand is less than an upper allocation limit allowed by the upstream virtual channel of the demand, the upper allocation limit being adaptively adjusted based on a priority or a traffic type of the upstream virtual channel; In the event that qualification passes, dynamically allocating the shared credit from the shared cache pool to the upstream virtual channel issuing the demand by the regional credit allocator.
5. The method according to claim 4, wherein the local credit allocator is further configured to maintain a global shared counter and a port credit counter, the global shared counter being configured to record a total number of shared credits allocated in a shared cache of the switch unit; the dynamically allocating, by the regional credit allocator, the shared credits from the shared cache pool to the upstream virtual channel that issued the demand, comprising: Issuing, by said regional credit distributor, said shared credit to said upstream virtual channel from which said demand was issued; And synchronously updating the global shared counter and a port credit counter corresponding to the upstream virtual channel which sends out the demand.
6. The method for cache sharing management of a distributed switching chip according to any one of claims 1 to 5, further comprising, before dividing the plurality of switching units into at least one shared domain: Dividing a local input buffer of each switching unit into the private buffer of the switching unit and the shared buffer of the switching unit; dividing an upstream virtual channel of an input port into a first group of upstream virtual channels and a second group of upstream virtual channels; and directly connecting the first group of upstream virtual channels to a downstream switching network, and establishing a one-to-one mapping relation between the second group of upstream virtual channels and specific shared cache resources in a shared cache pool.
7. The method of claim 6, wherein the method further comprises: Binding the data flow to a specific upstream virtual channel in the second set of upstream virtual channels according to a preset rule when the data flow enters the network under a high load condition; And transmitting the data packets in the data stream in sequence through the one-to-one mapping relation and the first-in first-out queues in the specific upstream virtual channels.
8. A cache sharing management device of a distributed switching chip is characterized in that the distributed switching chip comprises a plurality of switching units, and the device comprises: the system comprises a dividing module, a sharing module and a processing module, wherein the dividing module is used for dividing the plurality of switching units into at least one sharing domain, each sharing domain comprises a plurality of adjacent switching units, and each switching unit is provided with a regional credit distributor; the sensing module is used for sensing the dynamic demand of at least one upstream virtual channel on the shared cache resources in the shared cache pool of the switching unit by multiplexing credit signals in the existing credit-based flow control mechanism on the distributed switching chip through the regional credit distributor corresponding to the switching unit; and the management module is used for responding to the dynamic demand of each upstream virtual channel for the shared cache resource, dynamically distributing the shared credit from the shared cache pool to each upstream virtual channel through the regional credit distributor, wherein the shared credit is used for controlling the data flow of the upstream virtual channel to access the shared cache pool.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and running on the processor, wherein the processor implements the cache sharing management method of the distributed switching chip according to any one of claims 1 to 7 when executing the computer program.
10. A non-transitory computer readable storage medium having stored thereon a computer program, wherein the computer program when executed by a processor implements the cache sharing management method of a distributed switching chip according to any of claims 1 to 7.

Description

Cache sharing management method and device of distributed exchange chip and electronic equipment Technical Field The present invention relates to the field of network-on-chip technologies, and in particular, to a method and an apparatus for cache sharing management of a distributed switching chip, and an electronic device. Background As the demands for interconnect network throughput, latency, and scalability continue to increase with large language model (Large Language Model, LLM) training and reasoning, the switch chip scale continues to expand. To break through the area and yield limitations of single Die (Die), a core (Chiplet) -based switching unitized distributed architecture is becoming the dominant choice for chip performance expansion. In addition, the tile type structure is widely adopted, and by dispersing the functions of the switching chip to a plurality of sub-modules, the complexity and the implementation cost of a single switching unit are obviously reduced, and the expandability of the switching chip is improved. The Buffer is one of the important components of the switching chip, and is used for absorbing the burst traffic in the network on chip and guaranteeing the line speed transmission of the network link under the normal state. For a network port, the available buffer capacity and the flow bearing capacity of the network port are in strong correlation, if the buffer capacity of the port is smaller, the absorption capacity of network burst and congestion is poorer, a back pressure (backpressure) mechanism is easy to trigger frequently, so that the problems of link transmission pause, pipeline blocking and the like are caused, the overall transmission performance of the network on a chip is severely restricted, otherwise, the sufficient buffer capacity of the port can effectively improve the flow adaptation elasticity, and the bottom hardware support is provided for stable transmission of the network. In network on chip (NoC), credit-based flow control (Credit-Based Flow Control, CBFC) is an essential underlying mechanism to achieve lossless transmission. Fig. 1 is a schematic diagram of the principle of interaction between data and Credit in a Credit-based flow control mechanism, as shown in fig. 1, in this mechanism, a transmitting end (upstream port) determines data (fragment) transmission according to a "Credit" (Credit) value (e.g. 2) sent by a receiving end (downstream port), that is, an available storage space of the receiving end (downstream port), and by virtue of zero handshake overhead and low delay characteristics, the mechanism becomes a necessary bottom means for preventing cache overflow and guaranteeing no loss. The arrow pointing from the upstream port to the downstream port represents the direction of transmission of the data fragment and the arrow pointing from the downstream port back to the upstream port represents the feedback direction of the "credit" signal. On the downstream port side, the credit generator and the credit counter together form a logic unit for credit management for tracking and generating credit signals. However, the conventional CBFC mechanism is essentially designed as a static "Point-to-Point" transmission, whose flow control relationship is only bound to a single link in physical proximity, and naturally lacks the ability to dynamically schedule "one-to-many" or "many-to-many", and thus cannot be directly used to manage a common cache pool shared by multiple switching units. In a core particle or Tile type distributed architecture, because the original centralized shared cache is physically split into a plurality of independent units, the point-to-point constraint of the physical isolation superposition CBFC mechanism leads to obvious low resource utilization rate when the system faces unbalanced traffic common to a data center, such as converged traffic (Incast) or hot spot traffic (Hotspot), namely the local cache of the hot spot Tile can be quickly exhausted and trigger back pressure to cause local throughput to be reduced, and meanwhile, the cache resources of other non-hot spots Tile are in an idle state and cannot be effectively utilized. This also results in a lower overall performance of the network. Fig. 2 is a schematic diagram of a typical distributed cache architecture, and referring to fig. 2, each coordinate-marked block (e.g., "switch unit (0, 0)", "switch unit (0, 1)", "switch unit (0, 2)", "switch unit (1, 0)", "switch unit (1, 1)", and "switch unit (1, 2)", etc.) represents an independent switch unit (Tile). These cells are arranged in a two-dimensional array of 2 rows by 3 columns, which forms the core of the distributed switching chip. Each switching unit has a local input buffer physically bound to it, and these buffer resources are physically and logically assigned to the switching unit in which they are located and cannot be directly accessed or used by other units. The switching units are connected by links shown by a