CN-121984925-A - Dynamic congestion perception hierarchical scheduling method and device based on reconfigurable calculation

CN121984925ACN 121984925 ACN121984925 ACN 121984925ACN-121984925-A

Abstract

The invention particularly relates to a dynamic congestion perception hierarchical scheduling method and device based on reconfigurable calculation, which comprises the steps of constructing a hierarchical tree scheduling structure, mapping a node scheduling algorithm and a congestion algorithm to a reconfigurable calculation module, carrying out congestion recognition on dequeues and enqueues based on the mapped congestion algorithm, classifying and processing a message according to a congestion detection result when an enqueue application is received, calculating priority and sending time information for elements of the dequeues, pushing the queue elements into a scheduling sub-list according to the priority, completing orderly insertion of the priority and the sending time in the sub-list, extracting elements meeting the sending time and having the highest priority, and updating the scheduling sub-list and information of corresponding pointers when pushing and extracting. The invention supports the dynamic adjustment of the scheduling strategy, realizes the intelligent differentiated congestion control and the dynamic and efficient sharing of computing resources, and improves the scheduling efficiency of complex multi-service flow in high-speed network equipment.

Inventors

SHI JIANGYI
XU LIMING
PAN WEITAO
ZHU JUNCHAO
CHEN DILONG
WANG ZEKUN
ZHANG YUSEN

Assignees

西安电子科技大学

Dates

Publication Date: 20260505
Application Date: 20260116

Claims (10)

1. A reconfigurable computation-based dynamic congestion aware hierarchical scheduling method, the method comprising: Constructing a hierarchical tree-shaped scheduling structure, mapping queues to leaf nodes, assigning numbers for the leaf nodes, and mapping a node scheduling algorithm and a congestion algorithm to a reconfigurable computing module; congestion identification is carried out on dequeue and enqueue queues based on a mapped congestion algorithm; when an enqueue application is received, classifying and processing the message into direct discarding according to a congestion detection result, and sending the message to an off-chip cache or normal enqueue; preprocessing an element of the queue for dequeuing to obtain metadata, and sending the metadata to a reconfigurable computing module, and computing to obtain priority and sending time information based on a node scheduling algorithm of mapping; And pushing the queue elements into a scheduling sub-list according to the calculated priority, completing orderly insertion of the priority and the sending time in the sub-list, extracting the elements meeting the sending time and having the highest priority, and updating the information of the scheduling sub-list and the corresponding pointers when pushing and extracting.
2. The method of claim 1, wherein the hierarchical tree-like scheduling structure comprises a three-level tree-like structure of a stream queue layer, a user queue layer and a group queue layer, wherein the stream queue layer is used as a bottom leaf node for distinguishing different service types, the user queue layer is used as an intermediate node and comprises a plurality of stream queues, the group queue layer is used as a highest-level scheduler and is a final convergence point of traffic, and each layer of father nodes comprises a plurality of child nodes.
3. The method of claim 1, wherein each level of scheduling node supports a scheduling algorithm programmable by mapping instructions issued by a configuration to a reconfigurable computing module, the reconfigurable computing module supports a ring-shaped computing structure matching a pipeline level of a scheduler, a maximum level of supporting rollback is equal to the pipeline level of the node scheduler, and a computing object covers packaging metadata, upper-level computing output data and currently-computed rollback data.
4. The method of claim 1 wherein the reconfigurable computing module employs a computing unit sharing mechanism to allocate a preset number of basic computing units for each node, the basic computing units allocated for each node being divided into three groups of computing sub-modules, the first group of computing sub-modules being shared with a previous node, the second group of computing sub-modules being exclusive to a current node, the third group of computing sub-modules being shared with a next node, and when there is a difference in scheduling algorithm complexity mapped between adjacent nodes, the high complexity algorithm scheduling node being capable of dynamically borrowing idle sharing computing sub-modules of its adjacent nodes according to a preset scheduling policy.
5. The method of claim 1, wherein the push and fetch process supports concurrent processing, enabling dynamic switching of single-flow processing branches or dual-flow processing branches depending on load, entering dual-flow processing branches when push and fetch are performed simultaneously, and entering single-flow processing branches when only push or only fetch operations are performed.
6. The method according to claim 1, further comprising the steps of processing the batch message set as one element, performing enqueue element preprocessing before scheduling starts, dynamically calculating the number of messages contained in the push queue element according to the preset minimum push length of the single element and the maximum number of the single queue schedulable messages, analyzing the message number information contained in the enqueue element after scheduling dequeuing, and pushing the information into a scheduling result cache for storage.
7. The method of claim 1, wherein performing three differentiation schemes according to the congestion type identified by the congestion algorithm comprises: A dynamic discarding scheme, wherein if the congestion queue length is greater than the maximum threshold, the enqueue message is dynamically discarded according to the self-adaptive random early detection parameter and the congestion factor; In the off-chip caching scheme, if the congestion queue is a best effort flow, an enqueue message is transmitted to the off-chip cache, and the off-chip cache is taken out and sent when a link is idle; and (3) in the asynchronous dequeuing scheme, if the queue is a high-priority or delay sensitive stream, the queue is dequeued in an asynchronous way, the priority of the queue is dynamically improved according to the scheduling parameters and the congestion factors, and the sending time is shortened.
8. A reconfigurable computing-based dynamic congestion aware hierarchical scheduling device, comprising: the queue information management module is used for updating key information of the queue when the message is logically dequeued; the cache management module is used for being responsible for distributing and recycling the cache descriptor blocks to finish physical dequeuing when the enqueue and dequeue applications are received, and a chained shared cache structure is adopted; the queue scheduling module is used for mapping a scheduling algorithm to the shared computing unit, completing the computation of the priority and the sending time of the push element, maintaining the sub-list ordering and executing dequeuing; The congestion processing module is used for executing congestion detection during dequeuing and enqueuing according to the mapped congestion algorithm, and triggering packet loss, sending to off-chip cache or performing out-of-step dequeuing operation according to the flow type; and the DMA and configuration module is used for interacting the control plane and the data plane, completing configuration information analysis, reporting the collected scheduling state information to the control plane and completing off-chip buffer transmission of the best-effort flow.
9. The apparatus of claim 8, wherein the queue scheduling module further comprises: The basic calculation unit is used for supporting logic operation, and the calculation object comprises a queue length and a queue head message frame length by adjusting an operation mode through configuration instructions; The sorting module is used for finishing sorting of the elements in the priority sub-list and the sending time sub-list according to the calculation result, maintaining the highest priority and the minimum sending time in the sub-list through the ordered pointer, and positioning the pushing position of the elements; The dequeue assertion module is used for finding out a pointer meeting the minimum sending time according to the system reference time and completing dequeue of the element meeting the sending time and having the highest priority; And the asynchronous dequeuing module is used for searching the elements in the sub-list according to the queue numbers and dequeuing the elements when congestion occurs.
10. The apparatus of claim 8, wherein the congestion processing module further comprises: The congestion information identification module is used for detecting congestion state information, and comprises round trip delay of an dequeue, explicit congestion notification of an enqueue message and an adaptive random early detection factor obtained according to the length of the queue and a threshold value thereof; The algorithm mapping module is used for completing algorithm deployment of the reconfigurable computing unit according to the configuration instruction, packaging congestion key information into metadata, and disassembling the algorithm into basic logic calculation to complete calculation of congestion factors; And the congestion avoidance module is used for executing an off-chip caching or packet loss scheme for the low-priority congestion queue and executing an out-of-step dequeuing and priority regeneration scheme for the high-priority congestion queue.

Description

Dynamic congestion perception hierarchical scheduling method and device based on reconfigurable calculation Technical Field The invention relates to the technical field of communication network scheduling, in particular to a hierarchical quality of service scheduling method and device supporting hardware programmability, which are suitable for fine scheduling and congestion control of multi-user and multi-service traffic in high-speed network equipment. Background With the development of software-defined networking and network function virtualization technologies, network devices are evolving towards control and forwarding separation, and software and hardware decoupling. The broadband network gateway is used as a key device for broadband access of users, and needs to efficiently process tasks such as user authentication, access control, flow scheduling and the like, and particularly needs to support layered service quality capability under a large-scale user session. HQoS (Hierarchical Quality of Service) solve the above problems by a multi-level queue scheduling mechanism. The dual flow control is supported at the user level and the service level, the first level distinguishes users, and the second level distinguishes different services in the users, thereby realizing finer bandwidth allocation and delay control. This feature makes it particularly suitable for deployment on the user access side to cope with the scheduling needs of massive users and diversified services. However, there are still a number of problems with existing HQoS implementations. First, scheduling flexibility is insufficient, scheduling algorithms and congestion algorithms are usually solidified in hardware, and it is difficult to adapt to new services such as low-delay communication, dynamic demands of burst traffic, and changing algorithms of each node when a scheduling scene changes. And secondly, the resource allocation efficiency is low, the fixed hardware architecture is difficult to support a dynamically adjusted scheduling strategy, the allocation of a computing unit is stiff, and resources cannot be shared among nodes with different algorithm complexity. Furthermore, the current mainstream scheduling structure, such as PIFO, can only process about 1000 concurrent flows and only support simple scheduling strategies such as first-in first-out or priority, and cannot support non-working reservation algorithms such as wf2q+ that introduce latency, and cannot meet the requirements of modern data centers and high-speed networks. Some schemes propose monitoring the state of the HQoS queues of forwarding plane resources in real time through the control plane and dynamically adjusting the scheduling policy. However, these schemes still rely on predefined algorithms and do not enable real-time reconstruction of the scheduling logic at the hardware level. In addition, the traditional device mostly adopts simple packet loss or shaping when handling congestion, lacks dynamic mechanisms such as an off-chip caching scheme, priority remapping and the like, and is difficult to balance the link utilization rate and fairness. Some schemes utilize hierarchical level computation and admission control mechanisms through programmable pipeline architecture, enabling layered QoS effects on hardware using only a single queue, allowing users to customize message processing logic. However, when dealing with complex and variable network environments, especially in terms of supporting fine-grained, dynamically variable scheduling requirements, there is still a significant disadvantage in that hardware resources are generally statically allocated and fixedly bound to specific nodes, and the flexibility is lacking. The art further proposes Push-In-Extract-Out (PIEO) as a generalization of the PIFO primitive. PIEO not only support inserting elements into the ordered list in place based on a programmable ranking function, but it is more critical to introduce an assertion-based programmable filtering mechanism at dequeue that first screens out a subset of all eligible elements from the current ordered list, then selects the least ranked element from that subset for scheduling. This design enables PIEO to precisely control the timing and sequence of message scheduling at the same time, thereby supporting a wider class of scheduling algorithms, including those non-working economy-type algorithms that require dynamic screening of qualifying elements. Compared with PIFO, PIEO realizes higher expression capability and expandability which is more than 30 times in hardware, and provides a stronger hardware abstraction foundation for constructing a programmable scheduler which can meet the high-speed line speed processing requirement and flexibly adapt to complex scheduling strategies. However, PIEO cannot support simultaneous dequeuing, and only support monomer design and packet-by-packet ordering, and cannot match the requirement of high throughput of modern network