CN-121984937-A - DQN-based on-chip network arbitration dynamic optimization method

CN121984937ACN 121984937 ACN121984937 ACN 121984937ACN-121984937-A

Abstract

An intelligent optimization method for on-chip network arbitration based on DQN belongs to the field of on-chip interconnection of computer architecture. In the invention, in the execution process of an application task, the arrival information of a data packet of an input port of the network-on-chip router is acquired in real time, the time density characteristic and the fluctuation characteristic reflecting the flow intensity and the time sequence fluctuation characteristic are extracted, and the operation state information such as the cache utilization rate, the port competition degree, the injection rate and the like of the router is combined to construct the multidimensional state description of the router. And presetting a plurality of candidate basic arbitration strategies, adopting a deep Q network reinforcement learning model, and dynamically selecting among the candidate arbitration strategies according to real-time router state information so as to select a better arbitration strategy under the constraint of communication delay and power consumption. Through continuous interactive learning with the network-on-chip operation environment, the method can adaptively match arbitration strategies under different load and flow scenes, effectively reduce communication delay and tail delay, and improve the communication efficiency and energy efficiency of the network-on-chip.

Inventors

FANG JUAN
JIN ZEKAI
YAN YIMING
LI YIDING
LI XIAOLIN
HE NAN

Assignees

北京工业大学

Dates

Publication Date: 20260505
Application Date: 20260210

Claims (6)

1. The DQN-based on-chip network arbitration dynamic optimization method is characterized by being applied to an arbitration control process of a network-on-chip router, and comprises the following steps of: Step 1, acquiring data packet arrival information of an input port of a network router on a chip, and calculating time sequence characteristics of the flow of the input port based on a data packet arrival time interval sequence, wherein the time sequence characteristics at least comprise time density characteristics used for representing the arrival tightness degree of the data packet and fluctuation characteristics used for representing the stability or the irregularity degree of the flow; Step 2, pre-configuring a plurality of candidate basic arbitration strategies for the network-on-chip router according to application load characteristics and traffic scenes, wherein the candidate basic arbitration strategies at least comprise a data packet priority arbitration strategy based on global waiting time and a specific type data packet priority arbitration strategy based on virtual channel division; Step 3, acquiring the running state information of the network-on-chip router in real time in the execution process of the application task, and constructing the flow time sequence characteristics and the running state information together into the state input of the reinforcement learning model; And step 4, dynamically selecting an arbitration strategy in the current state for the network-on-chip router in the candidate basic arbitration strategies according to the state input based on the reinforcement learning model so as to realize self-adaptive optimization between communication delay and system power consumption.
2. The method of claim 1, wherein the time density characteristic is determined based on a statistical property of arrival time intervals of adjacent packets within a predetermined clock period window and a packet arrival density, and is used for comprehensively describing a packet arrival frequency and an arrival rhythm.
3. The method of claim 1, wherein the fluctuation feature is determined by adopting different calculation modes according to the type of the functional unit connected with the router, the fluctuation feature is calculated based on the variation degree of the time interval sequence when the router is connected with the calculation type processing unit, and the fluctuation feature is calculated based on the multi-fractal statistic analysis result when the router is connected with the storage hierarchy unit.
4. The method of claim 1, wherein the router operation status information includes at least utilization of input port buffers, degree of contention of output ports, and packet injection rate of the router.
5. The method of claim 1, wherein the reinforcement learning model is capable of handling a high-dimensional state space and learning mappings between different states and arbitration policies through continuous interactions with a network-on-chip running process.
6. The method of claim 1, wherein the reinforcement learning model uses communication delay and system power consumption as joint optimization targets during training, and updates policy selection results according to feedback information after execution of an arbitration policy.

Description

DQN-based on-chip network arbitration dynamic optimization method Technical Field The invention belongs to the field of on-chip interconnection of computer architectures, and particularly relates to an on-chip network arbitration dynamic optimization scheme based on a DQN reinforcement learning method. Background With the development of applications such as artificial intelligence, big data analysis, high-performance computing and automatic driving, computing systems are gradually evolving from a single architecture centered on a general purpose processor to heterogeneous or even super heterogeneous computing systems with multiple computing units working cooperatively. Traditional single computing devices have faced significant bottlenecks in meeting multiple demands for high throughput, low latency, large capacity, and low power consumption at the same time. For this reason, heterogeneous systems on chip integrating CPU, GPU, NPU, FPGA, DSP and dedicated accelerators and other computing units are widely used. With the development of chip manufacturing processes and advanced packaging technologies, multi-die architecture makes it possible to integrate more types and numbers of computing units within the same package, and computing systems gradually exhibit a trend characterized by super-heterogeneous fusion. In a super heterogeneous system, different computing units differ significantly in terms of computing mode, access behavior, and communication requirements. The CPU is generally sensitive to communication latency and fairness, while the GPU and AI accelerator are more focused on throughput and bandwidth utilization, and the storage and cache hierarchy exhibits significant data sharing and access centrality. The behavior characteristics of the high heterogeneity enable the data communication to gradually become an important factor influencing the system performance and energy efficiency, and the architecture design is also changed from the pure improvement of the computing capacity to the collaborative optimization of computing, storage and communication, so as to promote the development direction of a 'storage-computing-network integrated fusion system'. The network on chip is used as a core communication infrastructure for connecting various computing and storage units, and plays a key role in the super-heterogeneous fusion system. The network-on-chip realizes high-efficiency communication among the multiple processing units through the extensible topology structure and the distributed router, and the running efficiency of the network-on-chip directly influences the transmission performance of data in the system. However, in the super heterogeneous scenario, the communication traffic generated by different types of computing units varies significantly in terms of bandwidth requirements, timing characteristics and real-time constraints, with the problem of resource contention in the network-on-chip being exacerbated. In the network on chip, the router bears the functions of data caching, forwarding and scheduling, wherein arbitration mechanisms such as virtual channel allocation, crossbar allocation and the like determine the passing sequence of competing data packets, and have direct influence on communication delay. The existing arbitration mechanism mostly adopts static priority, polling or fixed rule strategies, and is simple to realize, but usually only good in a specific load or single traffic mode. When the system load or the flow mode changes, the static strategies are difficult to adapt in time, and local congestion, resource utilization rate reduction and tail delay increase are easy to cause. Part of researches are carried out by introducing a dynamic arbitration method based on queue length, injection rate or flow type, and the arbitration process is adjusted, but the scheduling logic still depends on manual setting rules, so that the time sequence characteristics of complex flow are difficult to accurately describe, and the effective balance among multiple targets such as delay, throughput, power consumption and the like is difficult to realize. With the continuous increase of the scale and complexity of the super-heterogeneous fusion system, the decision space facing network on chip arbitration increasingly presents high-dimension and dynamic characteristics, and the traditional method based on experience and rules gradually exposes the problem of insufficient adaptability. In recent years, reinforcement learning is introduced into dynamic management and optimization research of network-on-chip due to the capability of autonomously learning decision strategy from environment interaction, and provides a new idea for complex and multi-objective scheduling problems. However, the related research in the prior art is mainly focused on the routing or macro resource allocation level, and the level of intellectualization and refinement of the internal arbitration mechanism of the router, which dir