CN-122019203-A - Multi-kernel system deadlock release structure and deadlock release method thereof
Abstract
A multi-core system deadlock relief structure and method of deadlock relief that introduces a looped-back data path in the multi-core system, which is a simple hard-wired connection, creating an internal path that starts at the output of the up input port buffer selected by multiplexer I and ends at the input of the local input port buffer selected by multiplexer II, which architecture allows stagnant data packets to be physically redirected without any additional data buffers as it multiplexes the buffers of the virtual channels that the local input ports have. The entire process is atomic and self-contained, requiring no complex inter-router protocols.
Inventors
- ZHOU WU
- BAO YIMIN
- NI TIANMING
- XU DONGYU
- XU CHENG
- LUO LE
- CHEN FULONG
Assignees
- 安徽师范大学
Dates
- Publication Date
- 20260512
- Application Date
- 20260113
Claims (10)
- 1. A multi-kernel system deadlock relief structure, the structure comprising: Each core particle consists of a boundary router and a conventional router, wherein the boundary router is a router connected with the substrate; The boundary router comprises an upward input port and a local input port, an IPLR controller connected with the upward input port and the local input port, the input end of the upward input port is connected with the substrate, the output end of the upward input port is connected with the input end of a multiplexer I, two output ends of the multiplexer I are respectively connected with a multiplexer II and an adjacent router, the other input end of the multiplexer II is connected with a network interface NI, the output end of the multiplexer II is connected with the input end of the local input port, and the output end of the local output port is connected with the adjacent router; The conventional router comprises a local input port, the input is connected to the network interface NI, and the output is connected to the adjacent router.
- 2. The multi-granule system deadlock relief structure of claim 1, wherein the up input port includes a demultiplexer and a multiplexer, there are a plurality of virtual channels between the demultiplexer and the multiplexer for storing data, and a block counter connected to the IPLR controller, the block counter for detecting a blocked virtual channel u in the up input port.
- 3. The multi-granule system deadlock relief structure of claim 1, wherein the local input port includes a demultiplexer and a multiplexer, there being a plurality of virtual channels between the demultiplexer and the multiplexer for storing data.
- 4. The multi-core system deadlock relief structure according to claim 1, wherein the network interface NI comprises an injection arbiter, a register I, an AND gate I connected to the register I and the injection arbiter, an output of the AND gate I being connected to a control of an injection port, the injection input port being connected to a multiplexer II through a register II in the border router.
- 5. The deadlock relief structure of a multichip system according to claim 1, wherein the control port of the IPLR controller, the upward input port, is connected to two inputs of an and gate II, the output of which is connected to a switch distributor SA.
- 6. The deadlock release method based on the deadlock release structure of the multichip system according to any one of claims 1 to 5, wherein when the head data and the body data of the blocking data are blocked at the same border router, the deadlock release method is specifically as follows: The main finite state machine FSM is switched from an IDLE state to a LOCK state based on a blocking signal stall [ u ] = 1, in the LOCK state, the IPLR controller sends out a signal stop=1, a signal mask=0 and a signal sel=0, an injection port stops injecting output data of the processing unit PE into a local input port when the signal stop=1, a switching distributor distributes data resources to an upward input port when the signal mask=0, and a multiplexer I outputs the output data of the upward output port to an adjacent router when the signal sel=0; when the existence of an idle virtual channel in a local input port is detected, and the time period for entering the LOCK state exceeds 2 clock cycles, the main finite state machine FSM is converted from the LOCK state to the XFER state, in the XFER state, the IPLR controller sends out a signal stop=1, a signal mask=1 and a signal sel=1, when the signal mask=1, the switch distributor stops distributing data resources to an upward input port, when the signal sel=1, the multiplexer I inputs data output by the upward output port into the local input port through the multiplexer II, the idle virtual channel v in the local input end is acquired, a signal read=u and a signal write=v are generated, and based on the signal read=u, the data flit currently read from the blocked virtual channel u in the upward input port is written into the virtual channel v based on the signal write=v; When the current read data flit of the blocked virtual channel u is detected to be the tail data flit, the FSM main finite state machine is converted from the XFER state to the IDLE state, the signals stop, mask and sel are set to 0 at the next clock edge, and the read-write operation is stopped.
- 7. The method for deadlock resolution of a multichip system of claim 6, wherein the up input port and the local input port complete reading and writing of a data flit, respectively, at a clock.
- 8. The deadlock release method based on the deadlock release structure of the multichip system according to any one of claims 1 to 5, wherein when body data of the blocking data is located at a border router but head data is located at other routers, the deadlock release method is specifically as follows: The main finite state machine FSM is switched from an IDLE state to a LOCK state based on blocking signal state [ u ] = 1, in the LOCK state, the IPLR controller sends out signal stop=1, signal mask=0, signal sel=0, when an IDLE virtual channel exists in a local input port and the time length of entering the LOCK state exceeds 2 clock cycles, the main finite state machine FSM is switched from the LOCK state to the SNOOP state, in the SNOOP state, the IDLE virtual channel v in the local input end is acquired, the IPLR controller sniffs the blocking virtual channel u, and latches critical session context thereof to a session state register SSR in an atomic manner, the main finite state machine FSM is switched from the SNOOP state to a PROXY state, in the PROXY state, the IPLR controller sends out signal stop=1, signal sel=1, when the signal mask=1, the main finite state machine FSM stops distributing data resources to an upward input port, when the signal sel=1, the multiplexer sends out signal stop I=1, the multiplexer sends out signal stop input port to the current virtual channel, and reads out signal stop signal I=1, and reads out the current virtual channel, and requests the current signal stop signal flow from the current virtual channel based on the current virtual channel, the current buffer, and the current signal stop state, the signal control buffer is switched from the buffer, and the current buffer is switched to the current buffer, and the current buffer state is read, and the current state. The IPLR controller then generates a read signal read_up=v, the data flits in the virtual channel v are sent to a cache virtual channel of the head data in the connection router one by one through a routing direction R until the IPLR controller detects the reading of the tail data flits, the main finite state machine FSM returns to an IDLE state from a PROXY state, a stop signal, a mask signal and a sel signal are set to 0 along the next clock edge, and the read-write operation is stopped; The key session context includes the routing direction of the header data at the router where the body data is located, the cached virtual channel of the header data at the connection router, and the remaining available space of the cached virtual channel.
- 9. The deadlock relieving method for a multicenter system according to claim 8, wherein a multiplexer III is arranged in front of a local input port and a switching distributor, an output end of the switching distributor is connected with an input end of an AND gate III and an input end of an AND gate IV, an output end of the AND gate III is connected with an IPLR controller, an output zhi of the AND gate IV is connected with the local input port, an output signal of the AND gate V is input into a control signal end of the multiplexer III, an input end of the AND gate III, and an input end of a non-input AND gate IV of an output signal of the AND gate V; When the output signal proxy=1 and the remaining available space is greater than zero, the and gate V outputs the signal 1 as a control signal of the multiplexer III, and outputs the signal 1 to the and gate III and the signal 0 to the and gate IV at the same time, the multiplexer III outputs a request signal of the IPLR controller based on the control signal 1, and when the switch distributor SA grants the request signal, the output signal grant_to_local=1 is given to the and gate III and the and gate IV, and the and gate III outputs a grant signal to the IPLR controller, and the and gate IV intercepts the grant signal returned from the switch distributor to the local input terminal.
- 10. The method for deadlock resolution of a multichip system of claim 8, wherein the up input port and the local input port complete reading and writing of a data flit, respectively, at a clock.
Description
Multi-kernel system deadlock release structure and deadlock release method thereof Technical Field The invention belongs to the technical field of core particle design, and particularly relates to a multi-core particle system deadlock release structure and a lock release method thereof. Background The relaxation of moore's law has driven Chiplet (core grain) based designs to be a key paradigm for building large scale, heterogeneous systems on chip (SoC). This approach effectively overcomes the yield and cost limitations of monolithic integration by integrating multiple smaller, independently designed die (dies) or die on a shared package. The key stone of Chiplet ecosystems is the design modularity, i.e., each core is developed, validated, and reused as an independent black box IP, independent of the global system environment in which it is to be deployed. However, this modularity is being challenged by the fundamental appearance of "integration-induced deadlock" (integration-induced deadlocks) in the interconnect network. While the network on chip (NoC) and the interposer (interposer) networks inside the die may each be designed to be deadlock free, connecting them through a Border Router (BR) may create a global resource dependency loop across multiple modules, as shown in fig. 1. Such deadlock is typically caused by a "up packet" (upward packet), a packet attempting to enter the core from the interposer, stalling in the border router BR, waiting for a resource inside the core that is in turn held by another packet waiting to leave the core. Solving these deadlocks without violating design modularity is a critical challenge. Deadlock recovery schemes allow greater freedom of routing, which is only handled when a deadlock is expected or detected. Upward packet ejection (UPP), a well-known restoration technique, detects stagnant upward packets and solves it using a complex multiplexer bypass protocol. This approach has the problems of high implementation complexity, long recovery delay, and typically requires an expensive active interposer. Disclosure of Invention The present invention provides a deadlock relief structure for a multi-kernel system, which aims to solve at least one of the above problems. The invention is realized in such a way that a multi-kernel system deadlock relief structure comprises: Each core particle consists of a boundary router and a conventional router, wherein the boundary router is a router connected with the substrate, and the conventional router is connected with other conventional routers or the boundary router; The boundary router comprises an upward input port and a local input port, an IPLR controller connected with the upward input port and the local input port, the input end of the upward input port is connected with the substrate, the output end of the upward input port is connected with the input end of a multiplexer I, two output ends of the multiplexer I are respectively connected with a multiplexer II and an adjacent router, the other input end of the multiplexer II is connected with a network interface NI, the output end of the multiplexer II is connected with the input end of the local input port, and the output end of the local output port is connected with the adjacent router; The conventional router comprises a local input port, the input is connected to the network interface NI, and the output is connected to the adjacent router. Further, the upward input port comprises a demultiplexer and a multiplexer, a plurality of virtual channels for storing data exist between the demultiplexer and the multiplexer, and a blocking counter is connected with the IPLR controller and used for detecting the blocking virtual channel u in the upward input port. Further, the local input port includes a demultiplexer and a multiplexer, and a plurality of virtual channels for storing data exist between the demultiplexer and the multiplexer. Further, the network interface NI comprises an injection arbiter, a register I, an AND gate I connected with the register I and the injection arbiter, wherein the output end of the AND gate I is connected with the control end of the injection port, and the injection input port is connected with the multiplexer II through a register II in the boundary router. Further, the control ports of the IPLR controller and the upward input port are connected with two input ends of an AND gate II, and the output end of the AND gate II is connected with a switching distributor SA. The invention is realized in such a way that a deadlock relieving method based on a multi-kernel system deadlock relieving structure is realized, when head data and body data of blocking data are blocked in the same boundary router, the deadlock relieving method comprises the following steps: The main finite state machine FSM is switched from an IDLE state to a LOCK state based on a blocking signal stall [ u ] = 1, in the LOCK state, the IPLR controller sends out a signal stop=1, a