CN-121996171-A - Multi-port non-blocking out-of-order cache controller and control method

CN121996171ACN 121996171 ACN121996171 ACN 121996171ACN-121996171-A

Abstract

The invention provides a multi-port non-blocking out-of-order buffer controller, which adopts a front-end and back-end decoupling architecture, wherein the front end comprises a switching network, a global order-preserving buffer area and a distributed reordering buffer matrix, the back end comprises a plurality of processing nodes, the switching network receives access requests from multiple input channels and distributes the requests to corresponding processing nodes according to address mapping, the global order-preserving buffer area records the processing node identifications pointed by the distributed requests, the processing nodes schedule out-of-order based on the state of the distributed requests and perform read-write operation on actual data of the scheduled out-of-order requests to generate responses, the distributed reordering buffer matrix performs distributed reordering according to the reorder buffer identifications carried by the responses, and the global order-preserving buffer area takes out the responses according to the recorded processing node identifications from the corresponding queues of the distributed reordering buffer matrix.

Inventors

CUI XIAOXIN
ZHANG CHENGYI
WANG YUAN

Assignees

北京大学

Dates

Publication Date: 20260508
Application Date: 20260211

Claims (10)

1. A multi-port non-blocking out-of-order buffer controller is characterized in that the buffer controller adopts a front-end and back-end decoupling architecture, the front end comprises a switching network, a global order-preserving buffer area and a distributed reordering buffer matrix, the back end comprises a plurality of processing nodes, the switching network receives access requests from multiple input channels and distributes the requests to corresponding processing nodes according to address mapping, the global order-preserving buffer area records the processing node identification pointed by the distributed requests, the processing nodes schedule out-of-order based on the state of the distributed requests, read-write operation is carried out on actual data of the scheduled out-of-order requests, response is generated, the distributed reordering buffer matrix performs distributed reordering according to the reorder buffer identification carried by the response, and the global order-preserving buffer area sequentially takes out the response from the corresponding queue of the distributed reordering buffer matrix according to the recorded processing node identification.
2. The multi-port non-blocking out-of-order cache controller of claim 1, wherein the processing node integrates a directory access pipeline, an out-of-order dispatch engine, a data access pipeline and a direct memory access controller, wherein the directory access pipeline is used for performing state check on distributed requests to determine a data ready state, the direct memory access controller provides data ready condition information to the out-of-order dispatch engine, the out-of-order dispatch engine performs out-of-order dispatch dequeuing on requests meeting data ready conditions and submitting permissions simultaneously, and the data access pipeline performs read and write operations on actual data of the dequeued requests and generates responses.
3. The multi-port non-blocking out-of-order cache controller of claim 2, wherein the out-of-order scheduling engine comprises a reordering identifier allocation unit, a concurrent request merging unit, a quota management unit, a miss processing unit and a request scheduling unit, wherein the reordering identifier allocation unit is used for allocating a reordering buffer item identifier to which the request belongs, the concurrent request merging unit is used for judging an initial data ready condition of the request and realizing merging of a next-stage memory request, the quota management unit is used for judging whether the request has a commit permission or not according to the item identifier allocated to the request and monitoring of an idle state of a distributed reordering buffer matrix, the miss processing unit is used for temporarily storing the request which does not meet an execution condition and waking up the request which meets the data ready condition and the commit permission at the same time, and the request scheduling unit is used for scheduling the awakened request out-of-order.
4. The multi-port non-blocking out-of-order cache controller of claim 3, wherein the concurrent request merge unit is organized as a state flag matrix, each bit in the matrix identifying a miss state of a corresponding cache line.
5. The multi-port non-blocking out-of-order cache controller of claim 4 wherein the status flag matrix is queried when a request arrives at an out-of-order dispatch engine, the request is determined to be a "hit after miss" if the query result indicates that the target cache line is in a miss state, when a data ready condition is not satisfied.
6. The multi-port non-blocking out-of-order cache controller of claim 3, wherein the miss processing units are organized as a state machine queue, wherein each entry independently maintains a state machine for tracking lifecycle states of corresponding requests.
7. The multi-port non-blocking out-of-order cache controller of claim 3 wherein said request scheduling unit is configured to dequeue the awakened request out-of-order, and comprises performing a cyclic shift operation on all dequeued requests based on a bottom pointer by the request scheduling unit if there are a plurality of entries in the same period, and selecting an oldest request at a current time to dequeue by using a fixed priority arbitration policy, wherein the bottom pointer points to an entry which is immediately incremented after dequeuing is completed, so as to update a bottom position of the queue.
8. The multi-port non-blocking out-of-order buffer controller of claim 1, wherein the global order-preserving buffer is comprised of a plurality of independent first-in-first-out queues, the number of which corresponds to the number of input channels, each queue maintaining independent read-write pointers for recording and directing processing node routing information of requests.
9. The multi-port non-blocking out-of-order buffer controller of claim 8, wherein the design of independent read pointers of each queue is adopted inside the distributed reorder buffer matrix, responses after execution are pushed into free entries of reorder buffer queues in the distributed reorder buffer matrix corresponding to the responses according to channel and node identifiers to which the responses belong, and the responses are sequentially fetched from the read pointers of each queue and submitted by being sequentially guided by a global order-preserving buffer.
10. A multi-port non-blocking disordered cache control method is characterized by comprising the steps of receiving access requests from multiple input channels through a switching network, distributing the requests to corresponding processing nodes according to address mapping, tracking global sequences of the requests through a global order-preserving buffer area, performing disordered scheduling dequeuing on the processing nodes based on the states of the distributed requests, performing read-write operation on actual data of the scheduled dequeuing requests to generate responses, performing distributed reordering according to reordering buffer identifiers carried by the responses through a distributed reordering buffer matrix, and sequentially taking out the responses from the corresponding queues of the distributed reordering buffer matrix through the global order-preserving buffer area according to the recorded processing node identifiers.

Description

Multi-port non-blocking out-of-order cache controller and control method Technical Field The invention relates to the technical field of computer systems and on-chip storage systems, in particular to a multi-port non-blocking out-of-order cache controller and a control method. Background As the gap between processor and memory performance continues to expand, the "memory wall" problem has become a major bottleneck limiting computing system performance. To alleviate this problem, modern processors commonly employ cache technology. The core indicator that measures cache performance is the average memory latency, and the miss cost is a key factor affecting this indicator. The non-blocking out-of-order scheduling technology effectively reduces the miss cost and remarkably improves the system throughput rate by allowing the cache to continue to serve subsequent access requests when processing the miss requests, thereby becoming a core technology of high-performance cache design. However, existing non-blocking out-of-order caches still have limitations. Firstly, because the response is generated out of order, the main processor must integrate the reordering logic, so that the cache and the processor architecture need to adopt a tightly coupled design mode, thereby increasing the design complexity and reducing the module reusability, and secondly, when the 'miss hit' access is processed, the existing scheme generally adopts a blocking or request retransmission mechanism, thereby increasing the processing delay and restricting the further improvement of the performance. Disclosure of Invention The invention provides a multi-port non-blocking out-of-order buffer controller, which adopts a front-end and back-end decoupling architecture, wherein the front end comprises a switching network, a global order-preserving buffer area and a distributed reordering buffer matrix, the back end comprises a plurality of processing nodes, the switching network receives access requests from multiple input channels and distributes the requests to corresponding processing nodes according to address mapping, the global order-preserving buffer area records the processing node identifications pointed by the distributed requests, the processing nodes schedule out-of-order based on the state of the distributed requests, perform read-write operation on actual data of the scheduled out-of-order requests and generate responses, the distributed reordering buffer matrix performs distributed reordering according to the reorder buffer identifications carried by the responses, and the global order-preserving buffer area sequentially takes out the responses from the corresponding queues of the distributed reordering buffer matrix according to the recorded processing node identifications. Optionally, the processing node integrates a catalog access pipeline, an out-of-order scheduling engine, a data access pipeline and a direct memory access controller, wherein the catalog access pipeline is used for performing state check on the distributed requests and judging the data ready state of the distributed requests, the direct memory access controller provides data ready condition information to the out-of-order scheduling engine, the out-of-order scheduling engine performs out-of-order scheduling on the requests meeting the data ready condition and submitting permission at the same time, and the data access pipeline performs read-write operation on the actual data of the scheduled dequeue requests and generates responses. The out-of-order scheduling engine comprises a reordering identification allocation unit, a concurrent request merging unit, a quota management unit, a deletion processing unit and a request scheduling unit, wherein the reordering identification allocation unit is used for allocating reorder buffer item identifications to which the requests belong, the concurrent request merging unit is used for judging initial data ready conditions of the requests and realizing merging of next-stage memory requests, the quota management unit is used for judging whether the requests have submitting permissions according to the item identifications allocated by the requests and monitoring of idle states of a distributed reorder buffer matrix, the deletion processing unit is used for temporarily storing the requests which do not meet execution conditions and waking up the requests which meet the data ready conditions and submitting permissions at the same time, and the request scheduling unit is used for out-of-order scheduling the awakened requests. Optionally, the concurrent request merging unit is organized as a state flag matrix, each bit in the matrix identifying a miss state of a corresponding cache line. Optionally, when the request arrives at the out-of-order scheduling engine, the state flag matrix is queried, if the query result shows that the target cache line is in a missing state, the request is judged to be 'hit after miss', and the data rea