CN-121996574-A - Cache device, operation method, electronic equipment and artificial intelligent processor

CN121996574ACN 121996574 ACN121996574 ACN 121996574ACN-121996574-A

Abstract

Embodiments of the present disclosure provide a cache device, an operating method, an electronic device, and an artificial intelligence processor. The cache device comprises a receiving module, a scheduling module and an executing module. The receiving module is configured to receive a plurality of requests, the scheduling module comprises a plurality of waiting queues, the scheduling module is configured to store the plurality of requests into the plurality of waiting queues respectively and correspondingly based on request types of the plurality of requests, the plurality of waiting queues comprise a read-write request queue and a calculation request queue, the executing module comprises a plurality of executing pipelines respectively coupled with the corresponding waiting queues in the plurality of waiting queues, the plurality of executing pipelines further comprise a read-write pipeline and a calculation pipeline, the read-write pipeline is configured to execute the read-write requests transmitted from the read-write request queue, and the calculation pipeline is configured to execute the calculation requests transmitted from the calculation request queue. The cache device can improve the calculation efficiency of the general-purpose graphics processor.

Inventors

Request for anonymity
Request for anonymity

Assignees

上海壁仞科技股份有限公司

Dates

Publication Date: 20260508
Application Date: 20260403

Claims (18)

1. A cache device, characterized in that, the cache device includes: A receiving module configured to receive a plurality of requests; a scheduling module including a plurality of waiting queues and configured to store the plurality of requests respectively and correspondingly in the plurality of waiting queues based on request types of the plurality of requests, wherein the plurality of waiting queues include a read-write request queue and a calculation request queue, and The execution module comprises a plurality of execution pipelines, wherein the execution pipelines are respectively coupled with the corresponding waiting queues in the waiting queues and comprise a read-write pipeline and a calculation pipeline, Wherein the read-write pipeline is configured to execute read-write requests transmitted from the read-write request queue, and the compute pipeline is configured to execute compute requests transmitted from the compute request queue.
2. The cache apparatus of claim 1, wherein the read-write request queues include a read-write request main queue and a read-write request side queue, wherein the read-write pipeline includes a read-write request main pipeline and a read-write request side pipeline, Wherein the scheduling module is further configured to: transmitting a first read-write request in response to the read-write request main queue, allocating a second read-write request having the same access address as the first read-write request to the read-write request side queue, or And responding to the read-write request main queue to transmit a first read-write request, wherein the read-write request side queue comprises a third read-write request, the third read-write request is not transmitted in the current clock cycle, and is not distributed to the read-write request side queue, and the access address corresponding to the third read-write request is different from that of the first read-write request.
3. The cache arrangement of claim 2, wherein the scheduling module is further configured to transmit the second read-write request in response to the second read-write request passing a resource check after the read-write request master queue transmits a first read-write request.
4. A caching apparatus as claimed in claim 3, wherein said resource check comprises checking whether said second read-write request has an access conflict to a memory location with other requests transmitted in a current clock cycle.
5. The cache device of claim 2, wherein the read-write request side queue is configured to transmit the second read-write request at least one clock cycle after the read-write request main queue transmits a first read-write request.
6. The caching apparatus of claim 2, wherein, the caching apparatus further includes a storage module, Wherein the execution module comprises a write data cache and a credit cache, The read-write request pipeline is configured to: Accessing the memory module to read target data in response to the first read-write request being a read request and providing the target data to the credit buffer, or And responding to the first read-write request as a write request, taking out source data from the write data cache, and writing the source data into the storage module.
7. The cache memory device of claim 6, wherein the memory module comprises a first memory location or a second memory location, The read-write request pipeline is further configured to access the first storage unit or the second storage unit according to a parity of a cache line number to perform the read request or the write request.
8. The caching apparatus of claim 6, wherein the execution module further comprises a forwarding register, The read-write request pipeline is further configured to write the target data or the source data to the forwarding register in response to the first read-write request being a read request or a write request; The read-write request secondary pipeline is further configured to: Reading first data from the forwarding register in response to the second read-write request being a read request, or And responding to the second read-write request as a write request and the first read-write request as a write request, acquiring the source data from the forwarding register, merging the source data with the write data of the second read-write request, and writing the merged data into the storage module.
9. The caching apparatus of claim 2, wherein the scheduling module is further configured to transmit the master queue of read and write requests higher than the slave queue of compute requests, the master queue of compute requests higher than the slave queue of read and write requests.
10. The caching apparatus of claim 1, wherein the execution module further comprises a plurality of compute units and an arithmetic logic unit cache, The computing pipeline is further configured to fetch first source data from the arithmetic logic unit cache, enter a corresponding computing unit of the plurality of computing units according to the operation type corresponding to the computing request, execute corresponding computing operation, and write a corresponding computing result into the storage module.
11. The caching apparatus of claim 10, wherein the computing pipeline is further configured to read second source data from the storage module, the second source data being computationally manipulated with the first source data.
12. The caching apparatus of any one of claims 1-11, further comprising: a memory module comprising a plurality of cache lines; A hit detection module configured to adjust state information of a target cache line in the storage module in response to a target read-write request in the plurality of requests, and provide the target read-write request to the scheduling module according to a request type of the plurality of requests, or provide the target read-write request to the scheduling module in response to a target read-write request in the plurality of requests not hitting the target cache line in the storage module; The scheduling module is further configured to respond to the target read-write request not hitting the target cache line in the storage module, send a read request to a memory to acquire target data corresponding to the target read-write request, and write the target data into the target cache line.
13. The caching apparatus of claim 12, wherein the scheduling module further comprises: a request cache configured to register at least one request subject to hit detection.
14. The caching apparatus of claim 12, wherein, The scheduling module is further configured to write the data in the target cache line to the memory prior to operating the target cache line in accordance with the target read-write request in response to the data in the target cache line having been modified and not yet written to the memory.
15. The caching apparatus of any one of claims 1-11, further comprising: and a request return module configured to provide a corresponding request response to the target request source over the on-chip bus in response to completion of execution of the target request of the plurality of requests.
16. A cache operation method applied to a cache device, wherein the cache device comprises a plurality of execution pipelines and a plurality of waiting queues, the execution pipelines are respectively coupled with the corresponding waiting queues in the waiting queues, and the execution pipelines comprise a read-write pipeline and a calculation pipeline; The cache operation method comprises the following steps: receiving a plurality of requests; Storing the plurality of requests in a plurality of waiting queues based on the request types of the plurality of requests, wherein the plurality of waiting queues comprise a read-write request queue and a calculation request queue, respectively, and Executing, by the read-write pipeline, read-write requests transmitted from the read-write request queue, and executing, by the compute pipeline, compute requests transmitted from the compute request queue.
17. An electronic device comprising the caching apparatus of any one of claims 1-15.
18. An artificial intelligence processor comprising the caching apparatus of any one of claims 1-15.

Description

Cache device, operation method, electronic equipment and artificial intelligent processor Technical Field Embodiments of the present disclosure relate to the field of integrated circuits, and more particularly, to a cache device and an operating method, an electronic device, and an artificial intelligence processor. Background In a general purpose graphics processor, the Last Level Cache (LLC) is the lowest level global cache on chip, and in order to alleviate general purpose registers and shared memory shortages, data reduction, atomic operations, and the like can be put down to LLC implementations for in-memory/near-memory computation, thereby reducing data handling overhead. However, after the LLC integrates the computing function, the pipeline is deepened and logic is complicated, which causes a plurality of problems. Disclosure of Invention At least one embodiment of the present disclosure provides a cache device including a receiving module, a scheduling module, and an executing module. The receiving module is configured to receive a plurality of requests, the scheduling module comprises a plurality of waiting queues, and the scheduling module is configured to store the plurality of requests into the plurality of waiting queues respectively and correspondingly based on request types of the plurality of requests, wherein the plurality of waiting queues comprise a read-write request queue and a calculation request queue, and the executing module comprises a plurality of executing pipelines, wherein the plurality of executing pipelines are respectively coupled with the corresponding waiting queues in the plurality of waiting queues, the plurality of executing pipelines comprise a read-write pipeline and a calculation pipeline, and the read-write pipeline is configured to execute the read-write requests transmitted from the read-write request queue, and the calculation pipeline is configured to execute the calculation requests transmitted from the calculation request queue. For example, in a cache device provided in at least one embodiment of the present disclosure, the read-write request queue includes a read-write request main queue and a read-write request side queue, and the read-write pipeline includes a read-write request main pipeline and a read-write request side pipeline. The scheduling module is further configured to transmit a first read-write request to the read-write request side queue in response to the read-write request main queue, and allocate a second read-write request with the same access address as the first read-write request to the read-write request side queue, or transmit a first read-write request to the read-write request side queue in response to the read-write request main queue, wherein the read-write request side queue comprises a third read-write request which is not transmitted in the current clock cycle and is not allocated to the read-write request side queue, and the access address corresponding to the third read-write request is different from the first read-write request. For example, in a cache device provided in at least one embodiment of the present disclosure, the scheduling module is further configured to transmit the second read-write request in response to the second read-write request passing a resource check after the read-write request master queue transmits the first read-write request. For example, in a cache device provided in at least one embodiment of the present disclosure, the resource check includes checking whether the second read-write request has an access conflict with other requests transmitted in the current clock cycle to a memory location. For example, in a cache device provided in at least one embodiment of the present disclosure, the read-write request side queue is configured to transmit the second read-write request at least one clock cycle after the read-write request main queue transmits the first read-write request. For example, in a cache device provided in at least one embodiment of the present disclosure, the cache device further includes a storage module, where the execution module includes a write data cache and a credit cache. The read-write request pipeline is configured to access the memory module to read target data and provide the target data to the credit cache in response to the first read-write request being a read request, or to fetch source data from the write data cache in response to the first read-write request being a write request, and to write the source data to the memory module. For example, in a cache device provided in at least one embodiment of the present disclosure, the memory module includes a first memory location or a second memory location, and the read-write request pipeline is further configured to access the first memory location or the second memory location according to a parity of a cache line number to execute the read request or the write request. For example, in the cache device provided in at l