CN-122019419-A - Protocol request execution method, cache, computing device and computing system

CN122019419ACN 122019419 ACN122019419 ACN 122019419ACN-122019419-A

Abstract

The disclosure relates to a protocol request execution method, a cache, a computing device and a computing system, wherein the method is applied to the cache and comprises the steps of receiving a target protocol request sent by a target computing unit, distributing a corresponding target cache line for the target protocol request under the condition that a first source operand of the target protocol request is not in cache, executing a first step of reading the first source operand from a memory, and a second step of reading a second source operand of the target protocol request from a target cache area corresponding to the target protocol request in a storage buffer and writing the second source operand into the target cache line, executing protocol operation matched with the target protocol request on the first source operand and the second source operand under the condition that the first source operand is received, obtaining a protocol operation result of the target protocol request, and returning the protocol operation result to the target computing unit. In this way, the preparation time of two source operands of the protocol request can be realized, and the processing efficiency of the protocol request is improved.

Inventors

Request for anonymity
Request for anonymity

Assignees

上海壁仞科技股份有限公司

Dates

Publication Date: 20260512
Application Date: 20260408

Claims (11)

1. A method for executing a protocol request, the method being applied to a cache, the method comprising: Receiving a target protocol request sent by a target computing unit, wherein the target protocol request comprises address information of a first source operand and a second source operand; Under the condition that the first source operand misses the cache, a corresponding target cache line is allocated for the target protocol request; a first step of reading the first source operand from the memory according to the address information and a second step of reading the second source operand from a target cache area corresponding to the target specification request in a storage buffer and writing the second source operand into the target cache line are executed in parallel; Executing a reduction operation matched with the target reduction request on the first source operand and the second source operand under the condition that the first source operand is received, and obtaining a reduction operation result of the target reduction request; and returning the protocol operation result to the target calculation unit.
2. The method according to claim 1, wherein the method further comprises: identifying whether the target cache line is in an idle state; In the event that the target cache line is not in an idle state, identifying whether data in the target cache line is consistent with corresponding data in the memory, and under the condition that the data in the target cache line is consistent with the corresponding data in the memory, the target cache line is emptied.
3. The method according to claim 2, wherein the method further comprises: And under the condition that the data in the target cache line is inconsistent with the corresponding data in the memory, writing the data stored in the target cache line into the memory and releasing the space of the target cache line.
4. The method according to claim 1, wherein the method further comprises: Reading the second source operand from the target cache region and the first source operand from the cache if the first source operand hits the cache; and executing the reduction operation matched with the target reduction request on the first source operand and the second source operand to obtain a reduction operation result.
5. The method according to claim 1, wherein the method further comprises: And writing the protocol operation result into the target cache line.
6. The method according to claim 1, wherein the method further comprises: and notifying the storage buffer to release the target cache area under the condition that the second source operand is written into the target cache line.
7. The method according to any one of claims 1 to 6, further comprising: And executing other protocol requests in parallel during the second step.
8. A cache, the cache comprising: A request receiving circuit configured to receive a target specification request sent by the computing unit, wherein the target specification request includes address information of a first source operand and a second source operand; A cache control circuit configured to allocate a corresponding target cache line for the target specification request in the event that the target specification request misses the cache; Memory access circuitry arranged to perform a first step of reading from memory a first source operand matching the target specification request in dependence on address information of the target specification request; A request execution circuit configured to execute, when the memory access circuit executes the first step, a second step of reading a second source operand of the target specification request from a storage buffer corresponding to the target specification request and writing the second source operand into the target cache line; And the result return circuit returns the reduction operation result to the calculation unit.
9. The cache of claim 8, wherein the cache is a last level cache.
10. A computing device, the computing device comprising: a plurality of computing units; a cache, which is a cache according to claim 8 or 9, or which is arranged to perform the method according to any of claims 1 to 7, and An on-chip bus, the computing unit is coupled with the on-chip bus, and the on-chip bus is coupled to the cache.
11. A computing system comprising a control device for controlling the computing device to perform a computing task and the computing device of claim 10.

Description

Protocol request execution method, cache, computing device and computing system Technical Field The present disclosure relates to the technical field of caching, and more particularly, to a protocol request execution method, caching, computing device, and computing system. Background Computing devices represented by graphics processors (Graphics Processing Unit, GPUs) are widely used in the field of general computing, and have become core hardware in application scenarios such as artificial intelligence (ARTIFICIAL INTELLIGENCE, AI) and scientific computing. In such applications, instructions and data executed by a program are typically stored in memory, read and operated on by a computing unit (or computing core) within the computing device. With the processing speed of the computing unit far exceeding the access speed of the memory, the memory access has severely restricted the computing performance, and to alleviate this speed difference, the computing device typically integrates at least two levels of cache (cache) for temporarily storing recently accessed instructions and/or data. Because the access delay of the cache is far lower than that of the memory, the computing unit preferentially acquires the instruction and/or the data from the cache, and the processing efficiency of the computing device can be improved. In the design of a multi-level cache architecture, a final level cache (LAST LEVEL CACHE, LLC) is used as the last level of an on-chip cache system, is a key hub for connecting each private computing core cache with an off-chip high-bandwidth memory, and bears the core functions of multi-core data sharing, temporary cache and data interaction scheduling. In the actual operation flow of the artificial intelligent chip, a reduction operation is basic core operation which runs through the whole flow of model training and reasoning, and is commonly found in various AI operators such as feature aggregation, gradient accumulation, softmax normalization, batch normalization and the like. The core requirement of the reduction operation is that the source operands obtained by the respective operation of a plurality of parallel computing cores are summarized into a final stage buffer to complete the aggregation operation, and the conventional reduction logic comprises summation, average value calculation, maximum value taking, minimum value taking and the like, and finally, a single aggregation result is output for the call of a subsequent computing unit. In the prior art, the implementation mode of the final-stage cache reduction operation is insufficient in special temporary storage resources and limited in parallel processing capacity due to physical area constraint, cannot adapt to the operation requirements of multi-core parallelism, high throughput and low delay of an artificial intelligent chip, reduces the execution efficiency of a reduction operator, causes chain reaction, leads to the reduction of the utilization rate of the calculation power of the whole chip, increases the power consumption and increases the delay, and is difficult to meet the strict and caustic energy requirements of the current large-model and high-density AI calculation scene. Disclosure of Invention An object of an embodiment of the present disclosure is to provide a new technical solution for performing a protocol operation for a cache, so as to improve the execution efficiency of the protocol operation. According to a first aspect of the present disclosure, there is provided a protocol request execution method applied to a cache, the method comprising: Receiving a target protocol request sent by a target computing unit, wherein the target protocol request comprises address information of a first source operand and a second source operand; Under the condition that the first source operand misses the cache, a corresponding target cache line is allocated for the target protocol request; a first step of reading the first source operand from the memory according to the address information and a second step of reading the second source operand from a target cache area corresponding to the target specification request in a storage buffer and writing the second source operand into the target cache line are executed in parallel; Executing a reduction operation matched with the target reduction request on the first source operand and the second source operand under the condition that the first source operand is received, and obtaining a reduction operation result of the target reduction request; and returning the protocol operation result to the target calculation unit. Optionally, the method further comprises: identifying whether the target cache line is in an idle state; In the event that the target cache line is not in an idle state, identifying whether data in the target cache line is consistent with corresponding data in the memory, and under the condition that the data in the target cache line is consistent with the corre