CN-121704906-B - Artificial intelligence chip, method for executing host issuing request, computing device, medium and program product

CN121704906BCN 121704906 BCN121704906 BCN 121704906BCN-121704906-B

Abstract

The present invention relates to an artificial intelligence chip, a method for executing a host issuing request, a computing device, a computer readable storage medium and a computer program product. The artificial intelligence chip comprises a computing core, a target storage unit, a multi-stage bypass unit and a multi-stage bypass unit, wherein the computing core is configured to execute a request issued by a host, the target storage unit is configured to store update data associated with the request, the multi-stage bypass unit is arranged in an on-chip area adjacent to the computing core and is configured to backup the update data associated with a previous request into a predetermined stage of bypass units in the multi-stage bypass unit, and in response to determining that the current request to be executed and the previous request have data dependence, the backed-up update data of the previous request is acquired from the corresponding stage of bypass units of the previous request with the data dependence for executing the current request. When the present invention is faced with the data dependence of the current request and the previous request, the request processing pipeline can work uninterruptedly, and the processing efficiency of the chip is obviously improved.

Inventors

Request for anonymity
Request for anonymity
Request for anonymity

Assignees

上海壁仞科技股份有限公司

Dates

Publication Date: 20260505
Application Date: 20260214

Claims (17)

1. An artificial intelligence chip, comprising: A computing core configured to execute a request issued by a host; A target storage unit configured to store update data associated with the request; And in response to determining that the current request to be executed and the previous request have data dependencies, acquiring the updated data of the previous request, which has been backed up, from the bypass units of the corresponding stages of the previous request having data dependencies, for executing the current request.
2. The artificial intelligence chip of claim 1, wherein the number of stages of the multi-stage bypass unit is equal to a difference between the number of clock cycles of the compute core to read data and update data for the target memory cell.
3. The artificial intelligence chip of claim 1, wherein the bypass unit of each of the plurality of stages of bypass units comprises: A bypass flag module configured to store flag information of a bypass unit of a present stage, the flag information indicating address information associated with update data stored by the bypass unit of the present stage and validity state information indicating a valid state of the update data stored by the bypass data storage module; A bypass data storage module configured to store update data stored by the bypass unit of the present stage; and a comparator configured to compare the address information of the current request with the address information in the bypass tag module.
4. The artificial intelligence chip of claim 1, wherein the multi-stage bypass units are further configured to backup the update data already backed up in the bypass units of the corresponding stage in the multi-stage bypass units to the bypass units of the stage next to the corresponding stage while backing up the update data associated with the previous request to the bypass units of the predetermined stage in the multi-stage bypass units.
5. The artificial intelligence chip of claim 1, wherein the multi-level bypass unit is further configured to obtain update data of the current request from the target storage unit for executing the current request in response to determining that the current request to be executed is not data dependent from a previous request.
6. A method for executing a host-issued request, comprising: Storing update data associated with a previous request issued by a host to a target storage unit, and backing up the update data associated with the previous request to a preset-stage bypass unit in a multi-stage bypass unit, wherein the multi-stage bypass unit is configured in an on-chip area adjacent to a computing core; determining whether there is a data dependency between a current request to be executed and a previous request, and In response to determining that the current request to be executed has a data dependency with the previous request, updated data of the previous request that has been backed up is obtained from a bypass unit of a corresponding stage of the previous request that has the data dependency for executing the current request.
7. The method of claim 6, wherein the bypass units of each of the plurality of stages store at least tag information and update data, the tag information including validity state information indicating the update data stored in the bypass unit of the present stage and address information regarding the update data.
8. The method of claim 6, wherein the bypass cell of the predetermined stage is a bypass cell of a first stage of the plurality of stages of bypass cells.
9. The method as recited in claim 6, further comprising: Update data in a bypass unit of a corresponding stage in the multi-stage bypass units is backed up again to a bypass unit of a stage next to the corresponding stage while update data associated with a previous request is backed up to a bypass unit of a predetermined stage in the multi-stage bypass units.
10. The method of claim 6, wherein the bypass unit being configured in an on-chip area adjacent to a compute core comprises: based on the difference between the number of clock cycles of the computing core for the read data and the update data of the target memory cell, configuring a multi-stage bypass cell in an on-chip area adjacent to the computing core, the number of stages of the multi-stage bypass cell being equal to the difference.
11. The method of claim 6, wherein determining whether a data dependency exists between a current request and a previous request comprises: Via a comparator included in the bypass unit of the corresponding stage, it is determined whether the address information of the current request is the same as the address information in the bypass unit of the stage corresponding to the previous request, which is stored by the bypass flag module in the bypass unit of the stage corresponding to the previous request.
12. The method of claim 6, wherein in response to determining that the current request to be executed and the previous request have a data dependency, obtaining updated data for the previous request that has been backed up from a bypass unit of a corresponding stage of the previous request that has the data dependency comprises: In response to determining that the current request has a data dependency with the previous request, updated data for the previous request that has been backed up is obtained from the bypass unit of the first stage.
13. The method of claim 6, wherein in response to determining that the current request to be executed and the previous request have a data dependency, obtaining updated data for the previous request that has been backed up from a bypass unit of a corresponding stage of the previous request that has the data dependency comprises: determining whether a current request and a last request have data dependencies in response to determining that the current request and the last request do not have data dependencies, the last request being a neighbor request prior to the last request, and In response to determining that the current request has a data dependency with the last request, updated data for the last request that has been backed up is obtained from the bypass unit of the second stage.
14. The method as recited in claim 6, further comprising: In response to determining that the current request to be executed is not data dependent from the previous request, update data of the current request is obtained from the target storage unit for executing the current request.
15. A computing device, comprising: At least one actuator, and A memory communicatively coupled to the at least one actuator, wherein The memory stores instructions executable by the at least one actuator to enable the at least one actuator to perform the method of any one of claims 6-14.
16. A computer readable storage medium, characterized in that it has stored thereon a computer program which, when executed by a machine, performs the method according to any of claims 6-14.
17. A computer program product comprising a computer program which, when executed by a machine, performs the method according to any of claims 6-14.

Description

Artificial intelligence chip, method for executing host issuing request, computing device, medium and program product Technical Field Embodiments of the present invention relate generally to the field of artificial intelligence and, more particularly, relate to an artificial intelligence chip, a method for executing a host issuing request, a computing device, a computer readable storage medium, and a computer program product. Background The efficiency of request processing by an artificial intelligence chip (e.g., without limitation, a general purpose graphics processor) determines chip performance. Artificial intelligence chips typically employ a pipelined architecture, which guarantees that a processing unit can process requests issued by a host (e.g., CPU) without interruption, is one of the core elements that improves chip performance. It should be appreciated that conventional artificial intelligence chips include multi-level cache (cache) systems. For example, cache systems within a compute core (core) typically store data based on static random access memory (Static Random Access Memory, SRAM). Taking a level one cache (L1 cache) as an example, data in the computing core is mainly stored in the L1 cache. The computing core needs to constantly read data from the SRAM and write new data to the SRAM after several clock cycles. In the conventional method for issuing a request by an execution host, when a data dependency (hash) condition exists in the case of a request facing an adjacent Pipeline, a subsequent request cannot obtain the latest data of a preamble request, and a Pipeline (Pipeline) is required to be interrupted, and the subsequent request is processed after waiting for the completion of the preamble request. For example, if there is a data dependency between the request 0 and the request 1, for example, both read the data of the address a, when the request 1 uses the data of the address a in the static random access memory (Static Random Access Memory, SRAM) during the clock cycle number (N is a natural number), the request 0 updates the data of the address a in the SRAM only during the clock cycle number n+2, so the request 1 cannot acquire the data of the address a in the latest SRAM before the clock cycle number n+2, and the request 1 needs to be executed continuously after waiting for the completion of writing the request 0 into the SRAM, which results in pipeline interruption, and difficulty in uninterrupted operation. Therefore, the conventional method greatly affects the processing efficiency of the chip in the face of the data dependency existing between requests. In summary, the conventional artificial intelligent chip and the method for executing the host to issue the request have the defects that when data dependency exists between the requests, the request processing pipeline is difficult to work uninterruptedly, and the processing efficiency of the chip is further affected. Disclosure of Invention The invention provides an artificial intelligent chip, a method for executing a request issued by a host, a computing device, a computer readable storage medium and a computer program product, which can enable a request processing pipeline to work uninterruptedly even when data dependency exists between a current request and a previous request, and remarkably improve the processing efficiency of the chip. According to a first aspect of the present invention, an artificial intelligence chip is provided. The artificial intelligence chip comprises an artificial intelligence chip, a target storage unit, a multi-stage bypass unit and a multi-stage bypass unit, wherein the artificial intelligence chip comprises a computing core configured to execute requests issued by a host, the target storage unit is configured to store update data associated with the requests, the multi-stage bypass unit is arranged in an on-chip area adjacent to the computing core and is configured to backup the update data associated with a previous request into a predetermined stage of bypass units in the multi-stage bypass unit, and in response to determining that the current request to be executed and the previous request have data dependence, the backed-up update data of the previous request is acquired from the bypass units of corresponding stages of the previous request with the data dependence for executing the current request. According to a second aspect of the present invention there is also provided a method for executing a request issued by a host, the method comprising, while storing update data associated with a previous request issued by the host to a target storage unit, backing up the update data associated with the previous request in a predetermined level of bypass units in a plurality of levels of bypass units, the plurality of levels of bypass units being arranged in an on-chip area adjacent to a computing core, determining whether there is a data dependency between a current request to be executed