CN-116400926-B - Scalar engine processing method and device oriented to artificial intelligent chip

CN116400926BCN 116400926 BCN116400926 BCN 116400926BCN-116400926-B

Abstract

The application relates to a scalar engine processing method and device for an artificial intelligence chip. The method comprises the steps that an upper module in a chip obtains an artificial neural network model to be deployed in the chip, the upper module converts the artificial neural network model based on an instruction set built in a scalar engine in the chip to obtain a plurality of target instructions corresponding to the artificial neural network model and sends the target instructions to the scalar engine, and the scalar engine executes the target instructions to realize compiling processing corresponding to the artificial neural network model in the chip. By adopting the method, the flexibility of the artificial intelligent chip in-chip compiling processing of the artificial neural network model can be improved.

Inventors

WANG ZHOU
YIN SHOUYI
WEI JINGCHUAN
HU YANG
HAN HUIMING
WEI SHAOJUN

Assignees

清华大学

Dates

Publication Date: 20260512
Application Date: 20230329

Claims (10)

1. An intra-chip compiling method for a chip, the method comprising: An upper module in the chip acquires an artificial neural network model to be deployed in the chip; The upper module carries out conversion processing on the artificial neural network model based on an instruction set built in a scalar engine in the chip to obtain a plurality of target instructions corresponding to the artificial neural network model and sends the target instructions to the scalar engine, wherein the instruction set comprises a vector calculation instruction set for realizing vector calculation, an artificial neural network calculation instruction set for realizing operation corresponding to a plurality of artificial neural networks and a cross-module scheduling special instruction set for the scalar engine to schedule other modules in the chip to execute target operation processing; the scalar engine executes the target instructions to realize compiling processing corresponding to the artificial neural network model in the chip; Determining a target type of operation processing corresponding to the target instruction currently executed by the scalar engine; if the target type is different from the type of the operation processing corresponding to the reconfigurable array in the scalar engine, carrying out reconfiguration processing on the reconfigurable array; Performing operation processing on the currently executed target instruction by using the reconfigurable array after the reconfiguration processing; the type of operation processing corresponding to the reconfigurable array at least comprises adding operation and multiplying operation.
2. The method of claim 1, wherein the vector calculation instruction set includes a plurality of functional instructions and a plurality of arithmetic instructions.
3. A method according to claim 1 or 2, wherein instruction sets built into the scalar engine are developed based on RISCV architecture, and each of the instruction sets is identified using a different field.
4. The method of claim 3, wherein the step of, The first bit interval of the target instruction is used for representing an instruction set to which the target instruction belongs; The second bit interval of the target instruction is used for representing a target register address; the third bit interval of the target instruction is used for representing whether the target register needs to be written and whether the source register needs to be read; The fourth bit interval of the target instruction is used for representing the first source register address; the fifth bit interval of the target instruction is used for representing a second source register address; the sixth bit interval of the target instruction is used to characterize the operation code of the target instruction.
5. The method according to claim 1, wherein the method further comprises: and if the operation processing corresponding to the target instruction currently executed by the scalar engine meets a preset adjustment condition, changing the flow line number of the target instruction executed by the scalar engine from a first level number to a second level number, wherein the second level number is smaller than the first level number.
6. The method of claim 1, wherein the reconfigurable array is reconfigured as add operations when the scalar engine processes add operations for performing computation of add operations; the reconfigurable array is reconfigured into a multiply operation when the scalar engine processes the multiply operation for performing computation of the multiply operation.
7. The method of claim 1, wherein the instructions in each instruction set are in a 32-bit encoded format.
8. An intra-chip compiling apparatus for use in a chip, the apparatus comprising: the acquisition module is used for acquiring an artificial neural network model to be deployed in the chip by an upper module in the chip; The conversion module is used for converting the artificial neural network model based on an instruction set built in a scalar engine in the chip by the upper module to obtain a plurality of target instructions corresponding to the artificial neural network model and sending the target instructions to the scalar engine; the instruction set comprises a vector calculation instruction set for realizing vector calculation, an artificial neural network calculation instruction set for realizing operation operations corresponding to various artificial neural networks and a cross-module scheduling special instruction set for the scalar engine to schedule other modules in the chip to execute target operation processing; the execution module is used for executing the target instructions by the scalar engine so as to realize compiling processing corresponding to the artificial neural network model in the chip; The method comprises the steps of determining a target type of operation processing corresponding to a target instruction currently executed by a scalar engine, if the target type is different from the type of operation processing corresponding to a reconfigurable array in the scalar engine, carrying out reconfiguration processing on the reconfigurable array, and carrying out operation processing on the target instruction currently executed by the reconfigurable array after reconfiguration processing, wherein the type of operation processing corresponding to the reconfigurable array at least comprises adding operation and multiplying operation.
9. A chip comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any of claims 1 to 7 when the computer program is executed.
10. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 7.

Description

Scalar engine processing method and device oriented to artificial intelligent chip Technical Field The application relates to the technical field of chips, in particular to a scalar engine processing method and device for an artificial intelligent chip. Background Currently, artificial intelligence chips have been applied to computational processing in the field of artificial neural networks, and with the continuous development of the field of artificial neural networks, the types of artificial neural networks have been increasing. The traditional artificial intelligent chip only often supports the application of calculation acceleration of a single neural network model, and if compiling and calculating are needed in the artificial intelligent chip for each artificial neural network model, the data flow, circuit arrangement and the like of the artificial intelligent chip are required to be correspondingly designed, so that the problem of lower flexibility in the aspect of on-chip compiling and calculating of the artificial neural network exists in the current artificial intelligent chip. Disclosure of Invention Based on this, it is necessary to provide a scalar engine processing method and device for an artificial intelligent chip, which can improve the flexibility of the artificial intelligent chip in compiling and processing an artificial neural network model on a chip. In a first aspect, the present application provides a method for intra-slice compiling a model. Applied to a chip, the method comprises the following steps: An upper module in the chip acquires an artificial neural network model to be deployed in the chip; The upper module performs conversion processing on the artificial neural network model based on an instruction set built in a scalar engine in the chip to obtain a plurality of target instructions corresponding to the artificial neural network model, and sends the target instructions to the scalar engine; the scalar engine executes the target instructions to implement a compilation process corresponding to the artificial neural network model within the chip. In one embodiment, the instruction set includes a vector calculation instruction set for implementing vector calculations, an artificial neural network calculation instruction set for implementing operation operations corresponding to a plurality of artificial neural networks, and a cross-module scheduling-specific instruction set for the scalar engine to schedule other modules within the chip to perform target operation processing. In one embodiment, the vector calculation instruction set includes a plurality of functional instructions and a plurality of arithmetic instructions. In one embodiment, the instruction sets built into the scalar engine are developed based on RISCV architecture, and each of the instruction sets is identified using a different field. In one embodiment, a first bit interval of the target instruction is used for representing an instruction set to which the target instruction belongs, a second bit interval of the target instruction is used for representing a target register address, a third bit interval of the target instruction is used for representing whether a target register needs to be written and whether a source register needs to be read, a fourth bit interval of the target instruction is used for representing a first source register address, a fifth bit interval of the target instruction is used for representing a second source register address, and a sixth bit interval of the target instruction is used for representing an operation code of the target instruction. In one embodiment, the method further comprises determining a target type of operation processing corresponding to the target instruction currently executed by the scalar engine, if the target type is different from the type of operation processing currently corresponding to a reconfigurable array in the scalar engine, performing reconfiguration processing on the reconfigurable array, and performing operation processing on the currently executed target instruction by using the reconfigurable array after reconfiguration processing, wherein the type of operation processing corresponding to the reconfigurable array at least comprises adding operation and multiplying operation. In one embodiment, the method further includes changing the number of pipeline stages of the scalar engine executing the target instruction from a first number of stages to a second number of stages if the operation process corresponding to the target instruction currently executed by the scalar engine satisfies a preset adjustment condition, wherein the second number of stages is smaller than the first number of stages. In a second aspect, the application further provides a device for compiling the model in the slice. Applied to a chip, the device comprises: the acquisition module is used for acquiring an artificial neural network model to be deployed in the chip by an up