CN-121979576-A - RISC-V tensor instruction set expansion method and device based on intelligent processor

CN121979576ACN 121979576 ACN121979576 ACN 121979576ACN-121979576-A

Abstract

The invention provides an intelligent processor-based RISC-V tensor instruction set expansion method, which comprises the steps of introducing a special tensor configuration register set to store tensor meta-information and operand information, adopting a 64-bit fixed-length instruction coding format to code tensor expansion instructions, and setting uniformly addressed large-capacity chips in a processor core for tightly coupling storage to be used as a direct operation space of all tensor expansion instructions. The invention also provides a RISC-V tensor instruction set expansion device based on the intelligent processor, a storage medium and electronic equipment. Therefore, the invention can realize the RISC-V tensor instruction set expansion which is originally supported by high-dimensional tensor operation, is computationally intensive and is hardware-friendly.

Inventors

GUO QI
WEN YUANBO
WANG ZHE
XU GUANGLIN

Assignees

中国科学院计算技术研究所

Dates

Publication Date: 20260505
Application Date: 20260228

Claims (10)

1. An intelligent processor-based RISC-V tensor instruction set extension method, comprising: introducing a special tensor configuration register group to store tensor element information and operand information; encoding the tensor expansion instruction by adopting a 64-bit fixed-length instruction encoding format; and setting uniformly addressed large-capacity on-chip tightly-coupled storage in a processor core, and taking the large-capacity on-chip tightly-coupled storage as a direct operation space of all tensor expansion instructions.
2. The intelligent processor-based RISC-V tensor instruction set extension method of claim 1, wherein the tensor configuration register set is a dedicated state storage unit independent of a processor general purpose register, a floating point register, and comprises a plurality of sets of tensor meta information registers and a plurality of sets of operand information registers; The tensor meta information register is configured with a dimension size register and a dimension step size register, and the dimension size and/or the dimension step size of the tensor are respectively stored; The operand information register is configured with a convolution fill register and a convolution step size register, and the convolution fill register and the convolution step size register are respectively used for storing the convolution fill and/or the convolution step size of the operation.
3. The intelligent processor-based RISC-V tensor instruction set extension method of claim 1, wherein the 64-bit fixed length instruction encoding format includes a low 32-bit and a high 32-bit, the low 32-bit being aligned compatible with a standard RISC-V32-bit instruction encoding specification, the high 32-bit carrying the tensor operation extension encoded information.
4. The intelligent processor-based RISC-V tensor instruction set extension method of claim 3, wherein the decoding process of the 64-bit fixed length instruction encoding format employs compatibility processing logic: The processor instruction fetching unit acquires instructions according to 64-bit alignment; The decoder firstly checks the lowest two bits of the instruction, if a preset standard instruction special value is identified, the decoder decodes the instruction according to the standard RISC-V32 bit instruction and ignores the high 32 bits, and if a preset tensor extension special value is identified, the 64-bit instruction decoding process is started.
5. The intelligent processor-based RISC-V tensor instruction set extension method of claim 1, wherein the physical memory space of the uniformly addressed high-capacity on-chip tightly coupled storage is mapped to a predetermined area in the physical address space of the intelligent processor while supporting read-write access of standard RISC-V load/store instructions and the tensor extension instructions, the uniformly addressed high-capacity on-chip tightly coupled storage establishing connection with the tensor address generation unit and the plurality of tensor calculation units of the intelligent processor through a high-speed crossbar network.
6. The method according to claim 5, wherein the tensor address generating unit is provided with a micro state machine simulating four-layer nested loops, the tensor address generating unit receives the tensor expansion instruction in the decoding stage and the tensor descriptors read by the tensor configuration register set, completes each dimension loop iteration according to the designated operation size, calculates address offset by combining current each dimension index and corresponding step length in the innermost loop iteration, adds the address offset and the base address to generate a linear address stored in tight coupling on the uniformly addressed large capacity chip, and finally outputs the linear address sequence to a controller of tight coupling storage control on the uniformly addressed large capacity chip, and the tensor calculating unit executes memory access and calculation.
7. The method according to claim 6, wherein the tensor calculation unit is a dedicated parallel calculation array, the instruction interface is a tensor-level interface, when executing a tensor matrix multiply-add instruction, the tensor calculation unit configures a register set descriptor according to the tensor associated with the instruction, continuously reads a data block of a source operand from the tightly coupled storage of the uniformly addressed large capacity slices through the tensor address generation unit, executes a tensor block multiply-accumulate operation of c=a+b+c, and writes an operation result back to a destination operand region of the tightly coupled storage of the uniformly addressed large capacity slices in a streaming manner, wherein a and B are source operands, and C is a destination operand, and the tensor matrix multiply-add operation is started by a single tensor expansion instruction, and subsequent data handling and iterative calculation are automatically completed by hardware until the calculation of the entire data block is completed.
8. An intelligent processor-based RISC-V tensor instruction set extension device constructed based on the method of any one of claims 1-7, said device comprising: The special register module is used for introducing a special tensor configuration register set to store tensor element information and operand information; the instruction coding module is used for coding the tensor expansion instruction by adopting a 64-bit fixed-length instruction coding format; and the unified addressing module is used for setting the unified addressing high-capacity chip tightly coupled storage in the processor core and taking the unified addressing high-capacity chip tightly coupled storage as a direct operation space of all tensor expansion instructions.
9. A storage medium storing a computer program for performing the method of any one of claims 1 to 7.
10. An electronic device comprising a storage medium, a processor and a computer program stored on the storage medium and executable on the processor, characterized in that the processor implements the method of any one of claims 1-7 when executing the computer program.

Description

RISC-V tensor instruction set expansion method and device based on intelligent processor Technical Field The present invention relates to the field of computer technologies, and in particular, to a RISC-V tensor instruction set extension method and apparatus based on an intelligent processor, a storage medium, and an electronic device. Background With the rapid development of artificial intelligence technology, intelligent processors have become the core hardware for processing tensor intensive computing tasks such as deep learning. The instruction set design of the current intelligent processor is mainly divided into a closed source special instruction set and an open source RISC-V instruction set. The main stream closed source intelligent processor adopts a private special instruction set, and can be tightly combined with customized hardware to realize high performance, but an ecological system is closed, so that a software tool chain and an operator library need to be repeatedly developed for each generation of processor, and huge development cost and ecological fragmentation problems are brought. To build open unified ecology, the industry has explored intelligent processor extensions based on the open source RISC-V instruction set. However, there are significant limitations to the existing main solutions. RISC-V vector expansion (RVV) is mainly an accelerated vector operation design, which is based on vector registers with limited capacity, and has complex instruction flow, low computation density and native lack of tensor processing capability when processing high-dimensional tensor operations such as matrix multiplication. While RISC-V matrix extension (AME) is directly oriented to matrix multiplication, but has limited operation size and limited peak computing power, and an instruction set mainly supports matrix multiplication and a small amount of point-by-point operation, so that the complexity of instruction flow is high when processing more general and flexible tensor operation, and the comprehensive requirements of a high-performance intelligent processor cannot be met. The prior art has mainly the following problems and disadvantages. Firstly, the special instruction set of the closed source intelligent processor causes that the software ecology of the closed source intelligent processor cannot be reused, and repeated development and ecological barriers in the industry are caused. Secondly, the existing open source RISC-V instruction sets (such as RVV and AME) lack native abstraction and support for high-dimensional tensors, forcing developers to simulate high-dimensional operations by using low-dimensional instructions, and increasing the difficulty of programming and compiling optimization. Furthermore, the small register capacity of RISC-V vector expansion and the fixed operation size of RISC-V matrix expansion limit the data multiplexing rate, resulting in low computation density and restricting further increases in single-core peak computation power. Finally, describing complex tensor operations using existing instructions can result in lengthy instruction flows, while the rich meta-information of high-dimensional tensors can also make standard 32-bit instruction encoding space appear severely inadequate. In summary, it is clear that the prior art has inconvenience and defects in practical use, so that improvement is needed. Disclosure of Invention In view of the above-mentioned drawbacks, an object of the present invention is to provide a method, an apparatus, a storage medium and an electronic device for RISC-V tensor instruction set expansion based on an intelligent processor, which can implement RISC-V tensor instruction set expansion that natively supports high-dimensional tensor operation, is computationally intensive, and is hardware-friendly. In order to solve the technical problems, the invention is realized as follows: in a first aspect, an embodiment of the present invention provides a RISC-V tensor instruction set extension method based on an intelligent processor, including: introducing a special tensor configuration register group to store tensor element information and operand information; encoding the tensor expansion instruction by adopting a 64-bit fixed-length instruction encoding format; and setting uniformly addressed large-capacity on-chip tightly-coupled storage in a processor core, and taking the large-capacity on-chip tightly-coupled storage as a direct operation space of all tensor expansion instructions. According to the RISC-V tensor instruction set expansion method based on the intelligent processor, the tensor configuration register group is a special state storage unit independent of a general register and a floating point register of the processor and comprises a plurality of sets of tensor meta information registers and a plurality of sets of operation meta information registers; The tensor meta information register is configured with a dimension size regi