CN-122019948-A - Data processing system and chip

CN122019948ACN 122019948 ACN122019948 ACN 122019948ACN-122019948-A

Abstract

The application provides a data processing system and a chip, which relate to the technical field of data processing, wherein the system comprises a main control unit, an instruction processing unit and a data processing unit; the main control unit is used for issuing a data processing instruction to the instruction processing unit, the instruction processing unit is used for analyzing the data processing instruction to determine an operator type indicated by the data processing instruction, converting the data processing instruction into a data processing micro-instruction of a matrix multiplication and addition operation dimension according to a mapping rule corresponding to the operator type under the condition that the operator type is a non-matrix multiplication and addition operation type, writing the data processing micro-instruction into a micro-instruction queue, and the data processing unit is used for executing data processing operation of the matrix multiplication and addition operation dimension according to the data processing micro-instruction in the micro-instruction queue. And the unified acceleration of various operators is realized, meanwhile, the complexity of chip hardware is effectively reduced, and the resource utilization rate and the overall calculation efficiency are improved.

Inventors

LI JUNWEN
CUI YUNFEI
MA RUPING
OUYANG PENG

Assignees

北京清微智能科技股份有限公司

Dates

Publication Date: 20260512
Application Date: 20260416

Claims (10)

1. A data processing system, comprising: The system comprises a main control unit, an instruction processing unit and a data processing unit; the main control unit is used for issuing a data processing instruction to the instruction processing unit; The instruction processing unit is used for analyzing the data processing instruction to determine an operator type indicated by the data processing instruction, converting the data processing instruction into a data processing micro-instruction of a matrix multiplication and addition operation dimension according to a mapping rule corresponding to the operator type under the condition that the operator type is a non-matrix multiplication and addition operation type, and writing the data processing micro-instruction into a micro-instruction queue; The data processing unit is used for executing data processing operation of matrix multiplication and addition operation dimension according to the data processing micro-instructions in the micro-instruction queue.
2. The system of claim 1, wherein the converting the data processing instruction into a data processing microinstruction of a matrix multiply add operation dimension and writing the data processing microinstruction into a microinstruction queue according to the mapping rule corresponding to the operator type comprises: According to the mapping rule corresponding to the operator type, mapping the data processing parameters in the data processing instruction into matrix dimension parameters of matrix multiplication and addition operation; according to the matrix dimension parameters and a preset matrix calculation scale, performing block processing on matrix multiplication and addition operation to determine block dimension information; And generating the data processing micro-instruction according to the block dimension information and the queue state of the micro-instruction queue, and writing the data processing micro-instruction into the micro-instruction queue.
3. The system of claim 2, wherein the mapping the data processing parameters in the data processing instruction to matrix dimension parameters of a matrix multiply-add operation according to the mapping rule corresponding to the operator type comprises: And under the condition that the operator type is common convolution, mapping the space size of the output feature map indicated in the data processing instruction to the number of rows of a left matrix, mapping the product of the number of input channels indicated in the data processing instruction and the convolution kernel size to the number of columns of the left matrix and the number of rows of a right matrix, and mapping the number of output channels indicated in the data processing instruction to the number of columns of the right matrix.
4. The system of claim 2, wherein the mapping the data processing parameters in the data processing instruction to matrix dimension parameters of a matrix multiply-add operation according to the mapping rule corresponding to the operator type comprises: And under the condition that the operator type is transposed convolution, mapping the space size of the output feature map indicated in the data processing instruction to the number of rows of a left matrix, mapping the product of the number of output channels indicated in the data processing instruction and the convolution kernel size to the number of columns of the left matrix and the number of rows of a right matrix, and mapping the number of input channels indicated in the data processing instruction to the number of columns of the right matrix.
5. The system of claim 2, wherein the mapping the data processing parameters in the data processing instruction to matrix dimension parameters of a matrix multiply-add operation according to the mapping rule corresponding to the operator type comprises: And under the condition that the operator type is convolution kernel gradient calculation, mapping the product of the convolution kernel size indicated in the data processing instruction and the number of input channels into the number of rows of a left matrix, mapping the product of the batch number indicated in the data processing instruction and the space size of the error map into the number of columns of the left matrix and the number of rows of a right matrix, and mapping the number of output channels indicated in the data processing instruction into the number of columns of the right matrix.
6. The system of claim 2, wherein the mapping the data processing parameters in the data processing instruction to matrix dimension parameters of a matrix multiply-add operation according to the mapping rule corresponding to the operator type comprises: Under the condition that the operator type is the left matrix transposition, mapping the original number of rows of the left matrix indicated in the data processing instruction into the number of columns of the left matrix in matrix multiplication and addition operation, and mapping the original number of columns of the left matrix indicated in the data processing instruction into the number of rows of the left matrix in matrix multiplication and addition operation; And under the condition that the operator type is the right matrix transposition, mapping the original number of rows of the right matrix indicated in the data processing instruction into the number of columns of the right matrix in matrix multiplication and addition operation, and mapping the original number of columns of the right matrix indicated in the data processing instruction into the number of rows of the right matrix in matrix multiplication and addition operation.
7. The system of claim 2, wherein the data processing microinstructions include load microinstructions, load flags, calculate microinstructions, and store microinstructions, wherein the microinstruction queues include load microinstruction queues, load flag queues, calculate microinstruction queues, and store microinstruction queues; The generating the data processing micro instruction according to the block dimension information and the queue state of the micro instruction queue, and writing the data processing micro instruction into the micro instruction queue comprises: Generating a load micro instruction and a load mark according to the block dimension information and the queue state of the load micro instruction queue, and writing the load micro instruction and the load mark into the load micro instruction queue and the load mark queue; Generating a calculation micro instruction according to the block dimension information and the queue state of the calculation micro instruction queue, and writing the calculation micro instruction into the calculation micro instruction queue; And generating the storage micro instruction according to the block dimension information and the queue state of the storage micro instruction queue, and writing the storage micro instruction into the storage micro instruction queue.
8. The system of claim 7, wherein the data processing unit comprises: the device comprises an on-chip storage unit, a loading control unit, a local data cache unit, a matrix operation unit and a storage control unit; the loading control unit is used for reading a loading micro instruction from the loading micro instruction queue and sending the data to be processed loaded from the on-chip storage unit according to the loading micro instruction to the local data cache unit; the local data caching unit is used for reading a loading mark from the loading mark queue, writing the received data to be processed into a local cache according to the loading mark, reading a calculation micro-instruction from the calculation micro-instruction queue, and sending matrix operation data read from the local cache according to the calculation micro-instruction to the matrix operation unit; The matrix operation unit is used for performing matrix multiplication and addition operation on the received matrix operation data to obtain a calculation result, reading a storage micro-instruction from the storage micro-instruction queue, and sending the calculation result to the storage control unit according to the storage micro-instruction; the storage control unit is used for writing the calculation result into the on-chip storage unit.
9. The system of claim 8, wherein the data processing unit further comprises: And the data post-processing unit is connected between the matrix operation unit and the storage control unit and is used for performing data processing operation on the calculation result of the matrix operation unit so as to send the post-processing data after processing to the storage control unit.
10. A chip comprising a data processing system according to claim 1 to 9.

Description

Data processing system and chip Technical Field The present application relates to the field of data processing technologies, and in particular, to a data processing system and a chip. Background An artificial intelligence (ARTIFICIAL INTELLIGENCE, AI) chip is core hardware dedicated to accelerating AI model computation. AI models such as deep neural networks relate to massive matrix operation and tensor operation in the training and reasoning process, and extremely high requirements are put on the parallel computing capacity of the chip. Currently, AI chips typically employ dedicated hardware units to handle different types of operators separately. Specifically, for different operator types such as convolution, full connection and the like, corresponding special computing units (such as a convolution engine, a matrix multiplication and addition unit and the like) are respectively integrated in a chip. At run-time, the system distributes the calculation task to the corresponding hardware unit for execution according to the operator type. However, the heterogeneous design can lead to the integration of multiple computing units on a chip, the chip area is obviously increased, a large number of hardware units are in an idle state when a specific model is operated, the resource utilization rate is low, and in addition, extra pipeline cost is introduced in the switching between operators, so that the overall computing efficiency is further restricted. Disclosure of Invention The application provides a data processing system and a chip, which are used for solving the problems of high hardware complexity, low resource utilization rate and limited calculation efficiency in the related technology, the data processing system of the application uniformly converts data processing instructions of different operators into micro instructions of matrix multiplication and addition operation dimension through an instruction processing unit according to mapping rules corresponding to operator types, and the data processing unit performs matrix operation in parallel based on the micro instruction queue, so that the unified acceleration of various operators is realized, the complexity of chip hardware is effectively reduced, and the resource utilization rate and the overall calculation efficiency are improved. An embodiment of a first aspect of the present application provides a data processing system, including a main control unit, an instruction processing unit, and a data processing unit; the main control unit is used for issuing a data processing instruction to the instruction processing unit; The instruction processing unit is used for analyzing the data processing instruction to determine an operator type indicated by the data processing instruction, converting the data processing instruction into a data processing micro-instruction of a matrix multiplication and addition operation dimension according to a mapping rule corresponding to the operator type under the condition that the operator type is a non-matrix multiplication and addition operation type, and writing the data processing micro-instruction into a micro-instruction queue; the data processing unit is used for processing the micro instructions according to the data in the micro instruction queue and executing the data processing operation of the matrix multiplication and addition operation dimension. In some embodiments of the present application, converting a data processing instruction into a data processing microinstruction of a matrix multiply-add operation dimension according to a mapping rule corresponding to an operator type includes mapping a data processing parameter in the data processing instruction into a matrix dimension parameter of the matrix multiply-add operation according to the mapping rule corresponding to the operator type, performing a block processing on the matrix multiply-add operation according to the matrix dimension parameter and a preset matrix calculation scale to determine block dimension information, and generating the data processing microinstruction according to the block dimension information and a queue state of a microinstruction queue. In some embodiments of the present application, mapping the data processing parameters in the data processing instruction to matrix dimension parameters of matrix multiply-add operation according to a mapping rule corresponding to an operator type includes mapping a spatial dimension of an output feature map indicated in the data processing instruction to a number of rows of a left matrix, mapping a product of a number of input channels indicated in the data processing instruction and a convolution kernel dimension to a number of columns of the left matrix and a number of rows of a right matrix, and mapping a number of output channels indicated in the data processing instruction to a number of columns of the right matrix, where the operator type is a normal convolution. In some embodiments of