KR-20260062617-A - The method for performing operations in a processing unit of a systolic array structure, and the processing unit that performs the operations
Abstract
The present invention relates to a method for performing operations on a systolic array. A method for performing operations in a systolic array structure comprising a plurality of operation units according to one embodiment of the present invention may include the steps of receiving a plurality of matrix data for operations to be performed sequentially, performing a first operation on the input plurality of matrix data, expanding the result of the first operation on the matrix data to a predetermined size, and transmitting the expanded result of the first operation as an input for a second operation. The present invention has the effect of reducing precision loss and significantly improving the accuracy of operations by arbitrarily expanding the size of the mantissa in floating-point operations.
Inventors
- 양성원
- 김지원
- 최영근
Assignees
- 수퍼게이트 주식회사
Dates
- Publication Date
- 20260507
- Application Date
- 20241029
Claims (12)
- A method of operation performed in an arithmetic device having a systolic array structure including a plurality of operation units, A step of receiving multiple matrix data for operations performed sequentially; A step of performing a first operation on the input plurality of matrix data; A step of expanding the result of a first operation of the above matrix data to a predetermined size; A step comprising transmitting the above-mentioned extended first operation result as the input to a second operation, Operation method.
- In Article 1, The above-mentioned expanding step is, Expanding the mantissa portion of the first operation result in floating-point form to a predetermined size, Operation method.
- In Article 2, The predetermined size is determined according to the form of the operation, Operation method.
- In Article 2, The predetermined size is determined by the range of values within the matrix data, Operation method.
- In Article 2, The predetermined size is determined according to the size of the above-mentioned systolic array, Operation method.
- In Article 1, The method further includes a step of rounding the first operation result. The above-mentioned rounded first operation result is transmitted as the input to the second operation. Operation method.
- processor, and It includes a memory that communicates with the above processor, and The above memory stores instructions that cause the processor to perform operations, and The above operations are, An operation of receiving multiple matrix data for operations performed sequentially, The operation of performing a first operation on the plurality of matrix data input above, The operation of expanding the result of the first operation of the above matrix data to a predetermined size, The operation of transmitting the above-mentioned extended first operation result as the input to the second operation, Systolic array-based computer device.
- In Article 7, The above-mentioned expanding operation is, Expanding the mantissa portion of the first operation result in floating-point form to a predetermined size, Systolic array-based computer device.
- In Article 8, The predetermined size is determined according to the form of the operation, Systolic array-based computer device.
- In Article 8, The predetermined size is determined by the range of values within the matrix data, Computer device.
- In Article 8, The predetermined size is determined according to the size of the above-mentioned systolic array, Systolic array-based computer device.
- In Article 7, It further includes an operation of rounding the first operation result above, and The above-mentioned rounded first operation result is transmitted as the input to the second operation. Systolic array-based computer device.
Description
The method for performing operations in a processing unit of a systolic array structure, and the processing unit that performs the operations The present invention relates to a method for operating a systolic array. Systolic arrays are large-scale parallel computing hardware architectures primarily used in artificial intelligence and machine learning, utilizing Multiply-Accumulate (MAC) devices to rapidly process complex operations such as matrix multiplication. Recently, FP16 (16-bit floating-point) is often used to increase computation speed and reduce memory usage, but using FP16 can result in a loss of precision. FP16 consists of a 1-bit sign bit, a 5-bit exponent, and a 10-bit mantissa, and compared to FP32, the range of precision it can represent is much narrower. While FP32 allows for more precise calculations with an 8-bit exponent and a 23-bit mantissa, the reduced number of mantissa bits in FP16 can result in small values occurring during operations not being reflected. This can accumulate repeatedly, particularly in parallel operations such as systolic arrays, significantly increasing the likelihood of errors. Furthermore, because the exponent bit in FP16 is limited to 5 bits, overflow can occur for large values and underflow for small values, affecting accuracy. Systolic arrays perform fast parallel computations by utilizing multiple MAC units; while using FP16 improves computation speed, it also increases the likelihood of precision loss. In particular, because systolic arrays perform computations step-by-step, minute errors generated in each MAC unit can accumulate and significantly impact the final output. This loss of precision can lead to performance degradation during the training or prediction process of artificial intelligence models. Prior art patent (KR 2021-0062739 A (2021.05.31)) proposes a method to increase the precision and efficiency of calculations by utilizing various floating-point formats. FIG. 1 is an exemplary diagram showing the configuration of a computing device according to one embodiment of the present invention. FIG. 2 is a flowchart illustrating a calculation method according to one embodiment of the present invention. FIG. 3 illustrates the operation and transmission process of a MAC unit according to one embodiment of the present invention. FIG. 4 is an exemplary diagram showing the expansion of a calculation result according to one embodiment of the present invention. FIG. 5 is an illustrative diagram showing a change in value according to one embodiment of the present invention. FIG. 6 is a flowchart illustrating a rounding process according to one embodiment of the present invention. FIG. 7 is a figure showing a rounding example according to one embodiment of the present invention. FIG. 8 is an exemplary diagram showing an implementation of a computing device including a computational device according to one embodiment of the present invention. The following description merely illustrates the principles of the invention. Therefore, those skilled in the art may invent various devices that embody the principles of the invention and are included within the concept and scope of the invention, even if they are not explicitly described or illustrated in this specification. Furthermore, all conditional terms and embodiments listed in this specification are, in principle, explicitly intended only for the purpose of enabling an understanding of the concept of the invention and should be understood as not being limited to the embodiments and conditions specifically listed elsewhere. The aforementioned objectives, features, and advantages will become clearer through the following detailed description in conjunction with the attached drawings, and accordingly, a person skilled in the art to which the invention pertains will be able to easily implement the technical concept of the invention. In addition, in describing the invention, if it is determined that a detailed description of known technology related to the invention may unnecessarily obscure the essence of the invention, such detailed description will be omitted. Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the attached drawings. FIG. 1 is an exemplary diagram showing the configuration of a computing device (300) according to one embodiment of the present invention. FIG. 1 shows the structure of a systolic array-based computing device (300) composed of 9x9 MAC (Multiply-Accumulate) units (300u) as an example. A systolic array is a two-dimensional array designed to efficiently process matrix multiplication, in which each MAC unit (300u) receives an activation input and a weight input, performs a multiplication operation, and calculates a final sum value by accumulating the results. At this time, each MAC unit (300u) shifts the input data, and the data is transferred to adjacent MAC units (300u) within the array during each consecutive clock cycle. Each MAC u