US-12625674-B2 - Neural processing device and method for converting data thereof

US12625674B2US 12625674 B2US12625674 B2US 12625674B2US-12625674-B2

Abstract

A neural processing device and a method for converting data thereof are provided. The neural processing device comprises a first compute unit configured to receive first input data in first precision and generate first output data in the first precision by performing calculations, a second compute unit configured to receive second input data in second precision which is different from the first precision and generate second output data in the second precision by performing calculation, and a first converting buffer configured to receive and store the first output data, generate the second input data by converting the first output data into the second precision, and transmit the second input data to the second compute unit.

Inventors

Jinwook Oh

Assignees

REBELLIONS INC.

Dates

Publication Date: 20260512
Application Date: 20241211
Priority Date: 20220401

Claims (20)

1 . A neural processing device comprising: a first processing element (PE) array configured to receive first input data in first precision and generate first output data in the first precision; a second PE array configured to receive second input data in second precision which is different from the first precision and generate second output data in the second precision; and a first converting buffer connected to the first PE array and the second PE array and configured to receive and store the first output data, generate the second input data by converting the first output data into the second precision, and transmit the second input data to the second PE array, wherein the first converting buffer comprises: an input register configured to receive the first output data in the first precision; a storage configured to receive the first output data from the input register and store the first output data; and an output converting register configured to receive the first output data from the storage, convert the first output data into the second input data in the second precision and transmit the second input data to the second PE array.
2 . The neural processing device of claim 1 , wherein the first PE array is further configured to perform two-dimensional matrix multiplications in the first precision.
3 . The neural processing device of claim 1 , wherein the second PE array is further configured to perform one-dimensional calculations in the second precision.
4 . The neural processing device of claim 1 , further comprising: a second converting buffer configured to convert the first input data in the second precision into the first precision and provide the converted first input data to the first PE array.
5 . The neural processing device of claim 4 , further comprising: a memory unit configured to store the second output data in the second precision and provide the first input data in the second precision to the second converting buffer.
6 . The neural processing device of claim 1 , further comprising: a third converting buffer configured to receive the second output data in the second precision from the second PE array and convert the second output data into the first precision.
7 . The neural processing device of claim 6 , further comprising: a memory unit configured to store the second output data in the first precision and transmit the second output data to the first processing element array without any conversion.
8 . The neural processing device of claim 1 , wherein the first converting buffer has a first in first out (FIFO) structure.
9 . The neural processing device of claim 1 , wherein the first PE array is further configured to receive i pieces of first input data in the first precision and generate j pieces of the first output data in the first precision.
10 . The neural processing device of claim 9 , wherein the first converting buffer is further configured to receive the j pieces of the first output data and convert the j pieces of the first output data into k pieces of the second input data in the second precision.
11 . The neural processing device of claim 10 , wherein the second PE array is further configured to receive the k pieces of the second input data in the second precision, and generate the second output data by performing one-dimensional calculations in the second precision.
12 . The neural processing device of claim 1 , wherein the first PE array has a coarse-grained reconfigurable array (CGRA) structure.
13 . The neural processing device of claim 1 , wherein a number of the first output data is different from a number of the second input data.
14 . A method for converting data of a neural processing device, comprising: receiving first input data in first precision by a first processing element (PE) array; generating first output data in the first precision by the first PE array; and generating, by a first converting buffer connected to the PE array and a second PE array, second input data by receiving the first output data and converting the first output data into second precision different from the first precision, wherein generating the second input data comprises: receiving the first output data by an input register; storing the first output data in a storage; transmitting the first output data to an output converting register; generating the second input data by converting the first output data into the second precision; and outputting the second input data.
15 . The method for converting data of the neural processing device of claim 14 , wherein the first PE array receives i pieces of the first input data in the first precision and generates j pieces of the first output data in the first precision.
16 . The method for converting data of the neural processing device of claim 15 , wherein the first converting buffer has a FIFO structure, and the first converting buffer receives the j pieces of the first output data and converts the j pieces of the first output data into k pieces of the second input data in the second precision.
17 . The method for converting data of the neural processing device of claim 16 , wherein the second PE array performs one-dimensional calculations, receives the k pieces of the second output data, and generates the second output data.
18 . The method for converting data of the neural processing device of claim 17 , wherein generating the second output data comprises: outputting the second output data in the second precision by the second PE array; transmitting, by a first FIFO buffer, h pieces of the second output data to an L0 memory; and storing the h pieces of the second output data in the L0 memory.
19 . The method for converting data of the neural processing device of claim 16 , wherein converting into the k pieces of the second input data in the second precision comprises: converting the j pieces of the first output data in the first precision into j pieces of the second input data in the second precision; and generating k pieces of the second input data in the second precision by merging the j pieces of the second input data in the second precision.
20 . The method for converting data of the neural processing device of claim 14 , wherein receiving the first input data comprises: transmitting h pieces of the first input data in the first precision by an L0 memory; and receiving the first input data and outputting i pieces of the first input data in the first precision by a first FIFO buffer.

Description

CROSS-REFERENCE TO RELATED APPLICATION This application is a continuation application of U.S. patent application Ser. No. 18/191,737, filed on Mar. 28, 2023, which claims the benefit of Korean Patent Application No. 10-2022-0041152 filed on Apr. 1, 2022, in the Korean Intellectual Property Office, the entire contents of which are hereby incorporated by reference. TECHNICAL FIELD The disclosure relates to a neural processing device and a method for converting data thereof. Specifically, the disclosure relates to a neural processing device to change precision by using a buffer memory required for data flow and a method for converting data thereof. BACKGROUND For the last few years, artificial intelligence technology has been the core technology of the Fourth Industrial Revolution and the subject of discussion as the most promising technology worldwide. The biggest problem with such artificial intelligence technology is computing performance. For artificial intelligence technology which realizes human learning ability, reasoning ability, perceptual ability, natural language implementation ability, etc., it is of utmost important to process a large amount of data quickly. The central processing unit (CPU) or graphics processing unit (GPU) of off-the-shelf computers was used for deep-learning training and inference in early artificial intelligence, but had limitations on the tasks of deep-learning training and inference with high workloads, and thus, neural processing units (NPUs) that are structurally specialized for deep learning tasks have received a lot of attention. The neural processing device requires a buffer memory for data flow therein. The buffer memory may temporarily store data for a short clock and transmit the data to a module and may be used for synchronization and a correct transmission of data. Meanwhile, precision of data is a form of representing data and has to be converted correctly before calculation is performed. Several calculation modules in the neural network processing device may set various types of precision for various reason, and lots of resources have to be used for conversion of the precision. Accordingly, implementing a method of converting precision in a buffer memory rather than a calculation module may be a very good way in terms of hardware resources. The description set forth in the background section should not be assumed to be prior art merely because it is set forth in the background section. The background section may describe aspects or embodiments of the disclosure. SUMMARY Aspects of the disclosure provide a neural processing device that maximizes efficiency by converting precision during data transmission. Aspects of the disclosure provide a method for converting data of a neural processing device that maximizes efficiency by converting precision during the data transmission. According to some aspects of the disclosure, a neural processing device comprises: a first compute unit configured to receive first input data in first precision and generate first output data in the first precision by performing calculations; a second compute unit configured to receive second input data in second precision which is different from the first precision and generate second output data in the second precision by performing calculations; and a first converting buffer configured to receive and store the first output data, generate the second input data by converting the first output data into the second precision, and transmit the second input data to the second compute unit. According to some aspects, the neural processing device, further comprises: a second converting buffer configured to convert the first input data in the second precision into the first precision and provide the converted first input data to the first compute unit. According to some aspects, the neural processing device, further comprises: a memory unit configured to store the second output data in the second precision and provide the first input data in the second precision to the second converting buffer. According to some aspects, the neural processing device, further comprises: a third converting buffer configured to receive the second output data in the second precision from the second compute unit and convert the second output data into the first precision. According to some aspects, the first converting buffer comprises: an input converting registers configured to receive the first output data in the first precision and convert the first output data into the second input data in the second precision; a storage configured to receive the second input data from the input converting register and store the second input data; and an output register configured to receive the second input data from the storage and transmit the second input data to the second compute unit. According to some aspects, the first converting buffer comprises: an input converting registers configured to receive the first output data in