US-12626112-B2 - NPU, edge device and operation method thereof
Abstract
A neural processing unit (NPU) includes an internal memory storing information on combinations of a plurality of artificial neural network (ANN) models, the plurality of ANN models including first and second ANN models; a plurality of processing elements (PEs) to process first operations and second operations of the plurality of ANN models in sequence or in parallel, the plurality of PEs including first and second groups of PEs; and a scheduler to allocate to the first group of PEs a part of the first operations for the first ANN model and to allocate to the second group of PEs a part of the second operations for the second ANN model, based on an instruction related to information on an operation sequence of the plurality of ANN models or further based on ANN data locality information. The first and second operations may be performed in parallel or in a time division.
Inventors
- Lok Won Kim
Assignees
- DEEPX CO., LTD.
Dates
- Publication Date
- 20260512
- Application Date
- 20211014
- Priority Date
- 20210401
Claims (19)
- 1 . A neural processing unit (NPU) comprising: at least one internal memory for storing information on combinations of a plurality of artificial neural network (ANN) models, the plurality of ANN models including a first ANN model and a second ANN model; a first circuitry provided for a plurality of processing elements (PEs), each of which includes a multiplier, an adder and an accumulator, a second circuitry provided for a scheduler operably configurable to divide the plurality of PEs into a first group of PEs and a second group of PEs, based on an instruction related to information on an operation sequence of the plurality of ANN models, allocate to the first group of PEs a part of first operations for the first ANN model, and allocate to the second group of PEs a part of second operations for the second ANN model, wherein operations of the second ANN model are initiated before completing operations of the first ANN model, and wherein the scheduler is further configured to: reuse memory addresses in the at least one internal memory for intermediate computation results to improve memory reuse rate, and perform zero-skipping operations to reduce unnecessary computations.
- 2 . The NPU of claim 1 , wherein each of the allocations by the scheduler is further based on ANN data locality information.
- 3 . The NPU of claim 1 , wherein the first operations for the first ANN model and the second operations for the second ANN model are performed in parallel or in a time division.
- 4 . The NPU of claim 1 , wherein the first group of PEs includes at least one PE that is different from the second group of PEs.
- 5 . The NPU of claim 4 , wherein the first group of PEs includes at least one PE that coincides with the second group of PEs.
- 6 . The NPU of claim 1 , wherein the information on the operation sequence includes at least one of: information on a layer, information on a kernel, information on a processing time, information on a remaining time, and information on a clock.
- 7 . The NPU of claim 6 , wherein the information on the layer represents an ith layer among all layers of the first ANN model, and wherein the second ANN model is initiated after the ith layer of the first ANN model is initiated.
- 8 . The NPU of claim 6 , wherein the information on the kernel represents a kth kernel among all kernels of the first ANN model, and wherein the second ANN model is initiated after the kth kernel of the first ANN model is used.
- 9 . The NPU of claim 6 , wherein the information on the processing time represents a time elapsed after performing operations of the first ANN model, and wherein the second ANN model is initiated after the elapsed time.
- 10 . The NPU of claim 6 , wherein the information on the remaining time represents a time remaining until completing operations of the first ANN model, and wherein the second ANN model is initiated before reaching the remaining time.
- 11 . The NPU of claim 1 , wherein the information on the operation sequence of the plurality of ANN models is stored in the at least one internal memory.
- 12 . The NPU of claim 1 , wherein the scheduler generates the instruction based on the information on the operation sequence of the plurality of ANN models.
- 13 . The NPU of claim 1 , wherein the NPU is mounted in an edge device, and wherein the edge device comprises a memory and a central processing unit (CPU) configured to execute commands for an application.
- 14 . The NPU of claim 13 , wherein the memory of the edge device is configured to store the information on the operation sequence of the plurality of ANN models.
- 15 . The NPU of claim 13 , wherein the CPU of the edge device generates the instruction when the CPU executes the commands for the application.
- 16 . An edge device comprising: a system bus; a memory electrically connected to the system bus; a central processing unit (CPU) electrically connected to the system bus, the CPU being configured to access the memory via the system bus and execute commands for an application; and a plurality of neural processing units (NPUs) electrically connected to the system bus, the plurality of NPUs including a first NPU and a second NPU, each NPU including: at least one internal memory for storing information on combinations of a plurality of artificial neural network (ANN) models, the plurality of ANN models including a first ANN model and a second ANN model, a plurality of processing elements (PEs), each of which includes a multiplier, an adder and an accumulator, and a scheduler operably configurable to divide the plurality of PEs into a first group of PEs and a second group of PEs, based on an instruction related to information on an operation sequence of the plurality of ANN models, wherein a part of first operations for the first ANN model is allocated to the first NPU or to the first group of PEs in the first NPU, wherein a part of second operations for the second ANN model is allocated to the second NPU or to the second group of PEs in the first NPU, wherein operations of the second ANN model are initiated before completing operations of the first ANN model, and wherein the scheduler is further configured to: reuse memory addresses in the at least one internal memory for intermediate computation results to improve memory reuse rate, and perform zero-skipping operations to reduce unnecessary computations.
- 17 . The edge device of claim 16 , wherein the first operations for the first ANN model and the second operations for the second ANN model are performed in parallel or in a time division.
- 18 . The edge device of claim 16 , wherein the first group of PEs includes at least one PE that is different from the second group of PEs.
- 19 . The edge device of claim 18 , wherein the first group of PEs includes at least one PE that coincides with the second group of PEs.
Description
CROSS-REFERENCE TO RELATED APPLICATIONS This application claims the priority of Korean Patent Application No. 10-2021-0042950 filed on Apr. 1, 2021, in the Korean Intellectual Property Office, the disclosure of which is incorporated herein by reference. BACKGROUND OF THE DISCLOSURE Technical Field The present disclosure relates to an artificial neural network. Background Art Humans have intelligence to perform recognition, classification, inference, prediction, control/decision making, and the like. Artificial Intelligence (AI) means artificially imitating human intelligence. The human brain is made up of a multitude of nerve cells called neurons. Each neuron is connected to hundreds to thousands of other neurons through connections called synapses. The modeling of the working principle of biological neurons and the connection relationship between neurons operates to mimic human intelligence and is called an artificial neural network (ANN) model. In other words, an artificial neural network is a system in which nodes imitating neurons are connected in a layer structure. The ANN model is divided into a monolayer neural network and a multilayer neural network according to the number of layers, and a general multilayer neural network consists of input layers, hidden layers, and output layers. Here, the input layer is a layer receiving external data, in which the number of neurons of the input layer is the same as the number of input variables; the hidden layer is located between the input layer and the output layer and receives a signal from the input layer to extract features and transmit the features to the output layer; and the output layer receives a signal from the hidden layer and outputs the received signal to the outside. The input signal between the neurons is multiplied and then summed by each connection strength with a value of zero (0) to one (1), and if the sum is greater than a threshold of the neuron, neurons are activated and implemented as an output value through an activation function. In order to implement higher artificial intelligence, increasing the number of hidden layers of the ANN is referred to as a deep neural network (DNN). On the other hand, the ANN model may be used in various edge devices, and the edge devices may use a plurality of ANN models depending on its type. SUMMARY OF THE DISCLOSURE However, in the case of using a plurality of artificial neural network (ANN) models, the inventors of the present disclosure have recognized a problem in that an optimized method is not present. When a neural processing unit (NPU) is provided separately for each ANN model, the inventors of the present disclosure have recognized a problem in that the time that the NPU exists in an idle state is increased, which reduces efficiency. Further, in the case of performing computations of the plurality of ANN models with one NPU, the inventors of the present disclosure have recognized a problem in that, absent the setting of an efficient operation sequence among the plurality of ANN models, a computation processing time is increased. In order to solve the aforementioned problems, there is provided a neural processing unit (NPU). The NPU may include at least one internal memory for storing information on combinations of a plurality of artificial neural network (ANN) models, the plurality of ANN models including first and second ANN models; a plurality of processing elements (PEs) operably configurable to process first operations and second operations of the plurality of ANN models in sequence or in parallel, the plurality of PEs including first and second groups of PEs; and a scheduler operably configurable to allocate to the first group of PEs a part of the first operations for the first ANN model and to allocate to the second group of PEs a part of the second operations for the second ANN model, based on an instruction related to information on an operation sequence of the plurality of ANN models. Each of the allocations by the scheduler may be further based on ANN data locality information. The first operations for the first ANN model and the second operations for the second ANN model may be performed in parallel or in a time division. The first group of the PEs and the second group of the PEs may be partially the same or completely different from each other. In other words, the first group of PEs may include at least one PE that is different from the second group of PEs and may include at least one PE that coincides with the second group of PEs. The information on the operation sequence may include at least one of information on a layer, information on a kernel, information on a processing time, information on a remaining time, and information on a clock. The information on the layer may represent an ith layer among all layers of the first ANN model, and the second ANN model may be initiated after the ith layer of the first ANN model is initiated. The information on the kernel may represent a kth k