EP-4738202-A1 - ELECTRONIC DEVICE COMPRISING NEURAL PROCESSING UNIT, AND OPERATING METHOD THEREOF

EP4738202A1EP 4738202 A1EP4738202 A1EP 4738202A1EP-4738202-A1

Abstract

An electronic device according to an embodiment may include a processing element (PE) array, a local memory which is configured with a plurality of local memory blocks and which stores data on a plurality of feature maps processed in the PE array, and a control core configured to control the PE array and the local memory. The control core may control the local memory such that at least one local memory block from among the plurality of local memory blocks is turned off based on a size of a feature map corresponding to a layer.

Inventors

LEE, JUNHYUK
PARK, HYUNBIN
YANG, SEUNGJIN
CHOI, JIN
NA, BOYEON

Assignees

Samsung Electronics Co., Ltd.

Dates

Publication Date: 20260506
Application Date: 20240703

Claims (15)

An electronic device comprising: a processing element (PE) array; a local memory which is configured with a plurality of local memory blocks and which stores data on a plurality of feature maps processed in the PE array; and a control core configured to control the PE array and the local memory, wherein the control core controls the local memory such that at least one local memory block from among the plurality of local memory blocks is turned off based on a size of a feature map corresponding to a layer.
The electronic device of claim 1, further comprising: a main memory which stores an artificial neural network model in a first language format so as to provide the artificial neural network model; and a processor which provides the local memory with the artificial neural network model stored in the main memory in the first language format, wherein, when the artificial neural network model in the first language format is compiled to a second language format, the processor tags a buffer capacity of a local memory corresponding to a size of the feature map to the compiled artificial neural network model.
The electronic device of claim 2, wherein, while the local memory is controlled, the processor adjusts a bandwidth of the main memory, based on the number of local memory blocks in an on state, and the local memory acquires data from the main memory, based on the adjusted bandwidth.
The electronic device of claim 2, wherein the control core determines the number of local memory blocks to be turned off based on a buffer capacity corresponding to a size of the per-layer feature map.
The electronic device of claim 2, wherein the control core turns on all of the plurality of local memory blocks, when a size of the per-layer feature map is greater than a total buffer capacity of the plurality of local memory blocks.
The electronic device of claim 1, wherein the local memory is a tightly-coupled memory (TCM) which provides the control core with the per-layer feature map in association with the control core.
The electronic device of claim 2, wherein the processor is configured to: tag a first buffer capacity corresponding to a feature map size of a first layer to an artificial neural network model in the second language format; tag a second buffer capacity corresponding to a feature map size of a second layer, which is processed next to the first layer, to the artificial neural network model in the second language format; and tag a third buffer capacity corresponding to a feature map size of a third layer, which is processed next to the second layer, to the artificial neural network model in the second language format.
The electronic device of claim 2, wherein the control core is configured to: classify a plurality of consecutive layers into a plurality of layer groups differentiated depending buffer capacities; and determine the number of local memory blocks to be turned off based on a buffer capacity corresponding to the layer group.
The electronic device of claim 8, wherein the control core is configured to: group a plurality of layers having a first buffer capacity and a second buffer capacity smaller than the first buffer capacity; and determine the number of local memory blocks to be turned off based on the first buffer capacity, when layers having the second buffer capacity are not consecutive within the layer group.
The electronic device of claim 8, wherein the control core is configured to: group a plurality of layers having a first buffer capacity and a second buffer capacity smaller than the first buffer capacity; and determine the number of local memory blocks to be turned off among the plurality of cells, based on the first buffer capacity and the second buffer capacity, when layers having the second buffer capacity are consecutive within the layer group.
The electronic device of claim 1, wherein the control core is configured to control the PE array to perform a multiply-and-accumulate (MAC) computation in a state where some of the plurality of memory blocks are turned off.
A method of operating an electronic device including a neural processing unit (NPU), the NPU comprising: a processing element (PE) array; a local memory which is configured with a plurality of local memory blocks and which stores data on a plurality of feature maps processed in the PE array; and a control core configured to control the PE array and the local memory, and the method comprising controlling the local memory such that at least one local memory block from among the plurality of local memory blocks is turned off based on a size of a feature map corresponding to a layer.
The method of claim 12, the electronic device further comprising: a main memory which stores an artificial neural network model in a first language format so as to provide the NPU with the artificial neural network model; and a processor which provides the NPU with the artificial neural network model stored in the main memory in the first language format, and the method further comprising: compiling the artificial neural network model in the first language format to a second language format; and tagging a buffer capacity of a local memory corresponding to a size of the feature map to the compiled artificial neural network model.
The method of claim 13, wherein the compiling comprises: tagging a first buffer capacity corresponding to a feature map size of a first layer to an artificial neural network model in the second language format; tagging a second buffer capacity corresponding to a feature map size of a second layer, which is processed next to the first layer, to the artificial neural network model in the second language format; and tagging a third buffer capacity corresponding to a feature map size of a third layer, which is processed next to the second layer, to the artificial neural network model in the second language format.
The method of claim 13, further comprising: classifying a plurality of consecutive layers into a plurality of layer groups differentiated depending buffer capacities; and determining the number of local memory blocks to be turned off based on a buffer capacity corresponding to the layer group.

Description

[Technical Field] The disclosure relates to an electronic device including a neural processing unit (NPU). [Background Art] With the advancement of deep learning models, which are a type of artificial neural network, hardware specifications of neural processing units (NPUs), which are chipsets constituting neural networks, have been significantly enhanced. The enhancement of the specifications of the NPUs has led to an increase in a capacity of a static random access memory (SRAM) which serves a role similar to that of an internal cache. When the deep learning model requires a large computational load, the NPU with the enhanced specifications is suitable. However, when the deep learning model requires only a small internal memory, not all SRAM cells are utilized, resulting in the occurrence of leakage current, which is current applied to unused cells. If the specifications of the NPU in an electronic device exceed those required for the computational load of the deep learning model, unnecessary power consumption may occur from an overall perspective. [Disclosure of Invention] [Solution to Problem] An electronic device according to an embodiment may include a neural processing unit (NPU). The NPU according to an embodiment may include a processing element (PE) array, a local memory which provides a feature map to the PE array and is configured with a plurality of local memory blocks, and a control core configured to control the PE array and the local memory. The control core according to an embodiment may control at least one local memory block such that some of the plurality of local memory blocks are turned off based on a size of a feature map. A method of operating an electronic device according to an embodiment may include an NPU. In the operation method according to an embodiment, the NPU may include a PE array, a local memory which provides a feature map to the PE array and is configured with a plurality of local memory blocks, and a control core configured to control the PE array and the local memory. The method of operating the electronic device according to an embodiment may include allowing the control core to control at least one local memory block such that some of the plurality of local memory blocks are turned off based on a size of a feature map. [Brief Description of Drawings] FIG. 1 is a block diagram of an electronic device in a network environment according to one or more embodiments;FIG. 2 is a block diagram of a neural processing unit (NPU) in an electronic device according to one or more embodiments;FIG. 3 illustrates a plurality of cells constituting a local memory according to an embodiment;FIG. 4 illustrates a per-layer feature map size of a deep learning model having a relatively small computational load according to an embodiment;FIG. 5 illustrates a per-layer feature map size of a deep learning model having a relatively large computational load according to an embodiment;FIG. 6 is a control block diagram illustrating an operation of a per-layer local memory block in an electronic device according to an embodiment;FIG. 7 is a flowchart illustrating a method of operating an electronic device according to an embodiment;FIG. 8 is a flowchart illustrating a method by which an electronic device tags additional information for each layer during a compilation process according to an embodiment;FIG. 9 illustrates a method of operating a local memory when layer groups having different buffer capacities are processed in an NPU according to an embodiment;FIG. 10 illustrates a method of operating a local memory different from that of FIG. 9 when layer groups having different buffer capacities are processed in an NPU according to an embodiment;FIG. 11 is a flowchart illustrating a method of operating an electronic device according to an embodiment;FIG. 12 is a drawing for explaining the operating method according to FIG. 11, which is applicable to a mixed precision model; andFIG. 13 is a drawing for explaining the operating method according to FIG. 11, which is applicable to a mixed precision model. [Mode for the Invention] Embodiments of the disclosure will be described herein below with reference to the accompanying drawings. Advantages and features of the disclosure and methods of accomplishing the same may be understood more clearly by reference to the following detailed description of the embodiments and the accompanying drawings. However, the disclosure is not limited to embodiments disclosed below, and may be implemented in various forms. Rather, the embodiments are provided to complete the disclosure and to fully convey the concept of the disclosure to one of those ordinarily skilled in the art, and the disclosure will only be defined by the scope of claims. Throughout the specification, like reference numerals denote like components. Unless otherwise defined, all terms used in this specification (including technical and scientific terms) may be used with the meanings commonly understoo