US-20260126922-A1 - ELECTRONIC DEVICE COMPRISING NEURAL PROCESSING UNIT, AND OPERATING METHOD THEREOF

US20260126922A1US 20260126922 A1US20260126922 A1US 20260126922A1US-20260126922-A1

Abstract

An electronic device may include a processing element (PE) array, a local memory which is configured with a plurality of local memory blocks and which stores data on a plurality of feature maps processed in the PE array, and a control core configured to control the PE array and the local memory. The control core may control the local memory such that at least one local memory block from among the plurality of local memory blocks is turned off based on a size of a feature map corresponding to a layer.

Inventors

Junhyuk Lee
Hyunbin Park
Seungjin YANG
Jin Choi
Boyeon NA

Assignees

SAMSUNG ELECTRONICS CO., LTD.

Dates

Publication Date: 20260507
Application Date: 20260102
Priority Date: 20230703

Claims (20)

1 . An electronic device comprising: a processing element (PE) array comprising processing circuitry; a local memory which is configured with a plurality of local memory blocks and configured to store data on a plurality of feature maps processed in the PE array; and a control core, comprising circuitry, configured to control the PE array and the local memory, wherein the control core is configured to control the local memory so that at least one local memory block from among the plurality of local memory blocks is turned off based on a size of a feature map corresponding to a layer.
2 . The electronic device of claim 1 , further comprising: a main memory configured to store an artificial neural network model in a first language format so as to provide the artificial neural network model; and a processor, comprising processing circuitry, configured to provide the local memory with the artificial neural network model stored in the main memory in the first language format, wherein, when the artificial neural network model in the first language format is compiled to a second language format, the processor is configured to tag a buffer capacity of a local memory corresponding to a size of the feature map to the compiled artificial neural network model.
3 . The electronic device of claim 2 , wherein, while the local memory is controlled, the processor is configured to adjust a bandwidth of the main memory, based on the number of local memory blocks in an on state, and the local memory acquires data from the main memory, based on the adjusted bandwidth.
4 . The electronic device of claim 2 , wherein the control core is configured to determine the number of local memory blocks to be turned off based on a buffer capacity corresponding to a size of the per-layer feature map.
5 . The electronic device of claim 2 , wherein the control core is configured to turn on all of the plurality of local memory blocks, when a size of the per-layer feature map is greater than a total buffer capacity of the plurality of local memory blocks.
6 . The electronic device of claim 1 , wherein the local memory comprises a tightly-coupled memory (TCM) configured to provide the control core with the per-layer feature map in association with the control core.
7 . The electronic device of claim 2 , wherein the processor comprises one or more processors and is configured to: tag a first buffer capacity corresponding to a feature map size of a first layer to an artificial neural network model in the second language format; tag a second buffer capacity corresponding to a feature map size of a second layer, which is processed next to the first layer, to the artificial neural network model in the second language format; and tag a third buffer capacity corresponding to a feature map size of a third layer, which is processed next to the second layer, to the artificial neural network model in the second language format.
8 . The electronic device of claim 2 , wherein the control core is configured to: classify a plurality of consecutive layers into a plurality of layer groups differentiated depending buffer capacities; and determine the number of local memory blocks to be turned off based on a buffer capacity corresponding to the layer group.
9 . The electronic device of claim 8 , wherein the control core is configured to: group a plurality of layers having a first buffer capacity and a second buffer capacity smaller than the first buffer capacity; and determine the number of local memory blocks to be turned off based on the first buffer capacity, when layers having the second buffer capacity are not consecutive within the layer group.
10 . The electronic device of claim 8 , wherein the control core is configured to: group a plurality of layers having a first buffer capacity and a second buffer capacity smaller than the first buffer capacity; and determine the number of local memory blocks to be turned off among the plurality of cells, based on the first buffer capacity and the second buffer capacity, when layers having the second buffer capacity are consecutive within the layer group.
11 . The electronic device of claim 1 , wherein the control core is configured to control the PE array to perform a multiply-and-accumulate (MAC) computation in a state where some of the plurality of memory blocks are turned off.
12 . A method of operating an electronic device including a neural processing unit (NPU), the method comprising: having a processing element (PE) array of the NPU; a local memory of the NPU configured with a plurality of local memory blocks and storing data on a plurality of feature maps processed in the PE array; a control core of the NPU controlling the PE array and the local memory, and the method further comprising controlling the local memory such that at least one local memory block from among the plurality of local memory blocks is turned off based on a size of a feature map corresponding to a layer.
13 . The method of claim 12 , further comprising: a main memory storing an artificial neural network model in a first language format so as to provide the NPU with the artificial neural network model; and a processor, comprising processing circuitry, providing the NPU with the artificial neural network model stored in the main memory in the first language format, and compiling the artificial neural network model in the first language format to a second language format; and tagging a buffer capacity of a local memory corresponding to a size of the feature map to the compiled artificial neural network model.
14 . The method of claim 13 , further comprising determining the number of local memory blocks to be turned off based on a buffer capacity corresponding to a size of the per-layer feature map.
15 . The method of claim 13 , further comprising turning on all of the plurality of local memory blocks, when a size of the per-layer feature map is greater than a total buffer capacity of the plurality of local memory blocks.
16 . The method of claim 12 , wherein the local memory comprises a tightly-coupled memory (TCM) which provides the control core with the per-layer feature map in association with the control core.
17 . The method of claim 13 , wherein the compiling comprises: tagging a first buffer capacity corresponding to a feature map size of a first layer to an artificial neural network model in the second language format; tagging a second buffer capacity corresponding to a feature map size of a second layer, which is processed next to the first layer, to the artificial neural network model in the second language format; and tagging a third buffer capacity corresponding to a feature map size of a third layer, which is processed next to the second layer, to the artificial neural network model in the second language format.
18 . The method of claim 13 , further comprising: classifying a plurality of consecutive layers into a plurality of layer groups differentiated depending buffer capacities; and determining the number of local memory blocks to be turned off based on a buffer capacity corresponding to the layer group.
19 . The method of claim 18 , further comprising: grouping a plurality of layers having a first buffer capacity and a second buffer capacity smaller than the first buffer capacity; and determining the number of local memory blocks to be turned off based on the first buffer capacity, when layers having the second buffer capacity are not consecutive within the layer group.
20 . A neural processing unit (NPU) comprising: a processing element (PE) array comprising circuitry; a memory configured with a plurality of local memory blocks and configured to store data regarding a plurality of feature maps processed in the PE array; and a control core, comprising circuitry, configured to control the PE array and the local memory, wherein the control core is configured to control the local memory so that at least one local memory block from among the plurality of local memory blocks is turned off based on a size of a feature map corresponding to a layer.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS This application is a continuation application of International Application No. PCT/KR2024/009372, filed on Jul. 3, 2024, in the Korean Intellectual Property Receiving Office, and claiming priority to KR Application No. 10-2023-0085788 filed Jul. 3, 2023, and KR Application No. 10-2023-0149252 filed Nov. 1, 2023, the disclosures of which are all hereby incorporated by reference herein in their entireties. TECHNICAL FIELD Certain example embodiments may relate to an electronic device including a neural processing unit (NPU). BACKGROUND With the advancement of deep learning models, which are a type of artificial neural network, hardware specifications of neural processing units (NPUs), which are chipsets constituting neural networks, have been significantly enhanced. The enhancement of the specifications of the NPUs has led to an increase in a capacity of a static random access memory (SRAM) which serves a role similar to that of an internal cache. When the deep learning model requires a large computational load, the NPU with the enhanced specifications is suitable. However, when the deep learning model requires only a small internal memory, not all SRAM cells are utilized, resulting in the occurrence of leakage current, which is current applied to unused cells. If the specifications of the NPU in an electronic device exceed those required for the computational load of the deep learning model, unnecessary power consumption may occur from an overall perspective. SUMMARY An electronic device according to an example embodiment may include a neural processing unit (NPU) comprising circuitry. The NPU according to an embodiment may include a processing element (PE) array comprising processing circuitry, a local memory which provides a feature map to the PE array and is configured with a plurality of local memory blocks, and a control core configured to control the PE array and the local memory. The control core according to an example embodiment may control at least one local memory block such that some of the plurality of local memory blocks are turned off based on a size of a feature map. A method of operating an electronic device according to an example embodiment may include an NPU. In the operation method according to an example embodiment, the NPU may include a PE array, a local memory which provides a feature map to the PE array and is configured with a plurality of local memory blocks, and a control core configured to control the PE array and the local memory. The method of operating the electronic device according to an example embodiment may include allowing the control core to control at least one local memory block such that some of the plurality of local memory blocks are turned off based on a size of a feature map. BRIEF DESCRIPTION OF DRAWINGS FIG. 1 is a block diagram of an electronic device in a network environment according to one or more example embodiments; FIG. 2 is a block diagram of a neural processing unit (NPU) in an electronic device according to one or more example embodiments; FIG. 3 illustrates a plurality of cells constituting a local memory according to an example embodiment; FIG. 4 illustrates a per-layer feature map size of a deep learning model having a relatively small computational load according to an example embodiment; FIG. 5 illustrates a per-layer feature map size of a deep learning model having a relatively large computational load according to an example embodiment; FIG. 6 is a control block diagram illustrating an operation of a per-layer local memory block in an electronic device according to an example embodiment; FIG. 7 is a flowchart illustrating a method of operating an electronic device according to an example embodiment; FIG. 8 is a flowchart illustrating a method by which an electronic device tags additional information for each layer during a compilation process according to an example embodiment; FIG. 9 illustrates a method of operating a local memory when layer groups having different buffer capacities are processed in an NPU according to an example embodiment; FIG. 10 illustrates a method of operating a local memory different from that of FIG. 9 when layer groups having different buffer capacities are processed in an NPU according to an example embodiment; FIG. 11 is a flowchart illustrating a method of operating an electronic device according to an example embodiment; FIG. 12 is a drawing for explaining the example operating method according to FIG. 11, which is applicable to a mixed precision model; and FIG. 13 is a drawing for explaining the example operating method according to FIG. 11, which is applicable to a mixed precision model. DETAILED DESCRIPTION Embodiments of the disclosure will be described herein below with reference to the accompanying drawings. Advantages and features of the disclosure and methods of accomplishing the same may be understood more clearly by reference to the following detailed description