US-12626093-B2 - Computing apparatus

US12626093B2US 12626093 B2US12626093 B2US 12626093B2US-12626093-B2

Abstract

A computing apparatus includes at least one computing device including a computation area. The computing apparatus acquires input information, generates a plurality of input feature maps from the input information, and performs DNN computation in parallel on the generated plurality of input feature maps by at least one DNN partitioning method including channel partitioning. In the channel partitioning, by grouping the feature maps into sets grouping the feature maps without partitioning the feature maps and allocating each set grouping the feature maps to each computation area, computation for layers is performed.

Inventors

Makoto Takahashi
Takashi Oshima

Assignees

HITACHI, LTD.

Dates

Publication Date: 20260512
Application Date: 20230719
Priority Date: 20220914

Claims (15)

1 . A computing apparatus including at least one computing device including at least one computation area for performing computation, the computing apparatus performing deep neural network (DNN) computation by using the computing device, the computing device including an intermediate memory area, the computing apparatus comprising: a plurality of the computation areas; and a DNN computation execution unit acquiring input information, generating a plurality of input feature maps from the input information, transmitting the generated plurality of input feature maps to the computing device to be used in the computation among the computing devices, and performing DNN computation on the plurality of input feature maps in parallel by using the plurality of computation area, wherein the DNN computation execution unit performs the DNN computation in parallel on the plurality of input feature maps in at least one DNN partitioning method, wherein the DNN partitioning method when performing the DNN computation in parallel includes at least channel partitioning, in which, in the process of DNN computation, (1) without partitioning each of the plurality of feature maps to be used in the computation for the layer, the plurality of feature maps to be used in the computation for the layer are grouped into sets grouping the feature maps and (2) the computation for the layer is performed on each set grouping the feature maps, and wherein when performing at least the channel partitioning, the computation is performed by storing a result of performing the computation on the feature map or the feature map included in at least one of the different computation areas in the intermediate memory area included in at least another one of the computation areas.
2 . The computing apparatus according to claim 1 , wherein the DNN computation execution unit stores the convolution maps generated by each of the plurality of computing devices to be used in the computation among the plurality of computing devices in the intermediate memory area of the computing device aggregating the convolution maps, and allows the computing device aggregating the convolution maps to generate an output feature map by summing respective elements of all the convolution maps stored in the intermediate memory area.
3 . The computing apparatus according to claim 2 , wherein the computing device includes the intermediate memory areas of which the number is the number of the computation areas.
4 . The computing apparatus according to claim 1 , wherein the DNN computation execution unit allows each of the plurality of computing devices to be used in the computation among the plurality of computing devices to generate a partial sum map obtained by summing the generated convolution maps and store the partial sum maps in the intermediate memory area of the computing device aggregating the partial sum maps, and allows the computing device aggregating the partial sum maps to generate the output feature map by summing all the partial sum maps stored in the intermediate memory area.
5 . The computing apparatus according to claim 4 , wherein the computing device includes the intermediate memory areas of which the number is the number of computation areas.
6 . A computing apparatus including at least one computing device including at least one computation area for performing computation, the computing apparatus performing deep neural network (DNN) computation by using the computing device, the computing apparatus comprising: a plurality of the computation areas; and a DNN computation execution unit acquiring input information, generating a plurality of input feature maps from the input information, transmitting the generated plurality of input feature maps to the computing device to be used in the computation among the computing devices, and performing DNN computation on the plurality of input feature maps in parallel by using the plurality of computation area, wherein in a process of the DNN computation, the DNN computation execution unit switches a partitioning method between: spatial partitioning for partitioning the feature map into a plurality of the feature maps inside a plane, allocating each of the partitioned feature maps to each of the plurality of computation areas, and performing the computation for the layer; and channel partitioning which (1) without partitioning each of the plurality of feature maps used in the computation for the layer, grouping the plurality of feature maps to be used in the computation for the layer into sets grouping the feature maps, and (2) allocates each set grouping the feature maps to each computation area to be used among the plurality of computation areas.
7 . The computing apparatus according to claim 6 , wherein each of the plurality of computing devices includes a computation memory area to be used in the computation, and wherein the DNN computation execution unit switches the partitioning method from the spatial partitioning to the channel partitioning in front of the layer selected from the layer of the DNN computation based on at least one of a process speed of the plurality of computing devices, the number of computing devices among the plurality of computing devices, the number of computation areas, and a capacity of a computation memory of the plurality of computing devices.
8 . The computing apparatus according to claim 6 , wherein the computing device includes an intermediate memory area, and wherein the DNN computation execution unit stores the convolution maps generated by each of the plurality of computing devices to be used in the computation among the plurality of computing devices in the intermediate memory area of the computing device aggregating the convolution maps and allows the computing device aggregating the convolution maps to generate an output feature map by summing elements of all the convolution maps stored in the intermediate memory area.
9 . The computing apparatus according to claim 8 , wherein the computing device includes the intermediate memory areas of which the number is the number of computation areas.
10 . The computing apparatus according to claim 6 , wherein the DNN computation execution unit allows each of the plurality of computing devices to be used in the computation among the plurality of computing devices to generate a partial sum map obtained by summing the generated convolution maps, stores the partial sum map in the intermediate memory area of the computing device aggregating the partial sum maps, and allows the computing device aggregating the partial sum maps to generate the output feature map by summing all the partial sum maps stored in the intermediate memory area.
11 . The computing apparatus according to claim 10 , wherein the computing device includes the intermediate memory areas of which the number is the number of computation areas.
12 . A computing apparatus including at least one computing device including at least one computation area for performing computation, the computing apparatus performing deep neural network (DNN) computation by using the computing device, the computing apparatus comprising: a plurality of the computation areas; a DNN computation execution unit acquiring input information, generating a plurality of input feature maps from the input information, transmitting the generated plurality of input feature maps to the computing device to be used in the computation among the computing devices, and performing computation of the DNN model on the plurality of input feature maps in parallel by using the plurality of computation area, and calculating output information including the accuracy value of the computation result as the computation result of the DNN model; and a model adjustment unit determining applicability of the DNN model based on the accuracy value included in the output information and information on the computation scale of the DNN model and calculating model level information specifying the DNN model and the number of computation areas to be used in the DNN calculation for the plurality of input feature maps generated from the input information to be input next, wherein the DNN computation execution unit changes the DNN model and the number of computation areas to be used in the DNN computation for the plurality of input feature maps generated from the input information to be input next based on the model level information calculated by the model adjustment unit.
13 . The computing apparatus according to claim 12 , wherein the model adjustment unit compares magnitudes among the accuracy value, a first threshold value, and a second threshold value, when the accuracy value being larger than the first threshold, calculates model level information specifying the DNN model and the number of computation areas having smaller computation scales than the DNN model and the number of computation areas for the plurality of input feature maps, when the accuracy value being the first threshold value or less and the accuracy value being larger than the second threshold, calculates model level information specifying the DNN model and the number of computation areas for the plurality of input feature maps, and when the accuracy value being the second threshold or less, calculates model level information specifying the DNN model and the number of computation areas having larger computation scales than the DNN model and the number of computation areas for the plurality of input feature maps.
14 . The computing apparatus according to claim 12 , wherein a class probability value is used as the accuracy value.
15 . The computing apparatus according to claim 12 , further comprising a peripheral environment recognition unit performing a spatial spectrum analysis on the input information to calculate a spatial spectrum analysis result when the input information being input, and selecting the model level specifying the DNN model and the number of computation areas to be used in the computation for the plurality of input feature maps based on the calculated spatial spectrum analysis result, wherein the DNN computation execution unit uses the DNN model and the computation area set based on the model level selected by the peripheral environment recognition unit for the DNN computation for the plurality of input feature maps.

Description

BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a computing apparatus performing deep neural network (DNN) computation. 2. Description of Related Art As working population and skilled workers are declining, automation of various tasks has been promoted in the fields of logistics and production using artificial intelligence (AI) in robots and automatic guided vehicles (AGVs). In such automation, it is important to operate a highly accurate image recognition AI with an edge device that can be equipped with a robot or an AGV. Herein, the robot and the AGV are required to be able to move quickly, and therefore, it is required that the AI mounted on the robot and the AGV can be computed at a high speed. Deep neural network (DNN) is known as highly accurate image recognition AI. In the DNN, an image is analyzed by repeating convolution computation or the like on a feature map generated from the image. A unit of the computation is called a layer, and a size of the feature map and the number (number of channels) of the feature maps used for computation change depending on the layer. In general, in the DNN, in the first half of the computation network, the feature map is large in size and small in the number of channels, like the feature maps 1-1, 1-2, and 1-3 illustrated in an example of FIG. 1A. On the other hand, in the latter half of the computation network, the feature map is small in size and large in the number of channels, like the feature maps 2-1, 2-2, . . . , and 2-12 illustrated in FIG. 1B. The large-scale DNN is required for highly accurate image analysis. Generally, in the large-scale DNN, the feature map is large in size and large in the number of channels. On the other hand, to perform a highly accurate image analysis, especially by edge devices with limited process power and memory capacity, there is known partitioned DNN which is a method of partitioning the feature map into the plurality of areas and allocating individual devices to each area. For example, Li Zhou, Hao Wen, Radu Teodorescu and David H. C. Du, et al., “Distributing Deep Neural Networks with Containerized Partitions at the Edge”, 2nd USENIX Workshop on Hot Topics in Edge Computing, HotEdge (2019). https://www.usenix.org/system/files/hotedge19-paper-zhou.pdf focuses mainly on a magnitude of the feature map size near an input layer in the partitioned DNN as illustrated in FIG. 2A and discloses a technique using spatial partitioning which is a method of partitioning an inside of each feature map which is a two-dimensional data into a plurality of areas. Herein, as an example of the spatial partitioning, FIG. 2A illustrates a case where each of the feature map 1-1, 1-2, and 1-3 is partitioned into four areas, and each area is partitioned into a total of four edge devices 1, 2, 3, and 4. SUMMARY OF THE INVENTION However, as illustrated in FIG. 1B, the size of the feature map is reduced in the latter half of a DNN computation network. Therefore, in the spatial partitioning, there is a problem that computational efficiency decreases in the latter half of the DNN computation network. There is a problem that a large amount of memory is required for large-scale DNN computations. In such partitioned DNN, since it is necessary to operate the plurality of edge computing devices, there is a problem that power consumption may increase. Therefore, there is a demand for the computing apparatus capable of appropriately performing the DNN computation. Accordingly, the invention is to provide a computing apparatus capable of appropriately performing DNN computation. According to an aspect of a computing apparatus of the invention, there is provided a computing apparatus including at least one computing device including at least one computation area for performing computation, the computing apparatus performing deep neural network (DNN) computation by using the computing device, the computing device including an intermediate memory area, the computing apparatus including: a plurality of the computation areas; and a DNN computation execution unit acquiring input information, generating a plurality of input feature maps from the input information, transmitting the generated plurality of input feature maps to the computing device to be used in the computation among the computing devices, and performing DNN computation on the plurality of input feature maps in parallel by using the plurality of computation area, in which the DNN computation execution unit performs the DNN computation in parallel on the plurality of input feature maps in at least one DNN partitioning method, in the DNN partitioning method when performing the DNN computation in parallel includes at least channel partitioning, in which, in the process of DNN computation, (1) without partitioning each of the plurality of feature maps to be used in the computation for the layer, the plurality of feature maps to be used in the computation for the layer are g