EP-4354387-B1 - IMAGE PROCESSING DEVICE, IMAGE PROCESSING METHOD, AND IMAGE PROCESSING PROGRAM

EP4354387B1EP 4354387 B1EP4354387 B1EP 4354387B1EP-4354387-B1

Inventors

OMORI, YUYA
NAKAMURA, KEN
KOBAYASHI, DAISUKE
YOSHIDA, SHUHEI
HATTA, SAKI
UZAWA, Hiroyuki
NITTA, KOYO

Dates

Publication Date: 20260506
Application Date: 20211208

Claims (6)

An image processing device including a neural network including convolution processing for an image, the image processing device comprising: an acquisition unit that acquires a target image to be processed; and a processing unit that processes the target image using the neural network including convolution processing, wherein: when an output feature map constituting an output of the convolution processing is output, the processing unit outputs, to a storage unit, respective small regions dividing the output feature map, and when each of the small regions is output to the storage unit, in a case in which a feature included in the small region is the same as a predetermined feature or a feature of a small region output in the past, the processing unit compresses and outputs the predetermined feature or the feature of the small region output in the past to the storage unit.
The image processing device according to claim 1, wherein: when the convolution processing is performed using the neural network including continuous convolution processing, the processing unit reads an output feature map of a previous convolution processing from the storage unit, and performs the convolution processing for each of small regions obtained by dividing an input feature map constituting an input of the convolution processing, and when the convolution processing is performed for each of the small regions, in a case in which a feature included in the small region is the same as a predetermined feature or a feature of a small region processed in the past, the processing unit does not perform the convolution processing on the small region, and outputs a result of processing on the predetermined feature or a result of processing in the past as a result of processing the small region.
The image processing device according to claim 1 or 2, wherein the processing unit sets a small region that has an overlapping region overlapping an adjacent small region and having a size corresponding to a kernel size of the convolution processing of a subsequent stage as a small region obtained by dividing the output feature map, and determines whether a feature included in the small region is the same as the predetermined feature or the feature of a small region output in the past.
The image processing device according to any one of claims 1 to 3, wherein the predetermined feature includes features in the small region which are the same.
An image processing method of an image processing device including a neural network including convolution processing for an image, the image processing method comprising: acquiring, by an acquisition unit, a target image to be processed; and processing, by a processing unit, the target image using the neural network including convolution processing, wherein: when an output feature map constituting an output of the convolution processing is output, the processing unit outputs, to a storage unit, respective small regions dividing the output feature map, and when each of the small regions is output to the storage unit, in a case in which a feature included in the small region is the same as a predetermined feature or a feature of a small region output in the past, the processing unit compresses and outputs the predetermined feature or the feature of the small region output in the past to the storage unit.
An image processing program for causing a computer including a neural network including convolution processing for an image to execute: acquiring a target image to be processed; and processing the target image using the neural network including convolution processing, wherein: when an output feature map constituting an output of the convolution processing is output, respective small regions dividing the output feature map are output to a storage unit, and when each of the small regions is output to the storage unit, in a case in which a feature included in the small region is the same as a predetermined feature or a feature of a small region output in the past, the predetermined feature or the feature of the small region output in the past is compressed and output to the storage unit.

Description

Technical Field The technology of the present disclosure relates to an image processing device, an image processing method, and an image processing program. Background Art In a case where inference using a convolutional neural network (CNN) is performed, a network includes a plurality of layers, and convolution processing is performed in a convolutional layer. Convolution processing performs a product-sum operation and activation processing. In inference using a CNN, the convolution operation described above occupies most of the entire processing amount. Even in a case where an inference engine using a CNN as hardware is implemented, performance of the convolution operation is directly connected to the performance of the entire engine. Figs. 23 and 24 illustrate examples of the convolution operation in a case where the kernel size is 3 × 3. Fig. 23 illustrates an example in which a convolution operation is performed on a 3 × 3 input feature map using a 3 × 3 kernel. In this example, nine product-sum operations are performed to output a 1 × 1 output feature map. Furthermore, Fig. 24 illustrates an example in which a convolution operation is performed on a (W + 2) × (H + 2) input feature map using a 3 × 3 kernel. In this example, nine product-sum operations are repeated while moving the kernel on the input feature map to output a W × H output feature map. In hardware that performs a convolution operation of a CNN, in order to increase throughput, a circuit is often prepared so that the input feature map is divided into small regions of a certain fixed size and a product-sum operation for one small region can be performed at a time (see Fig. 25). Fig. 25 illustrates an example in which a 26 × 14 input feature map is divided into nine 10 × 6 small regions, and an arithmetic circuit performs convolution processing at 32 points (8 × 4 points) simultaneously using a 3 × 3 kernel and outputs an 8 × 4 output feature map. In this example, a dot part of the input feature map is one small region, and the arithmetic circuit performs 32-point simultaneous convolution processing on each of the nine small regions, thereby outputting a 24 × 12 output feature map. Furthermore, as one of the calculation speedup methods, as illustrated in Fig. 26, a method of skipping the calculation in a case where values of a small region of the input feature map are all 0 is known (see Non Patent Literature 1, for example). Fig. 26 illustrates an example of a case where the size of the output small region is 4 × 2, the kernel size is 3 × 3, and 4-bit data representing 0 to 15 is used. In this example, the size of the small region of the input feature map is 6 × 4, and values of the small region indicated by the dotted line are all 0. Since the result of product-sum operation with respect to 0 is 0, it is not necessary to perform the convolution processing by the arithmetic circuit, and the convolution processing on the small region can be skipped. Citation List CN110163370A describes a deep neural network compression method in which each layer's feature map is compressed using video-coding techniques (prediction, transform, quantisation, entropy coding) and stored to reduce memory bandwidth. Non Patent Literature 1: Angshuman Parashar, Minsoo Rhu, Anurag Mukkara, Antonio Puglielli, Rangharajan Venkatesan, Brucek Khailany, Joel Emer, Stephen W. Keckler, William J. Dally, "SCNN: An Accelerator for Compressed-sparse Convolutional Neural Networks", arXiv:1708.04485, 23 May 2017 Summary of Invention Technical Problem Here, if an attempt is made to increase the size of the small region in order to increase throughput, there will be less cases where all the values of the small region of the input feature map are 0, and a sufficient calculation speedup cannot be expected. For example, as illustrated in Fig. 27A, in a case where the size of the small region of the output feature map is 4 × 2 (case where size of small region of input feature map is 6 × 4), values of the small region indicated by the dotted line of the input feature map are all 0. On the other hand, as illustrated in Fig. 27B, in a case where the size of the small region of the output feature map is 8 × 4 (case where size of small region of input feature map is 10 × 6), a non-zero value is included in the small region indicated by the dotted line of the input feature map. Furthermore, the size of the small region is directly connected to calculation throughput, and therefore is difficult to change in many cases. Furthermore, in a case where the output feature map is output to a memory, the larger the data size, the longer the memory access takes, and the more severely the calculation speedup is hindered. The technology disclosed herein has been made in view of the above points, and an object thereof is to provide an image processing device, an image processing method, and an image processing program capable of suppressing the data size when an output feature map is output. Solution to