US-12625675-B2 - Convolutional computation device
Abstract
A convolutional computation device includes a two-dimensional circulation shift register unit and one or more multiplier-accumulators. The two-dimensional circulation shift register has storage elements, cyclically shifts the data among the storage elements, provides one or more input window in a predetermined area, and selects the data stored in one of the storage elements disposed in the input window as input data. The one or more multiplier-accumulators generate output data by performing a multiply-accumulate operation on the input data input from the two-dimensional circulation shift register unit and weight data from a predetermined filter.
Inventors
- Teppei Hirotsu
Assignees
- DENSO CORPORATION
Dates
- Publication Date
- 20260512
- Application Date
- 20210921
- Priority Date
- 20190328
Claims (12)
- 1 . A convolutional computation device comprising: a two-dimensional circulation shift register unit that has a plurality of storage elements arranged two-dimensionally and respectively storing data, cyclically shifts the data among the plurality of storage elements, provides at least one input window in a predetermined area, and selects the data stored in one of the plurality of storage elements disposed in the at least one input window as input data; and at least one multiplier-accumulator that generates output data by performing a multiply-accumulate operation on the input data input from the two-dimensional circulation shift register unit and weight data from a predetermined filter, wherein: each storage element of the plurality of storage elements includes a multiplexer and a flip-flop, the flip-flop in each storage element is configured to store data output by the multiplexer in each storage element, the multiplexer in each storage element is directly connected by signal lines to the flip-flops of directly adjacent storage elements of the plurality of storage elements, the directly adjacent storage elements being directly adjacent via signal lines in each of an up direction and a down direction from each storage element and directly adjacent via signal lines in each of a left direction and a right direction from each storage element, and the multiplexer in each storage element is configured to select one data element of the directly adjacent storage elements and output data from the selected one data element.
- 2 . The convolutional computation device according to claim 1 , wherein: the at least one multiplier-accumulator performs the multiply-accumulate operation based on a Winograd algorithm; the predetermined filter is a 3 rows and 3 columns matrix filter; the input data is a 5 rows and 5 columns matrix input data; and the output data is a 3 rows and 3 columns matrix output data.
- 3 . The convolutional computation device according to claim 2 , wherein the Winograd algorithm is defined by an equation: Y=A T [[GgG T ]⊙[B T dB]] A (2) where Y is an output data; A, B and G are constant matrixes; g is a weight data; d is an input data; a weight term of GgG T is calculated in advance; and the constant matrixes B and A have one of the elements 0, ±1, ±2, ±3, and ±4.
- 4 . The convolutional computation device according to claim 1 , wherein: the two-dimensional circulation shift register unit has a plurality of input windows including the at least one input window to select a plurality of the input data, the at least one multiplier-accumulator further comprises a plurality of multiplier-accumulators into which the plurality of the input data are input from the two-dimensional circulation shift register unit, respectively.
- 5 . The convolutional computation device according to claim 1 , wherein: the two-dimensional circulation shift register unit changes an input window area of the at least one input window or a shift amount of the shift that cyclically shifts the data.
- 6 . The convolutional computation device according to claim 1 , wherein: each storage element is connected to only four storage elements arranged vertically and horizontally of the each storage element.
- 7 . The convolutional computation device according to claim 1 , further comprising a memory interface connected to the two-dimensional circulation shift register unit, wherein data elements are sequentially input from the memory interface to each storage element in a bottom row of the two-dimensional circulation shift register unit.
- 8 . The convolutional computation device according to claim 1 , wherein the at least one multiplier-accumulator includes an input register, a weight register, at least one multiplier, and an adder tree, the input register is configured to hold the input data in a plurality of input data element storage areas, the weight register is configured to hold the weight data in a plurality of weight data element storage areas, the at least one multiplier is configured to multiply each input data element of the plurality of input data element storage areas with respectively each weight data element of the plurality of weight data element storage areas to generate multiplier results, and the adder tree is configured to calculate a total multiplication result from the multiplier results generated by the at least one multiplier.
- 9 . The convolutional computation device according to claim 1 , wherein the at least one input window comprises a plurality of input window areas which are configured to be switched with each other in the two-dimensional circulation shift register unit.
- 10 . The convolutional computation device according to claim 1 , wherein the at least one input window comprises a plurality of input window areas which are configured to be sequentially switched to select the input data, and the output data is sequentially generated from the input data.
- 11 . The convolutional computation device according to claim 1 , wherein the multiply-accumulate operation is performed only by a bit shift operation and an addition operation.
- 12 . The convolutional computation device according to claim 1 , wherein the plurality of storage elements which are arranged two-dimensionally are arranged in rows and columns.
Description
CROSS REFERENCE TO RELATED APPLICATION The present application is a continuation application of International Patent Application No. PCT/JP2020/012728 filed on Mar. 23, 2020, which designated the U.S. and claims the benefit of priority from Japanese Patent Application No. 2019-062744 filed on Mar. 28, 2019. The entire disclosures of all of the above applications are incorporated herein by reference. TECHNICAL FIELD The present disclosure relates to a convolutional computation device that performs a convolution calculation. BACKGROUND In the convolution calculation, output data is generated by convolving weight data forming a predetermined filter into the input data. In the conceivable convolutional computation device, the convolution calculation is processed by converting the convolution operation into a matrix operation SUMMARY According to an example, a convolutional computation device may include: a two-dimensional circulation shift register unit that has a plurality of storage elements, cyclically shifts the data among the plurality of storage elements, provides at least one input window in a predetermined area, and selects the data stored in one of the storage elements disposed in the input window as input data; and at least one multiplier-accumulator that generates output data by performing a multiply-accumulate operation on the input data input from the two-dimensional circulation shift register unit and weight data from a predetermined filter. BRIEF DESCRIPTION OF THE DRAWINGS The above and other objects, features and advantages of the present disclosure will become more apparent from the below-described detailed description made with reference to the accompanying drawings. In the drawings: FIG. 1 is a block diagram showing a DFP system according to the first embodiment of the present disclosure; FIG. 2 is a block diagram showing a DFP according to the first embodiment of the present disclosure; FIG. 3 is a block diagram showing a convolutional computation circuit according to the first embodiment of the present disclosure; FIG. 4 is a block diagram showing a storage element according to the first embodiment of the present disclosure; FIG. 5 is a block diagram showing a multiplier-accumulator according to the first embodiment of the present disclosure; FIG. 6 is a schematic diagram showing selection of input data by shift operation according to the first embodiment of the present disclosure; FIG. 7 is a conceptual diagram showing the movement of the input data range according to the first embodiment of the present disclosure; FIG. 8 is a diagram showing a table of a series of shift operations according to the first embodiment of the present disclosure; FIG. 9 is a block diagram showing a convolutional computation circuit according to the second embodiment of the present disclosure; FIG. 10 is a diagram showing a table of a series of shift operations according to the second embodiment of the present disclosure; FIG. 11 is a block diagram showing a convolutional computation circuit according to the third embodiment of the present disclosure; FIG. 12 is a diagram showing a table of a constant matrix used in the multiply-accumulate operation of the third embodiment of the present disclosure; and FIG. 13 is a conceptual diagram showing the movement of the input data range according to the third embodiment of the present disclosure. DETAILED DESCRIPTION As a result of detailed examination by the inventor, in the conceivable convolution computation device, since the convolution operation is converted into the matrix operation, it is necessary to convert the input data so that the matrix operation can be performed, and since it is necessary to handle the converted data in which the original input data is overlapped, the difficulty that the hardware and data processing increase and the power consumption increases has been found. A convolutional computation device is provided with reducing power consumption. One embodiment of the present disclosure provides a convolutional computation device including: a two-dimensional circulation shift register having a plurality of storage elements arranged two-dimensionally to store data, respectively, shifting the data cyclically among the plurality of storage elements, setting an input window in a predetermined area, and selecting the data stored in the storage elements in the input window as an input data; and a multiplier-accumulator generating an output data by performing a multiply-accumulate operation of the input data input from the two-dimensional circulation shift register and a weight data from a predetermined filter. According to this, the power consumption in the convolutional computation device is reduced. First Embodiment A first embodiment of the present disclosure will be described below with reference to the drawings of FIGS. 1 to 8. The convolutional computation circuit is used for a data flow processor (Data Flow Processor, hereinafter referred to as “