KR-20260065277-A - CONVOLUTIONAL OPERATION METHOD USING BLOCK-BASED DISTRIBUTED INPUT OUTPUT AND COMPRESSION

KR20260065277AKR 20260065277 AKR20260065277 AKR 20260065277AKR-20260065277-A

Abstract

The present invention relates to a convolution operation method using block-unit distributed input/output and compression, comprising: a first step of reading each row vector of a target matrix to be operated on from one auxiliary storage device into main memory, performing first zero padding and a first Fast Fourier Transform to obtain a first row vector set matrix, transposing the obtained first row vector set matrix, and dividing a compressed first compressed block into block units that are submatrices, and distributing and storing a plurality of first divided blocks in a plurality of auxiliary storage devices; A second step of reading a column vector block among the plurality of first compression blocks from the plurality of auxiliary storage devices, decompressing the column vector block by rearranging it using a transpose matrix method, performing second zero padding and a second Fast Fourier Transform to obtain a submatrix of the same size as the row vector set matrix, multiplying the obtained submatrix by an Hadamard transfer function and then performing a first inverse Fast Fourier Transform to obtain a second row vector set matrix, unpadding and transposing the obtained second row vector set matrix, and dividing the compressed second compression block into blocks corresponding to the submatrix, thereby distributing and storing the plurality of second divided blocks in the plurality of auxiliary storage devices; and a third step of reading a row vector block among the plurality of second compression blocks from the plurality of auxiliary storage devices, decompressing the row vector block, performing a second inverse Fast Fourier Transform to obtain a third row vector set matrix, unpadding the obtained third row vector set matrix, and storing it by sequential writing at the location of the target matrix; thereby, in the input/output process of the convolution operation, the input/output performance of the auxiliary storage device can be improved by using sequential access for both row vectors and column vectors, and the data can be distributed across four or more auxiliary storage devices by reducing the data size through data compression, thereby effectively reducing the input/output time required for the convolution operation.

Inventors

김덕수
이재홍

Assignees

한국기술교육대학교 산학협력단

Dates

Publication Date: 20260508
Application Date: 20241101

Claims (7)

A first step of reading each row vector of a target matrix to be operated on from one auxiliary storage device into main memory, performing first zero padding and a first Fast Fourier Transform to obtain a first row vector set matrix, transposing the obtained first row vector set matrix, dividing the compressed first compressed block into blocks that are submatrices, and distributing and storing a plurality of first divided blocks in a plurality of auxiliary storage devices; A second step of reading a column vector block among the plurality of first compression blocks from the plurality of auxiliary storage devices, decompressing the column vector block by rearranging it using a transpose matrix method, performing a second zero padding and a second Fast Fourier Transform to obtain a submatrix of the same size as the row vector set matrix, multiplying the obtained submatrix by an Hadamard transfer function and then performing a first inverse Fast Fourier Transform to obtain a second row vector set matrix, unpadding and transposing the obtained second row vector set matrix, dividing the compressed second compression block into blocks corresponding to the submatrix, and distributing and storing the plurality of second divided blocks in the plurality of auxiliary storage devices; and A third step of reading a row vector block from among the plurality of second compression blocks from the plurality of auxiliary storage devices, decompressing the row vector block, performing a second inverse Fast Fourier Transform to obtain a third row vector set matrix, unpadding the obtained third row vector set matrix, and storing it by sequential writing at the location of the target matrix; A convolution operation method using block-unit distributed I/O and compression including
In claim 1, The above first step is, Each row vector is read into the main memory using a sequential access method for the above target matrix, and the plurality of first compression blocks are distributed and stored in the plurality of auxiliary memory devices using a sequential access method. Convolution operation method using block-unit distributed I/O and compression.
In claim 2, The above first step is, After performing a row-direction Fourier transform on the above target matrix, transpose it, divide it into blocks, and distribute and store it in the plurality of auxiliary storage devices. Convolution operation method using block-unit distributed I/O and compression.
In claim 2, The above second step is, Among the plurality of first compression blocks, the column vector block is read from the plurality of auxiliary storage devices in a sequential access manner, and the plurality of second compression blocks are distributed and stored in the plurality of auxiliary storage devices in a sequential access manner. Convolution operation method using block-unit distributed I/O and compression.
In claim 2, The above second step is, In the case of the transfer function that multiplies the above submatrix by the Hadamard product, it has the same size as the Fourier space of the zero-padded target matrix. Convolution operation method using block-unit distributed I/O and compression.
In claim 5, The above second step is, After multiplying the above transfer function by the Hadamard product, crop the above submatrix according to its size and position. Convolution operation method using block-unit distributed I/O and compression.
In claim 6, The above third step is, Among the plurality of second compression blocks, the row vector block is read from the plurality of auxiliary storage devices in a sequential access manner, and stored at the location of the target matrix in a sequential access manner. Convolution operation method using block-unit distributed I/O and compression.

Description

Convolutional Operation Method Using Block-Based Distributed Input/Output and Compression The present invention relates to a convolution operation method using block-unit distributed input/output and compression, which can improve the input/output performance of an auxiliary storage device by utilizing sequential access for both row vectors and column vectors during the input/output process of a convolution operation, and can also distribute data across four or more auxiliary storage devices by reducing the data size through data compression, thereby effectively reducing the input/output time required for the convolution operation. As is well known, convolution is a mathematical operator that inverts and shifts a signal or function to multiply it by another signal and then integrates it. It is used for the composition of functions and signals and is utilized in a wide range of fields, such as artificial intelligence, signal processing, and holography. To explain this convolution operation, it is a mathematical theorem stating that the convolution of two signals is equal to the Fourier space product of the two signals. The operation complexity of a convolution of a one-dimensional function is O( n² ), but since the operation complexity of the Fast Fourier Transform (FFT) is O(nlog n), the operation complexity of the convolution is reduced, allowing for faster computation. Also, since the result of the convolution theorem is a circular convolution, to obtain a linear convolution using the convolution theorem, zero padding must be performed by adding zeros to the discrete signal. Here, the Fast Fourier Transform (FFT) is a fast computation algorithm that can reduce the computational complexity of the Discrete Fourier Transform from O(n 2 ) to O(n log n), and the computational complexity of the 2D Fast Fourier Transform is O(n 2 log n). Meanwhile, wavefront propagation is a technique that utilizes convolution operations. Data can be stored in secondary storage to perform convolution operations on matrices larger than those in main memory. The matrix can perform read-1D matrix operations-write operations in three stages. By sequentially reading the row vectors of the matrix, performing a 1D Fourier transform, transposing them into column vectors, and storing them in the column direction in secondary storage, sequential access reading and stride access writing are performed. This results in improved read performance but relatively lower write performance. In addition, convolution operations using multiple secondary storage devices are a technique for distributing input and output operations across multiple secondary storage devices. By dividing the matrix through implicit convolution and distributing the input and output operations across up to four secondary storage devices, the input and output throughput of each secondary storage device is reduced, thereby allowing for a reduction in data input/output time during the operation. In the case of the conventional convolution operation described above, there is a problem in that outputting a column vector to a secondary storage device takes a long time due to the low output throughput caused by the stride access method, and there are limitations to performance improvement because only up to four secondary storage devices can be used. FIG. 1 is a flowchart illustrating a convolution operation process using block-unit distributed input/output and compression according to an embodiment of the present invention, and FIGS. 2 to 9 are drawings for explaining the detailed configuration of a convolution operation process using block-unit distributed input/output and compression according to an embodiment of the present invention. The advantages and features of the embodiments of the present invention, and the methods for achieving them, will become clear by referring to the embodiments described below in detail together with the accompanying drawings. However, the present invention is not limited to the embodiments disclosed below but may be implemented in various different forms. These embodiments are provided merely to ensure that the disclosure of the present invention is complete and to fully inform those skilled in the art of the scope of the invention, and the present invention is defined only by the scope of the claims. Throughout the specification, the same reference numerals refer to the same components. In describing the embodiments of the present invention, specific descriptions of known functions or configurations will be omitted if it is determined that such detailed descriptions could unnecessarily obscure the essence of the invention. Furthermore, the terms described below are defined in consideration of their functions in the embodiments of the present invention, and these definitions may vary depending on the intentions or practices of the user or operator. Therefore, such definitions should be based on the content throughout this specification. Hereinafte