KR-20260064116-A - METHOD OF OPERATION OF MEMORY DEVICE FOR ACCELERATING COMPUTING IN MEMORY

KR20260064116AKR 20260064116 AKR20260064116 AKR 20260064116AKR-20260064116-A

Abstract

The present application relates to a method of operating a memory device. A memory device according to some embodiments of the present application may include: a step of loading a first weight into a first memory macro to generate a first data of a first matrix data and loading the first data into a second memory macro; a step of loading a second weight into the first memory macro to generate m (where m is a natural number) data of a second matrix data and loading the m data into the second memory macro and performing a first matrix operation with the first data; and a step of loading the first weight into the first memory macro to generate the nth (where n is a natural number other than 1) data of the first matrix data and loading the nth data of the first matrix data into the second memory macro and performing the first matrix operation with the nth data among the m data of the second matrix data.

Inventors

박종선
박준우
이경호

Assignees

고려대학교 산학협력단

Dates

Publication Date: 20260507
Application Date: 20241031

Claims (10)

In the method of operating a memory device, A step of generating the first data of the first matrix data by loading the first weight into the first memory macro, and loading the first data into the second memory macro; A step of loading a second weight into the first memory macro to generate m (where m is a natural number) data of the second matrix data, loading the m data into the second memory macro, and performing a first matrix operation with the first data; and A method of operating a memory device comprising the steps of: loading the first weight into the first memory macro to generate the nth (where n is a non-1 natural number) data of the first matrix data; loading the nth data of the first matrix data into the second memory macro; and performing the first matrix operation with the nth data among the m data of the second matrix data.
In paragraph 1, The above second memory macro is a method of operation of a memory device that performs a transpose matrix multiplication as the above first matrix operation.
In paragraph 1, The above first matrix data is matrix data based on a key, and The above second matrix data is a method of operation of a memory device that is matrix data based on a query.
In paragraph 1, The step of performing the first matrix operation with the first data above is, Method of operation of a memory device that generates m result values.
In paragraph 1, The step of performing the first matrix operation with the nth data among the m data of the second matrix data is A method of operation of a memory device that generates one result value.
In the method of operating a memory device, A step of performing a second matrix operation between the r-th data (where r is a natural number) of the third matrix data and the r-th column of the fourth matrix data in the first memory macro; A step of generating the r+1th data of the third matrix data by loading a third weight into a second memory macro, and loading the r+1th data into a first memory macro; and A method of operating a memory device comprising the step of performing the second matrix operation between the r+1th data of the third matrix data and the r+1th column of the fourth matrix data.
In paragraph 6, The above first memory macro is a method of operation of a memory device that performs matrix multiplication as the above second matrix operation.
In paragraph 6, The above third matrix data is matrix data based on value, and The above fourth matrix data is a method of operation of a memory device in which the matrix data is based on the result value of a softmax operation.
In paragraph 6, The method of operating the above memory device is, A method of operation of a memory device that generates a result value corresponding to the row number of the fourth matrix data through the second matrix operation.
In paragraph 6, The step of loading the r+1th data above is, A method of operating a memory device that is performed simultaneously with the step of performing the second matrix operation between the r-th data of the third matrix data and the r-th column of the fourth matrix data.

Description

Method of Operation of Memory Device for Accelerating Computing in Memory The present invention relates to a method of operation of a memory device for accelerating computation in memory computing. Deep Neural Networks (DNNs) are a field of machine learning that is currently being utilized in various areas such as image analysis, object recognition, and image segmentation, and can generate output values by multiplying input data and weights based on matrix operations. Meanwhile, although a memory technology called Computing In Memory (CIM) has recently been attracting attention as an accelerator for deep artificial neural networks, there is a problem in that it requires a significant amount of time to perform multiple matrix operations. FIG. 1 is a block diagram showing a memory device according to some embodiments of the present application. FIG. 2 is a block diagram schematically illustrating the steps of a first matrix operation method according to some embodiments of the present application. FIGS. 3a to 3e are drawings illustrating a first matrix operation method according to some embodiments of the present application. FIG. 4 is a timing diagram for a first matrix operation according to some embodiments of the present application. FIG. 5 is a block diagram schematically illustrating the steps of a second matrix operation method according to some embodiments of the present application. FIGS. 6a and FIGS. 6b are drawings illustrating a second matrix operation method according to some embodiments of the present application. FIG. 7 is a block diagram showing a processor of a memory device according to some embodiments of the present application. In the following, embodiments of the present invention will be described clearly and in detail with reference to the accompanying drawings. FIG. 1 is a block diagram showing a memory device according to some embodiments of the present application. Referring to FIG. 1, a memory device (10) according to some embodiments may include a first memory macro (100), a second memory macro (200), and a control unit (300). In some embodiments, the memory device (10) can perform various data processing or matrix operations as a device for performing Computing In Memory (CIM) operations. For example, the memory device (10) can perform various types of neural networks trained by machine learning and/or deep learning. Memory macros (100, 200) may include a plurality of unit cells to store input data and weights and perform matrix operations. Here, the plurality of unit cells may be composed of at least one of volatile memory and non-volatile memory. Volatile memory may include SRAM (Static RAM), DRAM (Dynamic RAM), SDRAM (Synchronous DRAM), and non-volatile memory may include ROM (Read Only Memory), PROM (Programmable ROM), EEPROM (Electrically Erase and Programmable ROM), EPROM (Electrically Programmable ROM), flash memory, PRAM (Phase change RAM), MRAM (Magnetic RAM), RRAM (Resistive RAM), and FRAM (Ferroelectric RAM), etc. However, this is merely illustrative and is not limited thereto. Memory macros (100, 200) can be electrically connected to transmit data. Specifically, memory macros (100, 200) can perform one or more CIM operations based on stored weights and input data, and generate output data as a result of the CIM operation. Here, the memory macros (100, 200) transmit output data to one of the memory macros (100, 200) in correspondence with the direction in which weights and/or input data are input, and the other of the memory macros (100, 200) can perform matrix operations on the input output data and weights to output a result value. A more detailed explanation will be provided below in FIGS. 2 to 7. The control unit (300) is electrically connected to the memory macros (100, 200) and can control the memory macros (100, 200). For example, the control unit (300) can transmit weights and input data required for CIM operation to any one of the memory macros (100, 200), and can receive result values output from any one of the memory macros (100, 200). In some embodiments, the control unit (300) may generate a first weight, a second weight, a third weight, and a fourth weight from input data input from the outside. Here, the first weight may be a weight for a key, the second weight may be a weight for a query, the third weight may be a weight for a value, and the fourth weight may be a weight for the result of a softmax operation. FIG. 2 is a block diagram schematically illustrating the steps of a first matrix operation method according to some embodiments of the present application. Referring to FIG. 2, the control unit (300) of the memory device (10) can load a first weight (W K ) and a second weight (W Q ) into a first memory macro (100). Specifically, the control unit (300) can generate the first weight (W K ) and the second weight (W Q ) from input data (X) input from the outside. The first memory macro (100) can load the first weight (W K ) and the second weight (W Q )