Search

KR-102962746-B1 - A METHOD AND APPARTUS FOR PERFORMING PROCESS IN MEMORY

KR102962746B1KR 102962746 B1KR102962746 B1KR 102962746B1KR-102962746-B1

Abstract

The following disclosure relates to a memory device that performs PIM, comprising a weight handler that receives weights from a memory bank, a Multiply-Accumulate (MAC), and a register file that stores input data and operation results. The weight handler controls the distribution of weights based on the specifications of the MAC, and the MAC can perform operations on the input and weights received from the register file based on the weight distribution results.

Inventors

  • 이원조
  • 차상훈
  • 이선정

Assignees

  • 삼성전자주식회사

Dates

Publication Date
20260508
Application Date
20240103

Claims (19)

  1. A weight handler that receives weights from a memory bank; MAC (Multiply-Accumulate) including multiple MAC units; and A register file that stores input data and operation results; Includes, The above weight handler is Based on the specifications of the above MAC, In correspondence with the number of MAC units, the weight is distributed to the plurality of MAC units using a broadcasting unit, and Corresponding to the different frequencies of the above MAC units, the distribution of the weights is controlled through an asynchronous FIFO, and The weight is stored in a buffer so that an operation corresponding to the frequency of each of the above MAC units can be performed, and The above MAC is A memory device that performs Process In Memory (PIM), which calculates the input received from the register file and the weight based on the distribution result of the weight.
  2. In paragraph 1, The specifications of the above MAC are At least one of the information regarding the number of MAC units included in the MAC and the frequency of each MAC unit included in the MAC. A memory device that performs PIM, including
  3. delete
  4. In paragraph 1, The above weight handler is A memory device that performs PIM, which distributes the weights so that the MAC units can perform operations simultaneously using the broadcasting unit, corresponding to the number of MAC units.
  5. In paragraph 1, The above weight handler is A memory device that performs PIM, which transfers the weight to the MAC through the asynchronous FIFO in correspondence with the frequency of the MAC unit.
  6. In paragraph 5, The above weight handler is A memory device that performs PIM, which reuses the weight stored in the buffer to perform operations corresponding to the frequency of each of the MAC units included in the MAC.
  7. In paragraph 1, The above weight handler is A memory device that performs PIM, corresponding to the specifications of the above MAC, inputs the above weight into the asynchronous FIFO, and distributes the above weight output from the asynchronous FIFO to the MAC unit through the broadcasting unit and the buffer.
  8. In paragraph 1, The above register file A memory device that performs PIM, comprising one or more scalar register files (SRF) or one or more vector register files (VRF).
  9. In paragraph 1, The above memory is A memory device that performs PIM, including DRAM (Dynamic Random Access Memory).
  10. A step of storing input data in a register file, reading a weight from a memory bank, and passing it to a weight handler; A step of controlling the distribution of the weight through the weight handler based on the specifications of the MAC; A step of calculating the input and the weight received from the register file based on the distribution result of the weight; and Step of saving the above operation result to the above register file Includes, The step of adjusting the distribution of the above weight A step of distributing weights to the plurality of MAC units using a broadcasting unit in correspondence with the number of MAC units included in the MAC (Multiply-Accumulate) comprising the plurality of MAC units; A step of controlling the distribution of the weights through an asynchronous FIFO in response to different frequencies of the MAC units; and A step of storing the weight in a buffer so that an operation corresponding to the frequency of each of the above MAC units can be performed. A method of operation of a memory device performing PIM, including
  11. In Paragraph 10, The specifications of the above MAC are At least one of the number of MAC units included in the MAC and information regarding the frequency of each MAC unit included in the MAC. A method of operation of a memory device performing PIM, including
  12. delete
  13. In Paragraph 10, The step of adjusting the above distribution A step of distributing the weights so that the MAC units can perform operations simultaneously using the broadcasting unit, corresponding to the number of MAC units. A method of operation of a memory device performing PIM, including
  14. In Paragraph 10, The step of adjusting the above distribution A step of transmitting the weight to the MAC through the asynchronous FIFO in correspondence with the frequency of the MAC unit. A method of operation of a memory device performing PIM, including
  15. In Paragraph 14, The step of transmitting to the above MAC A method of operation of a memory device performing PIM, further comprising the step of reusing the weight stored in the buffer so as to perform an operation corresponding to the frequency of each of the MAC units included in the MAC.
  16. In Paragraph 10, The step of adjusting the above distribution A step of inputting the weight into the asynchronous FIFO in accordance with the specifications of the above MAC, and distributing the weight output from the asynchronous FIFO to the MAC unit through the broadcasting unit and the buffer. A method of operation of a memory device performing PIM, including
  17. In Paragraph 10, The above register file A method of operation of a memory device performing PIM, comprising one or more scalar register files or one or more vector register files.
  18. In Paragraph 10, The above memory is A method of operation of a memory device that performs PIM, including a DRAM (Dynamic Random Access Memory).
  19. A computer program stored on a computer-readable recording medium in combination with hardware to execute the method of any one of claims 10, 11 and 13 through 18.

Description

A memory device and apparatus for performing process in memory The following disclosure relates to a memory device performing PIM and a method of operating the same. PIM can refer to a computing architecture where memory and processing units are closely integrated or located in the same physical location. In traditional computing architectures, there is a separation between the processor and memory, so data is typically transferred or processed between the two elements. In a PIM architecture, processing and memory functions are performed within the same physical location and, where possible, within the same hardware components. PIM can play a significant role in terms of speed and efficiency by reducing data movement between the processor and memory. By placing processing closer to the data, PIM can be intended to improve the performance of computations, particularly those involving large datasets. FIG. 1 is a schematic diagram illustrating a general PIM operation of a memory device according to one embodiment. FIG. 2 is a block diagram schematically illustrating a memory device according to one embodiment. FIG. 3 is a flowchart for explaining the operation of a memory device according to one embodiment. FIG. 4 briefly illustrates the structure of a memory device including a PIM unit according to one embodiment. FIG. 5 is a schematic diagram illustrating an example of a general PIM operation according to one embodiment. FIG. 6 schematically illustrates a PIM operation by a weight handler according to one embodiment. FIGS. 7 and FIGS. 8 are block diagrams schematically illustrating a PIM operation using a plurality of MAC units according to one embodiment. FIGS. 9 and FIGS. 11 are block diagrams schematically illustrating a PIM operation using one or more Faster MAC units according to one embodiment. Specific structural or functional descriptions of the embodiments are disclosed for illustrative purposes only and may be modified and implemented in various forms. Accordingly, actual implementations are not limited to the specific embodiments disclosed, and the scope of this specification includes modifications, equivalents, or substitutions included in the technical concept described by the embodiments. Terms such as "first" or "second" may be used to describe various components, but these terms should be interpreted solely for the purpose of distinguishing one component from another. For example, the first component may be named the second component, and similarly, the second component may be named the first component. When it is stated that a component is "connected" to another component, it should be understood that it may be directly connected to or coupled with that other component, or that there may be other components in between. The singular expression includes the plural expression unless the context clearly indicates otherwise. In this specification, terms such as "comprising" or "having" are intended to specify the existence of the described features, numbers, steps, actions, components, parts, or combinations thereof, and should be understood as not precluding the existence or addition of one or more other features, numbers, steps, actions, components, parts, or combinations thereof. In this document, each of the phrases such as "A or B", "at least one of A and B", "at least one of A or B", "A, B or C", "at least one of A, B and C", and "at least one of A, B, or C" may include any one of the items listed together in the corresponding phrase, or all possible combinations thereof. Unless otherwise defined, all terms used herein, including technical or scientific terms, have the same meaning as generally understood by those skilled in the art. Terms such as those defined in commonly used dictionaries should be interpreted as having a meaning consistent with their meaning in the context of the relevant technology, and should not be interpreted in an ideal or overly formal sense unless explicitly defined in this specification. The embodiments can be implemented in various forms of products, such as personal computers, laptop computers, tablet computers, smartphones, televisions, smart home appliances, intelligent vehicles, kiosks, and wearable devices. The embodiments will be described in detail below with reference to the attached drawings. In the description with reference to the attached drawings, identical components are given the same reference numeral regardless of the drawing number, and redundant descriptions thereof will be omitted. FIG. 1 is a schematic diagram illustrating a general PIM operation of a memory device according to one embodiment. Referring to FIG. 1, a conventional PIM (Processing In Memory) operation may mean that if M vectors are received as inputs (110), the memory device must repeat reading the inputs (110) and reading the weights (120) M times to perform operations on the inputs (110). That is, if the input (110) is an M x K input matrix, the memory device may need to read