CN-122024780-A - Memory device, operation method thereof and in-memory processing device
Abstract
Memory devices and methods of operating the same, and in-memory processing devices are disclosed. The memory device includes a in-memory Processing (PIM) block configured to perform an operation between a weight value and an input value, the weight value being represented by a weight scaling factor and a weight element, the input value being represented by an input scaling factor and an input element, wherein the PIM block includes a first scaling register file storing the input scaling factor, a second scaling register file storing the weight scaling factor, a Scalar Register File (SRF) storing the input element, a plurality of Arithmetic Logic Units (ALUs) configured to perform a first operation between the input scaling factor and the weight scaling factor and a second operation between the input element and the weight element in parallel in response to an operation command received from a host, and an accumulator configured to accumulate and store operation results of the first operation and the second operation.
Inventors
- LI XUANZHENG
- LIU ZAIXUN
- LI SHUOHAN
- Che Xiangxun
- Han Yuanduo
Assignees
- 三星电子株式会社
Dates
- Publication Date
- 20260512
- Application Date
- 20251106
- Priority Date
- 20241112
Claims (20)
- 1. A memory device, comprising: An in-memory processing block configured to perform an operation between a weight value represented by a weight scaling factor and a weight element and an input value represented by an input scaling factor and an input element, Wherein the in-memory processing block includes: a first scaling register file storing an input scaling factor, A second scaling register file storing weight scaling factors, A scalar register file, storing input elements, A plurality of arithmetic logic units configured to perform a first operation between the input scaling factor and the weight scaling factor and a second operation between the input element and the weight element in parallel in response to an operation command received from the host, and And an accumulator configured to accumulate and store operation results from the first operation and the second operation.
- 2. The memory device of claim 1, further comprising: A control circuit configured to provide a first operation signal according to the first operation and a second operation signal according to the second operation in parallel in response to the operation command, the first operation signal commanding a first arithmetic logic unit among the plurality of arithmetic logic units, and the second operation signal commanding a second arithmetic logic unit among the plurality of arithmetic logic units, in order to perform the first operation and the second operation in parallel.
- 3. The memory device of claim 2, wherein to accumulate and store the operation results, the control circuit is configured to provide a third operation signal to the accumulator at a time after a first particular period has elapsed since the first and second operation signals were provided, the third operation signal indicating a transition from the first and second partial results of the first operation to the operation result represented in the particular data format.
- 4. The memory device of claim 3, wherein to accumulate and store the operation result, the control circuit is further configured to provide a fourth operation signal to the accumulator at a time after the second specific period has elapsed since the third operation signal was provided, the fourth operation signal indicating addition of the operation result with a pre-stored value of the accumulator.
- 5. The memory device of claim 1, further comprising: The number of the registers in the register file is accumulated, Wherein the accumulator comprises: A data type converter configured to generate an operation result by combining a first partial result of the first operation and a second partial result of the second operation into a specific data format, and And an adder configured to add the generated operation result to a prestored value in the accumulation register file.
- 6. The memory device of claim 1, wherein, The input scaling factor comprises an exponential component of the input value, The input element comprises a mantissa component of the input value, The weight scaling factor includes an exponential component of the weight value, an The weight element includes a mantissa component of the weight value.
- 7. The memory device of claim 6, wherein the plurality of arithmetic logic units comprises: A first arithmetic logic unit configured to perform addition of an exponent component of the input value and an exponent component of the weight value as a first operation, and And a second arithmetic logic unit configured to perform multiplication of the mantissa component of the input value and the mantissa component of the weight value as a second operation.
- 8. The memory device of claim 2, wherein the control circuitry is further configured to perform a dot product operation between an input vector comprising a plurality of input values and a weight matrix comprising a plurality of weight values in response to receiving a plurality of operation commands comprising the operation commands from the host.
- 9. The memory device according to any one of claim 1 to 7, Also includes a memory bank storing a weight scaling factor and a weight element, Wherein, the In response to a command preceding the operation command, an input scaling factor and an input element are received from the host, an In response to a command preceding the operation command, a weight scaling factor is loaded from a memory bank.
- 10. The memory device of claim 2, wherein the control circuitry is further configured to receive and process a first plurality of operation commands for sharing data of the first scaling factor and a second plurality of operation commands for sharing data of the second scaling factor from the host without a barrier.
- 11. A method of operation of a memory device, comprising: receiving an operation command from a host; In response to the received operation command, performing a first operation between an input scaling factor of the input value and a weight scaling factor of the weight value and a second operation between an input element of the input value and a weight element of the weight value in parallel by a plurality of arithmetic logic units, and The operation results from the first operation and the second operation are accumulated, and the accumulated operation results are stored.
- 12. The method of operation of claim 11, wherein performing the first operation and the second operation in parallel by the plurality of arithmetic logic units comprises: In response to the operation command, a first operation signal indicating a first operation is provided to a first arithmetic logic unit among the plurality of arithmetic logic units, and a second operation signal indicating a second operation is provided to a second arithmetic logic unit among the plurality of arithmetic logic units, wherein the step of providing the first operation signal and the step of providing the second operation signal are performed in parallel by a control circuit.
- 13. The method of operation of claim 12 wherein storing the accumulated operational result comprises providing a third operational signal to the accumulator at a time after a first particular period has elapsed since the first and second operational signals were provided, the third operational signal indicating a transition from the first and second partial results of the first operation to the operational result represented in the particular data format.
- 14. The method of operation of claim 13 wherein the step of storing the accumulated operational result further comprises providing a fourth operational signal to the accumulator at a time after the second specified period has elapsed since the third operational signal was provided, the fourth operational signal indicating the addition of the operational result to the pre-stored value of the accumulator.
- 15. The method of operation of claim 11, wherein storing the accumulated operation results comprises: generating an operation result by combining a first partial result of the first operation and a second partial result of the second operation into a particular data format, and The generated operation result is added to a pre-stored value in the accumulation register file.
- 16. The method of operation of claim 11, wherein, The input scaling factor comprises an exponential component of the input value, The input element comprises a mantissa component of the input value, The weight scaling factor includes an exponential component of the weight value, an The weight element includes a mantissa component of the weight value.
- 17. The method of operation of claim 16, wherein performing the first operation and the second operation in parallel by the plurality of arithmetic logic units comprises: Performing addition of the exponent component of the input value as the first operation and the exponent component of the weight value by a first arithmetic logic unit, and Multiplication of the mantissa component of the input value as the second operation with the mantissa component of the weight value is performed by the second arithmetic logic unit.
- 18. The method of operation of any one of claims 11 to 17 further comprising performing a dot product operation between an input vector comprising a plurality of input values and a weight matrix comprising a plurality of weight values in response to receiving a plurality of operation commands comprising the operation commands from a host.
- 19. The method of operation of any of claims 11 to 17, further comprising: In response to a command preceding the operation command, receiving an input scaling factor from the host and storing the received input scaling factor in a first scaling register file; Receiving input elements from a host in response to a command preceding the operation command and storing the received input elements in a scalar register file, and In response to a command preceding the operation command, a weight scaling factor is loaded from the memory bank and the loaded weight scaling factor is stored in the second scaling register file.
- 20. An in-memory processing apparatus comprising: A plurality of arithmetic logic units configured to perform, for each of a plurality of input elements sharing a same input scaling factor and each of a plurality of weight elements sharing a same weight scaling factor: A first operation between the same input scaling factor received from the first scaling register file and the same weight scaling factor received from the second scaling register file, and A second operation between the respective input element and the respective weight element received from the scalar register file.
Description
Memory device, operation method thereof and in-memory processing device The present application claims the benefit of korean patent application No. 10-2024-0160451 filed on the 11 th month 12 of 2024 in the korean intellectual property office, the entire disclosure of which is incorporated herein by reference for all purposes. Technical Field The following description relates to memory devices and methods of operating the same, and in-memory processing devices. Background Efficient and high performance neural network processing is important for devices such as computers, smart phones, tablets, and wearable equipment. The improved processing performance by reducing the power consumption of the device has enabled hardware accelerators dedicated to performing specialized tasks. For example, multiple hardware accelerators may be connected to generate computational graphs for applications such as Natural Language Processing (NLP), language translation, and text generation. Thus, a subsystem for accelerating NLP, language translation, and text generation may include multiple dedicated hardware accelerators with efficient streaming interconnections for data transfer between the hardware accelerators. The near memory accelerator may be a hardware accelerator implemented near memory. In-memory computing (IMC) may be an implementation of a hardware accelerator within a memory. Disclosure of Invention This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter. In one or more general aspects, a memory device includes a in-memory Processing (PIM) block configured to perform operations between weight values represented by weight scaling factors and weight elements and input values represented by input scaling factors and input elements, wherein the PIM block may include a first scaling register file storing the input scaling factors, a second scaling register file storing the weight scaling factors, a Scalar Register File (SRF) storing the input elements, a plurality of Arithmetic Logic Units (ALUs) configured to perform a first operation between the input scaling factors and the weight scaling factors and a second operation between the input elements and the weight elements in parallel in response to an operation command received from a host, and an accumulator configured to accumulate and store operation results from the first operation and the second operation. The memory device may include control circuitry configured to provide a first operation signal and a second operation signal in parallel in response to the operation command, the first operation signal commanding a first ALU of the plurality of ALUs according to the first operation, and the second operation signal commanding a second ALU of the plurality of ALUs according to the second operation, in order to perform the first operation and the second operation in parallel. In order to accumulate and store the operation results, the control circuit may be configured to provide a third operation signal to the accumulator at a time after a first specific period has elapsed since the first operation signal and the second operation signal, the third operation signal indicating a transition from the first partial result of the first operation and the second partial result of the second operation to the operation result represented in the specific data format. For accumulating and storing the operation result, the control circuit may be configured to supply a fourth operation signal to the accumulator at a time after the second specific period has elapsed since the third operation signal, the fourth operation signal indicating addition of the operation result with a prestored value of the accumulator. The memory device may include an Accumulation Register File (ARF), wherein the accumulator may include a data type converter configured to generate an operation result by merging a first partial result of a first operation and a second partial result of a second operation into a particular data format, and an adder configured to add the generated operation result to a pre-stored value in the ARF. The input scaling factor may include an exponent component of the input value, the input element may include a mantissa component of the input value, the weight scaling factor may include an exponent component of the weight value, and the weight element may include a mantissa component of the weight value. The plurality of ALUs may include a first ALU configured to perform addition of an exponent component of an input value and an exponent component of a weight value as a first operation, and a second ALU configured to perform multiplication of a mantissa component of an input value and a mantiss