Search

EP-4742052-A1 - MEMORY DEVICE AND OPERATING METHOD WITH DATA FORMAT OPERATION

EP4742052A1EP 4742052 A1EP4742052 A1EP 4742052A1EP-4742052-A1

Abstract

A memory device includes a processing-in-memory (PIM) block configured to perform an operation between a weight value, which is represented by a weight scale factor and a weight element, and an input value, which is represented by an input scale factor and an input element, wherein the PIM block includes a first scale register file storing the input scale factor, a second scale register file storing the weight scale factor, a scalar register file (SRF) storing the input element, a plurality of arithmetic logic units (ALUs) configured to, in response to an operation command received from a host, perform, in parallel, a first operation between the input scale factor and the weight scale factor and a second operation between the input element and the weight element, and an accumulator configured to accumulate and store an operation result from the first operation and the second operation.

Inventors

  • Lee, Sunjung
  • YU, JAEHOON
  • LEE, SUK HAN
  • CHA, SANGHOON
  • HAN, Wontak

Assignees

  • Samsung Electronics Co., Ltd.

Dates

Publication Date
20260513
Application Date
20251111

Claims (15)

  1. A memory device, comprising: a processing-in-memory, PIM, block configured to perform an operation between a weight value, which is represented by a weight scale factor and a weight element, and an input value, which is represented by an input scale factor and an input element, wherein the PIM block comprises: a first scale register file storing the input scale factor; a second scale register file storing the weight scale factor; a scalar register file, SRF, storing the input element; a plurality of arithmetic logic units, ALUs, configured to, in response to an operation command received from a host, perform, in parallel, a first operation between the input scale factor and the weight scale factor and a second operation between the input element and the weight element; and an accumulator configured to accumulate and store an operation result from the first operation and the second operation.
  2. The memory device of claim 1, further comprising a control circuit configured to, for the performing of the first operation and the second operation in parallel, in response to the operation command, provide, in parallel, a first operation signal instructing a first ALU among the plurality of ALUs according to the first operation and provide a second operation signal instructing a second ALU among the plurality of ALUs according to the second operation.
  3. The memory device of claim 2, wherein, for the accumulating and storing of the operation result, the control circuit is configured to provide the accumulator with a third operation signal indicating a conversion into the operation result represented in a specific data format from a first partial result of the first operation and a second partial result of the second operation, at a timing after a first specific cycle has elapsed from the first operation signal and the second operation signal.
  4. The memory device of claim 3, wherein, for the accumulating and storing of the operation result, the control circuit is configured to provide the accumulator with a fourth operation signal indicating an addition of the operation result to a pre-stored value of the accumulator, at a timing after a second specific cycle has elapsed from the third operation signal.
  5. The memory device of one of claims 1 to 4, further comprising: an accumulation register file, ARF, wherein the accumulator comprises: a data type converter configured to generate the operation result by merging a first partial result of the first operation and a second partial result of the second operation into a specific data format; and an adder configured to add the generated operation result to a pre-stored value in the ARF.
  6. The memory device of one of claims 1 to 5, wherein the input scale factor comprises an exponent component of the input value, the input element comprises a mantissa component of the input value, the weight scale factor comprises an exponent component of the weight value, and the weight element comprises a mantissa component of the weight value.
  7. The memory device of claim 6, wherein the plurality of ALUs comprises: a first ALU configured to perform, as the first operation, an addition of the exponent component of the input value and the exponent component of the weight value; and a second ALU configured to perform, as the second operation, a multiplication of the mantissa component of the input value and the mantissa component of the weight value.
  8. The memory device of one of claims 1 to 7, configured to, in response to receiving, from the host, a plurality of operation commands comprising the operation command, perform a dot product operation between an input vector comprising a plurality of input values and a weight matrix comprising a plurality of weight values.
  9. The memory device of one of claims 1 to 8, further comprising a memory bank storing the weight scale factor and the weight element, wherein the input scale factor and the input element are received from the host, in response to a command preceding the operation command, and the weight scale factor is loaded from the memory bank, in response to a command preceding the operation command.
  10. The memory device of one of claims 1 to 9, configured to receive, from the host, and process a first plurality of operation commands for data sharing a first scale factor and a second plurality of operation commands for data sharing a second scale factor, without a fence.
  11. The memory device of one of claims 1 to 10, configured to: in response to a command preceding the operation command, receive the input scale factor from the host and store the received input scale factor in a first scale register file; and in response to a command preceding the operation command, receive the input element from the host and store the received input element in the scalar register file, SRF; and in response to a command preceding the operation command, load the weight scale factor from a memory bank and store the loaded weight scale factor in the second scale register file.
  12. A method of operating a processing-in-memory, PIM, device comprising a plurality of arithmetic logic units, ALUs, configured to, for each of a plurality of input elements sharing a same input scale factor and each of a plurality of weight elements sharing a same weight scale factor, the method comprises performing: a first operation between the same input scale factor received from a first scale register file and the same weight scale factor received from a second scale register file; and a second operation between the respective input element received from a scalar register file, SRF, and the respective weight element.
  13. The operating method of claim 12, wherein the first operation and the second operation are performed in parallel by the plurality of ALUs, and further comprising: in response to the operation command, providing a first ALU among the plurality of ALUs with a first operation signal indicating the first operation; and providing a second ALU among the plurality of ALUs with a second operation signal indicating the second operation, wherein the providing of the first operation signal and the providing of the second operation signal are performed in parallel by a control circuit.
  14. The operating method of claim 13, further comprising storing an accumulated operation result by providing an accumulator with a third operation signal indicating a conversion into the operation result represented in a specific data format from a first partial result of the first operation and a second partial result of the second operation, at a timing after a first specific cycle has elapsed from the first operation signal and the second operation signal.
  15. The operating method of claim 14, wherein the storing of the accumulated operation result comprises providing the accumulator with a fourth operation signal indicating an addition of the operation result and a pre-stored value of the accumulator, at a timing after a second specific cycle has elapsed from the third operation signal.

Description

BACKGROUND 1. Field The following description relates to a memory device and operating method with a data format operation. 2. Description of Related Art Efficient and high-performance neural network processing is important for devices such as computers, smartphones, tablets, and wearables. The processing performance increased by the decreasing power consumption of the devices has enabled the implementation of a hardware accelerator specific to performing a specialized task. For example, a plurality of hardware accelerators may be connected to generate a computation graph for applications such as natural language processing (NLP), language translation, and text generation. Therefore, a subsystem for accelerating NLP, language translation, and text generation may include a plurality of specialized hardware accelerators having efficient streaming interconnections for data transmission between the hardware accelerators. A near-memory accelerator may be a hardware accelerator implemented near a memory. In-memory computing (IMC) may be an implementation of a hardware accelerator inside a memory. SUMMARY The invention is claimed in the independent claims. Preferred embodiments are specified in the dependent claims. This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter. In one or more general aspects, a memory device includes a processing-in-memory (PIM) block configured to perform an operation between a weight value, which is represented by a weight scale factor and a weight element, and an input value, which is represented by an input scale factor and an input element, wherein the PIM block may include a first scale register file storing the input scale factor, a second scale register file storing the weight scale factor, a scalar register file (SRF) storing the input element, a plurality of arithmetic logic units (ALUs) configured to, in response to an operation command received from a host, perform, in parallel, a first operation between the input scale factor and the weight scale factor and a second operation between the input element and the weight element, and an accumulator configured to accumulate and store an operation result from the first operation and the second operation. The memory device may include a control circuit configured to, for the performing of the first operation and the second operation in parallel, in response to the operation command, provide, in parallel, a first operation signal instructing a first ALU among the plurality of ALUs according to the first operation and provide a second operation signal instructing a second ALU among the plurality of ALUs according to the second operation. For the accumulating and storing of the operation result, the control circuit may be configured to provide the accumulator with a third operation signal indicating a conversion into the operation result represented in a specific data format from a first partial result of the first operation and a second partial result of the second operation, at a timing after a first specific cycle has elapsed from the first operation signal and the second operation signal. For the accumulating and storing of the operation result, the control circuit may be configured to provide the accumulator with a fourth operation signal indicating an addition of the operation result to a pre-stored value of the accumulator, at a timing after a second specific cycle has elapsed from the third operation signal. The memory device may include an accumulation register file (ARF), wherein the accumulator may include a data type converter configured to generate the operation result by merging a first partial result of the first operation and a second partial result of the second operation into a specific data format, and an adder configured to add the generated operation result to a pre-stored value in the ARF. The input scale factor may include an exponent component of the input value, the input element may include a mantissa component of the input value, the weight scale factor may include an exponent component of the weight value, and the weight element may include a mantissa component of the weight value. The plurality of ALUs may include a first ALU configured to perform, as the first operation, an addition of the exponent component of the input value and the exponent component of the weight value, and a second ALU configured to perform, as the second operation, a multiplication of the mantissa component of the input value and the mantissa component of the weight value. The memory device may be configured to, in response to receiving, from the host, a plurality of operation commands comprising the operation command, perform a dot product operat