Search

CN-121979484-A - System and operation method of three-dimensional accelerator and multiplication accumulation device

CN121979484ACN 121979484 ACN121979484 ACN 121979484ACN-121979484-A

Abstract

A system and method of operation of a three-dimensional accelerator and a multiply-accumulate device are disclosed. In one aspect, a system includes a plurality of memory layers, each including a set of memory banks. The system may include a multiply-accumulate layer comprising a multiply-accumulate array having a plurality of multiply-accumulate devices. Each of the multiply-accumulate devices may be coupled to a respective one of the set of memory banks by at least one via structure. The memory banks and multiply-accumulate devices may each be arranged in a predetermined number of rows and columns.

Inventors

  • Murat Kerem Akarwadar
  • SUN XIAOYU
  • BRIAN CLIFTON
  • Peng Xiaochen
  • MAKOTO YABUUCHI
  • CHI YUDE
  • ZHANG CONGYONG

Assignees

  • 台湾积体电路制造股份有限公司

Dates

Publication Date
20260505
Application Date
20260108
Priority Date
20250109

Claims (10)

  1. 1. A system for a three-dimensional accelerator, comprising: a plurality of memory layers each including a set of memory banks, and A multiply-accumulate layer comprising a multiply-accumulate array having a plurality of multiply-accumulate means, Wherein each multiply-accumulate device of the plurality of multiply-accumulate devices is coupled to a respective memory bank of the set of memory banks by at least one via structure.
  2. 2. The system of claim 1 wherein the set of memory banks and the plurality of multiply-accumulate devices are each arranged in a predetermined number of rows and columns.
  3. 3. The system of claim 1 further comprising an input buffer circuit providing at least a portion of an input vector to at least one row of said plurality of multiply-accumulate means of the multiply-accumulate array.
  4. 4. The system of claim 1, wherein the multiply-accumulate layer is defined on a first semiconductor die and the plurality of memory layers are defined on a plurality of second semiconductor dies stacked on top of the first semiconductor die.
  5. 5. The system of claim 1 wherein a column of said plurality of multiply-accumulate means of the multiply-accumulate array is used to generate a partial sum.
  6. 6. The system of claim 1, wherein the set of memory groups of each of the plurality of memory layers is coupled to the multiply-accumulate array using a shared interconnect structure.
  7. 7. The system of claim 1, wherein the set of memory banks includes a predetermined number of memory components.
  8. 8. The system of claim 1, wherein the plurality of memory layers are coupled to the multiply-accumulate layer using a face-to-back stack having a plurality of hybrid bonds and a plurality of through-silicon vias.
  9. 9. A multiply-accumulate apparatus, comprising: A multiply-accumulate array comprising a plurality of multiply-accumulate devices defined on a first semiconductor die; An input buffer for storing at least one input vector, and A plurality of interconnect structures, each corresponding to a respective row of the plurality of multiply-accumulate devices, the plurality of interconnect structures including a semiconductor via coupled to at least one second semiconductor die, wherein the multiply-accumulate array is to: the at least one input vector is received from the input buffer, Receiving a plurality of data values from the at least one second semiconductor die via the plurality of interconnect structures, an A set of partial sums is generated using the at least one input vector and the plurality of data values.
  10. 10. A method of operating a three-dimensional accelerator, comprising the steps of: storing a set of weight values in a memory layer of a three-dimensional accelerator circuit; Receiving an input operand from an input buffer for a multiply-accumulate operation; A set of multiply-accumulate tiles providing the set of weight values from the memory layer to a multiply-accumulate layer of the three-dimensional accelerator circuit, the set of weight values provided using a set of via structures coupling the memory layer to the multiply-accumulate layer, and The multiplication accumulation layer is used to generate an output vector based on the set of weight values and the input operand.

Description

System and operation method of three-dimensional accelerator and multiplication accumulation device Technical Field An embodiment of the present disclosure provides a system and method of operation for a three-dimensional accelerator for generating artificial intelligence operations. Background In addition to various interconnections between circuit devices, the integrated circuit (INTEGRATED CIRCUIT, IC) may contain various hardware circuit devices or logic types, including FPGAs, application-specific integrated circuits (ASICs), logic gates, registers, or transistors. The IC may be fabricated using or composed of a semiconductor material, for example, as part of an electronic device, such as a computer, portable device, smart phone, internet of things (internet of thing, ioT) device, or the like. The development and increase in complexity of ICs has driven an increase in demand for higher computational efficiency and speed. More specifically, the IC may be configurable and/or programmable to perform computations in a sequence or with a variation as required by the manufacturer, developer, technician, programmer, etc. Disclosure of Invention A system for a three-dimensional accelerator includes a plurality of memory layers, each including a set of memory banks. The system includes a multiply-accumulate layer comprising a multiply-accumulate array having a plurality of multiply-accumulate means. Each of the multiply-accumulate devices is coupled to a respective one of the set of memory banks by at least one via structure. A multiply-accumulate device includes a multiply-accumulate array including a plurality of multiply-accumulate devices defined on a first semiconductor die. The multiply-accumulate device comprises an input buffer for storing at least one input vector. The multiply-accumulate means comprises a plurality of interconnect structures, each corresponding to a respective row of the multiply-accumulate means. The interconnect structure includes a semiconductor via coupled to at least one second semiconductor die. The multiply-accumulate array is used to receive an input vector from an input buffer. The multiply-accumulate array is configured to receive a plurality of data values from the second semiconductor die via the interconnect structure. The multiply-accumulate array is used to generate a set of partial sums using the input vector and data values. A method of operating a three-dimensional accelerator includes storing a set of weight values in a memory layer of a 3D accelerator circuit. The method of operation may include the step of receiving input operands for a multiply-accumulate operation from an input buffer. The method of operation may include the step of providing the set of weight values from the memory layer to a set of multiply-accumulate tiles of a multiply-accumulate layer of the 3-D accelerator circuit. The set of weight values is provided using a set of via structures coupling the memory layer to the multiply-accumulate layer. The method of operation may include generating an output vector based on the set of weight values and the input operand using a multiply-accumulate layer. Drawings The various aspects of the disclosure are best understood from the following detailed description when read with the accompanying drawing figures. It should be noted that the various features are not drawn to scale in accordance with standard practices in the industry. Indeed, the dimensions of the various features may be arbitrarily increased or reduced for clarity of discussion. FIG. 1 illustrates a perspective block diagram of an example three-dimensional (3D) accelerator circuit implemented to process generated accelerated artificial intelligence (ARTIFICIAL INTELLIGENCE, AI) operations in accordance with some embodiments of the disclosure; FIG. 2 illustrates a perspective block diagram of an example multiply-accumulate (MAC) array layer that may be included in the 3D accelerator circuit of FIG. 1, in accordance with some embodiments of the present disclosure; FIG. 3 illustrates a cross-sectional block diagram illustrating an interconnection between a memory layer described herein and a MAC layer of a 3D accelerator circuit, according to some embodiments of the present disclosure; FIG. 4 illustrates a block diagram showing an example mapping of a set of weight values to memory components in a memory layer of a 3D accelerator circuit described herein, in accordance with some embodiments of the present disclosure; FIG. 5 illustrates a block diagram showing how other data values stored according to the mapping shown in FIG. 4 may be processed using the 3D accelerator circuit described herein, according to some embodiments of the present disclosure; FIG. 6 illustrates a block diagram showing how further data values stored according to the mappings shown in FIGS. 4 and 5 may be processed using the 3D accelerator circuit described herein, in accordance with some embodiments of the present disclo