US-12619367-B2 - Block copy
Abstract
An interconnected stack of one or more Dynamic Random Access Memory (DRAM) die also has one or more custom logic, controller, or processor die. The custom die(s) of the stack include direct channel interfaces that allow direct access to memory regions on one or more DRAMs in the stack. The direct channels are time-division multiplexed such that each DRAM die is associated with a time slot on a direct channel. The custom die configures a first DRAM die to read a block of data and transmit it via the direct channel using a time slot that is assigned to a second DRAM die. The custom die also configures the second memory device to receive the first block of data in its assigned time slot and write the block of data.
Inventors
- Michael Raymond Miller
- Steven C. Woo
- Thomas Vogelsang
Assignees
- RAMBUS INC.
Dates
- Publication Date
- 20260505
- Application Date
- 20240805
Claims (20)
- 1 . An integrated circuit stack, comprising: a set of stacked memory integrated circuits comprising a first memory device, a second memory device, and a third memory device, each comprising memory cell arrays; a controller electrically coupled to, and stacked with, the set of stacked memory devices comprising at least a portion of the integrated circuit stack; and intra-device interconnect coupling the controller, the first memory device, the second memory device, and the third memory device to each other, the controller to configure the first memory device to read a first block of data from at least a first memory cell array of the first memory device and to transmit, in a first transmission, the first block of data via the intra-device stack interconnect, the controller to also configure the second memory device and the third memory device to receive the first block of data via the first transmission, the controller to also configure the second memory device to write the first block of data transmitted by the first memory device in the first transmission to at least a second memory cell array of the second memory device and the controller to also configure the third memory device to write the first block of data transmitted by the first memory device in the first transmission to at least a third memory cell array of the third memory device.
- 2 . The integrated circuit stack of claim 1 , wherein the second at least one memory cell array and the third memory cell array are each a one of directly above and directly below the first at least one memory cell array.
- 3 . The integrated circuit stack of claim 2 , wherein the first at least one memory cell array is a one of directly above and directly below the controller.
- 4 . The integrated circuit stack of claim 1 , wherein the controller communicates with the first memory device, the second memory device, and the third memory device via the intra-device stack interconnect using time-division multiplexing.
- 5 . The integrated circuit stack of claim 4 , wherein the time-division multiplexing uses time allocations that are cycled at a memory cell array cycle time.
- 6 . The integrated circuit stack of claim 4 , wherein the first memory device is configured to use a first time slot to transmit data to the controller, the second memory device is configured to use a second time slot to receive data from the controller, the third memory device is configured to use a third time slot to receive data from the controller, the first memory device to transmit the first block of data directly to the second memory device by transmitting the first block of data during the second time slot, the first memory device to transmit the first block of data directly to the third memory device by transmitting the first block of data during the third time slot.
- 7 . The integrated circuit stack of claim 4 , wherein the first memory device is configured to use a first time slot to receive data from the controller, the second memory device is configured to use a second time slot to receive data from the controller, the third memory device is configured to use a third time slot to receive data from the controller, the second memory device to receive the first block of data directly from the first memory device by sampling the first block of data during the second time slot, the third memory device to receive the first block of data directly from the first memory device by sampling the first block of data during the third time slot.
- 8 . An integrated circuit stack, comprising: a set of stacked integrated circuit memory devices that include a first memory device, a second memory device, and a third memory device, the set of stacked memory devices each comprising memory cell circuitry; and a processing device electrically coupled to, and stacked with, the set of stacked memory devices to form a first device stack, the processing device comprising a controller, the controller to communicate data with the first set of stacked memory devices using time-division multiplexing wherein respective ones of the set of stacked memory devices communicate data with the controller using respective ones of a set of non-overlapping and periodically repeating time slots that repeat at a frequency, the first memory device to communicate data with the controller using a first non-overlapping and periodically repeating time slot of the set of non-overlapping and periodically repeating time slots, the second memory device to communicate data with the controller using a second non-overlapping and periodically repeating time slot of the set of non-overlapping and periodically repeating time slots, the third memory device to communicate data with the controller using a third non-overlapping and periodically repeating time slot of the set of non-overlapping and periodically repeating time slots, the controller to configure the first memory device to transmit a block of data directly to the second memory device using at least one instance of the second non-overlapping and periodically repeating time slot without the data being re-transmitted by the controller, the controller to configure the first memory device to transmit the block of data directly to the third memory device using at least one instance of the third non-overlapping and periodically repeating time slot without the data being re-transmitted by the controller, the controller to configure the second memory device to write the block of data transmitted by the first memory device in the at least one instance of the second non-overlapping and periodically repeating time slot to memory cell circuitry of the second memory device, the controller to configure the third memory device to write the block of data transmitted by the first memory device in the at least one instance of the third non-overlapping and periodically repeating time slot to memory cell circuitry of the third memory device.
- 9 . The integrated circuit stack of claim 8 , wherein the set of non-overlapping and periodically repeating time slots repeats with a duration substantially equal to a core cycle time of each of the set of stacked memory devices.
- 10 . The integrated circuit stack of claim 8 , wherein the controller is positioned in alignment with a first memory region of the first memory device, a second memory region of the second memory device, and a third memory region of the third memory device.
- 11 . The integrated circuit stack of claim 10 , wherein the controller, the first memory device, the second memory device, and the third memory device are electrically coupled using through-silicon vias.
- 12 . The integrated circuit stack of claim 10 , wherein communication between the controller and the set of stacked memory devices includes commands communicated via a command/address bus and data communicated via a data bus.
- 13 . The integrated circuit stack of claim 10 , wherein the set of stacked memory devices includes a fourth memory device, the fourth memory device to communicate with the controller using a fourth non-overlapping and periodically repeating time slot of the set of non-overlapping and periodically repeating time slots.
- 14 . The integrated circuit stack of claim 13 , wherein the controller is to configure the first memory device to communicate directly with the fourth memory device using at least one instance of the fourth non-overlapping and periodically repeating time slot.
- 15 . The integrated circuit stack of claim 14 , wherein the controller is to configure the first memory device to communicate directly with the second memory device using a second instance of the second non-overlapping and periodically repeating time slot and to communicate with the fourth memory device using a first instance of the fourth non-overlapping and periodically repeating time slot that is a next successive fourth non-overlapping and periodically repeating time slot after the instance of second non-overlapping and periodically repeating time slot.
- 16 . A controller, comprising: a command/address interface to communicate commands and addresses with a plurality of stacked memory devices via first set of shared interconnections using respectively assigned time-division multiplexing slots to address commands and addresses transmitted by the controller to all of the plurality of stacked memory devices, to respective ones of the plurality of stacked memory devices, each of the first set of shared interconnections being connected with each of the plurality of stacked memory devices; and a data interface to communicate data with a plurality of the plurality of stacked memory devices via a second set of shared interconnections using time-division multiplexing that uses a plurality of ones of a set of non-overlapping and periodically repeating time slots assigned to respective ones of the plurality of stacked memory devices to address data communication between the controller and respectively addressed plurality of ones of the plurality of stacked memory devices, each of the second set of shared interconnections being connected with each of the plurality of stacked memory devices.
- 17 . The controller of claim 16 , wherein the controller is to configure a first memory device of the plurality of stacked memory devices to transmit data in a first instance of a first periodically repeating time slot that is assigned to a second memory device of the plurality of stacked memory devices for communication with the controller, and to configure the first memory device of the plurality of stacked memory devices to transmit data in a second instance of the first periodically repeating time slot that is assigned to a third memory device of the plurality of stacked memory devices for communication with the controller.
- 18 . The controller of claim 17 , wherein the controller is to configure the second memory device to receive data from the first memory device in the instance of the first periodically repeating time slot, and is to configure the third memory device to receive data from the first memory device in the second instance of the first periodically repeating time slot.
- 19 . The controller of claim 16 , wherein the controller is to transmit, to a first memory device of the plurality of stacked memory devices, a first indicator of a first instance of a first periodically repeating time slot that the first memory device is to use to transmit data, the first periodically repeating time slot being assigned to a second memory device of the plurality of stacked memory devices for communication with the controller, and the controller is also to transmit, to the first memory device of the plurality of stacked memory devices, a second indicator of a second instance of the first periodically repeating time slot that the first memory device is to use to transmit data, the second periodically repeating time slot being assigned to a third memory device of the plurality of stacked memory devices for communication with the controller.
- 20 . The controller of claim 16 , wherein the controller is to transmit, to a first memory device of the plurality of stacked memory devices, a first indicator of a first instance of a first periodically repeating time slot that the first memory device is to receive data from a second memory device of the plurality of stacked memory devices, the first periodically repeating time slot being assigned to the first memory device for communication with the controller, the controller is also to transmit, to the first memory device of the plurality of stacked memory devices, a second indicator of a second instance of the first periodically repeating time slot that the first memory device is to receive data from a third memory device of the plurality of stacked memory devices, the second periodically repeating time slot being assigned to the first memory device for communication with the controller.
Description
BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is an isometric illustration of an integrated circuit device stack. FIG. 2 is an isometric illustration of a high-bandwidth memory (HBM) compatible integrated circuit device stack. FIGS. 3A-3C illustrate example time multiplexed operations on a shared direct channel. FIGS. 4A-4C illustrate an example direct block copy on a shared direct channel. FIGS. 5A-5C illustrate an example multiple destination direct block copy on a shared direct channel. FIGS. 6A-6B illustrate an example of concurrent block copies on a shared direct channel. FIG. 7 is a flowchart illustrating a method of operating memory devices in a device stack. FIG. 8 is a flowchart illustrating a direct block copy among memory devices in a device stack. FIG. 9 is a flowchart illustrating a method of direct communication among memory devices in a device stack. FIG. 10 is a flowchart illustrating a method of configuring memory devices in a device stack for direct communication. FIG. 11 is a block diagram of a processing system. DETAILED DESCRIPTION OF THE EMBODIMENTS In an embodiment, an interconnected stack of one or more Dynamic Random Access Memory (DRAM) die has one or more custom logic, controller, or processor die. Custom die may be attached as a last step and interconnected vertically with the DRAM die(s) by shared through-silicon via (TSV) connections that carry data and control signals throughout the stack. The custom die(s) of the stack may include interfaces that allow direct access to memory regions on one or more DRAMs in the stack. These interfaces may access DRAM memory regions via TSVs that are not used for I/O outside of the stack. These additional (e.g., per processing element) interfaces allow processing elements to have more direct access to the data in the DRAM stack than using other I/O's. These direct memory channels allow more rapid access to the data in the DRAM stack. In an embodiment, the direct memory channels (direct channels) interconnect one or more DRAM regions on each DRAM die of the stack to the custom die. The direct channels may comprise command, address, and data busses that are shared between the multiple DRAM dies and the custom die. The direct channels are time-division multiplexed such that each DRAM die is associated with a time slot on a direct channel. The time slots may be configured such that each DRAM region is able to cycle at its core frequency while the custom die receives/transmits at a multiple of that core frequency. For example, if there are four DRAM dies in the stack, each DRAM die may generally transmit and/or receive in a unique one of 4 time slots while the custom die transmits and/or receives every time slot. Thus, the time slot assigned to a DRAM die may be used by the custom die to uniquely identify/address the die. In an embodiment, the custom die configures a first DRAM die to read a block of data and transmit it via the intra-device stack interconnect using a time slot that is assigned to a second DRAM die. The custom die also configures the second memory device to receive the first block of data in its ‘normal’ (i.e., assigned) time slot and write the block of data. In this manner, the block of data is communicated directly between the first DRAM die and the second DRAM die without passing via the custom die. By not passing the block of data via the custom die, the additional time slots and latency that would be associated with the custom die receiving and then re-transmitting the block of data are avoided. FIG. 1 is an isometric illustration of an integrated circuit device stack. In FIG. 1, processing system 100 comprises integrated circuit die 111, memory device die 131, and memory device die 132. Integrated circuit die 111, memory device die 131, and memory device die 132 are stacked with each other. Integrated circuit die 111 includes a two-dimensional array with 3 rows and 4 columns of processing elements (PEs) and/or controllers 111aa-111cd. In other words, die 111, and processing elements 111aa-111cd in particular, may be or include memory controller circuitry and optionally other processing circuitry (e.g., a CPU). Memory device die 131 is illustrated with two-dimensional array with 3 rows and 4 columns of memory regions 131aa-131cd. Likewise, memory device die 132 is illustrated with two-dimensional array with 3 rows and 4 columns of memory regions 132aa-132cd. It should be understood that the selection of 3 rows and 4 columns is merely for the purposes of illustration. Any number of rows and/or columns are contemplated. Note that in FIG. 1, some DRAM regions (e.g., DRAM regions 131ca-131cc 132ca-132cc) are obscured by die 111 or memory device die 131 and are therefore not visible in FIG. 1. In an embodiment of processing system 100, each PE/controller 111aa-111cd of integrated circuit die 111 is intercoupled to its nearest neighbors in the left and right directions and the front and back directions. In another embodiment of processin