Search

CN-122003964-A - Acceleration unit with modular architecture

CN122003964ACN 122003964 ACN122003964 ACN 122003964ACN-122003964-A

Abstract

A processing system (100) includes one or more Accelerator Units (AUs) (114), each AU having a modular architecture. To this end, each AU includes a connection circuit (116) and one or more memory stacks (122) disposed on the connection circuit. Further, each AU includes one or more interposer die (118), each interposer die disposed on the connection circuitry such that each of the one or more interposer die is communicatively coupled to a corresponding memory stack via the connection circuitry. Furthermore, each interposer die of each AU includes a set of circuits (578,580) configured to concurrently support two or more types of compute dies (300, 400).

Inventors

  • ALAN D. SMITH
  • Michael Man Te
  • MARK FOWLER
  • Vidhianantan Kaliana Sundaram
  • Samuel navzig

Assignees

  • 超威半导体公司

Dates

Publication Date
20260508
Application Date
20241009
Priority Date
20231009

Claims (20)

  1. 1. An Accelerator Unit (AU) (114), the AU comprising: A connection circuit (116); One or more memory stacks (122) disposed on the connection circuit, and One or more interposer die (118) disposed on the connection circuitry such that each of the one or more interposer die is communicatively coupled to a corresponding one of the one or more memory stacks via the connection circuitry, wherein each of the one or more interposer die is configured to concurrently support two or more compute dies (128).
  2. 2. The AU of claim 1, wherein each computing die comprises a chiplet having one or more computing units (460).
  3. 3. The AU of claim 1, wherein an interposer die of the one or more interposer dies is configured to concurrently support a first type of computing die and a second type of computing die, wherein the first type is different from the second type.
  4. 4. The AU of claim 3, wherein the first type comprises a Core Composite Die (CCD) (300) and the second type comprises an accelerating core die (AD) (400).
  5. 5. The AU of claim 3, wherein the interposer die of the one or more interposer die comprises a first set of circuits (578) for supporting the computing die of the first type and a second set of circuits (580) for supporting the computing die of the second type, wherein the first set of circuits is different from the second set of circuits.
  6. 6. The AU of claim 5, wherein the first set of circuitry comprises cache coherency circuitry (234) and the second set of circuitry comprises graphics coherency circuitry (236).
  7. 7. The AU of claim 6, wherein the second set of circuitry comprises a sub-network (238) configured to communicatively couple one or more ADs to a memory management circuit (240).
  8. 8. The AU of claim 1, wherein the one or more interposer die are disposed on the connection circuit such that each of the one or more interposer die is communicatively coupled to one or more other interposer die.
  9. 9. A processing system (100), the processing system comprising: memory (106) An Acceleration Unit (AU) (104), the AU comprising: A connection circuit (116); One or more memory stacks (122) disposed on the connection circuit, and One or more interposer die (118) disposed on the connection circuitry such that each of the one or more interposer die is communicatively coupled to a corresponding one of the one or more memory stacks via the connection circuitry, wherein each of the one or more interposer die is configured to concurrently support two or more compute dies (128).
  10. 10. The processing system of claim 9, wherein an interposer die of the one or more interposer die comprises interconnect circuitry (242) configured to communicatively couple the AU to the memory using a communication protocol.
  11. 11. The processing system of claim 10, wherein the communication protocol comprises a peripheral component interconnect express (PCIe) protocol.
  12. 12. The processing system of claim 9, wherein an interposer die of the one or more interposer dies is configured to concurrently support a first type of compute die and a second type of compute die, wherein the first type is different from the second type.
  13. 13. The processing system of claim 12, wherein the interposer die of the one or more interposer die comprises a first set of circuits (578) for supporting the computing die of the first type and a second set of circuits (580) for supporting the computing die of the second type, wherein the first set of circuits is different from the second set of circuits.
  14. 14. The processing system of claim 9, wherein the one or more interposer die are disposed on the connection circuit such that each of the one or more interposer die is communicatively coupled to one or more other interposer die.
  15. 15. The processing system of claim 12, wherein the AU comprises a set of registers (905) configured to set one or more partitions (925) of the AU, wherein a partition of the one or more partitions comprises at least one compute die of the AU.
  16. 16. An Accelerator Unit (AU) (104), the AU comprising: one or more memory stacks (122) disposed on the connection circuit; An interposer die (118) disposed on the connection circuitry such that the interposer die is communicatively coupled to the one or more memory stacks via the connection circuitry; one or more compute die (128) disposed on the interposer die, and One or more partitions (925), each partition including a respective one of the one or more compute dies and a respective one of the one or more memory stacks.
  17. 17. The AU of claim 16, further comprising a set of registers (905) configured to set the one or more partitions.
  18. 18. The AU of claim 16, wherein the one or more computing dies comprise a first type of computing die and a second type of computing die different from the first type.
  19. 19. The AU of claim 18, wherein the interposer die comprises a first set of circuits (578) for supporting the computing die of the first type and a second set of circuits (580) for supporting the computing die of the second type, wherein the first set of circuits is different from the second set of circuits.
  20. 20. The AU of claim 16, wherein the interposer die is disposed on the connection circuit such that the interposer die is communicatively coupled to one or more other interposer die.

Description

Acceleration unit with modular architecture Background In order to execute applications, some processing systems include multiple processing devices, such as a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), etc., that execute instructions, perform operations, or both on behalf of the applications. Many of these processing devices include one or more dies having a processor core configured to execute instructions. The dies are disposed on a silicon interposer that is configured to connect the processor cores on the dies to other components of the processing system, such as a host device or memory. However, these silicon intermediaries are typically configured to support only a set number of dies, thereby limiting the types of instructions and operations that the processing device is configured to perform and limiting the flexibility of the processing device. Furthermore, many of these silicon intermediaries are configured to support only certain types of die, again limiting the types of instructions and operations that the processing device is configured to perform and the flexibility of the processing device. Drawings The present disclosure may be better understood, and its numerous features and advantages made apparent to those skilled in the art by referencing the accompanying drawings. The use of the same reference symbols in different drawings indicates similar or identical items. Fig. 1 is a block diagram illustrating a processing system including an Accelerator Unit (AU) with a modular architecture according to some implementations. Fig. 2 is a block diagram illustrating an Active Interposer Die (AID) configured to concurrently support one or more computing dies, according to some implementations. Fig. 3 is a block diagram illustrating an exemplary Core Composite Die (CCD) according to some implementations. Fig. 4 is a block diagram illustrating an exemplary acceleration core die (AD) according to some implementations. FIG. 5 is a block diagram illustrating an exemplary architecture for AID according to some implementations. Fig. 6 is a block diagram illustrating an exemplary AU with a modular architecture according to some implementations. Fig. 7 is a block diagram illustrating an exemplary architecture of two or more interconnected AUs according to some implementations. Fig. 8 is a block diagram illustrating an exemplary AU configured to support two or more types of core dies, according to some implementations. Fig. 9 is a block diagram illustrating exemplary operations for partitioning AUs having a modular architecture, according to some implementations. Fig. 10-12 are each a block diagram illustrating a respective partitioning scheme according to some implementations. Fig. 13-17 are each a block diagram illustrating a respective exemplary architecture of an AU according to some implementations. Detailed Description In this context, fig. 1 to 17 relate to a processing system comprising an Accelerator Unit (AU) with a modular architecture. Such processing systems are, for example, configured to execute one or more applications, such as High Performance Computing (HPC) applications, graphics applications, or both. For example, HPC applications include resource intensive applications including machine learning applications, neural network applications, artificial intelligence applications, and the like. To execute the instructions and operations of these applications, the processing system includes one or more AUs, each AU having a modular architecture. For example, such a modular architecture of an AU includes connection circuitry implemented as a die. Furthermore, the modular architecture includes one or more memory stacks disposed on the connection circuit. For example, such memory stacks include three-dimensional (3D) stacked memory having one or more memory layers. In addition, the modular architecture includes one or more Active Interposer Die (AIDs) disposed on the connection circuitry such that each AID is communicatively coupled to one or more of the memory stacks via the connection circuitry. The AID is further disposed on the connection circuit such that each AID is communicatively coupled to each other AID disposed on the connection circuit. To execute instructions and operations of one or more applications, each AID of the AU is configured to concurrently support one or more computing dies. That is, one or more computing dies are configured to be disposed on each AID. The computing die includes, for example, one or more chiplets, each chiplet including a processor core, a computing unit, or both, configured to execute one or more instructions, operations, or both, of one or more applications. To support these computing dies, each AID includes an extensible data fabric configured to communicatively couple each computing die supported by the AID to one or more memory stacks (e.g., via connection circuitry), one or more other computing dies also supported by the same AID, o