CN-121998007-A - Three-dimensional integrated and substrate interconnected memory and calculation integrated neural network processor

CN121998007ACN 121998007 ACN121998007 ACN 121998007ACN-121998007-A

Abstract

The application relates to a memory-calculation integrated neural network processor based on three-dimensional integration and substrate interconnection heterogeneous storage medium and a control method thereof, comprising an off-chip volatile storage chip, a logic chip and a memory-calculation integrated neural network processor, wherein the off-chip volatile storage chip is connected with the logic chip and is used for carrying out information interaction with the logic chip; the system comprises a logic chip, an off-chip nonvolatile memory chip, a logic chip and a three-dimensional integrated module, wherein the off-chip nonvolatile memory chip is connected with the logic chip and is used for carrying out information interaction with the logic chip, the logic chip is configured to realize any combination of executing a calculation task, carrying out data buffering and realizing information interaction, the off-chip nonvolatile memory chip and the logic chip are integrated in the vertical direction by applying a three-dimensional stacking technology to form the three-dimensional integrated module, the three-dimensional integrated module and the off-chip nonvolatile memory chip are integrated through a substrate interconnection technology, and the off-chip volatile memory chip and the logic chip are bonded in the three-dimensional integrated module in a way of being bonded on the same surface so as to meet the high-speed access and high-bandwidth data interaction requirements of the off-chip volatile memory chip.

Inventors

CHEN PEIYU
GE XIAOHUAN
WANG ZHIXUAN
LIU YING

Assignees

杭州微纳核芯电子科技有限公司

Dates

Publication Date: 20260508
Application Date: 20260409

Claims (20)

1. A memory-to-compute unified neural network processor for three-dimensional integration and substrate interconnection, comprising: the off-chip volatile memory chip is connected with the logic chip and is used for carrying out information interaction with the logic chip; an off-chip nonvolatile memory chip connected with the logic chip for information interaction with the logic chip, and The logic chip is configured to implement a combination of any of the following functions: Executing a computing task, carrying out data caching and realizing information interaction; The off-chip volatile memory chip and the logic chip are integrated by a three-dimensional stacking technology to form a three-dimensional integrated module, and the three-dimensional integrated module and the off-chip nonvolatile memory chip are integrated by a substrate interconnection technology; in the three-dimensional integrated module, the off-chip volatile memory chip and the logic chip are bonded in a coplanar bonding mode.
2. The three-dimensional integrated heterogeneous storage medium memory-integrated neural network processor of claim 1, wherein the logic chip is a computing chip for performing computing tasks, and the logic chip is internally integrated with a digital in-memory computing unit.
3. The three-dimensional integrated heterogeneous storage medium of claim 1, wherein the logic chip is a data cache chip for data caching.
4. The three-dimensional integrated heterogeneous storage medium of claim 1 wherein the logic chip is an interface chip for implementing information interaction.
5. The three-dimensional integrated and baseboard-interconnected memory-as-one neural network processor of claim 2, wherein the logic chip has integrated thereon at least one off-chip nonvolatile memory chip controller, at least one off-chip volatile memory chip controller, and a global controller, wherein, The off-chip nonvolatile memory chip controller is used for accessing the off-chip nonvolatile memory; The off-chip volatile memory chip controller is used for accessing the off-chip volatile memory; The global controller is used for managing the off-chip volatile memory chip controller, the off-chip nonvolatile memory chip controller and the digital in-memory computing unit, and realizing dynamic adaptation of a data storage scheme and a computing data stream so as to globally optimize storage capacity, data bandwidth, access delay and computing efficiency.
6. The three-dimensional integrated and substrate-interconnected memory-co-planar neural network processor of any of claims 1-5, wherein the off-chip volatile memory chip is a DRAM chip.
7. The three-dimensional integrated and baseboard-interconnected memory-computing-integrated neural network processor of claim 6, wherein in a large language model computing scenario, the off-chip volatile memory chip is configured to store Key-Value cache data, an activation Value, and intermediate computing results.
8. The three-dimensional integrated and baseboard-interconnected memory-as set forth in any one of claims 1-5, wherein the off-chip nonvolatile memory chip is Flash.
9. The three-dimensional integrated and baseboard-interconnected memory-integrated neural network processor of claim 8, wherein in a large language model computing scenario, the off-chip non-volatile memory chip is configured to store Q-generation weights, K-generation weights, V-generation weights, attention weights, and FFN weights.
10. The three-dimensional integrated and substrate-interconnected memory-as set forth in any one of claims 1-5, wherein the off-chip volatile memory chip is comprised of an off-chip volatile memory module, or, The off-chip volatile memory chip is integrated by at least two off-chip volatile memory modules by using a three-dimensional stacking technology, wherein each off-chip volatile memory module is integrated from front to front, one side, close to a logic control layer, of a structure formed by three-dimensional stacking integration of the at least two off-chip volatile memory modules is defined as the front of the off-chip volatile memory chip, and one side, close to a memory array layer, is defined as the back of the off-chip volatile memory chip.
11. The three-dimensional integrated and baseboard-interconnected storage-and-computation-integrated neural network processor of any one of claim 1-5, wherein the off-chip nonvolatile memory chip is composed of one off-chip nonvolatile memory module, or, The off-chip nonvolatile memory chip is integrated by at least two off-chip nonvolatile memory modules by using a three-dimensional stacking technology, wherein each off-chip nonvolatile memory module is integrated from front to front, one side, close to a logic control layer, of a structure formed by three-dimensional stacking integration of the at least two off-chip nonvolatile memory modules is defined as the front of the off-chip nonvolatile memory chip, and one side, close to a memory array layer, is defined as the back of the off-chip nonvolatile memory chip.
12. The three-dimensional integrated and substrate-interconnected memory-integrated neural network processor of claim 2, wherein the logic chip is comprised of one logic module, or, The logic chip is integrated by at least two logic modules using a three-dimensional stacking technique.
13. The three-dimensional integrated and substrate-interconnected computational-memory neural network processor of claim 1, wherein the three-dimensional stacking technique is one or more of a through-silicon-via technique, a hybrid bonding technique, or a flip-chip technique.
14. The three-dimensional integrated and baseboard-interconnected memory-integrated neural network processor of claim 2, wherein the digital in-memory computing unit is integrated from an on-chip volatile memory and a computing unit, the on-chip volatile memory being one or more of SRAM, eDRAM, DRAM, flash, MRAM, reRAM.
15. The three-dimensional integrated and baseboard-interconnected computational-memory neural network processor of claim 5, wherein the global controller is configured to perform fine dynamic scheduling of data based on bonding scheme interconnection characteristics of three-dimensional stacking technology and inherent properties of each storage medium.
16. The three-dimensional integrated and baseboard-interconnected computational-memory neural network processor of claim 5, wherein the global controller is configured to support configuration of data storage mapping rules by upper-level software.
17. The three-dimensional integrated and baseboard-interconnected memory-integrated neural network processor of claim 5, wherein said off-chip volatile memory chip comprises an interface compliant with JEDEC standard, which interface is adapted to said off-chip volatile memory controller such that said neural network processor is accessible by external devices as standard DRAM memory; The interface protocol satisfies one or more of LPDDR5, LPDDR6, HBM2, HBM3e, HBM4, GDDR5, GDDR6, GDDR7, DDR5, DDR 6.
18. The three-dimensional integrated and baseboard-interconnected memory-integrated neural network processor of claim 8, wherein said Flash comprises an interface conforming to JEDEC or industry universal storage interface standards, said neural network processor Flash being accessible as standard Flash memory, The protocol of the interface satisfies one or more of SPI, QSPI, octal SPI, eMMC, UFS, NVMe, PARALLEL FLASH.
19. The three-dimensional integrated and baseboard-interconnected memory neural network processor of claim 1, wherein the off-chip volatile memory chip or the logic chip, the off-chip nonvolatile memory chip are interconnected with the baseboard through a silicon interposer.
20. The three-dimensional integrated and baseboard-interconnected storage-as set forth in any one of claims 1-5, 12-17, 19 wherein the off-chip volatile memory chips are stacked above the logic chips, the front side of the off-chip volatile memory chips being three-dimensionally integrated with the front side of the logic chips, and the back side of the logic chips being interconnected with the baseboard.

Description

Three-dimensional integrated and substrate interconnected memory and calculation integrated neural network processor Technical Field The application relates to the technical field of semiconductor chips, in particular to a three-dimensional integrated and substrate interconnected heterogeneous storage medium storage and calculation integrated neural network processor and a control method thereof. Background The large language model (Large Language Models, LLMs) has been a core evolution direction of natural language processing (Natural Language Processing, NLP) technology in the field of artificial intelligence (ARTIFICIAL INTELLIGENCE, AI), and has shown an exponential development situation in recent years. The technical maturity and application breadth of the method are continuously broken through, and the method has the irreplaceable advantages in a plurality of key fields. Meanwhile, the development process of LLMs gradually shows a core rule which is widely verified by the academia and the industry, namely, a scale effect (SCALING LAW). The core connotation of the rule is that under the support of sufficient high-quality training data and matched computing resources, the core performance index (such as confusion Perplexity of language generation and accuracy of downstream tasks) of LLMs and the model parameter scale show a remarkable positive correlation. As shown in fig. 1, the model performance score increases continuously as the number of parameters increases exponentially. Conventional computer systems achieve capacity, speed and cost tradeoffs by "storage system architecture" (register → on-chip cache → main memory → local disk → remote storage), as shown in fig. 3, with each tier exhibiting significant differences in relative cost, storage density, bandwidth, access latency and power consumption. However, existing three-dimensional integration schemes often lack a refined match for device physical characteristics and task requirements when dealing with the integration and interconnection of heterogeneous storage media (volatile memory vs. nonvolatile memory), and mainly suffer from the following technical pain: The existing integration scheme often adopts a unified stacking or interconnection strategy for different types of memories, and ignores the huge difference of access characteristics. Memory latency for volatile memory chips (e.g., DRAMs) is typically on the order of nanoseconds (ns) and its performance is highly dependent on the bit width and signal integrity of the I/O interface, while latency for non-volatile memory chips (e.g., flash) is on the order of microseconds (mus) and is limited by internal mechanisms rather than link bandwidth. In the prior art, if the nonvolatile memory is blindly incorporated into the high-density three-dimensional stack, the performance of the nonvolatile memory cannot be linearly improved by improving the I/O density, but rather, the redundant waste of expensive vertical interconnection resources (such as TSVs) is caused. In addition, in the three-dimensional stacked structure of the logic chip and the volatile memory chip, because the logic chip and the volatile memory chip are both core devices with high power consumption and high heat generation, the tight three-dimensional stacking brings about a thermal coupling problem, so that heat is easily conducted and accumulated between the chips, or performance degradation is caused by the blockage of a heat dissipation path. Accordingly, there is a need for improvements to existing neural network processors. Disclosure of Invention The application provides a three-dimensional integrated and substrate interconnected memory-calculation integrated neural network processor, which comprises an off-chip volatile memory chip, an off-chip nonvolatile memory chip and a logic chip, wherein the off-chip volatile memory chip is connected with the logic chip and is used for carrying out information interaction with the logic chip, the off-chip nonvolatile memory chip is connected with the logic chip and is used for carrying out information interaction with the logic chip, the logic chip is configured to realize any combination of the following functions of executing a calculation task, carrying out data caching and realizing information interaction, the off-chip volatile memory chip and the logic chip are integrated by a three-dimensional stacking technology to form a three-dimensional integrated module, the three-dimensional integrated module and the off-chip nonvolatile memory chip are integrated by a substrate interconnection technology, and in the three-dimensional integrated module, the off-chip volatile memory chip and the logic chip are bonded in a coplanar bonding mode. In the application, the off-chip volatile memory chip and the logic chip are bonded in a coplanar bonding mode, and firstly, the coplanar is easier to bond and the process cost is reduced because the materials of the coplanar are the same. In addit