CN-121998005-A - Three-dimensional integrated and substrate interconnected memory and calculation integrated neural network processor
Abstract
The application relates to a memory and calculation integrated neural network processor based on three-dimensional integration and substrate interconnection and a control method thereof, comprising an off-chip volatile memory chip, a logic chip and a memory and calculation integrated neural network processor, wherein the off-chip volatile memory chip is connected with the logic chip and is used for carrying out information interaction with the logic chip; the system comprises an off-chip nonvolatile memory chip, a logic chip and a logic chip, wherein the off-chip nonvolatile memory chip is connected with the logic chip and is used for carrying out information interaction with the logic chip, the logic chip is configured to realize any combination of executing calculation tasks, carrying out data buffering and realizing information interaction, the off-chip nonvolatile memory chip and the logic chip are integrated by a three-dimensional stacking technology to form a three-dimensional integrated module, the three-dimensional integrated module and the off-chip nonvolatile memory chip are integrated by a substrate interconnection technology, the logic chip is positioned below the off-chip nonvolatile memory chip along the vertical direction, and the logic chip is bonded with the off-chip nonvolatile memory chip through the back surface so as to meet the high heat dissipation requirements of the off-chip nonvolatile memory chip and the logic chip.
Inventors
- YAN FENGYUN
- CHEN PEIYU
- WANG ZHIXUAN
- LIU YING
Assignees
- 无锡微纳核芯电子科技有限公司
- 杭州微纳核芯电子科技有限公司
Dates
- Publication Date
- 20260508
- Application Date
- 20260130
Claims (20)
- 1. A memory-to-compute unified neural network processor for three-dimensional integration and substrate interconnection, comprising: the off-chip volatile memory chip is connected with the logic chip and is used for carrying out information interaction with the logic chip; an off-chip nonvolatile memory chip connected with the logic chip for information interaction with the logic chip, and The logic chip is configured to implement a combination of any of the following functions: Executing a computing task, carrying out data caching and realizing information interaction; The off-chip nonvolatile memory chip and the logic chip are integrated by a three-dimensional stacking technology to form a three-dimensional integrated module, and the three-dimensional integrated module and the off-chip nonvolatile memory chip are integrated by a substrate interconnection technology; In the three-dimensional integrated module, the logic chip is located below the off-chip nonvolatile memory chip along the vertical direction, and the logic chip is bonded with the off-chip nonvolatile memory chip through the back surface.
- 2. The three-dimensional integrated and baseboard-interconnected memory-to-computing neural network processor of claim 1, wherein the logic chip is a computing chip for performing computing tasks, the logic chip being integrated with a digital in-memory computing unit.
- 3. The three-dimensional integrated and baseboard-interconnected memory-integrated neural network processor of claim 1, wherein the logic chip is a data cache chip for data caching.
- 4. The three-dimensional integrated and substrate-interconnected memory-integrated neural network processor of claim 1, wherein the logic chip is an interface chip for implementing information interaction.
- 5. The three-dimensional integrated and baseboard-interconnected memory-to-computing neural network processor of claim 2, wherein the logic chip has integrated thereon at least one off-chip nonvolatile memory chip controller, at least one off-chip volatile memory chip controller, and a global controller; The off-chip nonvolatile memory chip controller is used for accessing the off-chip nonvolatile memory chip; the off-chip volatile memory chip controller is used for accessing the off-chip volatile memory chip; The global controller is used for managing the off-chip volatile memory chip controller, the off-chip nonvolatile memory chip controller and the digital in-memory computing unit, and realizing dynamic adaptation of a data storage scheme and a computing data stream so as to globally optimize storage capacity, data bandwidth, access delay and computing efficiency.
- 6. The three-dimensional integrated and substrate-interconnected memory-co-planar neural network processor of any of claims 1-5, wherein the off-chip volatile memory chip is a DRAM chip.
- 7. The three-dimensional integrated and baseboard-interconnected memory-computing-integrated neural network processor of claim 6, wherein in a large language model computing scenario, the off-chip volatile memory chip is configured to store Key-Value cache data, an activation Value, and intermediate computing results.
- 8. The three-dimensional integrated and baseboard-interconnected memory-as set forth in any one of claims 1-5, wherein the off-chip nonvolatile memory chip is Flash.
- 9. The three-dimensional integrated and baseboard-interconnected memory-integrated neural network processor of claim 8, wherein in a large language model computing scenario, the off-chip non-volatile memory chip is configured to store Q-generation weights, K-generation weights, V-generation weights, attention weights, and FFN weights.
- 10. The three-dimensional integrated and baseboard-interconnected storage-and-computation-integrated neural network processor of any one of claims 1-5, wherein the off-chip volatile memory chip is composed of an off-chip volatile memory module, or, The three-dimensional stacking technology is applied to at least two off-chip volatile memory modules for integration, wherein each off-chip volatile memory module is integrated from front to front, one side, close to a logic control layer, of a structure formed by three-dimensional stacking integration of the at least two off-chip volatile memory modules is defined as the front of the off-chip volatile memory chip, and one side, close to a memory array layer, is defined as the back of the off-chip volatile memory chip.
- 11. The three-dimensional integrated and baseboard-interconnected storage-and-computation-integrated neural network processor of any one of claim 1-5, wherein the off-chip nonvolatile memory chip is composed of one off-chip nonvolatile memory module, or, And integrating at least two off-chip nonvolatile memory modules by using a three-dimensional stacking technology, wherein each off-chip nonvolatile memory module is integrated from front to front, one side, close to a logic control layer, of a structure formed by three-dimensional stacking integration of the at least two off-chip nonvolatile memory modules is defined as the front of the off-chip nonvolatile memory chip, and one side, close to a memory array layer, is defined as the back of the off-chip nonvolatile memory chip.
- 12. The three-dimensional integrated and substrate-interconnected memory-integrated neural network processor of claim 2, wherein the logic chip is comprised of one logic module, or, The three-dimensional stacking technique is applied by at least two logic modules for integration.
- 13. The three-dimensional integrated and substrate-interconnected computational-memory neural network processor of claim 1, wherein the three-dimensional stacking technique is one or more of a through-silicon-via technique, a hybrid bonding technique, or a flip-chip technique.
- 14. The three-dimensional integrated and baseboard-interconnected memory-integrated neural network processor of claim 2, wherein the digital in-memory computing unit is integrated from an on-chip volatile memory and a computing unit, the on-chip volatile memory being one or more of SRAM, eDRAM, DRAM, flash, MRAM, reRAM.
- 15. The three-dimensional integrated and baseboard-interconnected computational-memory neural network processor of claim 5, wherein the global controller is configured to perform fine dynamic scheduling of data based on bonding scheme interconnection characteristics of three-dimensional stacking technology and inherent properties of each storage medium.
- 16. The three-dimensional integrated and baseboard-interconnected computational-memory neural network processor of claim 5, wherein the global controller is configured to support configuration of data storage mapping rules by upper-level software.
- 17. The three-dimensional integrated and baseboard-interconnected memory-integrated neural network processor of claim 5, wherein said off-chip volatile memory chip comprises an interface conforming to JEDEC standard, which interface is adapted to said off-chip volatile memory chip controller such that said neural network processor is accessible by external devices as standard DRAM memory; The interface protocol satisfies one or more of LPDDR5, LPDDR6, HBM2, HBM3e, HBM4, GDDR5, GDDR6, GDDR7, DDR5, DDR 6.
- 18. The three-dimensional integrated and baseboard-interconnected memory neural network processor of claim 8, wherein said Flash comprises an interface conforming to JEDEC or industry universal storage interface standards, said neural network processor Flash being treated as a standard Flash memory access, said interface protocol satisfying one or more of SPI, QSPI, octal SPI, eMMC, UFS, NVMe, PARALLEL FLASH.
- 19. The three-dimensional integrated and baseboard-interconnected memory neural network processor of claim 1, wherein the off-chip nonvolatile memory chip or the logic chip, the off-chip volatile memory chip are interconnected with the baseboard through a silicon interposer.
- 20. The three-dimensional integrated and substrate-interconnected memory-co-located neural network processor of any of claims 1-5, wherein the front side of the logic chip is interconnected with the front side of the off-chip volatile memory chip through the substrate.
Description
Three-dimensional integrated and substrate interconnected memory and calculation integrated neural network processor Technical Field The application relates to the technical field of semiconductor chips, in particular to a three-dimensional integrated and substrate interconnected storage and calculation integrated neural network processor and a control method thereof. Background The large language model (Large Language Models, LLMs) has been a core evolution direction of natural language processing (Natural Language Processing, NLP) technology in the field of artificial intelligence (ARTIFICIAL INTELLIGENCE, AI), and has shown an exponential development situation in recent years. The technical maturity and application breadth of the method are continuously broken through, and the method has the irreplaceable advantages in a plurality of key fields. Meanwhile, the development process of LLMs gradually shows a core rule which is widely verified by the academia and the industry, namely, a scale effect (SCALING LAW). The core connotation of the rule is that under the support of sufficient high-quality training data and matched computing resources, the core performance index (such as confusion Perplexity of language generation and accuracy of downstream tasks) of LLMs and the model parameter scale show a remarkable positive correlation. As shown in fig. 1, the model performance score increases continuously as the number of parameters increases exponentially. Conventional computer systems achieve capacity, speed and cost tradeoffs by "storage system architecture" (register → on-chip cache → main memory → local disk → remote storage), as shown in fig. 3, with each tier exhibiting significant differences in relative cost, storage density, bandwidth, access latency and power consumption. However, with the increase of integration level, the thermal management problem of the chip is increasingly prominent, and the existing integrated architecture faces serious challenges: In a conventional all-vertical stacked three-dimensional architecture, a logic chip with high computation power, a volatile memory chip for high-speed reading and writing, and a nonvolatile memory chip are often stacked on the same vertical axis. Since both the logic chip and the volatile memory chip are high heat sources, such dense stacking results in high accumulation of heat in the vertical direction, which is difficult to dissipate. Particularly, the high-frequency overturning heat of the volatile memory chip is overlapped with the operation heat of the logic chip, so that a hot spot (Hotspot) is easily formed, the system is forced to reduce the frequency, and the performance of a processor is influenced. Accordingly, there is a need for improvements to existing neural network processors. Disclosure of Invention The application provides a three-dimensional integrated and substrate interconnected memory-computing integrated neural network processor, which comprises an off-chip volatile memory chip, an off-chip nonvolatile memory chip and a logic chip, wherein the off-chip volatile memory chip is connected with the logic chip and is used for carrying out information interaction with the logic chip, the off-chip nonvolatile memory chip is connected with the logic chip and is used for carrying out information interaction with the logic chip, the logic chip is configured to realize any combination of the following functions of executing a calculation task, carrying out data caching and realizing information interaction, the off-chip nonvolatile memory chip and the logic chip are integrated by a three-dimensional stacking technology to form a three-dimensional integrated module, the three-dimensional integrated module and the off-chip nonvolatile memory chip are integrated by a substrate interconnection technology, and the logic chip is located below the off-chip nonvolatile memory chip in the vertical direction in the three-dimensional integrated module and is bonded with the off-chip nonvolatile memory chip through the back surface. The back of the logic chip is bonded with the off-chip nonvolatile memory chip because the heat dissipation requirement is larger when the logic chip executes a calculation task, and the back of the logic chip is mainly composed of a silicon substrate, so that the heat of the logic chip can be prevented from being diffused to the off-chip nonvolatile memory chip to a certain extent, meanwhile, the front of the logic chip is mainly composed of metal and can dissipate heat through a substrate and a heat dissipation structure connected with the substrate, in addition, the heat dissipation of the off-chip nonvolatile memory chip is higher because of the high-speed access characteristic of the off-chip nonvolatile memory chip, and the two chips with higher heat dissipation are separately arranged through the substrate interconnection technology, so that the excessive concentration of the heat can be avoided, and the whole hea