CN-121998004-A - Three-dimensional integrated and substrate interconnected memory and calculation integrated neural network processor

CN121998004ACN 121998004 ACN121998004 ACN 121998004ACN-121998004-A

Abstract

The application relates to a memory and calculation integrated neural network processor based on three-dimensional integration and substrate interconnection and a control method thereof, comprising an off-chip volatile memory chip, an off-chip nonvolatile memory chip and a logic chip, wherein the off-chip volatile memory chip is connected with the logic chip and used for carrying out information interaction with the logic chip, the off-chip nonvolatile memory chip is connected with the logic chip and used for carrying out information interaction with the logic chip, the logic chip is configured to realize any combination of the following functions of executing a calculation task, carrying out data caching and realizing information interaction, the off-chip volatile memory chip and the off-chip nonvolatile memory chip are integrated in the vertical direction by applying a three-dimensional stacking technology to form a three-dimensional integrated module, and the three-dimensional integrated module and the logic chip are integrated through the substrate interconnection technology, so that the logic chip with higher heat generation and the off-chip volatile memory chip respectively dissipate heat to facilitate heat management and improve calculation efficiency.

Inventors

WANG ZHIXUAN
YAN FENGYUN
CHEN PEIYU
LIU YING

Assignees

无锡微纳核芯电子科技有限公司
杭州微纳核芯电子科技有限公司

Dates

Publication Date: 20260508
Application Date: 20260130

Claims (20)

1. A memory-to-compute unified neural network processor for three-dimensional integration and substrate interconnection, comprising: the off-chip volatile memory chip is connected with the logic chip and is used for carrying out information interaction with the logic chip; an off-chip nonvolatile memory chip connected with the logic chip for information interaction with the logic chip, and The logic chip is configured to implement a combination of any of the following functions: Executing a computing task, carrying out data caching and realizing information interaction; The off-chip volatile memory chip and the off-chip nonvolatile memory chip are integrated by a three-dimensional stacking technology to form a three-dimensional integrated module, and the three-dimensional integrated module and the logic chip are integrated by a substrate interconnection technology.
2. The three-dimensional integrated and substrate-interconnected memory-to-computer neural network processor of claim 1, wherein the off-chip volatile memory chip is located on top in a vertical direction.
3. The three-dimensional integrated and substrate-interconnected computational-memory-neural-network processor of claim 1, wherein the off-chip nonvolatile memory chip is located on top in a vertical direction.
4. A three-dimensional integrated and baseboard-interconnected computational-memory neural network processor according to any one of claims 1-3, wherein the logic chip is a computing chip for performing computational tasks, the logic chip being integrated with a digital in-memory computing unit.
5. A three-dimensional integrated and baseboard-interconnected memory-as claimed in any one of claims 1-3, wherein the logic chip is a data cache chip for data caching.
6. A three-dimensional integrated and baseboard-interconnected memory neural network processor according to any one of claims 1-3, wherein the logic chip is an interface chip for implementing information interaction.
7. The three-dimensional integrated and baseboard-interconnected memory-integrated neural network processor of claim 4, wherein said logic chip has integrated thereon at least one off-chip nonvolatile memory chip controller, at least one off-chip volatile memory chip controller, and a global controller; The off-chip nonvolatile memory chip controller is used for accessing the off-chip nonvolatile memory chip; the off-chip volatile memory chip controller is used for accessing the off-chip volatile memory chip; The global controller is used for managing the off-chip volatile memory chip controller, the off-chip nonvolatile memory chip controller and the digital in-memory computing unit, and realizing dynamic adaptation of a data storage scheme and a computing data stream so as to globally optimize storage capacity, data bandwidth, access delay and computing efficiency.
8. A three-dimensional integrated and baseboard-interconnected memory neural network processor according to any one of claims 1-3, wherein the off-chip volatile memory chip is a DRAM chip.
9. The three-dimensional integrated and substrate-interconnected memory-computing-integrated neural network processor of claim 8, wherein the off-chip volatile memory chip is configured to store Key-Value cache data, activation values, and intermediate computation results in a large language model computation scenario.
10. The three-dimensional integrated and baseboard-interconnected memory-as claimed in any one of claims 1-3, wherein the off-chip nonvolatile memory chip is Flash.
11. The three-dimensional integrated and baseboard-interconnected memory-integrated neural network processor of claim 10, wherein in a large language model computing scenario, the off-chip non-volatile memory chip is configured to store Q-generation weights, K-generation weights, V-generation weights, attention weights, and FFN weights.
12. The three-dimensional integrated and substrate-interconnected memory-as set forth in any one of claims 1-3, wherein the off-chip volatile memory chip is constituted by an off-chip volatile memory module, or, The three-dimensional stacking technique is applied by at least two off-chip volatile memory modules.
13. The three-dimensional integrated and substrate-interconnected memory-integrated neural network processor of claim 1-3, wherein the off-chip nonvolatile memory chip is comprised of an off-chip nonvolatile memory module, or, The integration is applied by at least two off-chip non-volatile memory modules using a three-dimensional stacking technique.
14. A three-dimensional integrated and baseboard-interconnected storage neural network processor according to any one of claims 1-3, wherein the logic chip is formed by one logic module or at least two logic modules are integrated by applying a three-dimensional stacking technique.
15. A three-dimensional integrated and substrate-interconnected computational neural network processor according to any of claims 1-3, wherein the three-dimensional stacking technique is one or more of a through-silicon via technique, a hybrid bonding technique, or a flip-chip technique.
16. The three-dimensional integrated and baseboard-interconnected memory-computing-integrated neural network processor of claim 4, wherein the digital in-memory computing unit is integrated with the computing unit by an on-chip volatile memory chip, the on-chip volatile memory chip being one or more of SRAM, eDRAM, DRAM, flash, MRAM, reRAM.
17. The three-dimensional integrated and baseboard-interconnected computational-memory neural network processor of claim 7, wherein the global controller is configured to perform fine dynamic scheduling of data based on bonding scheme interconnection characteristics of three-dimensional stacking technology and inherent properties of each storage medium.
18. The three-dimensional integrated and baseboard-interconnected computational-memory neural network processor of claim 7, wherein the global controller is configured to support configuration of data storage mapping rules by upper-level software.
19. The three-dimensional integrated and baseboard-interconnected memory-integrated neural network processor of claim 7, wherein said off-chip volatile memory chip comprises an interface conforming to JEDEC standard, which interface is adapted to an off-chip volatile memory chip controller integrated on said logic chip such that the off-chip volatile memory chip of said neural network processor is accessible by an external device as a standard DRAM memory chip; The interface protocol satisfies one or more of LPDDR5, LPDDR6, HBM2, HBM3e, HBM4, GDDR5, GDDR6, GDDR7, DDR5, DDR 6.
20. The three-dimensional integrated and baseboard-interconnected memory-integrated neural network processor of claim 10, wherein said Flash comprises an interface conforming to JEDEC or industry universal storage interface standards, said neural network processor Flash being accessible as a standard Flash memory chip, The protocol of the interface satisfies one or more of SPI, QSPI, octal SPI, eMMC, UFS, NVMe, PARALLEL FLASH.

Description

Three-dimensional integrated and substrate interconnected memory and calculation integrated neural network processor Technical Field The application relates to the technical field of semiconductor chips, in particular to a three-dimensional integrated and substrate interconnected heterogeneous storage medium storage and calculation integrated neural network processor and a control method thereof. Background The large language model (Large Language Models, LLMs) has been a core evolution direction of natural language processing (Natural Language Processing, NLP) technology in the field of artificial intelligence (ARTIFICIAL INTELLIGENCE, AI), and has shown an exponential development situation in recent years. The technical maturity and application breadth of the method are continuously broken through, and the method has the irreplaceable advantages in a plurality of key fields. Meanwhile, the development process of LLMs gradually shows a core rule which is widely verified by the academia and the industry, namely, a scale effect (SCALING LAW). The core connotation of the rule is that under the support of sufficient high-quality training data and matched computing resources, the core performance index (such as confusion Perplexity of language generation and accuracy of downstream tasks) of LLMs and the model parameter scale show a remarkable positive correlation. As shown in fig. 1, the model performance score increases continuously as the number of parameters increases exponentially. Conventional computer systems achieve capacity, speed and cost tradeoffs by "storage system architecture" (register → on-chip cache → main memory → local disk → remote storage), as shown in fig. 3, with each tier exhibiting significant differences in relative cost, storage density, bandwidth, access latency and power consumption. However, in the process of pursuing high performance integration, the existing neural network processor architecture mainly faces the following technical challenges: First, the heat dissipation bottleneck brought by high density integration. In a high performance computing scenario, logic chips and volatile memory (e.g., DRAM) are the primary heat sources. Existing three-dimensional integration schemes tend to vertically stack logic chips directly with memory chips or to closely attach them by high density packaging. This arrangement, while shortening the interconnect path, results in a severe thermal coupling effect, such that the heat of the two are superimposed on each other and difficult to effectively dissipate. Particularly when the logic chip is running at full speed, the accumulated heat can significantly affect the stability of the temperature sensitive memory chip, which in turn can lead to the processor having to maintain thermal equilibrium by down-conversion, severely limiting the overall performance. Second, the limitations of a single storage medium. Conventional storage architectures typically focus on a single type of storage medium (employing only volatile storage or non-volatile storage) to process all data. However, the data types involved in the neural network computing task are complex and various (such as frequently read-written intermediate feature data and weight model data stored for a long time), and it is difficult for a single storage medium to simultaneously satisfy multiple requirements of high throughput, low delay and data non-volatility, so that the utilization rate of storage resources is low, and flexible matching is difficult according to the data characteristics, so that the overall computing efficiency is reduced. Accordingly, there is a need for improvements to existing neural network processors. Disclosure of Invention The application provides a three-dimensional integrated and substrate interconnected memory-calculation integrated neural network processor, which comprises an off-chip volatile memory chip, an off-chip nonvolatile memory chip, a logic chip and a logic chip, wherein the off-chip volatile memory chip is connected with the logic chip and is used for carrying out information interaction with the logic chip, the off-chip nonvolatile memory chip is connected with the logic chip and is used for carrying out information interaction with the logic chip, the logic chip is configured to realize any combination of the following functions of executing calculation tasks, carrying out data caching and realizing information interaction, and the off-chip volatile memory chip and the off-chip nonvolatile memory chip are integrated by a three-dimensional stacking technology to form a three-dimensional integrated module, and the three-dimensional integrated module and the logic chip are integrated by a substrate interconnection technology. In the application scenario of high calculation power demand, the heat dissipation capability of the neural network processor directly affects the performance of the chip, and for the neural network processor, the heat of the packag