Search

CN-121998008-A - Three-dimensional integrated heterogeneous storage medium storing and calculating integrated neural network processor

CN121998008ACN 121998008 ACN121998008 ACN 121998008ACN-121998008-A

Abstract

The application relates to a three-dimensional integrated heterogeneous storage medium-based memory calculation integrated neural network processor and a control method thereof, wherein the three-dimensional integrated heterogeneous storage medium-based memory calculation integrated neural network processor comprises an off-chip volatile storage chip, an off-chip nonvolatile storage chip and a logic chip, wherein the off-chip volatile storage chip is connected with the logic chip and used for carrying out information interaction with the logic chip, the off-chip nonvolatile storage chip is connected with the logic chip and used for carrying out information interaction with the logic chip, the logic chip is configured to realize any combination of the functions of executing calculation tasks, carrying out data caching and realizing information interaction, the off-chip volatile storage chip, the logic chip and the off-chip nonvolatile storage chip are integrated by adopting a three-dimensional stacking technology, and the off-chip volatile storage chip is positioned at the top in the vertical direction and is connected with an external heat dissipation structure, so that the off-chip volatile storage chip with higher heat generation can better dissipate heat through the heat dissipation structure connected with the off-chip and the heat influence on a peripheral module is reduced as much as possible.

Inventors

  • CHEN PEIYU
  • YE LE
  • WANG ZHIXUAN
  • LIU YING

Assignees

  • 杭州微纳核芯电子科技有限公司

Dates

Publication Date
20260508
Application Date
20260409

Claims (20)

  1. 1. A memory-to-compute unified neural network processor for three-dimensional integrated heterogeneous storage media, comprising: the off-chip volatile memory chip is connected with the logic chip and is used for carrying out information interaction with the logic chip; an off-chip nonvolatile memory chip connected with the logic chip for information interaction with the logic chip, and The logic chip is configured to implement a combination of any of the following functions: Executing a computing task, carrying out data caching and realizing information interaction; the off-chip volatile memory chip, the logic chip and the off-chip nonvolatile memory chip are integrated by adopting a three-dimensional stacking technology, and the off-chip volatile memory chip is positioned at the top in the vertical direction and can be connected with an external heat dissipation structure.
  2. 2. The three-dimensional integrated heterogeneous storage medium memory-integrated neural network processor of claim 1, wherein the off-chip volatile memory chip, the logic chip and the off-chip nonvolatile memory chip are integrated sequentially in a vertical direction using a three-dimensional stacking technique.
  3. 3. The three-dimensional integrated heterogeneous memory medium of claim 2 wherein the off-chip volatile memory chip and the logic chip are bonded in a front-to-front bonding, and the off-chip nonvolatile memory chip is bonded to the logic chip in either a front or back bonding.
  4. 4. The three-dimensional integrated heterogeneous memory medium of claim 2 wherein the off-chip volatile memory chip and the logic chip are bonded back-to-back, the off-chip nonvolatile memory chip being front-or back-bonded to the logic chip.
  5. 5. The three-dimensional integrated heterogeneous storage medium of any of claims 1-4, wherein a global controller is integrated on the logic chip, the global controller being respectively connected to the off-chip volatile storage chip and the off-chip nonvolatile storage chip and configured to: And carrying out dynamic scheduling on data based on the bonding mode and task characteristics of the off-chip volatile memory chip and the off-chip nonvolatile memory chip and the logic chip and the storage medium attributes of the off-chip volatile memory chip and the off-chip nonvolatile memory chip.
  6. 6. The three-dimensional integrated heterogeneous storage medium of claim 5, wherein the global controller is configured to store high bandwidth demand task data to a memory chip that is face-to-face bonded with the logic chip.
  7. 7. The three-dimensional integrated heterogeneous storage medium of claim 5, wherein the global controller is configured to store thermally-sensitive task data to a memory chip bonded with the logic chip using front-to-back or back-to-back bonding.
  8. 8. The three-dimensional integrated heterogeneous storage medium memory-integrated neural network processor of claim 1, wherein the off-chip volatile memory chip, the off-chip nonvolatile memory chip and the logic chip are integrated sequentially in a vertical direction using a three-dimensional stacking technique.
  9. 9. The three-dimensional integrated heterogeneous storage medium of any of claims 1-4, wherein the logic chip is a computing chip that performs computing tasks, the logic chip being integrated with a digital in-memory computing unit.
  10. 10. The three-dimensional integrated heterogeneous storage medium of any of claims 1-4, wherein the logic chip is a data cache chip for data caching.
  11. 11. The three-dimensional integrated heterogeneous storage medium of any of claims 1-4, wherein the logic chip is an interface chip for enabling information interaction.
  12. 12. The three-dimensional integrated heterogeneous storage medium of any of claims 1-4, wherein the logic chip has integrated thereon at least one off-chip nonvolatile storage chip controller, at least one off-chip volatile storage chip controller, and a global controller, wherein, The off-chip nonvolatile memory chip controller is used for accessing the off-chip nonvolatile memory chip, and the off-chip nonvolatile memory chip controller is used for accessing the off-chip volatile memory chip; The global controller is used for managing the off-chip volatile memory chip controller, the off-chip nonvolatile memory chip controller and the digital in-memory computing unit, and realizing dynamic adaptation of a data storage scheme and a computing data stream so as to globally optimize storage capacity, data bandwidth, access delay and computing efficiency.
  13. 13. The three-dimensional integrated heterogeneous storage medium of any of claims 1-4, wherein the off-chip volatile memory chip is a DRAM chip.
  14. 14. The three-dimensional integrated heterogeneous storage medium memory-integrated neural network processor of claim 13, wherein in a large language model computing scenario, the off-chip volatile memory chip is configured to store Key-Value cache data, an activation Value, and intermediate computing results.
  15. 15. The three-dimensional integrated heterogeneous storage medium of any of claims 1-4, wherein the off-chip nonvolatile memory chip is Flash.
  16. 16. The three-dimensional integrated heterogeneous storage medium of claim 15, wherein in a large language model computing scenario, the off-chip nonvolatile memory chip is configured to store Q-generation weights, K-generation weights, V-generation weights, attention weights, and FFN weights.
  17. 17. The three-dimensional integrated heterogeneous storage medium of any of claims 1-4, wherein the off-chip volatile memory chip is comprised of an off-chip volatile memory module, or, The off-chip volatile memory chip is integrated by at least two off-chip volatile memory modules by using a three-dimensional stacking technology, wherein each off-chip volatile memory module is integrated from front to front, one side, close to a logic control layer, of a structure formed by three-dimensional stacking integration of the at least two off-chip volatile memory modules is defined as the front of the off-chip volatile memory chip, and one side, close to a memory array layer, is defined as the back of the off-chip volatile memory chip.
  18. 18. The three-dimensional integrated heterogeneous storage medium of any of claims 1-4, wherein the off-chip nonvolatile memory chip is comprised of an off-chip nonvolatile memory module, or, The off-chip nonvolatile memory chip is integrated by at least two off-chip nonvolatile memory modules by using a three-dimensional stacking technology, wherein each off-chip nonvolatile memory module is integrated from front to front, one side, close to a logic control layer, of a structure formed by three-dimensional stacking integration of the at least two off-chip nonvolatile memory modules is defined as the front of the off-chip nonvolatile memory chip, and one side, close to a memory array layer, is defined as the back of the off-chip nonvolatile memory chip.
  19. 19. The three-dimensional integrated heterogeneous storage medium of any of claims 1 to 4, wherein the logic chip is comprised of one logic module or, The logic chip is integrated by at least two logic modules using a three-dimensional stacking technique.
  20. 20. The three-dimensional integrated heterogeneous storage medium of any of claims 1-4, wherein the three-dimensional stacking technique is one or more of a through-silicon via technique, a hybrid bonding technique, or a flip chip technique.

Description

Three-dimensional integrated heterogeneous storage medium storing and calculating integrated neural network processor Technical Field The application relates to the technical field of semiconductor chips, in particular to a memory-calculation integrated neural network processor of a three-dimensional integrated heterogeneous storage medium and a control method thereof. Background The large language model (Large Language Models, LLMs) has been a core evolution direction of natural language processing (Natural Language Processing, NLP) technology in the field of artificial intelligence (ARTIFICIAL INTELLIGENCE, AI), and has shown an exponential development situation in recent years. The technical maturity and application breadth of the method are continuously broken through, and the method has the irreplaceable advantages in a plurality of key fields. Meanwhile, the development process of LLMs gradually shows a core rule which is widely verified by the academia and the industry, namely, a scale effect (SCALING LAW). The core connotation of the rule is that under the support of sufficient high-quality training data and matched computing resources, the core performance index (such as confusion Perplexity of language generation and accuracy of downstream tasks) of LLMs and the model parameter scale show a remarkable positive correlation. As shown in fig. 1, the model performance score increases continuously as the number of parameters increases exponentially. Conventional computer systems achieve capacity, speed and cost tradeoffs by "storage system architecture" (register → on-chip cache → main memory → local disk → remote storage), as shown in fig. 3, with each tier exhibiting significant differences in relative cost, storage density, bandwidth, access latency and power consumption. However, three-dimensional stacked structures, while delivering high performance, also present serious thermal management challenges: In the existing three-dimensional integration scheme, the stacking sequence is generally determined mainly by considering the convenience or electrical performance of interconnection wiring, and thermal property differences of chips with different properties are often ignored. When the volatile memory chip (such as DRAM) is frequently read and written, the heat productivity of the volatile memory chip is large, the heat cycle is long, the volatile memory chip is extremely sensitive to temperature, a local high heat flux density point exists on the logic chip, and the heat dissipation bottleneck can cause the chip to be forced to reduce the frequency (Throttling) due to overheating, so that the calculation efficiency is affected. Accordingly, there is a need for improvements to existing neural network processors. Disclosure of Invention The application provides a memory calculation integrated neural network processor of a three-dimensional integrated heterogeneous storage medium, which comprises an off-chip volatile storage chip, an off-chip nonvolatile storage chip and a logic chip, wherein the off-chip volatile storage chip is connected with the logic chip and is used for carrying out information interaction with the logic chip, the off-chip nonvolatile storage chip is connected with the logic chip and is used for carrying out information interaction with the logic chip, the logic chip is configured to realize any combination of the following functions of executing calculation tasks, carrying out data caching and realizing information interaction, the off-chip volatile storage chip, the logic chip and the off-chip nonvolatile storage chip are integrated by adopting a three-dimensional stacking technology, and the off-chip volatile storage chip is positioned at the top in the vertical direction and can be connected with an external heat dissipation structure. The application arranges the off-chip volatile memory chip (such as DRAM chip) on the top layer of the three-dimensional stacking structure, and the layout mainly comprises two aspects, namely, on one hand, compared with a logic chip with higher heating density when bearing a large amount of calculation tasks and an off-chip nonvolatile memory chip (such as Flash) with high storage density and only generating local heat, the off-chip volatile memory chip has higher self heating value and longer heat cycle, and the heat influence of high heating core particles on the peripheral module can be reduced when the off-chip volatile memory chip is arranged on the top layer, on the other hand, the top layer position can be more directly contacted with a heat dissipation structure (such as a heat dissipation cover, a heat conduction gasket and the like) outside the package, the heat conduction path is shorter, the efficiency is higher, and the heat conducted from the off-chip volatile memory chip and the lower chip to the top layer can be rapidly dissipated, so that the problems of upward accumulation of lower heat and heat dissipation blockage of the top