CN-122021756-A - Three-dimensional stacked neural network processor based on integrated memory and calculation

CN122021756ACN 122021756 ACN122021756 ACN 122021756ACN-122021756-A

Abstract

The application relates to artificial intelligence, in particular to a three-dimensional stacked neural network packaging structure based on a memory and calculation integration and a processing method. The device comprises a computing layer, a storage layer, a three-dimensional integration unit, a memory interface control device and a memory interface control device, wherein the computing layer comprises a neural network processing unit based on a memory integrated architecture and is used for executing neural network computation, the storage layer comprises a storage unit and is used for storing data required by the computing layer, the three-dimensional integration unit is positioned between the computing layer and the storage layer and is used for realizing vertical connection between the computing layer and the storage layer, the memory interface control device is configured to receive an access instruction of an external main control chip and analyze the access instruction into a control instruction for the neural network processing unit or the storage unit through a protocol remapping mechanism, and the device and at least one memory chip are integrated in the same packaging structure and share the memory interface control device to interact with the outside. Thus, the performance of the processor is greatly improved, and the chip design and area are saved.

Inventors

WANG ZHIXUAN
YAN FENGYUN
CHEN PEIYU
LIU YING

Assignees

无锡微纳核芯电子科技有限公司
杭州微纳核芯电子科技有限公司

Dates

Publication Date: 20260512
Application Date: 20260203

Claims (16)

1. A three-dimensional stacked neural network processor based on a memory algorithm, wherein the packaging structure comprises: The computing layer comprises a neural network processing unit based on a memory-computing integrated architecture and is used for executing neural network computation; the storage layer comprises a storage unit and is used for storing data required by the calculation layer; the three-dimensional integrated unit is positioned between the calculation layer and the storage layer and is used for realizing vertical connection between the calculation layer and the storage layer; The memory interface control device is configured to receive an access instruction of an external main control chip, and analyze the access instruction into a control instruction for the neural network processing unit or the memory unit through a protocol remapping mechanism; The processor and at least one memory chip are integrated in the same packaging structure, and share the memory interface control device to interact with the outside.
2. The processor of claim 1, wherein the protocol remapping mechanism includes address remapping, the memory interface control is configured to map an access operation to a particular address in an external memory address space as a launch, configuration, or status query instruction to the neural network processing unit.
3. The processor of claim 1, wherein the protocol remapping mechanism is instruction remapping, and the memory interface control is configured to parse a reserved instruction or a low frequency use instruction in the memory unit and define it as control instructions for the neural network processing unit at the same time.
4. The processor of claim 1, wherein the three-dimensional integrated unit implements a vertical connection between the compute layer and the memory layer by at least one of through-silicon via technology, flip chip technology, hybrid bonding technology, and micro bump connection technology.
5. The processor of claim 1, wherein the memory interface control is disposed at the compute layer or the storage layer.
6. The processor of claim 5, further comprising an error checking and correction circuit, the error checking and correction circuit deployment location being associated with a location of the memory interface control device; the error checking and correcting circuit is integrated in the storage layer when the memory interface control device is arranged in the storage layer, and the error checking and correcting circuit is integrated in the calculation layer or the storage layer when the memory interface control device is arranged in the calculation layer.
7. The processor of claim 1, further comprising a timing compensation module for dynamically adjusting data transfer timing to maintain interface timing consistency with the memory chip when the memory interface control device and the three-dimensional integrated unit generate additional signal delays.
8. The processor of claim 1, wherein the calculation result output by the memory-based neural network processing unit is transmitted by: Writing into the appointed area of the storage layer, transmitting to the external main control chip through the memory interface control device, or And transmitting the data to the memory chip through the three-dimensional integrated unit.
9. The processing method based on the memory integrated neural network processor is characterized by being applied to a co-packaging structure, wherein the co-packaging structure comprises a three-dimensional stacked neural network processor based on the memory integrated and a memory chip; the method comprises the following steps: the processor receives and analyzes a designated instruction sent by an external main control chip; If the analysis result is a memory access instruction, executing corresponding data read-write operation on the memory chip or a memory layer in the processor; and if the analysis result is a calculation instruction, triggering a calculation layer of the processor to execute calculation.
10. The method of claim 9, wherein the computing layer of the processor performs the step of computing, comprising: The calculating layer acquires weight data and input characteristic data calculated by the neural network from the storage layer; performing matrix vector multiplication operation based on the weight data and input characteristic data in a memory-calculation integrated array of the calculation layer; And writing the calculated result obtained by the operation back to the appointed area of the storage layer.
11. The method of claim 9, further comprising a data verification step of: The data is checked and/or corrected by error checking and correction circuitry associated with a memory interface control device in the processor as the data passes through the memory interface control device.
12. The method of claim 9, further comprising a timing coordination step of: When the clock frequency or the data transmission path is changed, the delay of the signals inside the processor is dynamically adjusted through a time sequence compensation module in the processor so as to keep the time sequence consistency of the signals and the memory chip on a shared interface.
13. The method of claim 11, wherein the step of writing back the computed result of the operation comprises at least one of the following paths: writing the calculation result back to a storage layer within the processor, or Transmitting the calculation result to a memory chip in the co-packaging structure, or And outputting the calculation result to the main control chip through the memory interface control device.
14. A server comprising a memory and one or more processors, the memory for storing instructions for execution by the processors, and the processor for performing the method of any one of claims 9 to 13.
15. A computer program product comprising computer programs/instructions which, when executed by a processor, implement the method of any of claims 9 to 13.
16. A computer readable storage medium having instructions stored thereon, which when executed on a computer, cause the computer to perform the method of any of claims 9 to 13.

Description

Three-dimensional stacked neural network processor based on integrated memory and calculation Technical Field The application relates to artificial intelligence, in particular to a three-dimensional stacked neural network processor based on memory and calculation integration and a processing method. Background With the rapid development of artificial intelligence technology, a deep neural network model represented by a transducer structure has made extremely high demands on the computing power and data storage bandwidth of hardware. The traditional von neumann architecture has the defects that the computing unit is physically separated from the storage unit, data is required to be frequently carried between the computing unit and the storage unit through a system bus, and the bottleneck of a storage wall is caused by the fact that the computing density is seriously low and the energy efficiency is seriously lost, so that the requirement of a large model in a real-time reasoning scene is difficultly met. In order to break through the bottleneck, the integrated computational and memory (CIM) technology and the three-dimensional integration technology are gradually becoming the focus of industry research. The integrated memory technology eliminates the overhead of weight data handling by directly executing matrix vector multiplication in the memory array, and the three-dimensional integration technology (such as TSV silicon through holes and hybrid bonding) constructs ultra-high density interconnection channels through vertical stacking of chips, thereby greatly improving the interlayer transmission bandwidth. However, in practical applications, especially in mobile terminal side scenarios such as smart phones, smart wear, and the internet of things, the following core pain points still exist: First, the packaging space is extremely limited. The PCB area of the mobile terminal device is extremely precious, and conventional neural Network Processors (NPUs) are typically deployed as separate chips, which not only occupy additional packaging area, but also increase the complexity of system wiring. Secondly, interface resources contradict compatibility. Existing high performance AI acceleration chips often require custom interfaces or specific control logic, which requires complex hardware adaptations by the host chip (SoC). In the end-side device, the pin resources of the main control chip are limited, and it is difficult to sacrifice the existing memory interface resources for the NPU deployment. Finally, signal consistency and integration efficiency are low. Most of the existing three-dimensional stacking schemes only solve the bandwidth problem inside a single chip, but fail to effectively solve the coexistence and cooperative work problem between a processor and a standard memory chip (such as a conventional DRAM). When the processor is packaged separately from the external memory, the overall energy efficiency ratio is still limited by interface delays and signal attenuation. Disclosure of Invention Aiming at the problems, the embodiment of the application provides a three-dimensional stacked neural network processor and a processing method based on a memory and calculation integration. The three-dimensional stacked neural network processor based on the memory integration comprises a calculation layer, a storage layer, a three-dimensional integration unit, a memory interface control device and a protocol remapping mechanism, wherein the calculation layer comprises a neural network processing unit based on the memory integration architecture and is used for executing neural network calculation, the storage layer comprises a storage unit and is used for storing data required by the calculation layer, the three-dimensional integration unit is located between the calculation layer and the storage layer and is used for realizing three-dimensional connection between the calculation layer and the storage layer, the memory interface control device is used for receiving an access instruction of an external main control chip and analyzing the access instruction into a control instruction of the neural network processing unit or the storage unit through the protocol remapping mechanism, and the processor is integrated in the same packaging structure with at least one memory chip and shares the memory interface control device with the memory chip to conduct information interaction with the outside of the packaging structure. According to the application, the calculation layer, the storage layer and the external memory chip (such as DRAM particles) are integrated in the same packaging structure, so that the occupied area of the PCB is obviously reduced, and the pain point that the space of terminal equipment (such as a mobile phone and an Internet of things terminal) is limited and the high-performance AI chip cannot be independently deployed is solved. In addition, the computing chip and the standard memory chip share t