CN-122021754-A - AI processor and method based on memory-accounting integration, three-dimensional integration and memory sharing

CN122021754ACN 122021754 ACN122021754 ACN 122021754ACN-122021754-A

Abstract

The application relates to an AI processor based on memory integration, three-dimensional integration and memory sharing and a control method thereof, comprising a storage unit, a calculation unit, an interface device and a control module, wherein the calculation unit is coupled with at least one storage unit through a three-dimensional integration technology and comprises a neural network processing unit used for executing related calculation of the neural network, the interface device is arranged at the calculation unit and used for communicatively coupling the AI processor with external main control equipment, the control module is arranged at the storage unit and used for realizing switching of the storage unit between a first working mode and a second working mode, the storage unit is in communication coupling with the external main control equipment through the interface device and is used for responding to a read-write request to execute storage operation, the storage unit is in communication coupling with the neural network processing unit and is used for responding to the calculation request of the neural network processing unit to assist in executing calculation operation, and the time division multiplexing high-performance AI acceleration and a general storage function are compatible through the switching of the working modes.

Inventors

CHEN PEIYU
YE LE
WANG ZHIXUAN
LIU YING

Assignees

无锡微纳核芯电子科技有限公司
杭州微纳核芯电子科技有限公司

Dates

Publication Date: 20260512
Application Date: 20260203

Claims (20)

1. An AI processor based on memory integration, three-dimensional integration and memory sharing, comprising: the storage unit is used for storing system data, neural network weight data, characteristic data and intermediate calculation results; The computing unit is communicatively coupled with the storage unit through a three-dimensional integration technology, the computing unit comprises a neural network processing unit based on a memory integrated array, the neural network processing unit is used for executing neural network related computation by using the data of the storage unit, and the three-dimensional integration technology is at least one selected from the group consisting of hybrid bonding, through silicon vias, flip chips and micro bump connection; The interface device is arranged on the computing unit and used for communicatively coupling the AI processor with external main control equipment; The control module is arranged on the storage unit and is used for realizing the switching of the storage unit between a first working mode and a second working mode, wherein: In the first working mode, the storage unit establishes communication coupling with the external main control equipment through the interface device, and responds to a read-write request of the external main control equipment to execute storage operation in a first storage space; In the second operation mode, the storage unit establishes communication coupling with the neural network processing unit through a calculation special interface, and the calculation operation is performed in the first storage space of the storage unit in an auxiliary manner in response to the calculation request of the neural network processing unit.
2. The AI processor of claim 1, wherein the control module is configured to switch modes by switching control signal receiving ends of the memory units: In the first working mode, the control end of the storage unit is used for responding to the control signal sent by the external main control equipment and not responding to the control signal sent by the neural network processing unit; In the second working mode, the control end of the storage unit is used for responding to the control signal sent by the neural network processing unit and not responding to the control signal sent by the external main control equipment.
3. The AI processor of claim 2, wherein the mode switching control mode of the control module comprises at least one of: switching control modes in response to predefined instructions sent by the external master device via the interface means, and And monitoring the working load parameter of the AI processor in real time, and executing mode switching when the neural network computing task trigger is monitored or the load of the neural network processing unit meets the preset condition.
4. The integrated, three-dimensional integrated and memory sharing-based AI processor of claim 1 wherein the interface means is compatible with at least one JEDEC memory standard protocol selected from at least one of LPDDR4, LPDDR4X, LPDDR, LPDDR5X, LPDDR6, HBM2, HBM3, HBM4, DDR5, DDR6, GDDR5, GDDR6, GDDR7, UFS, eMMC.
5. The AI processor of claim 1 wherein the data stored by the storage unit comprises a neural network pre-training weight, a feature vector input in real time, an intermediate calculation result generated by the neural network processing unit in the calculation process, an output result after the calculation by the neural network processing unit is completed, general system data of the external master control device, and an operating system instruction.
6. The AI processor of claim 1 wherein the computationally intensive array of neural network processing units is a computationally intensive array configured to perform matrix vector multiplication, multiply-add operations in a neural network supporting massively parallel computations.
7. The AI processor of claim 1 wherein, in the first mode of operation, the memory unit is schedulable by an operating system of an external host device for storing instructions, general data for non-artificial intelligence tasks to supplement the memory capacity of the operating system.
8. The AI processor of claim 1 wherein, in the second mode of operation, the neural network processing unit accesses the memory unit via a vertical interconnect path formed by a three-dimensional integration technique, the communication bandwidth of the vertical interconnect path is higher than the communication bandwidth of the interface device, and data interaction between the memory unit and the neural network processing unit is not required to pass through an external host device to reduce data handling energy consumption.
9. The AI processor of claim 1 wherein the external host chip recognizes the AI processor as a standard memory device and can be interconnected with the device without designing a dedicated driver or modifying hardware architecture for the AI processor.
10. The AI processor of claim 1, wherein the computing results output by the neural network processing unit are transmitted via at least one of writing directly into a designated memory area of the memory unit, transmitting to the external host chip via the interface device, or transmitting to other expansion memory units via a three-dimensional integrated connection structure.
11. The AI processor of claim 1, wherein the apparatus is adapted to a mobile terminal device, a PC, a server, a high performance computing device by adjusting a protocol configuration of the interface apparatus, a number of memory units, a number of computing unit neural network processing units.
12. The AI processor of claim 1, wherein the interface device is further configured with a data buffer module for temporarily storing data transmitted between the external host chip and the storage unit and between the external host chip and the computing unit, thereby avoiding congestion during data transmission and improving stability of data interaction.
13. A control method of an AI processor, applying the AI processor based on memory integration, three-dimensional integration and memory sharing according to any one of claims 1 to 12, comprising the steps of: Detecting the working state of the AI processor and/or the external instruction received by the AI processor, and Switching of the storage unit between the first working mode and the second working mode is achieved based on the working state and/or an external instruction, wherein: In the first operation mode, the storage unit establishes communication coupling with the external master control device through the interface device, performs a storage operation in response to a read-write request of the external master control device, In the second operating mode, the storage unit establishes communicative coupling with the neural network processing unit via a computation-specific interface, and performs a computation operation in response to a computation request by the neural network processing unit.
14. The AI processor control method of claim 13, wherein the mode switching is triggered in response to a predefined command from an external host device or automatically based on a system load parameter.
15. The AI processor control method of claim 13, wherein switching from the first operating mode to the second operating mode includes: The control module shields an external main control equipment access request received by the interface device; Switching a control signal receiving end of the storage unit from responding to external main control equipment to responding to a neural network processing unit; activating a vertical interconnect communication link between the neural network processing unit and the memory unit, and And waking up the neural network processing unit and loading configuration parameters required by the neural network calculation.
16. The AI processor control method of claim 13, wherein switching from the second operating mode to the first operating mode includes: The control module pauses the current computing task of the neural network processing unit or waits for the completion of the current computing task; Switching the control signal receiving end of the storage unit from the response neural network processing unit back to the response external master control equipment; releasing the shielding of the interface device and recovering the access function of the external main control equipment to the storage unit; And setting the neural network processing unit into an idle state.
17. The AI processor control method of claim 14, wherein the condition for automatically triggering a handoff based on a system load parameter includes at least one of: the length of the calculation task queue of the neural network processing unit exceeds a preset threshold; The memory access frequency of the external master control equipment to the storage unit is lower than a preset threshold value; the priority of the neural network computing task in the system is higher than that of the general data storage task; And detecting that the currently executed task is a memory-access intensive AI task such as large language model reasoning, image recognition and the like.
18. The AI processor control method of claim 14, wherein the system load parameter monitored by the control module comprises at least one of: the read-write request frequency of the external main control equipment to the storage unit; calculating the length of a task queue by the neural network processing unit; The free storage capacity of the storage unit; the neural network calculates the single data demand of the task; the neural network processing unit calculates a load duty cycle.
19. The AI processor control method as set forth in claim 14, wherein in said second operating mode, said memory unit and said neural network processing unit data interaction logic comprises said memory unit broadcasting a pre-stored neural network weight to said neural network processing unit; the storage unit receives the intermediate calculation result output by the neural network processing unit and temporarily stores the intermediate calculation result to a designated storage block, and after the neural network processing unit completes all calculation, the storage unit writes the final calculation result into a preset result storage block.
20. The AI processor control method of claim 14, wherein the external instruction to trigger a mode switch is a command conforming to JEDEC memory standard protocol.

Description

AI processor and method based on memory-accounting integration, three-dimensional integration and memory sharing Technical Field The application relates to the technical field of artificial intelligence, in particular to an AI processor based on memory integration, three-dimensional integration and memory sharing and a control method thereof. Background Combining PIM technology with three-dimensional integration technology, a new type of neural network processor device can be constructed, typically comprising a "computational-in-memory-array computing layer" and a "data storage layer", stacked by three-dimensional integration. The architecture can break through the bottleneck of calculation intensity and bandwidth at the same time in theory, and greatly improves the processing efficiency of the neural network. The present invention relates generally to the field of semiconductor integrated circuit technology. More particularly, the present invention relates to a neural network processor device based on Processing-in-Memory (PIM) and three-dimensional Integration (3D Integration) techniques, and further to a method for managing and allocating resources in the device, in particular arbitrating access to shared Memory resources by an external host processor and an internal compute engine. While the above-described "pim+3D integrated" processor device has advantages in terms of theoretical performance, when integrated into a complete computing system (e.g., smart phone, edge computing device), it faces the core challenge of system-level integration-efficient communication adaptation with an external host chip (e.g., system-on-chip SoC). The existing communication interface scheme has obvious defects, and the method is as follows: 1. PCIe interface scheme PCIe is a universal high-speed bus standard, but this approach has two major limitations: Hardware compatibility is poor, namely, a main SoC is required to support a PCIe protocol, and an end-side application (such as a smart phone) is extremely sensitive to power consumption, cost and chip area, and the main SoC is not generally integrated with a PCIe interface, so that the processor cannot adapt to an end-side scene; the PCIe protocol stack is complex, and extra hardware overhead (such as protocol processing units) is introduced in a low-power-consumption end-side scene, which is contrary to the requirement of light-weight integration of the end side. 2. Custom interface scheme The custom interface realizes data interaction through a private communication protocol, and although the performance can be optimized for a specific scene, the problem of more serious expansibility exists: The cooperative design binding is that deep cooperative design is needed to be carried out with interconnection equipment (such as a main SoC), so that the selection range of a main control chip is strictly limited (the main control chip cannot be compatible with a general SoC on the market); The popularization cost is high, the private protocol needs to independently formulate technical specifications and develop matched driving and testing tools, the complexity of system design, the research and development period and the cost are obviously increased, and the universality and marketing popularization of the processor are seriously hindered. Accordingly, there is a need for improvements to existing neural network processors. Disclosure of Invention In order to solve the problems, the application provides an AI processor (namely the neural network processor) based on memory integration, three-dimensional integration and memory sharing, which comprises a storage unit, a calculation unit and a control module, wherein the storage unit is configured to store system data, neural network weight data, characteristic data and intermediate calculation results, the calculation unit is communicatively coupled with the storage unit through a three-dimensional integration technology, the calculation unit comprises a neural network processing unit based on a memory integration array and is configured to perform neural network related calculation by using data of the storage unit, the three-dimensional integration technology is selected from at least one of hybrid bonding, through-silicon-via, flip chip and micro-bump connection, the interface device is arranged on the calculation unit and is used for communicatively coupling the AI processor with an external master control device, the control module is arranged on the storage unit and is used for realizing switching of the storage unit between a first working mode and a second working mode, the storage unit is communicatively coupled with the external master control device through the interface device in the first working mode, the storage unit is used for responding to a write-in a first working space of the external master control device, the storage unit is used for responding to a write-in the external master control device, and the neural ne