CN-122018784-A - Storage controller, storage system, computer device, and model reasoning method

CN122018784ACN 122018784 ACN122018784 ACN 122018784ACN-122018784-A

Abstract

The invention is applicable to the technical field of artificial intelligence, and relates to a storage controller, a storage system, computer equipment and a model reasoning method. The storage controller comprises a controller chip, a neural network processing unit and a task input/output unit, wherein the controller chip comprises a task input/output unit used for receiving task execution instructions from a computing unit, the neural network processing unit is used for executing computing tasks of at least one expert network in the mixed expert model according to the task execution instructions to obtain a first computing result, the task input/output unit is also used for writing the first computing result into a destination address indicated by the task execution instructions, and the computing unit is used for combining with the second computing results of other expert networks in the mixed expert model to obtain an inference result. The invention can meet the performance requirements of a large-scale model, particularly a mixed expert model on high concurrency and low time delay reasoning.

Inventors

LUO TING
WU DAWEI
LIN YIN
CHEN QIANG

Assignees

得一微电子股份有限公司

Dates

Publication Date: 20260512
Application Date: 20251223

Claims (10)

1. A memory controller comprising a controller chip, the controller chip comprising: the task input and output unit is used for receiving the task execution instruction from the computing unit; The neural network processing unit is used for executing the calculation task of at least one expert network in the mixed expert model according to the task execution instruction to obtain a first calculation result; The task input and output unit is further configured to write the first calculation result into a destination address indicated by the task execution instruction, and is used for the calculation unit to combine with the second calculation results of the rest of expert networks in the mixed expert model to obtain an inference result.
2. The memory controller according to claim 1, further comprising an internal buffer unit, wherein the controller chip further comprises a controller unit for receiving expert classification information from the calculation unit, and loading weight parameters of a first type of expert network in the hybrid expert model from a memory chip to the buffer unit of the calculation unit according to the expert classification information, loading weight parameters of a second type of expert network in the hybrid expert model to the internal buffer unit, and retaining weight parameters of a third type of expert network remaining in the hybrid expert model in the memory chip.
3. The memory controller of claim 2, wherein the weighted parameter hotness of the second type of expert network is lower than the weighted parameter hotness of the first type of expert network, but higher than the weighted parameter hotness of the third type of expert network.
4. The memory controller according to claim 2, wherein the neural network processing unit includes a calculation subunit and a cache subunit, the controller unit is further configured to load the target weight parameter of the expert network indicated by the task execution instruction from the internal cache unit or the memory chip to the cache subunit, and the calculation subunit is configured to execute a corresponding calculation task according to the target weight parameter in the cache subunit, so as to obtain a first calculation result.
5. The memory controller of claim 2, wherein the internal cache unit is bonded to the controller chip and employs the same address encoding rules as the controller chip.
6. The storage controller according to any one of claims 1 to 5, wherein the task input output unit is configured to compile a plurality of task execution instructions of a same computing unit into task stream instructions when receiving the plurality of task execution instructions of the same computing unit.
7. The storage controller of any one of claims 1-5, wherein the task input output unit is configured to aggregate the plurality of task execution instructions of different computing units into a batch task instruction when the plurality of task execution instructions of the different computing units are received.
8. A storage system, comprising: A memory chip; The memory controller of any one of claims 1-7.
9. A computer device, comprising: A calculation unit; the storage system of claim 7.
10. A method of model reasoning, comprising: sending a task execution instruction to the storage controller through the computing unit; According to the task execution instruction, executing a calculation task of at least one expert network in the mixed expert model through the storage controller to obtain a first calculation result; Executing the calculation tasks of the rest expert networks in the mixed expert model through the calculation unit to obtain a second calculation result; and obtaining an inference result through the computing unit according to the first computing result and the second computing result.

Description

Storage controller, storage system, computer device, and model reasoning method Technical Field The invention is applicable to the technical field of artificial intelligence, and particularly relates to a storage controller, a storage system, computer equipment and a model reasoning method. Background With the rapid development of artificial intelligence technology, the parameters and the calculation demands of large-scale models are exponentially increased. The current large-scale model can generate massive random access requests in the training and reasoning process, and extremely high requirements are put on the bandwidth, delay and concurrent processing capacity of a storage system. For example, the hybrid expert model is used as a typical sparse activation architecture, and only part of the expert network is activated in the reasoning process, so that the access mode is highly irregular, and the load pressure of the storage system is further increased. The traditional storage architecture is difficult to meet the performance requirements of the hybrid expert model on high concurrency and low time delay reasoning. Disclosure of Invention The embodiment of the invention provides a storage controller, a storage system, computer equipment and a model reasoning method, which can meet the performance requirements of a large-scale model, particularly a hybrid expert model, on high concurrency and low time delay reasoning. In a first aspect, an embodiment of the present invention provides a memory controller, including a controller chip, the controller chip including: the task input and output unit is used for receiving the task execution instruction from the computing unit; The neural network processing unit is used for executing the calculation task of at least one expert network in the mixed expert model according to the task execution instruction to obtain a first calculation result; The task input/output unit is also used for writing the calculation result into a destination address indicated by the task execution instruction, and is used for combining the second calculation result of the rest expert networks in the self-execution mixed expert model by the calculation unit to obtain an inference result. In a second aspect, a storage system provided by an embodiment of the present invention includes: A memory chip; The embodiment of the invention provides a memory controller. In a third aspect, a computer device provided in an embodiment of the present invention includes: A calculation unit; the embodiment of the invention provides a storage system. In a fourth aspect, the model reasoning method provided by the embodiment of the present invention includes: sending a task execution instruction to the storage controller through the computing unit; according to the task execution instruction, executing a calculation task of at least one expert network in the mixed expert model through a storage controller to obtain a first calculation result; Executing the calculation tasks of the rest expert networks in the mixed expert model through a calculation unit to obtain a second calculation result; and obtaining an inference result through a calculation unit according to the first calculation result and the second calculation result. The invention provides a new storage controller architecture, which comprises a controller chip, wherein the controller chip comprises a task input/output unit and a neural network processing unit, the task input/output unit is used for receiving a task execution instruction from a computing unit, the neural network processing unit is used for executing a computing task of at least one expert network in a hybrid expert model based on the task execution instruction to obtain a first computing result, and the task input/output unit is also used for writing the first computing result into a destination address indicated by the task execution instruction so as to enable the computing unit to combine the second computing results of other expert networks completed by the computing unit to obtain an inference result. According to the architecture, the storage controller is changed from a traditional data carrying role into an intelligent processing unit with calculation power, so that the localized execution of a calculation task at a storage end is realized, the transmission frequency of data to and from a main calculation unit can be greatly reduced, and the energy efficiency ratio of a system is improved. By combining the dynamic activation characteristic of the mixed expert model, the storage controller can intelligently schedule calculation force resources according to task instructions, realize multi-expert parallel processing and low-delay response, meet the performance requirements of a large-scale model, particularly the mixed expert model on high concurrency and low-delay reasoning, and provide a feasible path for efficient deployment of the mixed expert model on the edge side and th