CN-121979646-A - Heterogeneous computing scheduling method, system, equipment and storage medium

CN121979646ACN 121979646 ACN121979646 ACN 121979646ACN-121979646-A

Abstract

The disclosure provides a heterogeneous computing scheduling method, a heterogeneous computing scheduling system, heterogeneous computing scheduling equipment and a heterogeneous computing scheduling storage medium, and relates to the technical field of artificial intelligence. In some embodiments of the disclosure, a page table mapping relation of an input/output memory management unit is configured according to a task descriptor without depending on system core participation, address translation preprocessing is completed, a virtual address interval accessed by a neural network processing unit subsequently is predicted, page table hardware traversal of the input/output memory management unit is triggered according to the virtual address interval before the neural network processing unit initiates a memory access request to obtain page table entries, the page table entries are loaded into an input/output conversion backup buffer, and a starting signal is sent to the neural network processing unit to enable the neural network processing unit to access a shared main memory through the input/output memory management unit based on the page table mapping relation. And by the dual-core division of the system core and the control core, the system overhead caused by the participation of an operating system in the bottom layer scheduling is reduced, and the additional power consumption overhead is reduced.

Inventors

PAN MINQIANG

Assignees

北京微核芯科技有限公司

Dates

Publication Date: 20260505
Application Date: 20260409

Claims (10)

1. A heterogeneous computing scheduling method, which is applied to a control core in a heterogeneous computing scheduling system, comprising: Receiving a task descriptor issued by a system core, wherein the task descriptor is generated by the system core according to a task request corresponding to a received AI task; under the condition of not depending on system nuclear intervention, configuring a page table mapping relation of an input/output memory management unit according to the task descriptor; predicting a virtual address interval accessed by a neural network processing unit subsequently according to the access behavior characteristics and the historical execution information of the AI task; Triggering a page table hardware traversal of an input/output memory management unit according to the virtual address interval before the neural network processing unit initiates a memory access request to obtain a page table entry, and loading the page table entry into an input/output conversion backup buffer; and sending a starting signal to the neural network processing unit so that the neural network processing unit can access the shared main memory through the input/output memory management unit according to the starting signal and the page table mapping relation.
2. The method of claim 1, wherein configuring the page table mapping of the input-output memory management unit according to the task descriptor comprises: Analyzing the task descriptor to obtain virtual address information; and according to the virtual address information, configuring a page table mapping relation in the input/output memory management unit through a special interface.
3. The method of claim 1, wherein predicting a virtual address interval for subsequent accesses by a neural network processing unit based on the access behavior characteristics and historical execution information of the AI task comprises: inputting the access behavior characteristics and the historical execution information of the AI task into an address prediction model to obtain a virtual address interval; The input features of the address prediction model at least comprise static prior features, dynamic observation features and statistical association features; The static priori features comprise access mode parameters which are obtained by analysis from the task descriptor, wherein the access mode parameters comprise a data step length, a tensor dimension, a data block size and a starting virtual address; the dynamic observation feature comprises a virtual address access history or statistical representation generated by the neural network processing unit in the current task execution stage; the statistical association features comprise historical task access feature statistical information or feature fingerprints corresponding to the current operator type.
4. A method according to claim 3, wherein the virtual address interval is inferred based on the input features in at least one of: when the access mode presents linear or quasi-linear characteristics, deducing based on the relation between the data step length and the address offset; by maintaining an address access record window with configurable length, a periodic access mode is identified based on address differential distribution in the window to infer; And deducing based on the operator access state conversion model and combining task progress information of the neural network processing unit.
5. The method of claim 1, wherein after predicting a virtual address interval for subsequent access by a neural network processing unit based on the access behavior characteristics and historical execution information of the AI task, the method further comprises: Judging whether an address conversion entry corresponding to a target virtual address exists in the input/output conversion backup buffer or not, if so, skipping the prefetching operation of the target virtual address; According to the memory bandwidth utilization rate, the number of the remaining cache entries of the input-output conversion backup buffer and the throughput requirement of the neural network processing unit, the number of the page table entries prefetched for a single time is adjusted; And triggering the input/output memory management unit to execute page table traversal through a special interface for the missed virtual address, and loading the obtained page table entry into an input/output conversion backup buffer.
6. The method of claim 1, wherein after predicting a virtual address interval for subsequent access by a neural network processing unit based on the access behavior characteristics and historical execution information of the AI task, the method further comprises: Reducing the prediction depth and opening the highest priority address translation channel under the condition that the virtual address access initiated by the neural network processing unit is not hit in the prefetched page table entry or the miss rate of the input-output conversion backup buffer exceeds a preset hit rate threshold or a miss threshold within a preset window time, and Performing feedback calibration on the address prediction model according to the actual memory access behavior sequence; and resetting the prediction state under the condition that the AI task is monitored to enter a new operator stage.
7. A heterogeneous computing scheduling system, comprising: The system core is used for generating a task descriptor according to a task request corresponding to the AI task and transmitting the task descriptor to the control core; the control core is used for configuring a page table mapping relation of the input/output memory management unit according to the task descriptor, predicting a virtual address interval accessed by the neural network processing unit subsequently according to the access behavior characteristic and the history execution information of the AI task, triggering page table traversal and loading page table entries before the neural network processing unit initiates a memory access request; The neural network processing unit is used for executing the AI task and accessing the shared memory through the input/output memory management unit; The input/output memory management unit is used for traversing page table hardware to obtain page table entries, and loading the page table entries into the input/output conversion backup buffer; And the shared main memory is used for storing the data buffer and the task descriptor.
8. An electronic device, comprising: A processor; A memory for storing processor-executable instructions; Wherein the processor is configured to execute instructions to implement the steps in the method of any of claims 1-6.
9. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1-6.
10. A computer program product comprising computer programs/instructions which, when executed by a processor, implement the steps of the method of any of claims 1-6.

Description

Heterogeneous computing scheduling method, system, equipment and storage medium Technical Field The disclosure relates to the technical field of artificial intelligence, and in particular relates to a heterogeneous computing scheduling method, a heterogeneous computing scheduling system, heterogeneous computing scheduling equipment and a heterogeneous computing scheduling storage medium. Background With the rapid development of artificial intelligence technology, the requirements of the edge computing scene on real-time performance, energy efficiency ratio and data processing throughput are increasingly stringent. In a typical edge AI system, a heterogeneous computing architecture consisting of a Central Processing Unit (CPU) and a neural Network Processing Unit (NPU) has become a mainstream scheme. In order to maximize system performance and reduce data handling overhead, efficient memory sharing between the CPU and the NPU, particularly "zero copy" data interaction, is a key technology breach. In the prior art, in order to realize data interaction between a CPU and an NPU, schemes such as data copying, coarse granularity mapping of shared physical memory, virtual memory sharing based on IOMMU and the like are mainly evolved. The zero copy scheme based on static IOMMU mapping is usually implemented by an operating system or a device driver, and before the NPU task is started, an IOMMU page table is pre-configured, a mapping relation between a CPU virtual address and an NPU virtual address is established, and mapping is kept basically unchanged during task execution. Although the prior art realizes zero-copy data sharing to a certain extent, when dealing with the high-dynamic and low-delay edge AI application scene, the prior art cannot adapt to the data access characteristics of the NPU such as burstiness and non-continuity, so that frequent address translation deletion and expensive page table traversal operation are caused, and access delay and power consumption are increased. At present, the heterogeneous computing scheduling method has the problems of longer access time delay and higher additional power consumption expenditure. Disclosure of Invention The disclosure provides a heterogeneous computing scheduling method, a heterogeneous computing scheduling system, heterogeneous computing scheduling equipment and a heterogeneous computing scheduling storage medium, so as to at least solve the problems of longer access time delay and higher additional power consumption expenditure in the prior art. The technical scheme of the present disclosure is as follows: The embodiment of the disclosure provides a heterogeneous computing scheduling method, which is applied to a control core in a heterogeneous computing scheduling system and comprises the following steps: Receiving a task descriptor issued by a system core, wherein the task descriptor is generated by the system core according to a task request corresponding to a received AI task; under the condition of not depending on system nuclear intervention, configuring a page table mapping relation of an input/output memory management unit according to the task descriptor; predicting a virtual address interval accessed by a neural network processing unit subsequently according to the access behavior characteristics and the historical execution information of the AI task; Triggering a page table hardware traversal of an input/output memory management unit according to the virtual address interval before the neural network processing unit initiates a memory access request to obtain a page table entry, and loading the page table entry into an input/output conversion backup buffer; and sending a starting signal to the neural network processing unit so that the neural network processing unit can access the shared main memory through the input/output memory management unit according to the starting signal and the page table mapping relation. Optionally, the configuring the page table mapping relationship of the input/output memory management unit according to the task descriptor includes: Analyzing the task descriptor to obtain virtual address information; and according to the virtual address information, configuring a page table mapping relation in the input/output memory management unit through a special interface. Optionally, the predicting the virtual address interval of the subsequent access of the neural network processing unit according to the access behavior feature and the historical execution information of the AI task includes: inputting the access behavior characteristics and the historical execution information of the AI task into an address prediction model to obtain a virtual address interval; The input features of the address prediction model at least comprise static prior features, dynamic observation features and statistical association features; The static priori features comprise access mode parameters which are obtained by analysis from the task