CN-121328639-B - Artificial intelligence acceleration method and system based on heterogeneous hardware

CN121328639BCN 121328639 BCN121328639 BCN 121328639BCN-121328639-B

Abstract

The application provides an artificial intelligence acceleration method and system based on heterogeneous hardware, which relates to the technical field of computer, the application obtains a performance jitter value through collecting cache miss rate, instruction throughput and chip power consumption waveform when heterogeneous hardware executes artificial intelligence reasoning task, and generates cache sensitivity by correlating calculation links corresponding to the artificial intelligence reasoning task, the waveform characteristics are extracted to obtain access indexes, the access indexes are taken as input, a mapping strategy is obtained through reinforcement learning, after the execution state of the monitoring task triggers a preset condition, the priority adjustment quantity is dynamically calculated, preemptive scheduling is executed, the acceleration of the artificial intelligent reasoning task can be realized, and the response efficiency and the energy efficiency ratio under a high concurrency scene are improved.

Inventors

WANG YANG
FAN JIANXUN
ZHAO ZIFENG

Assignees

紫光恒越技术有限公司
北京紫光智算信息技术有限公司

Dates

Publication Date: 20260512
Application Date: 20251217

Claims (10)

1. An artificial intelligence acceleration method based on heterogeneous hardware is characterized by comprising the following steps: Collecting cache miss rate and instruction throughput generated by each heterogeneous hardware when executing an artificial intelligence reasoning task, and chip power consumption waveforms caused by cache miss events; Performing mining processing on the cache miss rate and the instruction throughput to obtain a performance jitter value, correlating the performance jitter value with a calculation link corresponding to an artificial intelligent reasoning task to obtain a cache sensitivity, and performing feature extraction processing on the chip power consumption waveform to obtain an access index, wherein the performance jitter value refers to a feature rule reflecting hardware execution efficiency fluctuation caused by data locality difference, the cache sensitivity refers to vector data for integrating jitter intensity and position distribution of each calculation link and representing cache dependency degree of the calculation link, and the access index refers to a comprehensive index for fusing a pulse waveform feature set and a pulse density index and reflecting memory access frequency degree and intensity; taking the cache sensitivity and the access index as state input, and performing resource mapping by adopting a reinforcement learning strategy to obtain a mapping strategy between an artificial intelligent reasoning task and heterogeneous hardware; Based on the mapping strategy, monitoring task execution states on all heterogeneous hardware, and when the task execution state on any heterogeneous hardware meets a preset condition corresponding to the performance jitter value or the access index, dynamically calculating the priority adjustment quantity of the corresponding artificial intelligent reasoning task by adopting a dynamic priority scheduling mechanism according to the real-time execution progress and the residual deadline of the artificial intelligent reasoning task; and executing preemptive task rescheduling according to the priority adjustment quantity so as to realize the acceleration of the artificial intelligent reasoning task.
2. The method of claim 1, wherein the performing resource mapping with the cache sensitivity and the access index as state inputs using a reinforcement learning strategy to obtain a mapping strategy between artificial intelligence reasoning tasks and heterogeneous hardware comprises: Constructing a state representation vector based on the cache sensitivity and the access index; generating a resource mapping action through a decision network in a reinforcement learning strategy based on the state representation vector, wherein the resource mapping action is defined as selecting target execution hardware from a heterogeneous hardware set for an artificial intelligence reasoning task to be scheduled and distributing initial execution priority; In the task execution process, calculating a corresponding comprehensive rewarding value based on the matching degree of the real-time performance data with the performance jitter value and the access index respectively; and performing resource mapping according to the comprehensive rewards value to obtain a mapping strategy.
3. The method of claim 2, wherein calculating the corresponding composite prize value based on the degree of matching of the real-time performance data with the performance jitter value and the access index, respectively, comprises: Calculating a task completion rewarding value based on the number of tasks completed within a preset deadline; calculating a system benefit value based on the change in the number of tasks completed by the system per unit time relative to the historical benchmark; calculating a first penalty value based on a comparison result of a first matching degree and a first preset threshold value, and calculating a second penalty value based on a comparison result of a second matching degree and a second preset threshold value, wherein the first matching degree is the matching degree of the real-time performance data and the performance jitter value, and the second matching degree is the matching degree of the real-time performance data and the access index; Calculating a resource penalty value based on the degree of decline in instruction throughput; And carrying out weighted fusion on the task completion rewarding value, the system rewarding value, the first punishment value, the second punishment value and the resource punishment value to obtain a comprehensive rewarding value.
4. The method according to claim 1, wherein dynamically calculating the priority adjustment amount of the corresponding artificial intelligence inference task by using a dynamic priority scheduling mechanism according to the real-time execution progress and the remaining deadline of the artificial intelligence inference task comprises: Calculating an emergency degree quantization value according to the real-time execution progress, the residual calculated amount and the residual deadline; And dynamically generating a priority adjustment quantity corresponding to the artificial intelligence reasoning task through a dynamic priority scheduling mechanism based on the emergency quantification value, wherein the positive priority adjustment quantity represents the priority of the lifting task, and the negative priority represents the reduction of the priority of the task.
5. The method of claim 4, wherein dynamically generating a priority adjustment amount for the artificial intelligence inference task via a dynamic priority scheduling mechanism based on the urgency quantization value comprises: Calculating an urgency factor based on the urgency quantization value; Calculating an adaptability factor according to the performance matching degree and the index deviation degree of heterogeneous hardware; normalizing the urgency factor and the suitability factor to obtain an urgency index and an suitability index; Comprehensively calculating the urgency index and the fitness index through a dynamic priority scheduling mechanism to obtain a preliminary priority adjustment quantity; and performing range constraint processing on the preliminary priority adjustment quantity to obtain the priority adjustment quantity corresponding to the artificial intelligence reasoning task.
6. The method of claim 1, wherein the mining the cache miss rate and the instruction throughput to obtain a performance jitter value comprises: performing segmentation processing on the cache miss rate and the instruction throughput to obtain a time sequence segment; adopting a waveform characteristic identification method to detect peaks and troughs of cache miss rates in the time sequence fragments to obtain a target interval; And analyzing the instruction throughput and the corresponding cache miss rate in the time sequence segment by adopting a time sequence association analysis method to determine an association time period in which the instruction throughput is reduced and the cache miss rate is increased, and determining a performance jitter value based on the target interval and the association time period.
7. The method of claim 1, wherein performing preemptive task rescheduling based on the priority adjustment to achieve acceleration of artificial intelligence inference tasks comprises: based on the priority adjustment quantity, constructing a task queue, and carrying out priority sequencing treatment on all artificial intelligent reasoning tasks in the task queue to obtain a task sequence; determining a target reasoning task and corresponding target heterogeneous hardware based on the task sequence; Executing preemptive operation on the target heterogeneous hardware to suspend a currently running low-priority task and save a corresponding hardware execution state, loading the target reasoning task into the target heterogeneous hardware, and continuing to execute the target reasoning task based on the hardware execution state; and adopting a priority recalculation strategy for the suspended low-priority task, updating the corresponding priority adjustment quantity, and reinserting the low-priority task into the task queue so as to accelerate the artificial intelligent reasoning task.
8. An artificial intelligence acceleration system based on heterogeneous hardware, comprising: The acquisition module is used for acquiring cache miss rate and instruction throughput generated by each piece of heterogeneous hardware when the artificial intelligent reasoning task is executed and chip power consumption waveforms caused by a cache miss event; the association module is used for carrying out mining processing on the cache miss rate and the instruction throughput to obtain a performance jitter value, associating the performance jitter value with a calculation link corresponding to an artificial intelligence reasoning task to obtain a cache sensitivity, and carrying out feature extraction processing on the chip power consumption waveform to obtain an access index, wherein the performance jitter value refers to a feature rule reflecting hardware execution efficiency fluctuation caused by data locality difference, the cache sensitivity refers to vector data for integrating jitter intensity and position distribution of each calculation link and representing cache dependency degree of the calculation link, and the access index refers to a comprehensive index for integrating a pulse waveform feature set and a pulse density index and reflecting memory access frequency degree and intensity; The mapping module is used for taking the cache sensitivity and the access index as state input, and performing resource mapping by adopting a reinforcement learning strategy to obtain a mapping strategy between an artificial intelligent reasoning task and heterogeneous hardware; The calculation module is used for monitoring the task execution state on each heterogeneous hardware based on the mapping strategy, and dynamically calculating the priority adjustment quantity of the corresponding artificial intelligent reasoning task according to the real-time execution progress and the residual deadline of the artificial intelligent reasoning task by adopting a dynamic priority scheduling mechanism when the task execution state on any heterogeneous hardware meets the preset condition corresponding to the performance jitter value or the access index; and the scheduling module is used for executing preemptive task rescheduling according to the priority adjustment quantity so as to realize the acceleration of the artificial intelligent reasoning task.
9. An electronic device, comprising: A memory for storing a computer program; A processor for implementing the steps of the heterogeneous hardware-based artificial intelligence acceleration method of any one of claims 1 to 7 when executing the computer program.
10. A computer readable storage medium, wherein a computer program is stored in the computer readable storage medium, and when executed by a processor, the computer program is capable of implementing the heterogeneous hardware-based artificial intelligence acceleration method of any one of claims 1 to 7.

Description

Artificial intelligence acceleration method and system based on heterogeneous hardware Technical Field The application relates to the technical field of computers, in particular to an artificial intelligence acceleration method and system based on heterogeneous hardware. Background In the scenes of cloud computing data centers, intelligent driving edge nodes and the like, artificial intelligent reasoning tasks often face the requirements of high concurrency and multi-model mixed execution, heterogeneous hardware clusters become a mainstream deployment scheme because of having both calculation power and energy efficiency advantages, so how to enable heterogeneous hardware of different architectures to be accurately adapted to various reasoning tasks, improve resource utilization rate while ensuring task instantaneity, and become a core problem to be solved in an artificial intelligent acceleration method based on heterogeneous hardware. Currently, in the prior art, two types of methods are adopted for artificial intelligence reasoning acceleration, one type is to statically allocate tasks according to hardware computing power parameters, for example, compute intensive tasks are preferentially allocated to graphics processors (Graphics Processing Unit, GPU) and memory intensive tasks are allocated to a large cache central processing unit (Central Processing Unit, CPU), and the other type is to dynamically adjust task attribution by monitoring hardware load rates in real time and adopting a polling or weighted allocation mechanism, and part of schemes in the two types of methods can preset fixed priority scheduling rules in combination with task deadlines. However, the most main defects of the prior art are that resource scheduling is only carried out according to surface layer indexes such as calculation power or load, so that the suitability of hardware resources and task demands is insufficient, the root of the problem of incapability of accurately positioning and timely adjusting when the task execution causes hardware performance fluctuation, and the task execution efficiency and the optimal utilization of the hardware resources are difficult to be considered. Disclosure of Invention The application aims to provide an artificial intelligence acceleration method and system based on heterogeneous hardware, which are used for solving the problem that task execution efficiency and optimal utilization of hardware resources are difficult to consider in the prior art. In order to solve the technical problems, in a first aspect, the present application provides an artificial intelligence acceleration method based on heterogeneous hardware, including: Collecting cache miss rate and instruction throughput generated by each heterogeneous hardware when executing an artificial intelligence reasoning task, and chip power consumption waveforms caused by cache miss events; Performing mining processing on the cache miss rate and the instruction throughput to obtain a performance jitter value, correlating the performance jitter value with a calculation link corresponding to an artificial intelligent reasoning task to obtain cache sensitivity, and performing feature extraction processing on the chip power consumption waveform to obtain an access index; taking the cache sensitivity and the access index as state input, and performing resource mapping by adopting a reinforcement learning strategy to obtain a mapping strategy between an artificial intelligent reasoning task and heterogeneous hardware; Based on the mapping strategy, monitoring task execution states on all heterogeneous hardware, and when the task execution state on any heterogeneous hardware meets a preset condition corresponding to the performance jitter value or the access index, dynamically calculating the priority adjustment quantity of the corresponding artificial intelligent reasoning task by adopting a dynamic priority scheduling mechanism according to the real-time execution progress and the residual deadline of the artificial intelligent reasoning task; and executing preemptive task rescheduling according to the priority adjustment quantity so as to realize the acceleration of the artificial intelligent reasoning task. Optionally, the step of using the cache sensitivity and the access index as state inputs and performing resource mapping by adopting a reinforcement learning strategy to obtain a mapping strategy between an artificial intelligence reasoning task and heterogeneous hardware includes: Constructing a state representation vector based on the cache sensitivity and the access index; generating a resource mapping action through a decision network in a reinforcement learning strategy based on the state representation vector, wherein the resource mapping action is defined as selecting target execution hardware from a heterogeneous hardware set for an artificial intelligence reasoning task to be scheduled and distributing initial execution priority;