CN-117348989-B - Layer-by-layer scheduling method for mixed deep neural network task in embedded real-time system
Abstract
The invention discloses a mixed deep neural network task layer-by-layer scheduling method in an embedded real-time system. The method considers the factors of limited CPU and GPU resources, the basic conditions of task scheduling, the limitation of task scheduling time and the like of the embedded real-time system, and designs a scheduling mechanism for mixed depth neural network task layer-by-layer allocation of the embedded real-time system. According to the method, the task scheduling cost, the worst response time, the sum of the utilization rates of the tasks under the minimum layer task mapping scheduling and the like are mathematically processed by constructing the deep neural network task model and the layer task model, and an optimization function and constraint condition solving method are set, so that under the condition that the worst response time is short, heterogeneous CPU and GPU computing resources in a real-time system can be utilized more in a balanced manner, and the instantaneity and schedulability of the deep neural network task in the embedded real-time system are improved.
Inventors
- ZHU KUN
- FENG JIAXIN
Assignees
- 南京航空航天大学
Dates
- Publication Date
- 20260508
- Application Date
- 20230912
Claims (4)
- 1. A mixed depth neural network task layer-by-layer scheduling method in an embedded real-time system is characterized in that the method divides the mixed depth neural network task into a real-time task and an optimal response task, modeling is carried out on the depth neural network task and the neural network layer task, the total execution time of the task on heterogeneous CPU and GPU computing resources, time expenditure existing in layer-by-layer mapping scheduling and worst response time are considered by an optimization method, a total function of the task utilization rate of the mixed depth neural network task mapping layer scheduling is obtained, and an optimal scheme of the depth neural network task layer-by-layer allocation is found under the constraint that the response time is as small as possible so as to realize balanced utilization of the heterogeneous CPU and GPU computing resources; (1) Constructing a deep neural network task model for task scheduling execution in an embedded real-time system, wherein the neural network task is formed by Representation including worst-case execution time for single task runtime Cycle of operation Expiration date Layer number of tasks Ready time of periodic task And start time ; (2) Constructing a deep neural network task layer task model, wherein the layer task is formed by Representation including worst case execution time when running on CPU and GPU clusters and their quantization models And And the corresponding maximum times of quantized input and quantized output 、 And (3) summing; (3) Constructing a layer-by-layer allocation scheduling mechanism of the deep neural network task, which is used for solving the optimal scheme that the mixed deep neural network task is scheduled and executed on a CPU or a GPU platform according to each layer, wherein the optimal scheme comprises the steps of designing and allocating mapping groups, calculating the total execution time of the deep neural network task on heterogeneous CPUs and GPU platforms, the spending of the deep neural network task on CPU and GPU clusters, the total spending on the heterogeneous systems of the CPUs and the GPUs, the worst response time of the deep neural network task and the utilization rate of a single task; (4) And constructing an optimization function for the layer-by-layer allocation of the deep neural network task, wherein the optimization function comprises a target equation for the hierarchical mapping of the deep neural network task to heterogeneous CPU and GPU resources, and setting schedulability basic conditions, task starting time constraints, task completion time constraints and resource limiting conditions.
- 2. The method for layer-by-layer scheduling of hybrid deep neural network tasks in an embedded real-time system of claim 1, wherein said deep neural network tasks in step (1) comprise real-time tasks and best response tasks, if the deep neural network tasks are Is a real-time task, then the task is constrained by the expiration date, i.e. follows And once a task is admitted into the system, the expiration date should always be met if the deep neural network task Is a best response task, then some of the layer tasks in this task may miss the expiration date when executing, when the best response task Without a particular expiration date requirement, the expiration date can be set Is arranged as ; The real-time task and the optimal response task in the embedded real-time system follow the following rules: 1) The real-time class task takes precedence over the best response class task; 2) The real-time class task can preempt the best response class task on the same CPU or GPU cluster at any time; 3) The preemption condition is that if the previous task is the best responding task and the response time is longer than that of the best responding task The real-time task can preempt the best response task so that the real-time task is operated Less than or equal to 。
- 3. The method for layer-by-layer scheduling of mixed deep neural network tasks in an embedded real-time system according to claim 1, wherein step (3) is designed for layer-by-layer allocation scheduling based on the structural characteristics of the neural network tasks; layer-by-layer allocation of deep neural networks is performed on a GPU or CPU, the allocation map being expressed as And tasks mapped on the CPU clusters are represented as Tasks mapped on the GPU clusters are represented as ; The sum of execution time of deep neural network tasks on heterogeneous CPU and GPU platforms is expressed as follows: = Representing tasks Mapping is performed on successive k layers of the CPU platform, Representing tasks Mapping on successive k layers of the GPU platform; Deep neural network task The overhead on the CPU cluster is the overhead of the first layer and the last layer of the deep neural network task, expressed as: Deep neural network task The overhead on the GPU cluster is the overhead of the first and last layers of the deep neural network task, expressed as follows: Deep neural network task The overhead on the CPU and GPU heterogeneous system is expressed as follows: Deep neural network task The worst response time after mapping is expressed as follows: utilization of a single deep neural network task The expression is as follows: Wherein, the Representing the shortest arrival time of the periodic task, i.e., the run period.
- 4. The method for layer-by-layer scheduling of mixed deep neural network tasks in an embedded real-time system according to claim 1, wherein the optimization function of layer-by-layer scheduling in step (4) is a scheduling scheme with minimized total task utilization, and is expressed mathematically as follows: Min the optimization function follows the following constraints: the schedulability basic condition is expressed as ( , , ); The condition that the task start time is greater than the ready time is expressed as ; The condition that the starting time of each layer of the task is more than or equal to the finishing time of the last task is expressed as ; The condition that the task completion time is less than or equal to the latest expiration date is expressed as ; The resource constraint condition that only one task is running on the CPU and the GPU at the same time is expressed as: and finally, solving by an optimization method to obtain an allocation group of the deep neural network tasks allocated according to layers, and executing different layer tasks on different GPUs and the GPUs according to the layer task allocation in the embedded real-time system.
Description
Layer-by-layer scheduling method for mixed deep neural network task in embedded real-time system Technical Field The invention belongs to the big data processing technology, in particular to the field of real-time system task scheduling, and particularly relates to a method for scheduling mixed deep neural network tasks in layers in an embedded real-time system. Background In modern internet of things, unmanned, intelligent roadside foundation settings, computer vision, smart cities, smart homes, aerospace and the like are all independent of deep neural networks executing scheduling applications in embedded real-time systems. The application needs the real-time system to support the execution of a plurality of deep neural network tasks, but the embedded real-time system has limited computing resources, has the problems of poor real-time performance, long response time and the like when the deep learning tasks are scheduled, and can not provide worst time sequence guarantee for the deep neural network tasks, so that the application of the embedded real-time system is not reliable enough, for example, the problem of potential safety hazards in automatic driving can be generated. While current embedded real-time system hardware platforms are increasingly equipped with heterogeneous CPU and GPU kernels to improve the average case response time of deep neural network tasks in reasoning jobs, they are often underutilized when multiple deep neural network tasks are requested simultaneously, and their benefits are less pronounced in the worst case than in the average case. Further, neural network tasks include real-time tasks and best response tasks. The embedded real-time system completes the scheduling execution of the hybrid neural network task, needs to follow the time constraint of task scheduling, and achieves better task schedulability under the condition of shorter response time. Disclosure of Invention The invention aims to provide a mixed deep neural network task layer-by-layer scheduling method in an embedded real-time system, which mainly solves the problem of formalization of the task allocation scheduling of the mixed deep neural network based on optimization in the embedded real-time system and realizes a layer allocation scheduling mechanism in an embedded edge real-time system environment and under heterogeneous CPU and GPU resource environment conditions. According to the technical scheme, the mixed deep neural network task layer-by-layer scheduling method in the embedded real-time system is characterized in that the mixed deep neural network task is divided into a real-time task and an optimal response task, the depth neural network task and the neural network layer task are modeled, the sum of execution time of the task on heterogeneous CPU and GPU computing resources, time expenditure existing in layer-by-layer mapping scheduling, worst response time and the like are considered by the optimization method, a sum function of task utilization rate of the mixed deep neural network task mapping layer scheduling can be obtained, and the optimal scheme of the depth neural network task layer-by-layer allocation is found under the constraint that the response time is as small as possible, so that uniform utilization of heterogeneous CPU and GPU computing resources is realized. Further, the method comprises the implementation steps of: (1) Constructing a deep neural network task model for task scheduling execution in an embedded real-time system, wherein the neural network task is represented by tau i=(Ci,Ti,Di,Li,Ri,Si) and comprises worst-case execution time, operation period, task layer number, ready time and start time of periodic tasks when a single task is operated; (2) Constructing a deep neural network task layer task model, wherein the layer task is formed by A representation comprising worst case execution times when running on the CPU and GPU clusters and their quantization models, and a sum of corresponding maximum times of quantized input and quantized output; (3) Constructing a layer-by-layer allocation scheduling mechanism of the deep neural network task, which is used for solving the optimal scheme that the mixed deep neural network task is scheduled and executed on a CPU or a GPU platform according to each layer, wherein the optimal scheme comprises the steps of designing and allocating mapping groups, calculating the total execution time of the deep neural network task on heterogeneous CPUs and GPU platforms, the spending of the deep neural network task on CPU and GPU clusters, the total spending on the heterogeneous systems of the CPUs and the GPUs, the worst response time of the deep neural network task and the utilization rate of a single task; (4) And constructing an optimization function for the layer-by-layer allocation of the deep neural network task, wherein the optimization function comprises a target equation for the hierarchical mapping of the deep neural network task to hete