CN-122019178-A - Task scheduling method and device, electronic equipment and storage medium

CN122019178ACN 122019178 ACN122019178 ACN 122019178ACN-122019178-A

Abstract

The application provides a task scheduling method, a task scheduling device, electronic equipment and a storage medium. The method comprises the steps of collecting load states of execution units in a scheduling environment, queue states of task queues to be processed and system resource states of a scheduling controller to generate environment state vectors, inputting the environment state vectors into a reinforcement learning strategy model to obtain scheduling strategies, determining task allocation schemes for the execution units according to the scheduling strategies, distributing the tasks to be processed to corresponding execution units for execution according to the scheduling strategies, collecting feedback data in a task execution process, calculating comprehensive rewarding values fused with task execution efficiency indexes and task service value indexes based on the feedback data, and updating the reinforcement learning strategy model by utilizing the comprehensive rewarding values. Thus realizing continuous optimal balance of the overall processing efficiency and the resource utilization efficiency of the system.

Inventors

ZHOU GUOJING

Assignees

北京奇艺世纪科技有限公司

Dates

Publication Date: 20260512
Application Date: 20260205

Claims (10)

1. A method of task scheduling, the method comprising: Collecting the load state of each execution unit in the scheduling environment, the queue state of a task queue to be processed and the system resource state of a scheduling controller so as to generate an environment state vector; Inputting the environment state vector into a reinforcement learning strategy model to obtain a scheduling strategy, wherein the scheduling strategy is used for determining a task allocation scheme for each execution unit; distributing the task to be processed to a corresponding execution unit for execution according to the scheduling strategy, and collecting feedback data in the task execution process; based on the feedback data, calculating a comprehensive rewarding value fused with the task execution efficiency index and the task business value index; And updating the reinforcement learning strategy model by utilizing the comprehensive rewarding value.
2. The method of claim 1, wherein the scheduling policy includes a global concurrency task upper limit and a concurrency task weight corresponding to each execution unit; the distributing the task to be processed to a corresponding execution unit for execution according to the scheduling policy comprises the following steps: Calculating the concurrency execution limit of each execution unit according to the concurrency task weight of the upper limit of the global concurrency task corresponding to each execution unit; For each execution unit, determining a task group to be processed corresponding to the execution unit from the task queue to be processed; selecting a corresponding number of tasks to be processed from the task group to be processed according to the concurrent execution limit corresponding to the execution unit; and issuing the task to be processed to the execution unit.
3. The method according to claim 2, wherein selecting a corresponding number of tasks to be processed from the task group to be processed according to the concurrent execution units corresponding to the execution units includes: calculating a task score according to the priority and the estimated execution time length of each task in the task group to be processed; And selecting a corresponding number of tasks as tasks to be processed according to the sequence of the scores of the corresponding tasks from high to low.
4. The method of claim 2, wherein the scheduling policy further comprises a task execution timeout threshold corresponding to each execution unit; After the task to be processed is issued to the execution unit, the method further comprises: monitoring task execution time length in the process that the execution unit executes the task to be processed; and executing overtime processing operation when the task execution duration exceeds the task execution overtime threshold corresponding to the execution unit.
5. The method of claim 1, wherein calculating a composite prize value that merges a task execution efficiency indicator with a task business value indicator based on the feedback data, comprises: calculating a delay bonus item based on time consuming task execution in the feedback data; calculating a resource utilization rate rewarding item based on the resource consumption data in the feedback data; calculating a timeout penalty term based on the task completion status in the feedback data; calculating a service abnormality discovery rewarding item based on a task execution result in the feedback data; and carrying out weighted fusion on the delay rewarding item, the resource utilization rewarding item, the overtime punishment item and the business anomaly discovery rewarding item according to preset weights to obtain the comprehensive rewarding value.
6. The method of claim 1, wherein updating the reinforcement learning strategy model with the composite prize value comprises: Inputting the environment state vector and the comprehensive rewards value into the reinforcement learning strategy model, calculating the value evaluation score of the current state based on the environment state vector and the comprehensive rewards value by a value evaluation network in the reinforcement learning strategy model, and updating strategy parameters based on the value evaluation score and the comprehensive rewards value by a strategy generation network in the reinforcement learning strategy model.
7. The method according to claim 1, wherein the method further comprises: When a new execution unit is accessed in the scheduling environment, a current reinforcement learning strategy model is used as a base model, a scheduling strategy is generated based on an environment state vector containing the state of the new execution unit, and task distribution is executed according to the scheduling strategy; Collecting feedback data of task execution, and calculating a comprehensive rewarding value according to the feedback data; and updating the base model by utilizing the comprehensive rewards value to obtain a reinforcement learning strategy model adapting to the scheduling environment containing the new execution unit.
8. A task scheduling device, the device comprising: The system comprises an acquisition module, a scheduling controller and a scheduling module, wherein the acquisition module is used for acquiring the load state of each execution unit in the scheduling environment, the queue state of a task queue to be processed and the system resource state of the scheduling controller so as to generate an environment state vector; The input module is used for inputting the environment state vector into a reinforcement learning strategy model to obtain a scheduling strategy, and the scheduling strategy is used for determining a task allocation scheme for each execution unit; The distribution module is used for distributing the task to be processed to a corresponding execution unit for execution according to the scheduling strategy and collecting feedback data in the task execution process; The calculation module is used for calculating a comprehensive rewarding value fused with the task execution efficiency index and the task business value index based on the feedback data; And the updating module is used for updating the reinforcement learning strategy model by utilizing the comprehensive rewarding value.
9. The electronic equipment is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory are communicated with each other through the communication bus; a memory for storing a computer program; a processor for implementing the task scheduling method of any one of claims 1-7 when executing a program stored on a memory.
10. A computer readable storage medium, characterized in that the computer readable storage medium has stored therein a computer program which, when executed by a processor, implements the task scheduling method of any one of claims 1-7.

Description

Task scheduling method and device, electronic equipment and storage medium Technical Field The present application relates to the field of computer system resource management technologies, and in particular, to a task scheduling method, a task scheduling device, an electronic device, and a storage medium. Background With the penetration of enterprise digital transformation, a business auditing system for ensuring data accuracy and business process correctness becomes critical. Such systems require large-scale structured query language verification tasks to be performed on a variety of different types of data sources to discover data anomalies in real-time. In this scenario, the efficiency and intelligence level of task scheduling directly relate to the timeliness of anomaly discovery and the overall utilization of system computing resources. Currently, it is a common practice to employ a scheduling scheme based on fixed rules. The scheme presets fixed task execution parameters for different types of databases, and distributes tasks to corresponding databases for execution according to a preset rule sequence. However, in the above prior art, when coping with diversified data source types and dynamically changing workloads, the fixed scheduling parameters and rule sequences thereof are difficult to adaptively adjust, so that the overall task processing efficiency and resource utilization efficiency of the system cannot reach an optimal state. Disclosure of Invention The embodiment of the application aims to provide a task scheduling method, a device, electronic equipment and a storage medium, which are used for solving the problem that the optimal state of the overall efficiency of a system cannot be realized in diversified data sources and dynamic load scenes due to the lack of self-adaptive adjustment capability in a scheduling scheme based on fixed parameters and rules in the prior art. The specific technical scheme is as follows: In a first aspect, the present application provides a task scheduling method, including: Collecting the load state of each execution unit in the scheduling environment, the queue state of a task queue to be processed and the system resource state of a scheduling controller so as to generate an environment state vector; Inputting the environment state vector into a reinforcement learning strategy model to obtain a scheduling strategy, wherein the scheduling strategy is used for determining a task allocation scheme for each execution unit; distributing the task to be processed to a corresponding execution unit for execution according to the scheduling strategy, and collecting feedback data in the task execution process; based on the feedback data, calculating a comprehensive rewarding value fused with the task execution efficiency index and the task business value index; And updating the reinforcement learning strategy model by utilizing the comprehensive rewarding value. In one possible implementation manner, the scheduling policy includes a global concurrency task upper limit and a concurrency task weight corresponding to each execution unit; the distributing the task to be processed to a corresponding execution unit for execution according to the scheduling policy comprises the following steps: Calculating the concurrency execution limit of each execution unit according to the concurrency task weight of the upper limit of the global concurrency task corresponding to each execution unit; For each execution unit, determining a task group to be processed corresponding to the execution unit from the task queue to be processed; selecting a corresponding number of tasks to be processed from the task group to be processed according to the concurrent execution limit corresponding to the execution unit; and issuing the task to be processed to the execution unit. In one possible implementation manner, the selecting, according to the concurrent execution unit corresponding to the execution unit, a corresponding number of tasks to be processed from the task group to be processed includes: calculating a task score according to the priority and the estimated execution time length of each task in the task group to be processed; And selecting a corresponding number of tasks as tasks to be processed according to the sequence of the scores of the corresponding tasks from high to low. In a possible implementation manner, the scheduling policy further includes a task execution timeout threshold corresponding to each execution unit; After the task to be processed is issued to the execution unit, the method further comprises: monitoring task execution time length in the process that the execution unit executes the task to be processed; and executing overtime processing operation when the task execution duration exceeds the task execution overtime threshold corresponding to the execution unit. In one possible implementation manner, the calculating, based on the feedback data, a comprehensive reward va