CN-121996421-A - Dynamic resource scheduling optimization method based on deep reinforcement learning

CN121996421ACN 121996421 ACN121996421 ACN 121996421ACN-121996421-A

Abstract

The application relates to the technical field of cloud computing and cluster resource management, and discloses a dynamic resource scheduling optimization method based on deep reinforcement learning, which utilizes a kernel-level probe to collect hardware performance counter data and generate a normalized state vector; the method comprises the steps of identifying a resource bottleneck type according to a normalized state vector, generating an action space orthogonal mask vector, inputting the normalized state vector into a reinforcement learning network to output an original action vector, performing projection processing on the original action vector by utilizing the action space orthogonal mask vector to obtain a limited action vector and candidate scheduling weight parameters, applying the candidate scheduling weight parameters in an offline simulation environment, calculating topological difference entropy and dynamic damping coefficients, generating effective weight parameters according to the effective weight parameters, and updating the effective weight parameters to a scheduler. According to the application, through the hardware interference topology mask and the dynamic damping mechanism, the avoidance of physical bottlenecks of the underlying micro-architecture is realized, and the topology stability in the cluster resource optimization process is ensured.

Inventors

LI ZHIPENG
WANG YANHUI
ZHU MENGYING

Assignees

河南物流职业学院

Dates

Publication Date: 20260508
Application Date: 20260127

Claims (10)

1. The dynamic resource scheduling optimization method based on deep reinforcement learning is characterized by comprising the following steps of: Collecting hardware performance counter data by using a kernel-level probe deployed at a computing node, and generating a normalized state vector by combining a pre-stored hardware capacity baseline; Identifying the resource bottleneck type of the computing node according to the normalized state vector, and generating an action space orthogonal mask vector according to the resource bottleneck type; inputting the normalized state vector into a deep reinforcement learning network to output an original motion vector, and performing projection processing on the original motion vector by using the motion space orthogonal mask vector to obtain a limited motion vector and candidate scheduling weight parameters; And applying the candidate scheduling weight parameters in an offline simulation environment, calculating topological difference entropy and dynamic damping coefficient, generating effective weight parameters according to the dynamic damping coefficient, and updating the effective weight parameters to a cluster scheduler.
2. The method for optimizing dynamic resource scheduling based on deep reinforcement learning according to claim 1, wherein the step of acquiring hardware performance counter data by using a kernel-level probe deployed at a computing node and generating a normalized state vector in combination with a pre-stored hardware capacity baseline comprises: Acquiring the increment of a hardware performance counter of a service container in a specified time window by expanding a context switching monitoring point of a Berkeley data packet filter mounted on a kernel scheduler; The hardware performance counter increment at least comprises the instruction number per clock cycle, the last-stage cache miss times, the memory bandwidth consumption and the pipeline pause period number; dividing the increment of the hardware performance counter by the corresponding physical limit value in the hardware capacity baseline, and calculating to obtain the normalized state vector representing the calculated saturation, the memory bandwidth saturation and the cache miss rate variance.
3. The method for optimizing dynamic resource scheduling based on deep reinforcement learning according to claim 2, wherein the identifying the resource bottleneck type of the computing node according to the normalized state vector comprises: Judging whether the memory bandwidth saturation is larger than a preset memory saturation threshold, if so, judging that the resource bottleneck type is in a memory bandwidth limited state; if not, judging whether the cache miss rate variance is larger than a preset cache conflict threshold, and if so, judging that the resource bottleneck type is a cache interference state; if not, judging whether the calculated saturation is larger than a preset calculated saturation threshold, and if so, judging that the resource bottleneck type is in a front-end calculation limited state; If the judgment is no, judging that the resource bottleneck type is in an unsaturated normal state.
4. The method for optimizing dynamic resource scheduling based on deep reinforcement learning according to claim 3, wherein the generating an action space orthogonal mask vector according to the resource bottleneck type comprises: The dimension defining the scheduling policy weight vector comprises a CPU resource utilization rate scoring weight, a memory resource utilization rate scoring weight, a back affinity scoring weight and a load balancing scoring weight; when the resource bottleneck type is in a memory bandwidth limited state, generating the action space orthogonal mask vector which sets the corresponding dimensionality of the CPU resource utilization rate scoring weight and the load balancing scoring weight as a lock; and when the resource bottleneck type is in a cache interference state, generating the action space orthogonal mask vector which only sets the dimension corresponding to the anti-affinity scoring weight as active and sets the other dimensions as locked.
5. The method for optimizing dynamic resource scheduling based on deep reinforcement learning according to claim 1, wherein the projecting the original motion vector with the motion space orthogonal mask vector to obtain a limited motion vector comprises: And executing Hadamard product operation of the original motion vector and the motion space orthogonal mask vector, and forcedly setting a component corresponding to a locking dimension in the original motion vector to be zero to obtain the limited motion vector.
6. The method for optimizing dynamic resource scheduling based on deep reinforcement learning according to claim 5, further comprising, after obtaining the constrained motion vector: acquiring a scheduling weight parameter which is currently effective; And performing incremental updating on the currently effective scheduling weight parameters by using the limited motion vector and the basic step length coefficient, and cutting off the updated result to a preset weight allowed interval to obtain the candidate scheduling weight parameters.
7. The method for optimizing dynamic resource scheduling based on deep reinforcement learning according to claim 1, wherein the step of applying the candidate scheduling weight parameter in an offline simulation environment to calculate a topology difference entropy comprises the steps of: establishing an offline simulation environment containing a historical task request sequence and a node resource state snapshot in a memory; Performing pre-scheduling simulation on the historical task request sequence in the offline simulation environment by using the currently effective scheduling weight parameter and the candidate scheduling weight parameter respectively to generate a reference scheduling distribution result and a candidate scheduling distribution result; and calculating a normalized Hamming distance between the reference scheduling distribution result and the candidate scheduling distribution result as the topological difference entropy.
8. The method for optimizing dynamic resource scheduling based on deep reinforcement learning according to claim 7, wherein the calculating the dynamic damping coefficient comprises: Comparing the topological difference entropy with a preset safe migration threshold; when the topological difference entropy is smaller than or equal to the safe migration threshold value, setting the dynamic damping coefficient to be 1; And when the topological difference entropy is larger than the safety migration threshold, calculating the dynamic damping coefficient by using an exponential decay function, wherein the larger the topological difference entropy is, the smaller the dynamic damping coefficient is.
9. The method for optimizing dynamic resource scheduling based on deep reinforcement learning according to claim 8, wherein generating the effective weight parameter according to the dynamic damping coefficient comprises: and carrying out weighted scaling on the change amplitude of the candidate scheduling weight parameter relative to the scheduling weight parameter which is currently in effect by utilizing the dynamic damping coefficient, and calculating to obtain the effective weight parameter.
10. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of any one of claims 1 to 9 when the program is executed by the processor.

Description

Dynamic resource scheduling optimization method based on deep reinforcement learning Technical Field The invention relates to the technical field of cloud computing and cluster resource management, in particular to a dynamic resource scheduling optimization method based on deep reinforcement learning. Background In a cloud native computing environment, in order to improve the resource utilization of a data center, a container hybrid deployment technology is generally adopted, and online services and offline jobs are scheduled to run in the same physical cluster. The container orchestration system (e.g., kubernetes) makes scheduling decisions based primarily on the logical relationship between the resource request amounts (Requests) of the containers and the amounts of resources that the nodes can allocate, and selects the optimal nodes through a weighted scoring mechanism. However, the existing default scheduling mechanism mainly focuses on allocation of logic resources such as the number of CPU cores and the memory capacity, and lacks the perceptibility of the underlying microarchitectural physical resources. Under the mixed deployment scenario, even if the CPU and the memory utilization of the node do not reach the upper limit, serious performance interference can still be generated between different containers due to the competing memory bus bandwidth, the final cache (LLC) or the pipeline execution unit. The resource contention at the micro-architecture level can cause response delay jitter of key services, and the traditional scheduler based on the logic quota cannot identify the hidden bottleneck, so that the resource utilization rate is difficult to further improve on the premise of guaranteeing the service quality. In order to solve the problem of resource scheduling in a complex scene, related technologies begin to attempt to introduce a deep reinforcement learning algorithm, and dynamically adjust a scheduling strategy by utilizing the self-adaptive decision capability of the deep reinforcement learning algorithm. While reinforcement learning is capable of learning optimized scheduling parameters through interactions with the environment, there is a significant stability risk of directly applying an end-to-end reinforcement learning model to a production environment. On the one hand, reinforcement learning is essentially based on trial-and-error mechanisms, and in the absence of physical rule constraints, the model is prone to generating scheduling actions that violate hardware characteristics in the exploration phase, such as continuing to increase memory intensive loads on nodes where memory bandwidth is already saturated, resulting in dramatic degradation of cluster performance. On the other hand, the deep neural network is sensitive to small changes of state input, and is easy to output action values with large fluctuation, so that the scheduling weight parameters frequently and greatly vibrate. The instability of the parameters can cause large-scale migration and rescheduling of tasks in the cluster, damage the stability of the cluster topology and increase the system overhead and operation and maintenance risks. Therefore, how to utilize reinforcement learning to optimize the scheduling efficiency, effectively avoid the physical bottleneck of the bottom layer and ensure the smooth evolution of the scheduling policy is a technical problem to be solved currently. Disclosure of Invention Aiming at the defects of the prior art, the invention provides a dynamic resource scheduling optimization method based on deep reinforcement learning, which solves the technical problems that the prior cluster resource scheduling mechanism cannot sense the physical bottleneck of the underlying micro-architecture to cause the interference of application performance and the direct application reinforcement learning algorithm is used for optimizing the scheduling and lack of physical constraint to cause the cluster topology oscillation. The invention provides a dynamic resource scheduling optimization method based on deep reinforcement learning, which aims to solve the technical problems that the existing cluster resource scheduling mechanism cannot accurately monitor the physical bottleneck of a bottom micro-architecture and cluster topology oscillation is caused by scheduling strategy adjustment. The method utilizes a kernel-level probe deployed at a computing node to collect hardware performance counter data and combines a pre-stored hardware capacity baseline to generate a normalized state vector. The method comprises the specific steps of reading a hardware performance counter increment of a service container in a specified time window by expanding a context switch monitoring point of a Berkeley data packet filter mounted on an operating system kernel scheduler, wherein the hardware performance counter increment comprises an instruction number per clock cycle, a last-stage cache miss number, a memory bandwidt