CN-121996352-A - Data center resource scheduling optimization method based on artificial intelligence

CN121996352ACN 121996352 ACN121996352 ACN 121996352ACN-121996352-A

Abstract

The invention relates to the technical field of cloud computing and data center operation and maintenance, and provides a data center resource scheduling optimization method based on artificial intelligence, which comprises the following steps of multidimensional data acquisition; the method comprises the steps of data preprocessing, hybrid intelligent scheduling model construction and training, real-time scheduling decision generation, scheduling execution and closed-loop optimization. According to the data center resource scheduling optimization method driven by the hybrid intelligent model based on the artificial intelligence, the multi-objective collaborative optimization of the resource utilization rate, the SLA standard reaching rate and the energy consumption is realized through multi-dimensional data acquisition, feature engineering optimization, hybrid intelligent model construction and closed loop iterative optimization, and the intelligent level and the dynamic adaptability of the data center resource scheduling are improved.

Inventors

CHEN FEI
SHEN DAWEI
XU HAITAO
ZHAO JIANHUA

Assignees

北京阿尔法风控科技有限公司

Dates

Publication Date: 20260508
Application Date: 20260109

Claims (10)

1. The data center resource scheduling optimization method based on artificial intelligence is characterized by comprising the following steps of: s1, multidimensional data acquisition, namely acquiring physical resource data, virtual resource data, business service data and environment perception data of a data center to form a dispatching original data set; S2, preprocessing data, namely performing noise filtering, missing value complementation and normalization processing on the dispatching original data set, extracting time sequence characteristics, load fluctuation characteristics, service QoS demand characteristics and energy consumption constraint characteristics of a resource utilization rate through characteristic engineering, and obtaining a model input characteristic set; S3, constructing and training a hybrid intelligent scheduling model, namely constructing a hybrid intelligent model integrating a load prediction sub-model and a reinforcement learning decision sub-model, taking the model input feature set as training data, setting resource capacity constraint, SLA constraint and energy consumption constraint as model constraint conditions, and outputting an optimal resource scheduling strategy by the training model; S4, generating a real-time scheduling decision, namely inputting real-time running state data of a data center into a trained hybrid intelligent scheduling model, and generating an optimized scheduling scheme aiming at virtual machine/container resource allocation, server load balancing and resource dynamic adjustment by combining the current service priority weight; S5, scheduling execution and closed-loop optimization, wherein the optimized scheduling scheme is executed, indexes of resource utilization rate, SLA standard reaching rate, unit service energy consumption and migration overhead after scheduling are collected, and an incremental learning algorithm is adopted to update on-line parameters of the hybrid intelligent scheduling model based on the indexes, so that dynamic iterative optimization of scheduling strategies is realized.
2. The method for optimizing data center resource scheduling based on artificial intelligence according to claim 1, wherein the multidimensional data in step S1 specifically comprises: Physical resource data, namely CPU utilization rate, memory occupancy rate, storage residual capacity, network bandwidth utilization rate and hardware health state parameters of a server; Virtual resource data, namely the number of virtual machines/containers, the resource application amount, the current load intensity, the migration history record and the life cycle state; business service data, including business type, service Level Agreement (SLA) parameters, business priority coefficient, request response time threshold and business concurrency; Environmental perception data, namely the temperature of a machine room area, the running power of an air conditioner, the total power consumption of a power supply and the energy consumption of a cooling system.
3. The method for optimizing data center resource scheduling based on artificial intelligence according to claim 1, wherein the implementation manner of the feature engineering in step S2 is as follows: Extracting the average value, variance and trend slope of the resource utilization rate of near N time slices by adopting a sliding window method, generating a time sequence characteristic of the resource utilization rate, wherein N is a preset positive integer of 5-30; Decomposing load data through wavelet transformation, extracting load fluctuation frequency, amplitude and abrupt change characteristics, and forming load fluctuation characteristics; Quantifying the service QoS requirement into response time weight, availability weight and reliability weight, and constructing a service QoS requirement feature vector; and generating energy consumption constraint features based on the mapping relation between the energy consumption of the server and the resource utilization rate.
4. The data center resource scheduling optimization method based on artificial intelligence according to claim 1, wherein the method for constructing the hybrid intelligent scheduling model in step S3 is as follows: The load prediction sub-model adopts an improved transducer network, a attention mechanism optimization module is added in an encoder, a resource utilization rate time sequence characteristic and a business concurrency characteristic are input, resource load prediction values of T time slices in the future are output, and T is a preset positive integer of 3-10; The reinforcement learning decision sub-model adopts a near-end policy optimization (PPO) algorithm, takes the load predicted value, the real-time resource state and the service priority coefficient as a state space, takes the resource allocation proportion, the virtual machine/container migration path and the server awakening/dormancy control as an action space, takes the maximum resource utilization rate up to standard rate, the minimum SLA default rate and the unit service energy consumption as targets to construct a reward function, and trains to obtain a decision model; The calculation formula of the reward function R is as follows: Wherein U is the resource utilization rate up to standard, E is the unit service energy consumption, S is the SLA default rate, alpha, beta and gamma are preset weight coefficients, and 。
5. The method for optimizing data center resource scheduling based on artificial intelligence according to claim 1, wherein the constraint conditions in step S3 are specifically: the resource capacity constraint is that the CPU core number, the memory capacity, the storage space and the network bandwidth sum distributed to all virtual resources do not exceed the maximum available resources of the corresponding physical servers respectively; SLA constraint that service request response time is less than or equal to a preset response threshold value, service availability is more than or equal to 99.9%, and data transmission reliability is more than or equal to 99.99%; the energy consumption constraint is that the total energy consumption of the data center is less than or equal to the preset upper energy consumption limit, and the energy consumption of a single server is less than or equal to 85% of rated power consumption; And the migration overhead constraint is that the network bandwidth in the migration process of the virtual machine/container occupies less than or equal to 30% of the total bandwidth, and the migration time is less than or equal to 500ms.
6. The method for optimizing data center resource scheduling based on artificial intelligence according to claim 1, wherein the optimizing scheduling scheme in step S4 specifically includes: the physical resource allocation scheme comprises the steps of defining the CPU core number, the memory size, the storage partition and the network bandwidth quota corresponding to each virtual machine/container; Determining the creation/destruction time of a virtual machine/container, a cross-server migration path and a load balancing target node; and (3) setting a server control scheme, namely setting a sleep/wake-up strategy of the idle server, a load migration proportion of the overload server and a resource allocation priority of the heterogeneous server.
7. The method for optimizing data center resource scheduling based on artificial intelligence according to claim 1, wherein the incremental learning algorithm in step S5 is an improved on-line gradient descent (OGD) algorithm, and only model parameter fine tuning is performed on newly collected operation index samples by setting a history parameter forgetting factor λ (λ e [0.8,0.95 ]), so that model history training effects are preserved, and catastrophic forgetting is avoided.
8. The method for optimizing data center resource scheduling based on artificial intelligence according to claim 1, further comprising the step S0 of initializing the model, pre-importing historical operation data of the data center for about 6 months, pre-training the hybrid intelligent scheduling model, and setting the number of model network layers, the number of hidden layer neurons, the training iteration number and constraint condition threshold values to obtain an initial training model.
9. The data center resource scheduling optimization method based on artificial intelligence according to claim 1, wherein the step S4 further comprises a scheduling scheme verification process, namely a data center simulation environment is built by adopting a digital twin technology, constraint satisfaction verification is conducted on the generated optimized scheduling scheme, and if constraint violation exists, the scheduling scheme is regenerated through an action space adjustment mechanism of a reinforcement learning decision sub-model until all constraint conditions are met.
10. The data center resource scheduling optimization method based on artificial intelligence according to claim 4, wherein the attention mechanism optimization module of the improved transducer network carries out differential attention on the resource utilization characteristics of different time slices by introducing a load fluctuation weight coefficient, so that the accuracy of load prediction is improved.

Description

Data center resource scheduling optimization method based on artificial intelligence Technical Field The invention relates to the technical field of cloud computing and data center operation and maintenance, in particular to a data center resource scheduling optimization method based on artificial intelligence. Background With the rapid development of cloud computing, big data and artificial intelligence technology, the scale of a data center is continuously enlarged, service types are increasingly complex, the number of virtual resources such as virtual machines/containers is exponentially increased, and higher requirements are provided for the real-time performance, reliability and energy efficiency of resource scheduling. The traditional data center resource scheduling method mainly depends on static rule scheduling (such as polling scheduling and load threshold triggering scheduling) or single algorithm scheduling (such as scheduling based on genetic algorithm and particle swarm optimization), and has the following remarkable defects: The traditional method is mainly used for carrying out simple trend prediction based on historical statistical data, and cannot effectively capture nonlinear fluctuation characteristics of resource loads and mutation rules of business concurrency, so that a scheduling scheme is delayed from actual load changes, and overload of a server or idle of resources are easily caused. The multi-constraint collaborative optimization capability is weak, namely, the resource scheduling of the data center needs to meet the resource capacity constraint, the SLA (service level agreement) constraint and the energy consumption constraint at the same time, but the traditional method usually focuses on single-target optimization (such as pursuing only the maximization of the resource utilization rate), so that the SLA default rate is increased or the energy consumption exceeds the standard, and the multi-target balance is difficult to realize. The dynamic adaptability is poor, the traditional static rules or the generalization capability of a single algorithm are insufficient in the face of complex scenes such as heterogeneous server clusters, service priority dynamic adjustment and the like, the scheduling strategy cannot be dynamically adjusted according to the real-time running state, and the model update depends on offline retraining and cannot adapt to the dynamic change of a data center. The migration cost and the scheduling efficiency are unbalanced, namely virtual machine/container migration is an important means for realizing load balancing, but the traditional scheduling method does not fully consider network bandwidth occupation, time consumption and other costs in the migration process, and is easy to cause service interruption or response delay and influence the service quality. Therefore, a resource scheduling optimization method capable of accurately predicting load change, cooperatively meeting multiple constraint conditions and having dynamic self-adaption capability is needed, so as to solve the problems of low resource utilization rate, high SLA default rate, excessive energy consumption and the like in the prior art. Disclosure of Invention In order to overcome the defects of the prior art, the invention aims to provide an artificial intelligence-based data center resource scheduling optimization method. The invention aims to overcome the defects of the prior art and provides a data center resource scheduling optimization method driven by a hybrid intelligent model based on artificial intelligence, which realizes multi-objective collaborative optimization of resource utilization rate, SLA standard reaching rate and energy consumption through multi-dimensional data acquisition, characteristic engineering optimization, hybrid intelligent model construction and closed loop iterative optimization, and improves the intelligent level and dynamic adaptability of data center resource scheduling. The technical scheme of the invention is that the data center resource scheduling optimization method based on artificial intelligence comprises the following steps: S0. model initialization configuration The method comprises the steps of pre-importing historical operation data (including physical resource data, virtual resource data, business service data and environment perception data) of a data center for approximately 6 months, pre-training a hybrid intelligent scheduling model, setting a model network layer number (a transducer encoder layer number is 6-12, a PPO algorithm network hiding layer is 3-5), the number of neurons of the hiding layer (512-1024), training iteration times (1000-5000) and constraint condition thresholds (such as an SLA response time threshold and an energy consumption upper limit), and obtaining an initial training model. S1, multidimensional data acquisition The method comprises the steps of collecting multidimensional operation data of a data center to form