CN-121995767-A - Data center air conditioner energy-saving control method based on joint prediction and reinforcement learning

CN121995767ACN 121995767 ACN121995767 ACN 121995767ACN-121995767-A

Abstract

The invention discloses a data center air conditioner energy-saving control method based on joint prediction and reinforcement learning, which comprises the steps of introducing GBoost models to predict the temperature of a machine room for 5 minutes in the future with high precision, embedding predicted values into reinforcement learning state space to enable a control strategy to have a look-ahead decision making capability, constructing a DQN reinforcement learning model and a reward function R, introducing experience playback and a target network delay updating mechanism in DQN model training, improving convergence stability and preventing the strategy from sinking into local optimum. And further introducing an expert identification-guided PCA feature dimension reduction and air conditioner-temperature area correlation matrix, and improving the model convergence speed and generalization performance. The control method is deployed in a 2000+ cabinet data center, the actual measured energy saving is more than 12%, the energy consumption is reduced by 12.3%, and the PUE is optimized to 1.21. Experiments prove that the invention obviously improves the energy efficiency on the premise of guaranteeing the safety of a machine room and provides a new idea for greening a data center.

Inventors

AI CHUANXIAN
DING FUJIANG
CHEN YINGHU
SONG ZIJIAN
GAO FUYI
CHEN CHEN
ZHANG LIANG

Assignees

国家计算机网络与信息安全管理中心云南分中心

Dates

Publication Date: 20260508
Application Date: 20260304

Claims (10)

1. The data center air conditioner energy-saving control method based on the combined prediction and reinforcement learning is characterized by comprising the following steps of: S1, collecting historical data, namely collecting environmental data of a data center machine room in a historical period, wherein the environmental data comprises time information, point location temperature, outdoor temperature, air conditioning parameters and dynamic indexes; S2, training a prediction model, namely constructing a data center temperature prediction model based on XGBoost algorithm, and completing model training and model optimization by taking the data center historical data collected in the S1 as a training sample to obtain a trained data center temperature prediction model; s3, acquiring current environmental data of a data center machine room, and outputting a temperature prediction result through an optimized temperature prediction model by taking the current environmental data as input characteristics based on the model obtained by training in the S2; S4, embedding the temperature prediction result obtained in the step S3 into the environmental data of the data machine room to obtain a data center state S, and enabling the DQN to have a look-ahead decision by taking the prediction information into a state space; S5, constructing a DQN reinforcement learning model and a reward function R, taking a data center state S and a control quantity A executed by the air conditioning unit action as input data, and outputting an updated control quantity A' executed by the air conditioning unit action; s6, repeating the steps S4 and S5, performing multi-round iterative training optimization on the DQN reinforcement learning model, and obtaining a trained DQN reinforcement learning model after the PUE value of the data center and the feedback value R of the reward function tend to be stable; And S7, based on the DQN reinforcement learning model trained in the step S6, taking the data center state S embedded with the temperature prediction result output by the XGBoost algorithm as input data, and completing the output of the energy-saving control quantity A of the air conditioner of the data center.
2. The data center air conditioner energy-saving control method based on combined prediction and reinforcement learning according to claim 1, wherein in step S1, the air conditioner parameters comprise air supply temperature, return air temperature, fan rotating speed, fan frequency and compressor frequency, the dynamic indexes comprise temperature change rate and IT load power change, and in step S2, the temperature prediction result is a maximum value of 5min future temperature of the data center 。
3. The method for controlling energy conservation of an air conditioner in a data center based on joint prediction and reinforcement learning according to claim 1, wherein in step S2, the XGBoost algorithm completes prediction of the environmental temperature of a machine room in the data center by gradually optimizing a loss function through a serial CART regression tree, and the constructed loss function L is as follows: , in the formula, Representing the true temperature value of the i-th sample, Representing the predicted temperature value of the i-th sample, The regularization parameter representing the complexity of the tree is used for controlling splitting, and the value is 0.1; represents the total number of CART regression trees, Representing an L2 regularization coefficient, and taking a value of 0.1; The L2 norm representing the leaf node weight.
4. The method for energy-saving control of an air conditioner in a data center based on joint prediction and reinforcement learning according to claim 1, wherein the bonus function R in step S4 is designed as follows, , Wherein T represents the maximum value of the temperature of the data center for 5min in the future, T threshold represents the temperature alarm threshold, the temperature alarm threshold is set to 25 ℃, E represents the energy consumption of a machine room, alpha represents the weight coefficient of the balance temperature control progress, and beta represents the weight coefficient of energy consumption optimization.
5. The method for controlling energy conservation of the air conditioner in the data center based on the combined prediction and reinforcement learning of claim 1, wherein the sample data in the step S1 is further subjected to dimension reduction according to the following steps: S1.1, carrying out standardized pretreatment on environmental data to obtain standardized feature vectors, wherein the standardized pretreatment comprises value deficiency treatment, abnormal value removal and normalization treatment; s1.2, optimizing and recommending indoor temperature points closely related to outdoor temperature, air conditioning parameters and dynamic indexes through expert identification based on prior knowledge of characteristic engineering and engineering structures; s1.3, based on principal component analysis PCA method, obtaining eigenvalue vectors according to covariance for each temperature point, sorting according to eigenvalues, and ensuring that the accumulated covariance is larger than a preset threshold according to the following formula The minimum number of features required is confirmed at the time, , Wherein x j represents the j-th feature, k represents the total number of features, f represents the cumulative covariance ratio function, the larger the function value represents the better the fitting effect, Representing a preset threshold value, and taking 85% -95% of the value.
6. The method for controlling energy conservation of an air conditioner in a data center based on joint prediction and reinforcement learning according to claim 4, wherein the iterative training optimization flow of the DQN reinforcement learning model in step S5 is as follows: S5.1, initializing a DQN reinforcement learning model strategy and a target network; s5.2, determining the control quantity of the action to be executed in the action network by adopting an epsilon algorithm according to the current state in the current time step; S5.3, after the action A is executed on the air conditioner of the machine room according to the control quantity, updating the parameter setting of the air conditioner unit, predicting the temperature data of five minutes in the future through a XGBoost temperature prediction model, and obtaining the maximum temperature value T in the predicted temperature; S5.4, constructing a new data center state S, calculating a reward value of the execution action according to a reward function R, and feeding back the reward value of the execution work to the DQN reinforcement learning model to update a model strategy; And S5.5, repeating the steps S5.2-S5.4 until the rewarding value of the execution action tends to be stable, and completing the iterative training optimization of the model.
7. The data center air conditioner energy-saving control method based on the combined prediction and reinforcement learning according to claim 6, wherein an experience return visit pool is constructed, the data center state S of each time step obtained in the step S5.5 is executed, and an action A, a reward value R and a data center new state S' are stored in the experience return visit pool; Sampling from the experience playback pool at intervals of a plurality of time steps, and updating model weights by using the DQN to finish periodic updating of the target network.
8. The method for controlling energy conservation of an air conditioner in a data center based on joint prediction and reinforcement learning of claim 6, wherein in step S6, iterative training optimization of the DQN reinforcement learning model is ended when the following condition is satisfied, , In the formula, The prize value representing the jth round of iteration, The prize value representing the j + n iteration, Is a constraint coefficient and has a value ranging from 0.01 to 0.1.
9. The method for controlling energy conservation of the air conditioner in the data center based on combined prediction and reinforcement learning of claim 6, wherein the method is characterized in that the forced correlation variable with future temperature prediction is obtained by carrying out Pearson correlation analysis on input features of a XGBoost temperature prediction model, and the forced correlation variable comprises an air conditioner supply temperature set value, point position temperature, a point position temperature average value in a 20-min time period and a temperature change rate.
10. The method for controlling energy conservation of a data center air conditioner based on joint prediction and reinforcement learning of claim 6, wherein when iterative training is performed on the DQN reinforcement learning model, The action network adopts a multi-layer perceptron MLP, the structure comprises an input layer 64 node and a hidden layer 32 node, reLU activation functions are adopted, and the dimension of an output layer is the action space size |A|; the target network is a copy of the action network, and is stably trained through a delay updating mechanism, and the delay updating coefficient tau=0.01.

Description

Data center air conditioner energy-saving control method based on joint prediction and reinforcement learning Technical Field The invention belongs to the technical field of data center air conditioner refrigeration control, and particularly relates to a data center air conditioner energy-saving control method based on joint prediction and reinforcement learning. Background The actual IT load in the data center machine room is uneven, so that the cold and hot distribution in the machine room is uneven, and the mutual consumption among air conditioners can be caused by resistance formed among air flows. The IT load has fluctuation and day-and-night difference, the outdoor environment temperature also fluctuates with seasons, and simple regulation can not make an integral regulation strategy along with the change of the load in advance although keeping the temperature in a normal range, so that the air conditioner can not work in an optimal state. The non-uniformity of the ambient temperature and humidity of the machine room and the mutual interference between the air conditioners not only cause a great deal of waste of cold energy, but also cause the air conditioners to be continuously started and stopped under the condition of fluctuation interference, so that potential safety hazards exist. And the energy consumption of the air conditioning system of the data center is up to 40%, and the traditional control method, such as PDI or threshold rules, can only enable the precise air conditioner to operate according to simple control logic due to lack of predictability, and can lead to excessive refrigeration or after adjustment. The conventional reinforcement learning algorithm RL only responds to the history/current state, the load mutation or the thermal inertia effect cannot be predicted, the problems of response lag and insufficient predictability exist, and the problems of high complexity, difficult convergence and the like exist in the conventional multi-agent RL. Disclosure of Invention Aiming at the technical problems and defects existing in the prior art, the invention provides a data center air conditioner energy-saving control method based on joint prediction and reinforcement learning, the model control strategy has forward-looking decision capability, can effectively reduce energy consumption by 12.3%, reduce the exceeding temperature duration by 81%, optimize the PUE to 1.21, solve the problem of lag or supercooling regulation by the conventional control method, and take a predicted value as an extended state of a part of observable MDP to solve the problem of traditional RL short-looking. The technical scheme is that in order to achieve the aim of the invention, the invention adopts the following technical scheme: a data center air conditioner energy-saving control method based on joint prediction and reinforcement learning comprises the following steps: S1, collecting historical data, namely collecting environmental data of a data center machine room in a historical period, wherein the environmental data comprises time information, point location temperature, outdoor temperature, air conditioning parameters and dynamic indexes; S2, training a prediction model, namely constructing a data center temperature prediction model based on XGBoost algorithm, and completing model training and model optimization by taking the data center historical data collected in the S1 as a training sample to obtain a trained data center temperature prediction model; s3, acquiring current environmental data of a data center machine room, and outputting a temperature prediction result through an optimized temperature prediction model by taking the current environmental data as input characteristics based on the model obtained by training in the S2; S4, embedding the temperature prediction result obtained in the step S3 into the environmental data of the data machine room to obtain a data center state S, and enabling the DQN to have a look-ahead decision by taking the prediction information into a state space; S5, constructing a DQN reinforcement learning model and a reward function R, taking a data center state S and a control quantity A executed by the air conditioning unit action as input data, and outputting an updated control quantity A' executed by the air conditioning unit action; s6, repeating the steps S4 and S5, performing multi-round iterative training optimization on the DQN reinforcement learning model, and obtaining a trained DQN reinforcement learning model after the PUE value of the data center and the feedback value R of the reward function tend to be stable; And S7, based on the DQN reinforcement learning model trained in the step S6, taking the data center state S embedded with the temperature prediction result output by the XGBoost algorithm as input data, and completing the output of the energy-saving control quantity A of the air conditioner of the data center. Further, in step S1, the air condition