CN-116430732-B - Heat exchange station control method based on reinforcement learning
Abstract
The invention discloses a heat exchange station control method based on reinforcement learning, which mainly comprises two parts, wherein the first part is a simulation environment model for learning a heat exchange station based on a method for generating countermeasures, and the second part is used for performing heat exchange true control strategy training by using a PPO reinforcement learning technology, so that a good heat exchange station control method is obtained. The simulation environment model of the heat exchange station is constructed based on the historical data of the heat exchange station, and the simulation environment model is learned by adopting a method of generating countermeasures, so that the simulation environment can obtain good simulation effect even in a state that the historical data does not appear. And then, the simulation environment model learned by the countermeasure structure is generated to train a control strategy by using the PPO reinforcement learning technology, and the control strategy can well complete a control target given by a design reward function and the temperature of a control result does not lag because the training process of the PPO reinforcement learning technology is stable and the variance of the training process is small.
Inventors
- YU YANG
- HE JIAFEI
Assignees
- 南京大学
Dates
- Publication Date
- 20260508
- Application Date
- 20230417
Claims (6)
- 1. The heat exchange station control method based on reinforcement learning is characterized by comprising the following steps: step 1, determining a control target to be achieved by a heat exchange station, wherein the control target comprises the state of the heat exchange station And actions ; Step 2, acquiring and collecting historical data executed by the heat exchange station; Step 3, based on the historical data of the heat exchange station, adopting a simulation environment model for generating countermeasure type learning and outputting the heat exchange station; step 4, setting different reward functions according to control targets of the heat exchange station; Step 5, training the PPO intelligent agent by using a PPO reinforcement learning technology based on the simulation environment model and the rewarding function; step 6, executing real-time control of the heat exchange station by using the trained PPO intelligent agent; In step 3, based on the historical data of the heat exchange station, a simulation environment model for generating an countermeasure type learning output of the heat exchange station is adopted, and the method specifically comprises the following steps: step 3.1, defining a generator and a discriminator respectively; step 3.2, the current state of the heat exchange station is adjusted And actions The predicted state input to the generator output at the next time ; Step 3.3, the predicted state of the next moment Inputting into the discriminator, calculating the rewarding signal And then transferred to the generator; step 3.4, according to the reward signal Updating and training the generator by adopting a PPO reinforcement learning method; Step 3.5, updating and training the discriminator in a gradient descent mode according to the two classification loss functions; step 3.6, outputting a simulation environment model of the heat exchange station until training is finished; in step 4, different reward functions are set according to the control targets of the heat exchange station so as to adapt to different scenes, and the method specifically comprises the following situations: if the lost secondary network backwater temperature is controlled around a certain target temperature, the reward function can be set as follows: , wherein, Representing the temperature of the object to be measured, If the energy consumption is used as a control target, the reward function can be set as follows: Wherein, the Representing the water costs currently consumed by the heat exchange station, Representing the current electricity charge consumed by the heat exchange station, if a plurality of control targets exist, defining a reward function in a weighted addition mode ; In step 5, based on the simulation environment model and the reward function, PPO reinforcement learning is used for training the PPO intelligent agent, wherein the PPO intelligent agent adopts an Actor-Critic framework, and the specific training steps are as follows: step 5.1, initializing an Actor and a Critic network; Step 5.2, according to the current state of the heat exchange station in the Actor network Calculating the action currently executed by the heat exchange station ; Step 5.3, calculating the state of the heat exchange station at the next moment according to the simulation environment model ; Step 5.4, calculating a reward signal according to the reward function ; Step 5.5, the actions in the steps 5.2 to 5.4 are saved and updated to the Actor and Critic network; and 5.6, finishing the training output PPO agent control strategy when the Actor and Critic network reach the training times.
- 2. The method for controlling a heat exchange station based on reinforcement learning as set forth in claim 1, wherein the state of the heat exchange station in step 1 It is indicated that the number of the elements is, Wherein, the Representing the temperature of the outside world and, Represents the water supply temperature of the primary network, Represents the backwater temperature of the secondary network, Represents the water outlet temperature of the secondary net, Represents the pressure after the primary net is decontaminated, Represents the pressure of the secondary net after decontamination, Represents the pressure of the outlet pipe orifice of the secondary net, Representing the backwater flow of the secondary network.
- 3. The heat exchange station control method based on reinforcement learning as set forth in claim 2, wherein the actions of the heat exchange station in step 1 It is indicated that the number of the elements is, Wherein, the Representing the opening setting percentage of the regulating valve, Representing the rotation speed set value of the circulating water pump.
- 4. A heat exchange station control method based on reinforcement learning according to claim 3, wherein the history data of the heat exchange station execution is acquired and collected in step 2, and specifically comprising the steps of: step 2.1, collecting the state of the heat exchange station set at each moment And actions History data of (2); Step 2.2, the collected historical data is simply processed; And 2.3, arranging the historical data according to the time sequence.
- 5. The heat exchange station control method based on reinforcement learning according to claim 1, wherein the early warning can be performed by using the simulation environment model obtained in the step 3, specifically comprising the following steps: Step 7.1, calculating a predicted value of the heat exchange station at the next moment according to the simulation environment model; step 7.2, reading the current sensor data of the heat exchange station to obtain the true value of the next moment And 7.3, comparing the predicted value with a true value, and if the difference value between the predicted value and the true value exceeds a set value, repeating the steps 3 to 5 to retrain the control strategy and the simulation environment model of the PPO agent.
- 6. The method for controlling a heat exchange station based on reinforcement learning according to claim 1, wherein the loss function of the PPO reinforcement learning is expressed as, Wherein, the As parameters of the PPO agent neural network, Is the ratio of the probability that the PPO agent selected the current action and the probability that the PPO agent selected this action when the set of data records, Is a dominance function.
Description
Heat exchange station control method based on reinforcement learning Technical Field The invention relates to a heat exchange station control method based on reinforcement learning, which belongs to the application of reinforcement learning in the field of energy in computer technology and is particularly suitable for heating scenes. Background The heat exchange station (heat exchange station) is a place where heat is concentrated and exchanged, and is used for transmitting high-temperature hot water or steam generated by the thermal power plant into each residential area and transmitting heat into a residential pipe network to realize residential heating, and the residential heating is divided into a direct supply station and an indirect supply station according to a heat supply mode. The heat exchange station system mainly consists of two parts, wherein the first part is called a primary net, and the second part is called a secondary net. The primary network provides heat for the whole system, hot water enters a heat exchanger of the system through a primary network pipeline, heat energy is released by staying in the heat exchanger, and the hot water is called primary network water supply. The secondary net supplies heat for users, water in the secondary net pipeline absorbs heat in the heat exchanger, the water absorbing the heat circulates in the heat supply pipeline to provide heat for the users, and the water in the state is called secondary net water outlet. After flowing for one week, the water returns to the heat exchange station, and the water is called secondary net backwater. The temperature of the secondary network backwater is controlled in a target temperature range generally, so that the whole system can be controlled well. The traditional heat exchange station control method generally comprises a manual control method and a proportional-integral-derivative control PID method. However, the method of manually controlling the heat exchange station consumes manpower, and the problem of unstable temperature of the heat exchange station is very easy to occur completely according to the operation experience of staff, so that the heat supply temperature of a user is not up to standard. The PID controller is difficult to meet the requirements of dynamic and static indexes, the effects of integral saturation, differential links on noise amplification and the like limit the control effect of PID, and the traditional error extraction method is too simple, so that the disturbance of a control system is easily caused. Therefore, the good heat exchange station control method can fully consider the current external environment and the state of the current heat exchange station, so that reasonable control decisions are made, and the problems of unstable control, hysteresis and the like are avoided. In recent years, with the wide application of deep learning in various fields, the deep reinforcement learning technology has achieved a series of achievements in the fields of robot control, games and the like. However, the application of reinforcement learning is also limited, because reinforcement learning is to continuously interact with the environment to improve the control strategy effect of the reinforcement learning, and thus environments and data for reinforcement learning training cannot be provided in many application scenarios. For example, in some application scenarios, the trial-and-error cost is particularly high, and the control effect of the self-strategy cannot be improved. Therefore, most reinforcement learning techniques cannot be directly applied to a real scene, and the current mainstream reinforcement learning techniques all require a simulation environment capable of continuous interaction, and the simulation environment corresponding to each heat exchange station in reality is different. Therefore, there is a need for a method that can apply reinforcement learning to the real scenario of a heat exchange station to obtain high flexibility and robustness for control of the heat exchange station. Disclosure of Invention The summary of the application is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. The summary of the application is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter. Aiming at the problems and the defects existing in the prior art, the invention aims to provide a heat exchange station control method based on reinforcement learning, which is realized mainly by establishing a simulation environment model, training a reinforcement learning control strategy and migrating to a learning process of a real environment, the simulation environment of the heat exchange station is established, the problem that the reinforcement learning technology has high trial and error cost in practica