CN-122022233-A - Water storage scheduling rule extraction method based on reinforcement learning
Abstract
The invention belongs to the technical field of reservoir dispatching, and particularly discloses a water storage dispatching rule extraction method based on reinforcement learning, which comprises the following steps of (1) establishing an advanced water storage optimized dispatching model aiming at the maximum total power generation amount and the minimum flood control risk rate in a dispatching period; the method comprises the steps of (1) taking the warehouse-in flow for years as input, interacting with a water storage scheduling environment, training an agent based on a reinforcement learning algorithm to extract a water storage scheduling rule, and (3) performing simulated scheduling according to the obtained scheduling rule. According to the invention, by constructing the advanced water storage optimal scheduling model and training by using a reinforcement learning algorithm, a set of scientific and feasible water storage optimal scheduling rules is extracted, the overall benefits of the water storage period and the high water level operation period are comprehensively considered, and the reservoir manager is effectively helped to make scheduling decisions meeting the demands of all parties under different water supply conditions.
Inventors
- WU DI
- ZENG WEI
- MA LI
- WANG YINGJIE
- GONG LANQIANG
- MU YONGJUN
- CHENG RUIXIONG
- LIU PAN
Assignees
- 中国电建集团贵阳勘测设计研究院有限公司
- 华能澜沧江水电股份有限公司
- 武汉大学
Dates
- Publication Date
- 20260512
- Application Date
- 20251216
Claims (7)
- 1. A water storage optimization scheduling rule extraction method based on reinforcement learning is characterized by comprising the following steps: (1) Establishing an advanced water storage optimization scheduling model with the maximum total power generation amount and the minimum flood control risk rate in a scheduling period as targets; (2) Taking the input of the warehouse-in flow for many years, interacting with the water storage scheduling environment, training an agent based on a reinforcement learning algorithm, and extracting the water storage scheduling rule; (3) And performing simulated scheduling according to the obtained scheduling rules.
- 2. The method for extracting the optimal scheduling rules of water accumulation based on reinforcement learning as set forth in claim 1, wherein in the step (1), the objective function of the optimal scheduling model of water accumulation in advance is as follows: The total power generation amount in the reservoir dispatching period is the largest: ; ; wherein E is the total power generation amount of a reservoir dispatching period, hundred million kW.h, E (T) is the annual average power generation amount at the moment T, and hundred million kW.h, the reservoir dispatching period refers to the reservoir water storage period and the subsequent high water level operation period, T is the total step length of the dispatching period, and Y is the total years; The method is characterized in that the method is used for generating electricity in a water tank, the time step is day, the output coefficient is K, the electricity generation flow of the reservoir in the y-th year at the time t is Q (y, t), the electricity generation flow of the reservoir in the y-th year at the time t is m 3 /s, and the electricity generation water purification head of the reservoir in the y-th year at the time t is H (y, t).
- 3. The method for extracting the optimal scheduling rules of water storage based on reinforcement learning according to claim 2, wherein in the step (1), the objective function of the optimal scheduling model of water storage in advance is as follows: The reservoir flood control risk is minimum: ; ; wherein FCR (t) is the maximum flood control risk of a reservoir at t time in a water storage period of the reservoir, FCR (y, t) is the flood control risk of the reservoir at t time in the y-th year, V (y, t) is the reservoir capacity of the reservoir at t time in the y-th year, m 3 , VS (t) is the reservoir capacity corresponding to the stage flood control limit water level of the reservoir at t time in the y-th year, m 3 , VU is the normal water storage level of the reservoir, m 3 and VL are the reservoir capacities corresponding to the flood limit water levels, and m 3 respectively; Is a penalty coefficient.
- 4. The method for extracting water storage optimization scheduling rules based on reinforcement learning of claim 3, wherein in the step (2), the reinforcement learning algorithm trains components in the reinforcement learning framework of the agent to include the agent and the environment, the agent acquires knowledge based on Markov decision process by interacting with the environment in discrete time steps, performs training of the agent with maximum value estimation as a target, namely, updates the value estimation, and finally acquires an optimal action strategy reinforcement learning algorithm as the agent to interact with the environment containing the step reservoir scheduling knowledge to acquire a knowledge sample: ; Wherein s t is the state of the primary agent in the t period, which consists of the scheduling time t and the water level Z t , the water level Z t is a discrete value in the water level constraint range of the reservoir, a t is the action of the primary agent in the t period, r t is the action reward in the t period, and r t is the power generation benefit E (t) generated by scheduling.
- 5. The method for extracting the water storage optimization scheduling rule based on reinforcement learning according to claim 4, wherein the value obtained by an agent taking a certain action under a certain state in the Markov decision process is called an action value and is recorded as a Q value, the reinforcement learning algorithm approximates the value to an optimal strategy for solving a multi-stage problem through Q value updating, and the optimal Q values at the beginning and end of a period satisfy a Belman equation: ; Where q (S t ,a t ) represents the current impact of action a t taken in state S t , γ is the discount rate for controlling future benefits on the current, and P st,st+1 represents the markov state transition probability of a step reservoir transition from state S t to the next state S t+1 for describing the randomness of the flow in the state, where S is the set of states.
- 6. The method for extracting the water storage optimization scheduling rule based on reinforcement learning according to claim 5, wherein an epsilon-greedy strategy is adopted in reinforcement learning to determine decision actions of each stage, and an action selection probability expression in the epsilon-greedy strategy is as follows: ; wherein: the probability of randomly selecting an action in the state s t at the time t is represented; the probability of selecting the action according to the highest evaluation value under the state s t at the time t is represented, and epsilon is the greedy rate.
- 7. The method for extracting water storage optimization scheduling rules based on reinforcement learning as claimed in claim 6, wherein in the step (3), The method comprises the steps of training the obtained reinforcement learning water storage optimization scheduling rule by using the inspection period data in the warehouse-in flow data, and converting the double targets of the water storage scheduling model into single targets to perform simulated scheduling on a reservoir by considering water balance constraint, reservoir capacity curve constraint, drainage capacity constraint of a drainage facility, water level upper and lower limit and water level amplitude constraint, output constraint and boundary condition constraint, wherein the comprehensive benefit index is expressed as follows: ; wherein R is a comprehensive benefit index, and a is a weight.
Description
Water storage scheduling rule extraction method based on reinforcement learning Technical Field The invention belongs to the technical field of reservoir dispatching, and relates to a water storage dispatching rule extraction method based on reinforcement learning. Background As new reservoirs are continuously built and put into use, the proportion of the water storage capacity in the water storage period to the runoff is increased, so that obvious water reducing processes exist in areas such as the middle and lower reaches of the Yangtze river, and the like, thereby the problems of insufficient capacity and the like are brought to public welfare scheduling of drought resistance, ecology and the like of the river basin. Under the condition of limited water storage capacity, how to combine and consider various factors such as flood control, power generation, water storage, shipping and the like according to the warehouse-in flow of each warehouse is an engineering problem which needs scientific exploration. The current reservoir water storage real-time operation scheduling is carried out conventional scheduling according to a designed water storage scheduling line, so that the method is difficult to adapt to different water supply quantities, the power generation benefit of the reservoir in the withered water year is low, and the flood control risk in the plump water year is high. Based on the method, an optimal scheduling rule with balanced flood control risks and power generation benefits of the reservoir under the condition of different water storage periods is obtained by adopting a water storage scheduling rule extraction method based on reinforcement learning. Disclosure of Invention The invention aims to solve the problem that the existing reservoir water storage scheduling is difficult to adapt to different water supply conditions, provides a water storage scheduling rule extraction method based on a reinforcement learning algorithm, aims at maximizing total power generation amount and minimizing flood control risk rate in a scheduling period, extracts reservoir water storage optimization scheduling rules and provides references for reservoir water storage period operation, and the method comprehensively considers the overall benefits of the water storage period and the high water level operation period, and effectively helps reservoir managers to make scheduling decisions meeting demands of all parties under different water supply conditions. The invention adopts the following technical scheme to realize the purposes: a water storage optimization scheduling rule extraction method based on reinforcement learning comprises the following steps: (1) Establishing an advanced water storage optimization scheduling model with the maximum total power generation amount and the minimum flood control risk rate in a scheduling period as targets; (2) Taking the input of the warehouse-in flow for many years, interacting with the water storage scheduling environment, training an agent based on a reinforcement learning algorithm, and extracting the water storage scheduling rule; (3) And performing simulated scheduling according to the obtained scheduling rules. In the method for extracting the water storage optimization scheduling rule based on reinforcement learning, in the step (1), the objective function of the advanced water storage optimization scheduling model is as follows: The total power generation amount in the reservoir dispatching period is the largest: wherein E is the total power generation amount of a reservoir dispatching period, hundred million kW.h, E (T) is the annual average power generation amount at the moment T, and hundred million kW.h, the reservoir dispatching period refers to the reservoir water storage period and the subsequent high water level operation period, T is the total step length of the dispatching period, and Y is the total years; The method is characterized in that the method is used for generating electricity in a water tank, the time step is day, the output coefficient is K, the electricity generation flow of the reservoir in the y-th year at the time t is Q (y, t), the electricity generation flow of the reservoir in the y-th year at the time t is m 3/s, and the electricity generation water purification head of the reservoir in the y-th year at the time t is H (y, t). In the method for extracting the water storage optimization scheduling rule based on reinforcement learning, in the step (1), the objective function of the advanced water storage optimization scheduling model is as follows: The reservoir flood control risk is minimum: Wherein FCR (t) is the maximum flood control risk of a reservoir at t time in a water storage period of the reservoir, FCR (y, t) is the flood control risk of the reservoir at t time in the y-th year, V (y, t) is the reservoir capacity of the reservoir at t time in the y-th year, m 3, VS (t) is the reservoir capacity corresponding to the stage flo