CN-122021989-A - Deep reinforcement learning-based method for extracting long-term scheduling rules in water, wind and solar energy

CN122021989ACN 122021989 ACN122021989 ACN 122021989ACN-122021989-A

Abstract

The invention relates to the crossing field of clean renewable energy utilization and reservoir dispatching, and discloses a deep reinforcement learning-based method for extracting long-term dispatching rules in water, wind and light. The method comprises the following steps of (1) constructing a water-wind-solar long-term optimization scheduling model, determining an objective function and constraint conditions, (2) constructing a deep reinforcement learning framework, determining state variables, action variables and rewarding functions, and (3) solving by adopting a deep reinforcement learning near-end strategy optimization algorithm to obtain a scheduling rule. The invention utilizes the learning capability of the deep reinforcement learning method in a complex environment to determine a deep reinforcement learning framework for the long-term scheduling problem in the water and wind power plant, adopts an advanced deep reinforcement learning algorithm to formulate a reasonable scheduling rule, and can effectively guide the medium-term and long-term scheduling of the water and wind power plant complementary system.

Inventors

MA LI
XIAO HAIBIN
WU DI
DUAN YONGJIE
GONG LANQIANG
YUAN YIQI
ZHANG AONAN
LIU PAN
LI DACHENG

Assignees

中国电建集团贵阳勘测设计研究院有限公司
华能澜沧江水电股份有限公司
武汉大学

Dates

Publication Date: 20260512
Application Date: 20251216

Claims (8)

1. A deep reinforcement learning-based method for extracting a long-term scheduling rule in water, wind and light comprises the following steps: (1) Constructing a water-wind-solar long-term optimization scheduling model, and determining an objective function and constraint conditions; (2) Setting up a deep reinforcement learning framework, and determining state variables, action variables and rewarding functions; (3) And solving by adopting a deep reinforcement learning near-end strategy optimization algorithm to obtain the scheduling rule.
2. The method for extracting the long-term scheduling rule in the water and wind based on the deep reinforcement learning of claim 1, wherein in the step (1), The objective function is that the total power generation amount of the water-wind-light complementary system is maximum: ; ; wherein: the total output of the water-wind-solar complementary system is t time period; 、、 The power output of the hydropower station, the wind power station J and the photovoltaic power station M in the period t respectively, wherein J and M are the number of the wind power station and the photovoltaic power station respectively; the power discarding force is t time period; The time period is long; the calculation formulas of the water power output and the power rejection output are as follows: ; ; Wherein K is the comprehensive output coefficient of the hydropower station; a water purification head for a period t; Generating flow for the hydropower station in the period t; the power transmission capacity is the power transmission capacity of the water-wind-solar complementary system.
3. The method for extracting the long-term scheduling rule in the water and wind based on the deep reinforcement learning of claim 2, wherein in the step (1), The constraint conditions include: ; wherein: reservoir water storage capacity at the beginning of the period t; And Reservoir storage flow and water discharge flow of the reservoir at the time t period respectively; 、、 the reservoir water level, the downstream water level and the head loss at the beginning of the period t respectively.
4. The method for extracting the long-term scheduling rules in the water and wind based on the deep reinforcement learning according to claim 3, wherein in the step (2), three elements of the deep reinforcement learning framework comprise state variables, action variables and rewarding functions.
5. The deep reinforcement learning-based method for extracting the long-term scheduling rules in the water and wind power plant, which is disclosed in claim 4, is characterized in that: The state variables are defined as reservoir inflow, total wind and light output, reservoir water level and month: ; wherein: The total wind and light output is t time period; The month in which the t period is located; the action variable is defined as the reservoir water level at the end of the period: ; The reward function is defined as the period generating capacity of the water-wind-solar complementary system: 。
6. The method for extracting the long-term scheduling rules in the water and wind based on the deep reinforcement learning of claim 5, wherein in the step (3), the near-end strategy optimization algorithm comprises two neural networks, namely a strategy network and a value network.
7. The deep reinforcement learning-based method for extracting the long-term scheduling rules in the water and wind power plant, as set forth in claim 6, is characterized in that the objective function of the strategy network is as follows: ; ; ; wherein: the ratio of the new strategy to the old strategy; Is the updated policy; is a pre-update policy; Is a parameter of the policy network; An estimated value for the dominance function; super-parameters that limit the magnitude of policy updates; Is policy entropy; is the weight of the policy entropy.
8. The method for extracting the long-term scheduling rule in the water and wind based on the deep reinforcement learning of claim 7, wherein the objective function of the value network is as follows: ; wherein: Is an estimate of state value; is a parameter of the value network; For total return; Calculation using generalized dominance estimation The calculation formula is as follows: ; ; wherein: Is a discount factor; Is a super parameter.

Description

Deep reinforcement learning-based method for extracting long-term scheduling rules in water, wind and solar energy Technical Field The invention belongs to the crossing field of clean renewable energy utilization and reservoir dispatching, and relates to a deep reinforcement learning-based method for extracting a long-term dispatching rule in water, wind and light. Background The natural complementary characteristics of the sun and the season exist among wind power, photoelectricity and hydropower, so that a water-wind-solar complementary system development mode is gradually formed. However, due to the characteristics of randomness, uncertainty and the like of wind-solar resources, the traditional reservoir dispatching rules are difficult to adapt to reservoir operation requirements in a complementary system, and the long-term dispatching rules in the water-wind solar energy extraction are challenged. The conventional long-term scheduling rules in the water and wind power plant are mostly based on linear scheduling function forms, and nonlinear relations between complex decision factors and optimal decision variables in a water and wind power plant complementary system are difficult to capture accurately. Based on the method, the invention provides a deep reinforcement learning-based method for extracting the long-term scheduling rules in the water, the wind and the light. Disclosure of Invention Aiming at the problem that the existing scheduling rules are based on linear scheduling function forms, and nonlinear relations between complex decision factors and optimal decision variables in a water-wind-solar complementary system are difficult to capture accurately, the invention provides a deep reinforcement learning-based method for extracting the long-term scheduling rules in the water-wind-solar complementary system, and a novel method is provided for the long-term scheduling in the water-wind-solar complementary system by excavating the nonlinear and high-dimensional relations of the scheduling rules through a deep reinforcement learning technology. The technical scheme is that the method for extracting the long-term scheduling rules in the water and wind based on deep reinforcement learning comprises the following steps: (1) Constructing a water-wind-solar long-term optimization scheduling model, and determining an objective function and constraint conditions; the objective function is that the total power generation amount of the water-wind-light complementary system is maximum: wherein: the total output of the water-wind-solar complementary system is t time period; 、、 The power output of the hydropower station, the wind power station J and the photovoltaic power station M in the period t respectively, wherein J and M are the number of the wind power station and the photovoltaic power station respectively; the power discarding force is t time period; The time period is long; the calculation formulas of the water power output and the power rejection output are as follows: Wherein K is the comprehensive output coefficient of the hydropower station; a water purification head for a period t; Generating flow for the hydropower station in the period t; the power transmission capacity of the water-wind-solar complementary system is used for power transmission; the constraint conditions include: wherein: reservoir water storage capacity at the beginning of the period t; And Reservoir storage flow and water discharge flow of the reservoir at the time t period respectively;、、 Reservoir water level, downstream water level and head loss at the beginning of the period t respectively; (2) Setting up a deep reinforcement learning framework, and determining state variables, action variables and rewarding functions; The three elements of the deep reinforcement learning framework comprise state variables, action variables and rewarding functions; the state variables are defined as reservoir inflow, total wind and light output, reservoir water level and month: wherein: The total wind and light output is t time period; The month in which the t period is located; the action variable is defined as the reservoir level at the end of the period: the reward function is defined as the period generating capacity of the water-wind-solar complementary system: (3) Solving by adopting a deep reinforcement learning near-end strategy optimization algorithm to obtain a scheduling rule; the near-end strategy optimization algorithm comprises two neural networks, namely a strategy network and a value network; the objective function of the policy network is: wherein: the ratio of the new strategy to the old strategy; Is the updated policy; is a pre-update policy; Is a parameter of the policy network; An estimated value for the dominance function; super-parameters that limit the magnitude of policy updates; Is policy entropy; is the weight of the strategy entropy; the objective function of the value network is: wherein: Is