CN-121984048-A - Light Chu Zhi soft migration depth reinforcement learning multi-scene energy storage self-adaptive configuration method, system, equipment and medium

CN121984048ACN 121984048 ACN121984048 ACN 121984048ACN-121984048-A

Abstract

The invention relates to the technical field of energy storage configuration of a straight-flexible system, and discloses a light Chu Zhi flexible migration depth reinforcement learning multi-scene energy storage self-adaptive configuration method, a system, equipment and a medium, wherein the method comprises the steps of acquiring historical operation and equipment parameters and constructing a typical operation scene set based on a clustering algorithm; the method comprises the steps of constructing a double-layer Markov decision model, taking the whole year comprehensive economic benefit of a system as a rewarding function by an upper-layer model, introducing investment out-of-limit penalty, taking the daily operation economic benefit as the rewarding function by a lower-layer model, combining system safety constraint, improving a TD3 algorithm based on a multi-scene migration strategy and a smoothness constraint technology, solving the double-layer Markov decision model, and forming a multi-scene energy storage self-adaptive configuration scheme taking both investment benefit and operation benefit into consideration. The invention can realize the self-adaptive planning configuration of the energy storage capacity of the light storage direct-soft system under the condition of multiple scenes, reduce the system investment and the operation and maintenance cost, and improve the power supply reliability and the operation economy of the light storage direct-soft system.

Inventors

TAN ZHUKUI
HAN XUE
GAO JIPU
WANG YU
ZHANG XUAN
ZHANG YUANYUAN
ZHANG LONG
LIN CHENGHUI
FENG QIHUI
ZHANG YUELANG
XU YUTAO
WANG YANG
MAO JUNYI
LI DINGGUO
GAO YUAN

Assignees

贵州电网有限责任公司

Dates

Publication Date: 20260505
Application Date: 20251219

Claims (10)

1. The light Chu Zhi soft migration depth reinforcement learning multi-scene energy storage self-adaptive configuration method is characterized by comprising the following steps of: acquiring historical photovoltaic, energy storage, load, electricity price and equipment parameters, and constructing a typical operation scene set of the light storage direct-soft system based on a clustering algorithm; The method comprises the steps of constructing a double-layer Markov decision model based on a typical operation scene set, wherein the upper-layer model takes the annual comprehensive economic benefit of a system as a rewarding function, introduces investment out-of-limit punishment, and the lower-layer model takes the daily operation economic benefit as the rewarding function and combines the system safety constraint to realize multi-scene self-adaptive operation; the TD3 algorithm is improved based on a multi-scene migration strategy and a smoothness constraint technology; And solving a double-layer Markov decision model by utilizing an improved TD3 algorithm, and forming a multi-scene energy storage self-adaptive configuration scheme taking investment benefit and operation benefit into consideration through upper and lower interactive iteration.
2. The light Chu Zhi soft migration depth reinforcement learning multi-scene energy storage self-adaptive configuration method as claimed in claim 1, wherein the upper model introduces an investment out-of-limit penalty by taking the overall economic benefit of the system as an rewarding function, and the method comprises the following steps: Constructing an upper layer space, wherein the upper layer space comprises external environment information of typical scenes and system equipment parameters, and the external environment information comprises photovoltaic output characteristics, load characteristics, electricity price level and equipment related parameters in each scene; constructing an upper-layer action space, and defining the action space as the rated installation capacity of energy storage and the rated power thereof; The upper layer rewarding function comprises annual operation income of the system, equivalent annual investment cost of energy storage equipment, annual operation maintenance cost and investment out-of-limit penalty term.
3. The light Chu Zhi soft migration depth reinforcement learning multi-scene energy storage self-adaptive configuration method as claimed in claim 2, wherein the lower model uses daily operation economic benefit as a reward function, and the method is combined with system security constraint, and comprises the following steps: Constructing a lower layer space, and setting a state space of operation in a day, wherein the state space specifically comprises the current energy level of energy storage, photovoltaic power generation capacity, load demand and electricity price signals, and meanwhile, operating conditions under different typical scenes are distinguished; constructing a lower-layer action space, which is defined as charging and discharging power of energy storage at each moment and electricity purchasing and selling quantity of a system to a power grid at each moment; The lower layer rewarding function comprises system daily operation income and system safety constraint punishment items, wherein the safety constraint punishment comprises energy storage charge state out-of-limit punishment and grid-connected power out-of-limit punishment.
4. The method for adaptive configuration of light Chu Zhi-oriented soft migration deep reinforcement learning multi-scene energy storage according to claim 3, wherein the lower model further comprises: The mathematical model of the energy storage equipment is used for updating the charge state of the energy storage equipment according to the charge-discharge action; The mathematical model defines the self-discharge rate, the charging efficiency and the discharging efficiency of the energy storage device, and ensures that the charge state of the energy storage device always runs within a preset safety limit value.
5. The light Chu Zhi soft migration depth reinforcement learning multi-scene energy storage adaptive configuration method as claimed in claim 4, wherein the improvement of the TD3 algorithm based on the multi-scene migration strategy and the smoothness constraint technology comprises the following steps: In the multi-scene migration strategy, scene information is integrated into an action strategy network and a value evaluation network, so that an algorithm shares knowledge among different scenes and adjusts in a self-adaptive manner; and introducing an action smooth loss term and a parameter smooth loss term into a loss function of the action strategy network so as to restrain action jump at adjacent moments and limit parameter drift.
6. The light Chu Zhi-oriented soft migration depth reinforcement learning multi-scene energy storage self-adaptive configuration method as claimed in claim 5, wherein the improved TD3 algorithm is used for solving a double-layer Markov decision model, and a multi-scene energy storage self-adaptive configuration scheme which takes investment benefit and operation benefit into account is formed through upper-layer and lower-layer interactive iteration, and the method comprises the following steps: Initializing strategy networks, double Critic networks, target networks and experience playback pools of an upper layer and a lower layer; Preliminary selecting an energy storage configuration scheme according to experience rules and economic constraints, and simultaneously superposing and exploring noise training and cutting to a feasible region; Based on the energy storage configuration capacity and power given by the upper layer, the lower layer Actor network trains and outputs the energy storage charging and discharging and system electricity purchasing actions in each time period in the day, and simultaneously returns instant benefit and constraint feedback; Updating an Actor network under a delay condition, and adding action and parameter smoothing regularization; the daily operation benefit and the constraint result output by the lower layer are fed back to the upper layer, and the upper layer rewards are formed by combining the investment operation maintenance cost and the investment out-of-limit penalty items; the upper layer stores the interaction sample, updates the lower layer TD3 double Critic network Actor network, and also introduces action and parameter smoothness and regularities; The updated upper energy storage configuration scheme is transmitted to the lower layer, and the daily operation scheduling strategy of the lower layer retrains under the new energy storage configuration capacity and returns a result to form loop iteration optimization; and when the training reaches the set upper limit of the iteration times, outputting the optimal energy storage configuration and the corresponding operation strategy.
7. The light Chu Zhi soft-oriented migration depth reinforcement learning multi-scene energy storage adaptive configuration method as claimed in claim 6, wherein the construction of the typical operation scene set of the light storage straight soft system based on the clustering algorithm comprises the following steps: splicing the preprocessed multidimensional data into a sample vector; Based on the sample vector, clustering is carried out on the historical multidimensional operation data by adopting a clustering algorithm, and a scene set capable of representing different typical operation conditions of the system is generated.
8. A light Chu Zhi soft-oriented migration deep reinforcement learning multi-scenario energy storage adaptive configuration system employing the method of any one of claims 1-7, comprising: the acquisition module is used for acquiring historical photovoltaic, energy storage, load, electricity price and equipment parameters and constructing a typical operation scene set of the light storage direct-soft system based on a clustering algorithm; The system comprises a double-layer Markov decision model construction module, a multi-scene self-adaptive operation module and a multi-scene self-adaptive operation module, wherein the double-layer Markov decision model construction module is used for constructing a double-layer Markov decision model based on a typical operation scene set, the upper-layer model takes the annual comprehensive economic benefit of the system as a rewarding function, introduces investment out-of-limit punishment, and the lower-layer model takes the daily operation economic benefit as a rewarding function and combines the system safety constraint to realize the multi-scene self-adaptive operation; the optimization module is used for improving the TD3 algorithm based on a multi-scene migration strategy and a smoothness constraint technology; the solving output module is used for solving a double-layer Markov decision model by utilizing an improved TD3 algorithm, and forming a multi-scene energy storage self-adaptive configuration scheme taking investment benefit and operation benefit into consideration through upper-layer and lower-layer interactive iteration.
9. A computer device, comprising: A memory and a processor; the memory is configured to store computer executable instructions that when executed by the processor implement the steps of a light Chu Zhi soft-oriented migration deep reinforcement learning multi-scenario energy storage adaptive configuration method according to any one of claims 1 to 7.
10. A computer readable storage medium storing computer executable instructions which when executed by a processor implement the steps of a light Chu Zhi soft-oriented migration depth reinforcement learning multi-scenario energy storage adaptive configuration method of any one of claims 1 to 7.

Description

Light Chu Zhi soft migration depth reinforcement learning multi-scene energy storage self-adaptive configuration method, system, equipment and medium Technical Field The invention relates to the technical field of energy storage configuration of an optical storage straight-flexible system, in particular to a light Chu Zhi flexible migration deep reinforcement learning multi-scene energy storage self-adaptive configuration method, system, equipment and medium. Background With the development of large-scale access of renewable energy sources to power grids and distributed energy systems, the importance of the light storage direct-flexible system in the aspects of improving the regional power supply reliability, optimizing energy scheduling and realizing green low-carbon targets is increasingly highlighted. Under the conditions of uncertainty and load fluctuation of photovoltaic power generation, the light storage direct-flexible system can be flexibly scheduled through energy storage equipment, local power self-sufficiency is realized, dependence on an external power grid is reduced, and energy utilization efficiency is improved. However, under the complex operation environment of multiple scenes, how to scientifically configure the energy storage capacity and formulate the self-adaptive operation strategy, and considering the economical efficiency, the energy storage life and the power supply reliability of the system, the key problem of restricting the efficient operation of the optical storage direct-soft system is still solved. In recent years, deep reinforcement learning provides possibility for solving the problem because of strong adaptability and long-term benefit optimization. The energy storage system focused on light Chu Zhi is researched, the capacity and the power of the energy storage system are optimally modeled by taking economy as a target, and the optimal configuration of the capacity and the power parameters of the energy storage system is realized by adopting a particle swarm algorithm; however, the energy storage in a single scene is only optimally configured, and the generalization capability in the case of multiple scenes is poor. There are also studies to propose a method for capacity configuration design of key devices of a civil building 'smooth Chu Zhi soft' system based on an improved energy valley optimization algorithm, which can realize unified configuration of multiple devices of the system including energy storage, but does not consider daily optimized operation of a subsequent system. The method has the advantages that the energy storage operation problem of the system is converted into the optimizing problem of the intelligent agent, meanwhile, a TD3 algorithm considering the constraint condition of the system operation is provided for solving the energy storage operation problem, and the safe operation of the energy storage system is guaranteed, however, the TD3 algorithm in the method has poor training effect under the condition of multiple scenes, and the oscillation of an output strategy is large, so that the operation stability of the system and the service life of equipment are influenced. Disclosure of Invention The present invention has been made in view of the above-described problems occurring in the prior art. Therefore, the invention provides a light Chu Zhi soft migration depth reinforcement learning multi-scene energy storage self-adaptive configuration method and a system, which solve the problems that the conventional light storage straight-soft system configuration method is poor in generalization capability under the condition of multiple scenes, and is difficult to consider in-day optimal operation and large in oscillation of an output strategy. In order to solve the technical problems, the invention provides the following technical scheme: In a first aspect, the present invention provides a light Chu Zhi soft migration deep reinforcement learning multi-scenario energy storage adaptive configuration method, including: acquiring historical photovoltaic, energy storage, load, electricity price and equipment parameters, and constructing a typical operation scene set of the light storage direct-soft system based on a clustering algorithm; The method comprises the steps of constructing a double-layer Markov decision model based on a typical operation scene set, wherein the upper-layer model takes the annual comprehensive economic benefit of a system as a rewarding function, introduces investment out-of-limit punishment, and the lower-layer model takes the daily operation economic benefit as the rewarding function and combines the system safety constraint to realize multi-scene self-adaptive operation; the TD3 algorithm is improved based on a multi-scene migration strategy and a smoothness constraint technology; And solving a double-layer Markov decision model by utilizing an improved TD3 algorithm, and forming a multi-scene energy storage self-adaptive configur