CN-122000910-A - Photovoltaic-energy storage-charging multi-stage scheduling and market bidding optimization method and device
Abstract
The invention discloses a multi-stage scheduling and market bidding optimization method and device for photovoltaic-energy storage-charging, wherein the method comprises the steps of constructing a data driving random environment model reflecting photovoltaic output, electricity price fluctuation and charging load uncertainty by adopting a mode of combining time sequence clustering with a non-homogeneous Markov chain based on historical operation data; the method comprises the steps of modeling the scheduling and bidding problems of the optical storage and charging integrated station into a multi-stage Markov decision process model comprising joint optimization of daily decision and multiple daily rolling adjustment, training the multi-stage Markov decision process model by using a deep reinforcement learning algorithm to obtain a strategy regulation network which can adapt to various uncertain scenes and meet the physical constraint of equipment, and deploying the trained strategy regulation network in an energy management system to realize the globally coordinated scheduling and bidding of the optical storage and charging integrated station. The invention has the advantages of remarkably improved economic benefit, greatly enhanced robustness, global coordination and optimization of decisions, strong real-time decision making capability and good expandability and portability.
Inventors
- ZHANG ZHI
- CHEN QIANG
- HU XIAOXIAO
- ZHANG XINYU
- GU TAIYU
- Ji Xinzhe
- WANG ZEBIN
- LIU XIMU
- LIU HAOYU
- CHEN YUQI
- LIU HANGYU
- SUN JIAZHENG
- Bian Gechen
- ZHU YIDONG
- TIAN YE
- YE YUJIAN
- SHI KEJIAN
- WANG SHANSHAN
Assignees
- 国网辽宁省电力有限公司电力科学研究院
- 东南大学
Dates
- Publication Date
- 20260508
- Application Date
- 20251204
Claims (10)
- 1. A multi-stage scheduling and market bid optimization method for photovoltaic-energy storage-charging, comprising: Constructing a data-driven random environment model reflecting photovoltaic output, electricity price fluctuation and charging load uncertainty based on historical operation data, wherein the data-driven random environment model is a Monte Carlo random scene constructed by combining time sequence clustering with a non-homogeneous Markov chain; modeling scheduling and bidding problems of the optical storage and filling integrated station into a multi-stage Markov decision process model comprising joint optimization of daily decision and multiple daily rolling adjustment; Training the multi-stage Markov decision process model by utilizing a deep reinforcement learning algorithm to obtain a strategy regulation network which can adapt to various uncertainty scenes and meet the physical constraint of equipment; And deploying the trained strategy regulation network in an energy management system to realize globally coordinated scheduling and bidding action instructions of the optical storage and charging integrated station.
- 2. The method of claim 1, wherein the data-driven stochastic environment model is constructed according to the method of: collecting historical operation data of multiple dimensions and preprocessing to obtain a multi-dimensional time sequence variable; extracting synchronous feature vectors from the multi-dimensional time sequence variable according to time periods; discretizing the continuous state space into a limited number of state clusters by using a density clustering algorithm DBSCAN; Estimating a transition probability matrix corresponding to each period based on the historical state transition frequency, and introducing Laplace smoothing treatment to enhance the generalization capability of the model; Constructing a non-homogeneous Markov chain model, wherein the initial distribution of the non-homogeneous Markov chain model is obtained by counting the state frequency of the first period of time and is used for describing the joint dynamic characteristic of uncertainty evolution along with time; And generating an uncertainty Monte Carlo random scene based on the trained non-homogeneous Markov chain as a data-driven random environment model.
- 3. The method of claim 2, wherein the discretizing the continuous state space into a finite number of state clusters using a density clustering algorithm DBSCAN comprises: for each scheduling period, constructing a normalized multidimensional feature vector set based on historical operation data, wherein the feature vector at least comprises photovoltaic power generation output, power grid electricity price and electric vehicle charging load, and weather and time coding information is optionally added; Clustering the feature vector set of each period by adopting a DBSCAN density clustering algorithm, wherein the neighborhood radius epsilon is determined by the inflection point of a k-distance curve, and the minimum point MinPts is set as the feature dimension plus 1; Dividing a typical operation mode according to a clustering result, distributing a unique state number for each effective cluster, and calculating a cluster center and a covariance matrix of the effective clusters for generating a subsequent scene; and evaluating the clustering quality by using the profile coefficient, and if the average value is lower than a preset threshold value, adjusting the parameters to re-cluster so as to ensure the rationality and the distinguishing degree of the state division.
- 4. The method according to claim 2, wherein estimating the transition probability matrix corresponding to each period based on the historical state transition frequency and introducing a laplace smoothing process to enhance model generalization capability comprises: counting historical state transition frequency; Calculating the original transition probability by adopting a maximum likelihood method based on the historical state transition frequency obtained by statistics; and smoothing the transition probability matrix by using Laplace.
- 5. The method of claim 2, wherein said constructing a non-homogeneous markov chain model comprises: constructing a first-order Markov chain; Constructing initial state distribution estimation of a first-order Markov chain; and storing the smoothed transition probability matrix Pt corresponding to each period as a model parameter, and combining the initial state distribution of the first-order Markov chain to jointly form the non-homogeneous Markov chain.
- 6. The method of claim 2, wherein the generating the uncertainty monte carlo stochastic scene based on the trained non-homogeneous markov chain comprises: For a daily operation scene, determining the discrete state of each scene in a first period according to the random sampling of the initial state distribution; Gradually sampling the next state according to the current state by utilizing a transition probability matrix P t corresponding to each period according to the time sequence, and constructing complete state transition paths of 96 periods; After the whole discrete state sequence is obtained, mapping the state number of each period into the center vector of the corresponding cluster, and superposing Gaussian disturbance conforming to the statistical characteristics of the cluster to generate a continuous characteristic value; The obtained values are subjected to [0,1] range truncation and are converted into actual physical quantities through inverse normalization, wherein the actual physical quantities comprise photovoltaic output, electricity price and charging requirement.
- 7. The method of claim 1, wherein the intelligent agent is trained using PPO algorithm using 10,000 random monte carlo scenarios generated when training the multi-stage markov decision process model, wherein the training parameters are set to a learning rate lr = 1e-4, a batch size 64, 2048 samples for each update, a training round of 10, a discount factor γ = 0.99, and wherein performance is evaluated over 100 independent test scenarios after training is completed with about 2000 trains episode.
- 8. A photovoltaic-energy storage-charging multi-stage scheduling and market bid optimizing device, comprising: The scene construction unit is used for constructing a data-driven random environment model reflecting photovoltaic output, electricity price fluctuation and charging load uncertainty based on historical operation data, wherein the data-driven random environment model is a Monte Carlo random scene constructed by adopting a mode of combining time sequence clustering with a non-homogeneous Markov chain; the decision model building unit is used for modeling the scheduling and bidding problems of the optical storage and filling integrated station into a multi-stage Markov decision process model which comprises joint optimization of daily decision and multiple daily rolling adjustment; the training unit is used for training the multi-stage Markov decision process model by utilizing a deep reinforcement learning algorithm to obtain a strategy regulation network which can adapt to various uncertain scenes and meet the physical constraint of equipment; The deployment unit is used for deploying the trained strategy regulation network in the energy management system to realize the overall coordinated scheduling and bidding action instruction of the optical storage and charging integrated station.
- 9. An electronic device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the photovoltaic-energy storage-charging multi-stage scheduling and market bid optimization method according to any of claims 1 to 7 when the computer program is executed by the processor.
- 10. A computer readable storage medium storing a computer program, characterized in that the computer program when instructed by a processor implements the steps of the photovoltaic-energy storage-charging multi-stage scheduling and market bid optimization method according to any one of claims 1 to 7.
Description
Photovoltaic-energy storage-charging multi-stage scheduling and market bidding optimization method and device Technical Field The invention relates to the technical field of intelligent power grid and energy system optimization control, in particular to a photovoltaic-energy storage-charging multi-stage scheduling and market bidding optimization method and device. Background According to the latest statistical data of the International Energy Agency (IEA), the global photovoltaic power generation accumulated installed capacity breaks through 1TW, the electric automobile has more than 2600 ten thousand electric automobiles, and the annual compound growth rate is kept above 40%. In this context, integrated Photovoltaic (PV), energy Storage system (Energy Storage System, ESS) and electric vehicle charging facility (INTEGRATED PHOTOVOLTAIC-Energy Storage-ELECTRIC VEHICLE Station, IPEES or optical Storage-charging integration Station) have been developed, and become an important component of the distributed Energy system. The integrated photovoltaic storage and charging station has obvious technical and economic advantages as a novel integrated energy system, and has the advantages that (1) the photovoltaic system provides clean renewable electric energy, the carbon emission intensity is effectively reduced, (2) the energy storage system can stabilize the intermittence and fluctuation of photovoltaic output through an energy time shifting function, peak clipping and valley filling are realized, (3) the electric automobile charging load is used as a flexible and controllable movable energy storage unit and can participate in demand side response, and (4) the whole station can maximize the photovoltaic in-situ absorption rate and reduce the transmission loss of a power grid through unified coordination control. According to the statistics of the literature, the reasonably configured optical storage and filling integrated station can improve the photovoltaic utilization rate to more than 90%, the comprehensive energy efficiency of the system is improved by 15-25%, and the operation cost is reduced by 20-30%. With the continuous deepening of the power market reform, china has established a multi-time scale spot market system comprising a daily market, a daily market and a real-time market, and a multi-variety auxiliary service market such as frequency modulation, peak shaving and standby. Under the market environment, the optical storage and charging integrated station is used as an emerging market main body, and faces the problem of complex decision optimization, namely, a reference operation plan (including an internet power quantity, an energy storage and charging plan, an auxiliary service capacity declaration and the like) of the whole day is required to be determined in a day-ahead stage, and strategies are dynamically adjusted according to real-time information in a plurality of stages in the day so as to maximize comprehensive benefits of the whole day. The prior art mainly comprises the following types: deterministic optimization method The deterministic optimization method is based on deterministic prediction (such as photovoltaic output prediction, load prediction and electricity price prediction), a Mixed Integer Linear Programming (MILP) or nonlinear programming (NLP) model is established, and a commercial solver is adopted for solving. The objective function form of a typical model is: (1); Where n is the total profit, λt is the electricity price at time t, pgrid, t is the power exchanged with the grid (positive value is electricity selling, negative value is electricity purchasing), cop, t is the operation and maintenance cost, ras, t is the auxiliary service benefit, and Δt is the time period length. The method has the advantages of clear model and high solving speed, but has the obvious defects that (1) the method is completely dependent on prediction precision, when actual operation deviates from prediction, an optimization result is likely to be seriously invalid, (2) prediction uncertainty cannot be processed, robustness is lacking for extreme weather, and (3) the method is used for deciding to split before and in the day without considering linkage influence among stages. Second, random optimization and robust optimization method To overcome the deficiencies of deterministic methods, researchers have introduced stochastic programming and robust optimization theory. The stochastic programming method builds a two-stage or multi-stage stochastic programming model by generating a plurality of uncertainty scenarios: (2); Where x is the first stage decision variable (day-ahead plan), ys is the decision variable (day-ahead adjustment) in the second stage scene s, ζs is the uncertain parameter implementation value of scene s, and Es represents the expectation of all scenes. The robust optimization method adopts an uncertainty set to describe the parameter fluctuation range: min max f(x, ξ)s.t. ξ ∈ U(3);