CN-122026319-A - Photovoltaic electric station-probability joint prediction self-adaptive switching method based on reinforcement learning
Abstract
A photovoltaic electric station-probability combined prediction self-adaptive switching method based on reinforcement learning aims to solve the contradiction that under high-proportion photovoltaic access, point prediction is out of alignment on fluctuation days and probability prediction cost is high. The method comprises the steps of extracting normalized force track and climbing strength characteristics based on previous sunlight photovoltaic output data, constructing a decision state without weather dependence by combining calendar prior, adopting a deep reinforcement learning network to learn a mapping relation of 'previous day form-next day fluctuation risk-prediction mode selection', outputting automatic switching actions of point prediction and probability prediction, designing a reward function fusing fluctuation degree indexes and prediction cost, and enabling a driving strategy to be automatically switched to probability prediction on a fluctuation day by preferentially selecting point prediction on a typical day. The method realizes the collaborative optimization among the prediction precision, the robustness and the calculation cost, and improves the adaptability and the economy of the photovoltaic prediction system under the complex weather and multi-station conditions.
Inventors
- YAO KAIWEN
- QU YINPENG
- CHEN HAOYANG
- Tao Penghao
Assignees
- 湖南大学
Dates
- Publication Date
- 20260512
- Application Date
- 20260114
Claims (10)
- 1. The photovoltaic electric site-probability joint prediction self-adaptive switching method based on reinforcement learning is characterized by comprising the following steps of: S1, acquiring historical photovoltaic output data and constructing a daily sample, namely acquiring historical output time sequence data of a photovoltaic power station when the photovoltaic power station is in a normal operation or to-be-predicted state, wherein the historical output time sequence data comprises a timestamp and photovoltaic active power sequences of a plurality of stations, and reconstructing the historical output time sequence data according to a preset daily time window to form a daily power station output matrix ; S2, extracting morphological characteristics and constructing state quantity of the previous day, namely outputting a matrix of the previous day Performing normalization and difference processing, and extracting characteristic tensor representing output form and fluctuation The characteristic tensor comprises normalized output track characteristics and climbing strength characteristics, and calendar priori characteristic vectors of a target day d are extracted simultaneously The calendar prior feature vector Includes sine and cosine features and week features of the position in the year And (3) with Together constitute a no-prediction decision state ; S3, constructing and outputting a switching strategy network based on deep reinforcement learning, namely constructing a deep Q network DQN as the switching strategy network, wherein the deep Q network DQN comprises a processing unit for processing Convolutional neural network branches of (c) and for processing Multiple layers of perceptron branches of (1) and outputting motion value vectors at a fusion layer Based on Greedy policy or greedy policy selection actions And outputs a binary switching result, wherein Indicating the target day A point prediction model is adopted, so that the point prediction model, Indicating the target day A probability prediction model is adopted; s4, constructing a reward function and performing offline training, namely, real output curve based on target day d Calculating a waviness index Building site threshold values for each site based on fluctuation degree distribution of training set According to With site threshold Calculating a substitution loss function of the point prediction cost and the probability prediction cost, and considering a probability prediction use cost item to form rewards The system is used for updating the DQN parameters, performing offline training on the DQN through experience playback and a target network mechanism, and obtaining a trained switching strategy; s5, automatic strategy output and prediction flow diversion of the verification set, namely generating states of the verification set day by day according to time sequence And calling the trained switching strategy to output a switching result of each target day, writing the switching result into a strategy file, automatically shunting the prediction task of the target day to a point prediction model or a probability prediction model for execution according to the strategy file, and generating a corresponding point prediction result or probability prediction result.
- 2. The reinforcement learning-based photovoltaic power station-probability joint prediction adaptive switching method of claim 1 is characterized in that the sub-step of S1 is as follows: s1.1, collecting hour-level photovoltaic power data according to a station, and performing alignment and duplicate removal processing on a time stamp; S1.2, screening sample points of 06:00 to 18:00 daily and resampling to fix the number of the sample points in the daily to 13; S1.3, interpolation and alignment treatment are carried out on the missing points in the day, wherein the interpolation is preferably linear interpolation or spline interpolation based on time index; S1.4 forming daily stacked sample tensors Where N is the number of sample days and 13 is the number of sample points in the day.
- 3. The method for adaptively switching photovoltaic site-probability joint prediction based on reinforcement learning according to claim 1, wherein the sub-step of S2 is as follows: S2.1 matrix of output for previous day Normalizing the maximum value or quantile according to the site to obtain normalized track characteristics ; S2.2 for the normalized trajectory characteristics Calculating the difference between adjacent moments and taking the absolute value to obtain the climbing strength characteristic ; S2.3 will And (3) with Forming morphology feature tensors by channel stacking ; S2.4 calculating calendar prior characteristics of target day The calendar prior feature Including the position within the year 、 A one-time-of-week encoding feature; s2.5 will And (3) with Together constitute a no-prediction decision state 。
- 4. The method for adaptively switching photovoltaic site-probability joint prediction based on reinforcement learning according to claim 1, wherein the step of processing the depth Q network DQN output motion value vector in S3 comprises the steps of: S3.1, the convolution branch adopts at least three convolution layer pairs Feature extraction is carried out, and a batch normalization layer and a nonlinear activation function are arranged behind each convolution layer to obtain an image embedding vector; s3.2 a priori branching adopts at least two fully connected layer pairs Performing feature mapping to obtain a priori embedded vector; S3.3, splicing the image embedded vector and the prior embedded vector, inputting the spliced image embedded vector and the prior embedded vector into a fusion full-connection layer, and outputting an action value vector Where K is the action space dimension.
- 5. The method for adaptively switching photovoltaic site-probability joint prediction based on reinforcement learning according to claim 4, wherein the action space of S3.3 is one of the following: (1) The whole station is provided with a unified action space, Corresponding to the output {0,1}; (2) The motion space is differentiated by a plurality of stations, Wherein action a encodes the site switch result in the form of a bitmask and the j-th bit of a indicates that the j-th site employs point prediction or probabilistic prediction on the target day.
- 6. The method for adaptively switching photovoltaic site-probability joint prediction based on reinforcement learning according to claim 1, wherein the fluctuation index in S4 is The construction of (a) includes one or a combination of the following: (1) Based on the ratio of the average absolute climbing intensity of the power difference at adjacent time points in the day to the relative average value of the power difference; (2) Based on the ratio of the standard deviation of the power sequence in the day to the relative mean; (3) Morphological fluctuation index based on peak-valley difference in day, climbing extremum or multimodal number: ; Wherein, the Represent the first Day (C) A segment of a day or a statistical window, The handover policy is indicated as such, As a function of the parameters of the policy network, Representing the desired operator(s), Representing a loss function.
- 7. The method for adaptively switching photovoltaic site-probability joint prediction based on reinforcement learning as set forth in claim 1, wherein the threshold value of the site in S4 is The construction method comprises respectively calculating quantile threshold value of training set fluctuation degree sample according to sites Wherein Is a preset quantile parameter for distinguishing a typical day from a fluctuation day, For depicting sites In global handover, aggregating site thresholds into site thresholds by weighted mean 。
- 8. The method for adaptively switching over the photovoltaic site-probability joint prediction based on reinforcement learning according to claim 1, wherein the reward r (d) in S4 is constructed by a substitution loss function, and the substitution loss function comprises: (1) Point predicted substitution loss ; (2) Probability predictive substitution loss ; Wherein the method comprises the steps of A cost term is used for the probability prediction, In order to set the coefficient to be the preset value, Representing the basic cost of the probability prediction and the sensitivity coefficient of the probability prediction to supra-threshold fluctuations respectively, Representing a point prediction basic cost and a point prediction superthreshold penalty slope respectively; the rewards are defined as: ; Wherein, the Is the first Reinforcement learning rewards (reward) of days, For the number of sites to be the number of sites, For the indexing of the sites, Invoking a corresponding replacement loss function on behalf of the mode selected according to the site; site of site In the first place The fluctuation degree index of the day is used for indicating the fluctuation degree of the day, Representative site Is set to a threshold value of (2).
- 9. The reinforcement learning-based photovoltaic site-probability joint prediction adaptive switching method of claim 1, wherein the offline training of S4 adopts a deep Q learning algorithm, and the method comprises the following steps: (1) By using Greedy strategy for action selection and gradual decay To realize the exploration and utilization balance; (2) Randomly extracting small batches of samples from the historical interaction samples by adopting an experience playback mechanism to update parameters; (3) Calculating a target Q value by adopting a target network mechanism or a dual DQN mechanism so as to improve training stability; (4) The Q value estimation error is minimized by using Huber loss or mean square error loss, and the network parameters are updated by using an Adam optimizer.
- 10. The method for adaptive switching of photovoltaic site-probability joint prediction based on reinforcement learning as set forth in claim 1, wherein the verification set automatic policy output in S5 comprises the steps of Day-before-day output matrix Build status And outputting a switching result by the trained DQN, and writing a strategy file into the strategy file for driving the automatic shunting execution of the subsequent point prediction flow and the probability prediction flow.
Description
Photovoltaic electric station-probability joint prediction self-adaptive switching method based on reinforcement learning Technical Field The invention belongs to the technical field of new energy power prediction and electric power system operation control, and particularly relates to a photovoltaic electric station-probability combined prediction self-adaptive switching method based on reinforcement learning. Background As photovoltaic power generation continues to increase in permeability in distribution, micro and regional grids, the output uncertainty has evolved from "negligible disturbance" to a key constraint that affects dispatch planning, standby configuration, energy storage invocation, and market clearance. The photovoltaic output fluctuation mainly comes from irradiance rapid change caused by cloud occlusion, aerosol change, local convection and the like, has obvious time-varying property and non-stationarity, and causes the same station to present two different obvious output forms of 'typical day (change rule and smoothness)' and 'fluctuation day (frequent climbing and obvious mutation)' at different days. Existing photovoltaic power prediction engineering systems typically employ point prediction models (e.g., single-valued regression based on statistical or deep learning) and probabilistic prediction models (e.g., quantile regression, scene generation, or bayesian reasoning). The point prediction model has the advantages of light structure, fast reasoning and easy integration, but errors are often amplified remarkably on fluctuation days, and the errors present asymmetric risks (underestimation or overestimation can have adverse consequences on the adjustment). The advantage of probabilistic predictive models is that they provide uncertainty characterization and are more robust on the fluctuation day, but generally require more complex modeling, more inference overhead, and higher business costs (training, calibration, storage, and post-output processing). However, the existing methods have the following general disadvantages: (1) The predictive model lacks an adaptive selection mechanism. Most systems lack interpretable, leachable switching logic between point prediction and probability prediction, usually with fixed strategies or manual rules, and are difficult to adapt to seasonal and fluctuating daily duty cycle changes. (2) The decision stage is strongly dependent on future information. Many switching rules rely on future weather forecast, cloud image or irradiance predictions, and are prone to failure when weather data is missing or of unstable quality. (3) The point/probability predictor depth coupling results in engineering complexity. Embedding the switching logic into the prediction model often results in long training chains, difficult parameter adjustment, high online maintenance cost, and difficulty in achieving 'lightweight deployment+robust operation'. Therefore, a technical solution that is independent of future weather information, can automatically identify the "next day fluctuation risk" based on only historical operation data and output prediction model selection is needed, so as to reduce the overall cost of the system and improve the deployability while ensuring the robustness. Disclosure of Invention The invention aims to provide a non-prediction reinforcement learning switching strategy method for photovoltaic power prediction service, which is used for automatically deciding to adopt point prediction or probability prediction in the next day on a daily scale according to the output form of the previous day, so that the problems of insufficient robustness, high probability prediction cost, difficulty in full use, strong dependence on future weather information and high engineering coupling degree of fluctuation daily point prediction in a high-permeability photovoltaic scene are solved. In order to solve the technical problems, the invention adopts the following technical scheme: A photovoltaic electric station-probability joint prediction self-adaptive switching method based on reinforcement learning comprises the following steps: S1, acquiring historical photovoltaic output data and constructing a daily sample, namely acquiring historical output time sequence data of a photovoltaic power station when the photovoltaic power station is in a normal operation or to-be-predicted state, wherein the historical output time sequence data comprises a timestamp and photovoltaic active power sequences of a plurality of stations, and reconstructing the historical output time sequence data according to a preset daily time window to form a daily power station output matrix ; S2, extracting morphological characteristics and constructing state quantity of the previous day, namely outputting a matrix of the previous dayPerforming normalization and difference processing, and extracting characteristic tensor representing output form and fluctuationThe characteristic tensor comprises n