CN-121984028-A - Reactive power optimization and voltage control method based on reinforcement learning

CN121984028ACN 121984028 ACN121984028 ACN 121984028ACN-121984028-A

Abstract

A reactive power optimization and voltage control method based on reinforcement learning combines Informer model, reformer model, near-end strategy optimization agent, interior point method and reactive power-voltage droop control method, and is used for reactive power optimization and voltage control of a power distribution network. The reactive power optimization and voltage control method based on reinforcement learning can predict the future active power output of renewable energy, then uses a near-end strategy to optimize the actual measurement value and the predicted value of the active power output of the renewable energy observed by an agent and output a weight vector to fuse the actual measurement value and the predicted value of the active power output into equivalent active power output, then uses an interior point method to perform reactive power optimization to generate a reactive power instruction, and finally uses a reactive power-voltage droop control method to realize the control of reactive power output of the renewable energy. The method can inhibit fluctuation of the active output fluctuation of the renewable energy source to the voltage of the power distribution network, and improve the stability of the voltage of the power distribution network.

Inventors

YIN LINFEI
LIAO YONGXING

Assignees

广西大学

Dates

Publication Date: 20260505
Application Date: 20260124

Claims (4)

1. The reactive power optimization and voltage control method based on reinforcement learning is characterized by comprising the following steps of: In the step (1) At moment, collecting meteorological data related to the active output of the photovoltaic power generation unit, collecting the active output of the photovoltaic power generation unit, and then constructing the collected meteorological data and the active output Time Informer is input of sequence data ; Time Informer is input of sequence data The method comprises the following steps: (1) In the formula, Is that The horizontal irradiance of the photovoltaic power station at the moment; Is that The horizontal irradiance of the photovoltaic power station at the moment; Is that The horizontal irradiance of the photovoltaic power station at the moment; Is that The photovoltaic power station at moment directly transmits irradiance to the normal direction; Is that The photovoltaic power station at moment directly transmits irradiance to the normal direction; Is that The photovoltaic power station at moment directly transmits irradiance to the normal direction; Is that Scattering horizontal irradiance of the photovoltaic power station at moment; Is that Scattering horizontal irradiance of the photovoltaic power station at moment; Is that Scattering horizontal irradiance of the photovoltaic power station at moment; Is that The temperature of the photovoltaic power generation unit at the moment; Is that The temperature of the photovoltaic power generation unit at the moment; Is that The temperature of the photovoltaic power generation unit at the moment; Is that The ambient wind speed of the photovoltaic power station at moment; Is that The ambient wind speed of the photovoltaic power station at moment; Is that The ambient wind speed of the photovoltaic power station at moment; Is that The active output of the photovoltaic power generation unit at the moment; Is that The active output of the photovoltaic power generation unit at the moment; Is that The active output of the photovoltaic power generation unit at the moment; At the position of At moment, collecting meteorological data related to the active output of the wind power generation unit, collecting the active output of the wind power generation unit, and then constructing the collected meteorological data and the active output Time Reformer is input of sequence data ; Time Reformer is input of sequence data The method comprises the following steps: (2) In the formula, Is that The ambient temperature of the wind farm at moment; Is that The ambient temperature of the wind farm at moment; Is that The ambient temperature of the wind farm at moment; Is that The relative humidity of the wind farm environment at the moment; Is that The relative humidity of the wind farm environment at the moment; Is that The relative humidity of the wind farm environment at the moment; Is that The ambient wind speed of the wind farm at moment; Is that The ambient wind speed of the wind farm at moment; Is that The ambient wind speed of the wind farm at moment; Is that Wind direction of wind farm environment wind speed at moment; Is that Wind direction of wind farm environment wind speed at moment; Is that Wind direction of wind farm environment wind speed at moment; Is that The ambient air pressure of the wind farm at moment; Is that The ambient air pressure of the wind farm at moment; Is that The ambient air pressure of the wind farm at moment; Is that Active output of the wind power generation unit at moment; Is that Active output of the wind power generation unit at moment; Is that Active output of the wind power generation unit at moment; Step (2) of Input Informer model, informer model output Predicted value of photovoltaic active power output at moment ; Predicted value of photovoltaic active power output at moment The method comprises the following steps: (3) In the formula, Is that A predicted value of photovoltaic active output at a moment; Is that A predicted value of photovoltaic active output at a moment; Is that A predicted value of photovoltaic active output at a moment; ; Is a control period; Will be Input Reformer model, reformer model output Predicted value of wind power active output at moment ; Predicted value of wind power active output at moment The method comprises the following steps: (4) In the formula, Is that A predicted value of wind power active output at moment; Is that A predicted value of wind power active output at moment; Is that A predicted value of wind power active output at moment; Step (3) of connecting 、、 And Composition of the composition Moment distribution network observation matrix ; Moment distribution network observation matrix The method comprises the following steps: (5) Step (4) of The input of the near-end policy optimization agent, the actions of the near-end policy optimization agent output are that Time weight vector ; Time weight vector The method comprises the following steps: (6) In the formula, Is that The weight value of the photovoltaic output at the moment, ; Is that The wind power output weight value at the moment, ; Step (5) of calculating Time renewable energy source equivalent active power output ; Time renewable energy source equivalent active power output The method comprises the following steps: (7) In the formula, Is that Photovoltaic equivalent active force at moment; Is that Wind power equivalent active output at moment; is a row minimum function; is a row maximum function; is an element product; and (6) constructing a reactive power optimization model of the power distribution network, and setting the voltage deviation minimization of the nodes of the power distribution network as an objective function: (8) In the formula, Is the total number of nodes of the power distribution network; Is a node A voltage amplitude; Is a node A voltage reference value; the equation constraint is set as follows: (9) (10) In the formula, And Respectively nodes Active power and reactive power generated by the upper generator; And Respectively nodes Active and reactive loads of (a); Is a node A voltage amplitude; And The conductance and susceptance in the power distribution network admittance matrix respectively; Is a node Sum node The phase angle difference of the voltage between them, And Respectively nodes Sum node A voltage phase angle; Setting inequality constraint conditions, wherein node voltage constraint conditions are as follows: (11) (12) In the formula, Is a node A lower voltage amplitude limit; Is a node An upper voltage amplitude limit; Is a node A lower voltage phase angle limit; Is a node An upper voltage phase angle limit; The generator output constraint is: (13) (14) In the formula, Is a node The lower limit of active power generated by the upper generator; Is a node The upper limit of active power generated by the upper generator; Is a node The lower limit of reactive power generated by the upper generator; Is a node The upper limit of reactive power generated by the upper generator; setting decision variables as follows: (15) (16) In the formula, The reactive output decision variable vector of the generator; is the voltage decision variable vector of the generator node; is a reactive power decision variable of the photovoltaic power generation unit; Is a reactive power decision variable of the wind power generation unit; Is a node voltage decision variable of the photovoltaic power generation unit; Is a node voltage decision variable of the wind power generation unit; solving the reactive power optimization model of the power distribution network constructed in the step (6) by using an interior point method, and extracting a decision variable optimization result to obtain a reactive power output reference value of the photovoltaic power generation unit Node voltage reference value of photovoltaic power generation unit Reactive output reference value of wind power generation unit Wind power unit node voltage reference value ; According to step (8) And Generating reactive-voltage sag control curves of photovoltaic power generation units The method comprises the following steps: (17) In the formula, The maximum reactive power allowable to be output by the photovoltaic power generation unit; is the node voltage of the photovoltaic power generation unit; the voltage regulation dead zone of the node of the photovoltaic power generation unit; is the width parameter of the linear adjustment region; is a sag control factor; According to And Generating reactive-voltage sag control curves for wind power generation units The method comprises the following steps: (18) In the formula, The maximum reactive power allowable to be output by the wind power generation unit; is the node voltage of the wind power generation unit; is a dead zone for regulating the voltage of a node of the wind power generation unit; step (9) of And Photovoltaic inverter for updating parameters to photovoltaic power generation units in power distribution network And The parameters are updated to a doubly-fed converter of a wind power generation unit in the power distribution network; step (10) of The method comprises the steps of updating parameters of renewable energy equipment in a power distribution network for a control period, and firstly collecting node voltage amplitude of the power distribution network at non-parameter updating moment , , And then quantifying the voltage amplitude into a reward value of the near-end strategy optimization intelligent agent, wherein the reward value quantifying method comprises the following steps: (19) In the formula, Is a near-end policy optimization agent A reward value obtained at a moment; Is the F-norm; Is that the node 1 of the distribution network is at The voltage amplitude at the moment; Is that the node 1 of the distribution network is at The voltage amplitude at the moment; Is that the node 1 of the distribution network is at The voltage amplitude at the moment; Is that the node 1 of the distribution network is at The voltage amplitude at the moment; is that the node 2 of the distribution network is at The voltage amplitude at the moment; is that the node 2 of the distribution network is at The voltage amplitude at the moment; is that the node 2 of the distribution network is at The voltage amplitude at the moment; is that the node 2 of the distribution network is at The voltage amplitude at the moment; Is a node of a power distribution network At the position of The voltage amplitude at the moment; Is a node of a power distribution network At the position of The voltage amplitude at the moment; Is a node of a power distribution network At the position of The voltage amplitude at the moment; Is a node of a power distribution network At the position of The voltage amplitude at the moment; Is a node of a power distribution network At the position of The voltage amplitude at the moment; Is a node of a power distribution network At the position of The voltage amplitude at the moment; Is a node of a power distribution network At the position of The voltage amplitude at the moment; Is a node of a power distribution network At the position of The voltage amplitude at the moment; Is that Row of lines A full 1 matrix of columns; Step (11) of awarding value Inputting the proximal strategy optimization agent of the step (4); And (12) jumping to the step (1) when the time reaches the next control time.
2. The reinforcement learning-based reactive power optimization and voltage control method according to claim 1, wherein Informer model in step (2) can be based on the input Output of Predicted value of photovoltaic active power output at moment The Informer model contains ProbSparse self-attention module, self-attention distillation, and generative decoder; ProbSparse self-attention formula is: (20) In the formula, Is a query Is used for the sparsity score of (1), Is a key matrix composed of A key vector is formed; Is to take all keys In the process, the And (3) with Maximum value of similarity; Is the first A plurality of query vectors; Is the first A key vector; Is the vector dimension; is the total number of key vectors; The self-attention distillation formula is: (21) In the formula, Is the first An input sequence of a layer encoder; Is the first An input sequence of a layer encoder; Is an attention module function; is a one-dimensional convolution function; Is an activation function; Is the maximum pooling function; The formula of the generative decoder is: (22) In the formula, Is the decoder input sequence; is a splicing operation function; Is the start token; is the target sequence position code; Is a real set; is the starting token length; is the target sequence length; Is the hidden layer dimension.
3. The reinforcement learning-based reactive power optimization and voltage control method according to claim 1, wherein Reformer model in step (2) can be based on the input Output of Predicted value of wind power active output at moment The Reformer model comprises a local sensitive hash attention, a reversible residual error network and a blocking feedforward network; the locality sensitive hash attention formula is: (23) In the formula, Is a normalized exponential function; Is a query matrix; is a transpose of the key matrix; Is the dimension of the attention head; is a mask matrix; Is a matrix of values; the reversible residual error network comprises forward calculation and reverse reconstruction, wherein the forward calculation formula is as follows: (24) (25) (26) (27) In the formula, Is an input to the feed-forward network, is split along the channel dimension into And ; And Is the output of forward propagation, and the final output is obtained after splicing ; Is a feedforward network function which merges the attention of the local sensitive hash; Is a feed forward network function; the reverse reconstruction formula is: (28) The formula of the partitioned feedforward network is as follows: (29) In the formula, Is the input to the feed forward network; is a standard feed forward network function; Is the first The number of input blocks is one, , Is a preset super parameter and represents the number of blocks.
4. The reinforcement learning based reactive power optimization and voltage control method of claim 1, wherein the near-end policy optimization agent in step (4) is capable of optimizing the agent based on the input Output of Time weight vector ; Near-end policy optimization agent's objective function The method comprises the following steps: (30) In the formula, Is a policy clipping item; is a cost function term; Is an entropy regularization term; is the cost function penalty term weight; Is a strategy entropy regularization term weight; is a policy network parameter; Is a value network parameter; is the current strategy; Policy clipping item The method comprises the following steps: (31) In the formula, Is a mathematical expectation; Is a minimum function; Is the ratio of the probability of the strategy, Is a function of the current policy and, Is a function of the old policy and, Is that The action is performed at the moment of time, Is that A time state; Is that Estimating a time dominance function; is a clipping function; is a clipping threshold; cost function term The method comprises the following steps: (32) In the formula, Is a cost function; is a discount return; Entropy regularization term The method comprises the following steps: (33) In the formula, Is the policy entropy.

Description

Reactive power optimization and voltage control method based on reinforcement learning Technical Field The invention belongs to the field of artificial intelligence, power systems, intelligent power grids and power system optimization scheduling, and relates to a power system reactive power optimization and voltage control method based on neural network, reinforcement learning, interior point method and reactive power-voltage droop control, which is suitable for reactive power optimization and voltage control of a power distribution network with high-proportion renewable energy access in a power system. Background As the permeability of renewable energy sources in an electric power system is continuously increased, the electric power system, particularly a power distribution network, is easily affected by the fluctuation of the active output of the renewable energy sources, so that the fluctuation of the power grid voltage is caused. The existing reactive power optimization method of the power system does not have the capability of actively predicting the active output of renewable energy sources in the power distribution network, so that the problem that a reactive power instruction is not matched with the active output of the renewable energy sources easily occurs, and the voltage of the power distribution network is severely fluctuated finally. In addition, the existing reactive power optimization method of the power system does not have the capability of actively observing the active output fluctuation of the renewable energy source, cannot actively generate a reactive power instruction matched with the active output fluctuation of the renewable energy source, and does not have the capability of realizing the stabilization of the power grid voltage without the function of an active management system. In addition, the existing power grid voltage control method based on reinforcement learning only inputs a real-time power grid state to the intelligent agent, and does not utilize the active output predicted value of the renewable energy source, so that the action of the intelligent agent output is difficult to adapt to the rapid voltage fluctuation caused by the access of the renewable energy source. Therefore, a reactive power optimization and voltage control method based on reinforcement learning is provided to solve the problems that the existing reactive power optimization and voltage control method does not incorporate the active output predicted value of renewable energy sources and does not have the capacity of actively managing the nonfunctional capacity of the power system. The active output of the renewable energy source is predicted by using Informer model, so that reactive power optimization is avoided by only calculating the current moment, and the change of the future active output of the renewable energy source is ignored, thereby being beneficial to generating a reactive power instruction which is more matched with the future active output fluctuation of the renewable energy source. The near-end strategy is used for optimizing the actual measurement value and the predicted value of the active output of the renewable energy source by the intelligent agent, so that the reactive power optimization can obtain the capacity of actively managing the non-functional quantity of the power system, and the voltage of the power distribution network is favorably stabilized. By combining the Informer model with the near-end strategy optimization agent, the near-end strategy optimization agent can observe the fluctuation degree of the active output of the renewable energy source in the future, and is beneficial to the near-end strategy optimization agent to generate actions which are more matched with the fluctuation of the active output of the renewable energy source. Disclosure of Invention The invention provides a reactive power optimization and voltage control method based on reinforcement learning, which comprises the following steps in the using process: In the step (1) At moment, collecting meteorological data related to the active output of the photovoltaic power generation unit, collecting the active output of the photovoltaic power generation unit, and then constructing the collected meteorological data and the active outputTime Informer is input of sequence data;Time Informer is input of sequence dataThe method comprises the following steps: (1) In the formula, Is thatThe horizontal irradiance of the photovoltaic power station at the moment; Is that The horizontal irradiance of the photovoltaic power station at the moment; Is that The horizontal irradiance of the photovoltaic power station at the moment; Is that The photovoltaic power station at moment directly transmits irradiance to the normal direction; Is that The photovoltaic power station at moment directly transmits irradiance to the normal direction; Is that The photovoltaic power station at moment directly transmits irradiance to the no