CN-121983981-A - Direct-current transmitting end transient voltage rise suppression system and method based on reinforcement learning
Abstract
The invention discloses a direct current transmission end transient voltage rise suppression system and method based on reinforcement learning, and belongs to the technical field of high-voltage direct current transmission. The system comprises a transient state sensing module, a dynamic power grid partitioning module, a reinforcement learning control module and an execution module, wherein the transient state sensing module is used for collecting running data of a direct current end and extracting transient characteristics, the running data of the direct current end and the transient characteristics are respectively transmitted to the dynamic power grid partitioning module and the reinforcement learning control module, the dynamic power grid partitioning module is used for realizing dynamic partitioning based on the transient characteristics and outputting partition information to the reinforcement learning control module, the reinforcement learning control module is used for carrying a depth residual reinforcement learning algorithm, the transient characteristics and the partition information are used as input and outputting optimal control instructions, and the execution module responds to the control instructions and inhibits transient voltage rising through multi-equipment cooperative actions. By adopting the system and the method, the response is quick, the self-adaption is strong, the inhibition precision is high, secondary faults can be avoided, and the engineering practicability is strong.
Inventors
- WANG XUEBIN
- ZHANG JIE
- WEN XISHAN
- CHEN XIAOYUE
- YIN XIYU
- ZHANG LINYU
- DING YUJIE
- FU GUOBIN
- SONG RUI
- WANG SHENGJIE
- WANG SHENGFU
- YANG KAIXUAN
- ZHAO DONGNING
- ZHAO JINCHAO
Assignees
- 国网青海省电力公司电力科学研究院
- 国网青海省电力公司
- 武汉大学
Dates
- Publication Date
- 20260505
- Application Date
- 20260120
Claims (10)
- 1. The direct current transmission end transient voltage rise suppression system based on reinforcement learning is characterized by comprising a transient state sensing module, a dynamic power grid partitioning module, a reinforcement learning control module and an execution module; The transient state sensing module collects running data of a direct current end and extracts transient characteristics, the running data and the transient characteristics of the direct current end are respectively transmitted to the dynamic power grid partitioning module and the reinforcement learning control module, the dynamic power grid partitioning module realizes dynamic partitioning based on the transient characteristics and outputs partition information to the reinforcement learning control module, the reinforcement learning control module carries a deep residual reinforcement learning algorithm, the transient characteristics and the partition information are taken as input to output optimal control instructions, and the execution module responds to the control instructions and suppresses transient voltage rising through multi-equipment cooperative actions.
- 2. The direct current transmission terminal transient voltage increase suppression system based on reinforcement learning of claim 1, wherein the transient state sensing module comprises a broadband voltage sensor, a current sensor, a synchronous phasor measurement unit and a data preprocessing unit, wherein the broadband voltage sensor and the current sensor acquire direct current transmission terminal bus voltage signals And an inverter outlet current signal Simultaneously, synchronous acquisition and transmission are carried out through a synchronous phasor measurement unit 、 The data preprocessing unit adopts the Mexico cap wavelet base pair to collect 、 Performing wavelet transformation to remove noise and extract transient voltage mutation rate Peak value of current surge Duration of transient state The formula of wavelet transformation is as follows: ; Wherein, the As a scale factor of the dimensions of the device, In order for the translation factor to be a factor, In order to input a signal to the device, Is a mexico cap wavelet basis function.
- 3. The reinforcement learning-based direct current transmission end transient voltage rise suppression system of claim 1, wherein the dynamic power grid partitioning module comprises an edge computing unit and a topology analysis chip, the edge computing unit computes node energy interaction degree through a high-order priori energy model, and the energy interaction formula is as follows: ; Wherein, the 、 As a node state variable, For the level of energy exchange of the node, For the hybrid energy node to be homogeneous, For the energy interaction overrun penalty factor, Is an energy interaction threshold; The topology analysis chip realizes dynamic partitioning through the minimization of the Markov random field energy function and positions the transient voltage rise weak area, and the energy function formula is as follows: ; Wherein, the Is an energy supply and demand balance item, In order to partition the energy demand terms, To call energy storage item, weight coefficient 。
- 4. The direct current transmission end transient voltage increase suppression system based on reinforcement learning according to claim 1, wherein the reinforcement learning control module comprises an FPGA chip, a bidirectional target network storage unit and an experience playback buffer zone, wherein the FPGA chip is provided with a depth residual error reinforcement learning algorithm, the response time of real-time decision in transient state is less than or equal to 50 mu s, the depth residual error reinforcement learning algorithm adopts a bidirectional target network stability training process, and a residual error gradient item is introduced The algorithm is used for compensating function approximation errors, optimizing critic network updating, setting training convergence conditions to be that average reward fluctuation of continuous 1000 pieces episode is less than or equal to 5% through epsilon-greedy strategy, and the algorithm updating formula is as follows: ; Wherein, the In order to adapt the rate of learning to the user, As the residual weight of the signal, the signal is, Greedy actions output for reinforcement learning actor networks; The bidirectional target network storage unit stores parameters of the forward target network and the reverse target network, and corrects the action cost function through bidirectional value propagation The experience playback buffer stores more than or equal to 10 6 transient scene data, and the balance algorithm tries a new control strategy and simultaneously uses a historical effective strategy.
- 5. The direct current transmission end transient voltage increase suppression system based on reinforcement learning of claim 1, wherein the execution module comprises a modularized multi-level converter, a superconducting current controller and an adaptive reactor group, wherein the cooperative response time of the modularized multi-level converter, the superconducting current controller and the adaptive reactor group is less than or equal to 200 mu s, the modularized multi-level converter adjusts IGBT switching frequency according to a control instruction, and the superconducting current controller is based on transient current peak value Setting a current limit value The current limiting formula is as follows: ; the reactive compensation quantity of the self-adaptive reactor group is adjusted through a soft switching technology, the energy unbalance degree of the subareas is balanced, and the response time is less than or equal to 50 mu s.
- 6. The Direct Current (DC) terminal transient voltage rise suppression system based on reinforcement learning according to claim 1, further comprising a safety guarantee module, wherein the safety guarantee module is in bidirectional communication with the transient state sensing module, the reinforcement learning control module and the execution module, the safety guarantee module comprises an operation state monitoring unit, a control instruction checking unit and a safety response unit, the operation state monitoring unit collects system parameters in real time, the system parameters comprise bus voltage, converter current, equipment temperature and reactive compensation quantity, a safety threshold of the system parameters is set, the voltage threshold is 0.9-1.05 times of rated voltage, the current threshold is 1.1 times of rated current, the equipment temperature threshold is less than or equal to 85 ℃, the reactive compensation quantity threshold is +/-20% of rated reactive capacity of the system, and early warning signals are triggered in real time when the threshold is exceeded; The control instruction checking unit compares the consistency of the real-time control instruction with the matching instruction in the library by constructing a history optimal instruction library, and judges the control instruction as an invalid instruction when the similarity is less than 85%; the safety response unit comprises a verification instruction logic conflict and instruction execution risk, and when logic conflict or high risk is detected, the abnormal control signal is cut off in real time and the instruction is regenerated by combining the reinforcement learning control module.
- 7. The direct-current transmission end transient voltage increase suppression system based on reinforcement learning as set forth in claim 3, wherein the energy interaction threshold is Based on the rated capacity of the system And reference capacity Setting, namely, the following formula is satisfied: 。
- 8. The reinforcement learning-based DC link transient voltage rise suppression system of claim 4, wherein said deep residual reinforcement learning algorithm constructs a Markov decision process with a state space And an action space The formula of (2) is: ; Wherein, the In order to be able to vary the voltage, In order to partition the degree of energy imbalance, For the power of the load, For the line inductance to be a function of the line inductance, For the modulation factor of a modular multilevel converter, For the number of switching groups of the self-adaptive reactor, Is a superconducting current limiting value.
- 9. The method of reinforcement learning-based DC link transient voltage increase suppression system of any one of claims 1-8, wherein the computer-readable storage medium drives the transient state sensing module to collect DC link operational data and extract transient characteristics, and the operational data and the transient characteristics are transmitted to the dynamic power grid partitioning module and the reinforcement learning control module, respectively, when Multiple rated voltage and When the method is used, starting a suppression flow; The control dynamic power grid partitioning module realizes dynamic partitioning based on the received transient characteristics, and then outputs partitioning information to the reinforcement learning control module; Invoking a reinforcement learning control module to take transient characteristics and partition information as input, and outputting an optimal control instruction through a carried depth residual reinforcement learning algorithm; The instruction execution module responds to the control instruction through the cooperative action of the modularized multi-level converter, the superconducting current controller and the self-adaptive reactor group to inhibit transient voltage rise; enabling a safety guarantee module to monitor the running state of the system and the effectiveness and safety of a control instruction in real time, and when the parameter exceeds a threshold value or the instruction is invalid and has logic conflict, cutting off an abnormal control signal in real time and triggering safety early warning; And after the transient state sensing module receives the feedback voltage recovery condition and the suppression is successful, the reinforcement learning control module is driven to store effective data and update algorithm parameters.
- 10. The method of claim 9, wherein the computer readable storage medium stores a computer program, and the computer program, when executed by the processor, controls the transient state sensing module, the dynamic power grid partitioning module, the reinforcement learning control module, the execution module and the safety guarantee module to work cooperatively.
Description
Direct-current transmitting end transient voltage rise suppression system and method based on reinforcement learning Technical Field The invention relates to the technical field of high-voltage direct-current transmission, in particular to a direct-current transmission end transient voltage rise suppression system and method based on reinforcement learning. Background With the large-scale grid connection of new energy sources, the direct current end system presents low inertia and strong nonlinear characteristics, the problem of transient voltage rise (TransientVoltageRise, TVR) frequently occurs, and the safety of equipment and the stability of the system are seriously threatened. The traditional suppression method has the defects of low response speed, poor self-adaptive capacity, dependence on an accurate mathematical model and the like, such as adding a shunt reactor, optimizing a converter control strategy and the like, and is difficult to cope with complex and changeable transient scenes. Reinforcement learning (ReinforcementLearning, RL) uses model-free self-adaptive advantages to control the brand-new angle of the power system, but the existing application has the problems of unstable training, unbonded power grid topological characteristics, insufficient transient scene generalization capability and the like, and a high-efficiency inhibition scheme for fusing power grid partition perception and advanced reinforcement learning algorithms is needed. Disclosure of Invention The invention aims to provide a direct current transmission end transient voltage rise suppression system and method based on reinforcement learning, which solve the technical problems. In order to achieve the above purpose, the invention provides a direct current transmission end transient voltage rise suppression system and method based on reinforcement learning, comprising a transient state sensing module, a dynamic power grid partitioning module, a reinforcement learning control module and an execution module; The transient state sensing module collects running data of a direct current end and extracts transient characteristics, the running data and the transient characteristics of the direct current end are respectively transmitted to the dynamic power grid partitioning module and the reinforcement learning control module, the dynamic power grid partitioning module realizes dynamic partitioning based on the transient characteristics and outputs partition information to the reinforcement learning control module, the reinforcement learning control module carries a deep residual reinforcement learning algorithm, the transient characteristics and the partition information are taken as input to output optimal control instructions, and the execution module responds to the control instructions and suppresses transient voltage rising through multi-equipment cooperative actions. Preferably, the transient state sensing module comprises a broadband voltage sensor, a current sensor, a synchronous phasor measurement unit and a data preprocessing unit, wherein the broadband voltage sensor and the current sensor acquire a direct current terminal bus voltage signalAnd an inverter outlet current signalSimultaneously, synchronous acquisition and transmission are carried out through a synchronous phasor measurement unit、The data preprocessing unit adopts the Mexico cap wavelet base pair to collect、Performing wavelet transformation to remove noise and extract transient voltage mutation ratePeak value of current surgeDuration of transient stateThe formula of wavelet transformation is as follows: ; Wherein, the As a scale factor of the dimensions of the device,In order for the translation factor to be a factor,In order to input a signal to the device,Is a mexico cap wavelet basis function. Preferably, the dynamic power grid partitioning module comprises an edge computing unit and a topology analysis chip, wherein the edge computing unit computes node energy interaction degree through a high-order priori energy model, and the energy interaction formula is as follows: ; Wherein, the 、As a node state variable,For the level of energy exchange of the node,For the hybrid energy node to be homogeneous,For the energy interaction overrun penalty factor,Is an energy interaction threshold; The topology analysis chip realizes dynamic partitioning through the minimization of the Markov random field energy function and positions the transient voltage rise weak area, and the energy function formula is as follows: ; Wherein, the Is an energy supply and demand balance item,In order to partition the energy demand terms,To call energy storage item, weight coefficient。 Preferably, the reinforcement learning control module comprises an FPGA chip, a bidirectional target network storage unit and an experience playback buffer zone, wherein the FPGA chip is carried with a depth residual reinforcement learning algorithm, the response time of real-time decision in a trans