Search

CN-121192745-B - Network source cooperative primary frequency modulation optimization method and system based on reinforcement learning

CN121192745BCN 121192745 BCN121192745 BCN 121192745BCN-121192745-B

Abstract

The invention discloses a network source cooperative primary frequency modulation optimization method and a system based on reinforcement learning, and relates to the technical field of power dispatching and power generation, wherein the method comprises the steps of constructing a layered reinforcement learning framework, and outputting a dynamic frequency modulation weight vector by taking system frequency deviation, frequency modulation demand prediction and unit running state as inputs based on a high-level strategy network; based on the bottom layer strategy network, taking the dynamic frequency modulation weight vector and the real-time frequency deviation as input, outputting the output adjustment instruction of each frequency modulation unit, checking the output adjustment instruction in real time, and triggering a strategy re-planning mechanism to iteratively calculate a new instruction if the verification is not passed. The invention solves the technical problems of slow response speed and low frequency modulation precision caused by the fact that the existing frequency modulation method cannot effectively cope with frequency fluctuation caused by large-scale renewable energy access, and achieves the technical effects of realizing high-efficiency power grid frequency adjustment under renewable energy access and improving primary frequency modulation efficiency and precision through hierarchical reinforcement learning optimization.

Inventors

  • ZHANG CHUANFU
  • LI XIA
  • LIU YANG
  • Song Zhuoli
  • LIU HAICHEN
  • LI XIFENG
  • SUN YINGHE

Assignees

  • 国电电力发展股份有限公司
  • 国电电力发展股份有限公司和禹水电开发公司

Dates

Publication Date
20260512
Application Date
20250927

Claims (8)

  1. 1. The network source cooperative primary frequency modulation optimization method based on reinforcement learning is characterized by comprising the following steps of: constructing a hierarchical reinforcement learning framework, wherein the hierarchical reinforcement learning framework comprises a high-level strategy network and a bottom-level strategy network; based on the high-level strategy network, taking system frequency deviation, frequency modulation demand prediction and unit running state as inputs, and outputting a dynamic frequency modulation weight vector; Based on the bottom layer strategy network, taking the dynamic frequency modulation weight vector and the real-time frequency deviation as inputs, and outputting a power output adjustment instruction of each frequency modulation unit, wherein each unit corresponds to one heterogeneous sub-strategy network; checking the output adjustment instruction in real time through preset safety constraint, and executing frequency modulation control of each frequency modulation unit if the checking is passed; If the verification is not passed, triggering a strategy re-planning mechanism, regenerating a dynamic frequency modulation weight vector by the high-level strategy network, iteratively calculating a new instruction, and transmitting the new instruction to a speed regulator through network source cooperation primary frequency modulation optimization installation; constructing a hierarchical reinforcement learning framework comprising a high-level policy network and a low-level policy network, comprising: the high-level strategy network adopts an attention mechanism to dynamically capture the state characteristics of the key unit; The bottom policy network adopts a heterogeneous network architecture and comprises heterogeneous sub-policy networks aiming at different types of units, and each heterogeneous sub-policy network corresponds to one frequency modulation unit; a cross-layer information fusion channel is configured between the high-layer policy network and the bottom-layer policy network, and bidirectional transmission of weight vectors and state information is performed based on the cross-layer information fusion channel; based on the high-level strategy network, taking system frequency deviation, frequency modulation demand prediction and unit running state as inputs, outputting a dynamic frequency modulation weight vector, and comprising the following steps: collecting power grid frequency measurement data in real time, and calculating a system frequency deviation value; Acquiring frequency modulation demand prediction information of a dispatching center, and simultaneously acquiring operation state parameters of each frequency modulation unit; and inputting the system frequency deviation, the frequency modulation demand prediction and the unit running state into a high-level strategy network, and calculating and outputting a dynamic frequency modulation weight vector, wherein the dynamic frequency modulation weight vector comprises three dimensions of frequency recovery weight, economical efficiency weight and unit service life weight.
  2. 2. The reinforcement learning-based network source collaborative primary frequency modulation optimization method according to claim 1, wherein a mode of combining offline training and online fine tuning is adopted, a layering strategy is trained in a digital twin system first, and then the network source collaborative primary frequency modulation optimization method is migrated to an actual power grid for operation, and the method comprises the following steps of; Constructing a training environment based on a digital twin system, and training the high-level strategy network and the bottom-level strategy network in the training environment; and migrating the trained high-level strategy network and the trained bottom-level strategy network to an actual power grid system, and performing on-line primary frequency modulation optimization device parameter fine adjustment on the high-level strategy network and the bottom-level strategy network based on actual operation data.
  3. 3. The reinforcement learning-based network source cooperative primary frequency modulation optimization method of claim 1, wherein the system frequency deviation, the frequency modulation demand prediction and the unit operation state are input into a high-level strategy network, and the calculation and output of the dynamic frequency modulation weight vector comprises the following steps: Extracting key features of the running state of the unit through the attention mechanism, and generating feature attention weights; Carrying out space-time correlation analysis on the system frequency deviation and the frequency modulation demand prediction to generate a demand-deviation coupling coefficient; and fusing the characteristic attention weight and the demand-deviation coupling coefficient, and outputting a three-dimensional dynamic frequency modulation weight vector, wherein the weight value of each dimension is normalized by a Softmax function.
  4. 4. The reinforcement learning-based network source cooperative primary frequency modulation optimization method as claimed in claim 3, wherein outputting the output adjustment command of each frequency modulation unit based on the underlying strategy network by taking the dynamic frequency modulation weight vector and the real-time frequency deviation as inputs comprises: distributing the dynamic frequency modulation weight vector to each heterogeneous sub-strategy network; Each heterogeneous sub-strategy network receives real-time frequency deviation data; And calculating a reference output value of a corresponding unit based on the dynamic frequency modulation weight vector, and generating a final output adjustment instruction by combining the real-time frequency deviation.
  5. 5. The reinforcement learning-based network source cooperative primary frequency modulation optimization method of claim 4, wherein calculating the reference output value of the corresponding unit based on the dynamic frequency modulation weight vector and generating the final output adjustment command in combination with the real-time frequency deviation comprises: calling a corresponding heterogeneous sub-strategy network according to the unit type, inputting the frequency recovery weight of the dynamic frequency modulation weight vector, and calculating an initial power adjustment quantity; based on the initial power adjustment quantity, superposing differential terms of real-time frequency deviation for dynamic compensation, and generating a reference output value; and combining the economical weight of the dynamic frequency modulation weight vector and the service life weight of the unit, performing multi-objective optimization correction on the reference output value, and outputting a final output adjustment instruction meeting the climbing rate constraint of the unit.
  6. 6. The reinforcement learning-based network source collaborative primary frequency modulation optimization method of claim 1, wherein the real-time verification of the output adjustment command by a preset safety constraint comprises: The preset security constraint comprises a multi-stage verification channel, wherein the multi-stage verification channel comprises a pre-verification channel and a final verification channel before issuing when an instruction is generated; and activating a corresponding verification channel based on the real-time task stage, and verifying the output adjustment instruction in real time.
  7. 7. The reinforcement learning-based network source collaborative primary frequency modulation optimization method according to claim 6, wherein if the verification is not passed, triggering a strategy re-planning mechanism, regenerating a dynamic frequency modulation weight vector by the higher-level strategy network and iteratively calculating new instructions, comprising: Recording the type and degree of security constraint violation of the verification failure instruction, and generating a re-planning trigger signal; Feeding the re-planning trigger signal back to a high-level strategy network through a cross-layer information fusion channel, and adjusting a weight distribution strategy of an attention mechanism; And regenerating a dynamic frequency modulation weight vector based on the corrected unit state characteristics and the updated frequency modulation demand prediction, and starting iterative computation of a bottom layer strategy network until a safety verification passing instruction is output.
  8. 8. Network source cooperation primary frequency modulation optimizing system based on reinforcement learning, which is characterized in that the system comprises: The learning framework construction module is used for constructing a layered reinforcement learning framework which comprises a high-level strategy network and a bottom-level strategy network; the global frequency modulation analysis module is used for outputting a dynamic frequency modulation weight vector by taking system frequency deviation, frequency modulation demand prediction and unit running state as inputs based on the high-level strategy network; The detail output analysis module is used for outputting output adjustment instructions of all frequency modulation units based on the bottom policy network by taking the dynamic frequency modulation weight vector and the real-time frequency deviation as inputs, wherein each unit corresponds to one heterogeneous sub-policy network; The real-time safety verification module is used for verifying the output adjustment instruction in real time through preset safety constraint, and executing frequency modulation control of each frequency modulation unit if verification is passed; the strategy re-planning module is used for triggering a strategy re-planning mechanism if the verification is not passed, regenerating a dynamic frequency modulation weight vector by the high-level strategy network, iteratively calculating a new instruction, and transmitting the new instruction to a speed regulator through network source cooperation primary frequency modulation optimization installation; Further, in the learning frame construction module: The system comprises a high-level policy network, a bottom-level policy network, a weight vector and state information bidirectional transmission device, a first-level policy network, a second-level policy network, a third-level policy network, a first-level policy network, a second-level policy network and a third-level policy network, wherein the high-level policy network adopts an attention mechanism to dynamically capture the state characteristics of key units; The global frequency modulation analysis module is further configured to perform the following steps: The method comprises the steps of acquiring power grid frequency measurement data in real time, calculating a system frequency deviation value, acquiring frequency modulation demand prediction information of a dispatching center, simultaneously acquiring operation state parameters of each frequency modulation unit, inputting the system frequency deviation, the frequency modulation demand prediction and the unit operation state into a high-level strategy network, and calculating and outputting a dynamic frequency modulation weight vector, wherein the dynamic frequency modulation weight vector comprises three dimensions of frequency recovery weight, economic weight and unit service life weight.

Description

Network source cooperative primary frequency modulation optimization method and system based on reinforcement learning Technical Field The invention relates to the technical field of power dispatching and power generation, in particular to a network source collaborative primary frequency modulation optimization method and system based on reinforcement learning. Background With the rapid development of renewable energy sources, the complexity and challenges of grid frequency modulation are greatly increased by the access of intermittent energy sources such as wind energy, solar energy and the like. Traditional frequency adjustment methods generally rely on adjustment of large thermal power generating units, which have slow response speed and high operation cost, and are difficult to effectively cope with frequently-fluctuating power grid frequencies. In addition, the frequency modulation requirements of modern power systems are increasingly complex, and the coordination work of a plurality of units is involved, so that the traditional reinforcement learning method often lacks a targeted strategy in the power grid frequency adjustment, and is difficult to simultaneously consider the real-time performance, the economical efficiency and the unit health of the frequency adjustment. Disclosure of Invention The application provides a network source collaborative primary frequency modulation optimization method and system based on reinforcement learning, which are used for solving the technical problems of low response speed and low frequency modulation precision caused by the fact that the conventional frequency modulation method cannot effectively cope with frequency fluctuation caused by large-scale renewable energy source access. The first aspect of the application provides a network source collaborative primary frequency modulation optimization method based on reinforcement learning, which comprises the steps of constructing a hierarchical reinforcement learning framework, wherein the hierarchical reinforcement learning framework comprises a high-level strategy network and a bottom-level strategy network, outputting dynamic frequency modulation weight vectors based on the high-level strategy network by taking system frequency deviation, frequency modulation demand prediction and unit running state as inputs, outputting output adjustment instructions of all frequency modulation units based on the bottom-level strategy network by taking the dynamic frequency modulation weight vectors and real-time frequency deviation as inputs, wherein each unit corresponds to one heterogeneous sub-strategy network, performing real-time verification on the output adjustment instructions through preset safety constraint, executing frequency modulation control of all frequency modulation units if verification is not passed, triggering a strategy weight planning mechanism if verification is not passed, regenerating dynamic frequency modulation weight vectors by the high-level strategy network, iteratively calculating new instructions, and transmitting the new instructions to a speed regulator through network source collaborative primary frequency modulation optimization device. The network source collaborative primary frequency modulation optimization system based on reinforcement learning comprises a learning framework construction module, a global frequency modulation analysis module, a detail output analysis module and a real-time safety verification module, wherein the learning framework construction module is used for constructing a hierarchical reinforcement learning framework, the hierarchical reinforcement learning framework comprises a high-level strategy network and a bottom-level strategy network, the global frequency modulation analysis module is used for taking system frequency deviation, frequency modulation demand prediction and unit running state as input based on the high-level strategy network, outputting dynamic frequency modulation weight vectors, the detail output analysis module is used for taking the dynamic frequency modulation weight vectors and real-time frequency deviation as input based on the bottom-level strategy network, outputting output regulation instructions of all frequency modulation units, each unit corresponds to one heterogeneous sub-strategy network, the real-time safety verification module is used for carrying out real-time verification on the output regulation instructions through preset safety constraint, if verification is passed, executing frequency modulation control of all frequency modulation units, and the strategy re-planning module is used for triggering a strategy re-planning mechanism if verification is not passed, generating dynamic frequency modulation weight vectors again by the high-level strategy network, iterating new frequency modulation weight vectors, and sending new instructions to a speed regulator through network collaborative primary optimization. One or more techni