CN-122001026-A - Distributed energy intelligent agent regulation and control method and system based on deep reinforcement learning

CN122001026ACN 122001026 ACN122001026 ACN 122001026ACN-122001026-A

Abstract

The invention discloses a distributed energy intelligent agent regulation and control method and system based on deep reinforcement learning, comprising the following steps of collecting distributed energy operation data and network constraint data in a micro-grid or park power distribution network, and executing pretreatment; constructing a reinforcement learning environment state representation vector, configuring a distributed energy agent set, acquiring a disturbance antigen fingerprint item set to generate an immunoaffinity score sequence, generating a normal regulation and control action instruction set and issuing to execute or enter a perturbation sampling process, generating a movable action domain and a perturbation action candidate set and executing exploration control contract verification, selecting a target micro-perturbation action and issuing to execute or generating a modified perturbation action and issuing to execute, and executing updating on a deep reinforcement learning strategy network. The invention adopts deep reinforcement learning and disturbance immune mechanism to realize the cooperative regulation and control of distributed energy intelligent agents, and has the advantages of high safety, strong adaptability and good stability.

Inventors

LIU XIAOHONG
CAO CHENXI
DU QINGSONG

Assignees

北京集联软件科技有限公司

Dates

Publication Date: 20260508
Application Date: 20260408

Claims (10)

1. The distributed energy intelligent body regulation and control method based on deep reinforcement learning is characterized by comprising the following steps of: collecting distributed energy operation data and network constraint data in a micro-grid or park power distribution network, and performing preprocessing to generate a standardized operation data set; constructing a reinforcement learning environment state representation vector based on the standardized operation data set, and forming a state sequence; mapping the state sequence to a distributed energy intelligent agent set, and defining an action switching frequency constraint and an action amplitude boundary; Calculating a disturbance fingerprint vector and a disturbance grade identifier based on the state sequence, and acquiring a disturbance antigen fingerprint item set to generate an immunoaffinity score sequence; generating an exploration trigger judgment result based on the immunoaffinity score sequence and the disturbance grade mark, generating a normal regulation and control action instruction set through a deep reinforcement learning strategy network and issuing and executing when the exploration trigger judgment result is forbidden to explore, and entering a micro disturbance sampling flow when the exploration trigger judgment result is triggered to explore; Under the condition of triggering exploration, generating a feasible action domain and a perturbation action candidate set, and executing exploration control contract verification to generate a contract meeting identification value and a violation positioning vector; When the contract meeting identification value is met, selecting a target micro-disturbance action and issuing and executing, and when the contract meeting identification value is not met, executing minimum correction projection on a micro-disturbance action candidate set according to the violation positioning vector, generating a correction micro-disturbance action and issuing and executing, and generating a disturbance response sample set; and updating the deep reinforcement learning strategy network based on the disturbance response sample set to generate a distributed energy intelligent agent regulation and control result.
2. The deep reinforcement learning-based distributed energy agent regulation and control method according to claim 1, wherein the distributed energy operation data comprises photovoltaic output data, wind power output data, energy storage charge and discharge power data, energy storage charge state data, charging pile load data, controllable load power data and reactive compensation device operation data, the network constraint data comprises node voltage data, line current carrier data, tie line trend data, frequency deviation data and communication delay data, and the preprocessing comprises time alignment, outlier rejection, missing value interpolation and normalization.
3. The deep reinforcement learning-based distributed energy agent regulation and control method according to claim 1, wherein the generating of the state sequence specifically comprises: Based on a standardized operation data set, splicing photovoltaic output data, wind power output data, energy storage charge and discharge power data, energy storage charge state data, charging pile load data, controllable load power data and reactive compensation device operation data according to a unified time index to obtain equipment side state components; Node voltage data, line current carrier data, tie line power flow data and frequency deviation data are extracted from the standardized operation data set, node level and line level aggregation is executed according to the space position identification and the measuring point identification, a voltage boundary approximation value, a current carrier boundary approximation value, a tie line power flow boundary approximation value and a frequency deviation representative value are calculated, and a network side constraint approximation component is generated; calculating fluctuation degree components and drift degree components of photovoltaic output data, wind power output data, charging pile load data, controllable load power data and communication delay data in a control period, and aligning to generate uncertainty disturbance components; Performing dimension consistency check on the equipment side state component, the network side constraint approximation component and the uncertainty disturbance component, and splicing to form a reinforcement learning environment state representation vector; and writing the reinforcement learning environment state representation vectors at successive moments into a state buffer queue according to the control period in time sequence to generate a state sequence.
4. The method for controlling a distributed energy agent based on deep reinforcement learning according to claim 1, wherein defining the motion switching frequency constraint and the motion amplitude boundary specifically comprises: Acquiring a distributed energy unit identifier corresponding to each reinforcement learning environment state representation vector in a state sequence, and establishing a distributed energy intelligent agent set based on the distributed energy unit identifiers; Performing agent mapping on the state sequence, and splitting the state sequence into a plurality of agent state subsequence sets according to the distributed energy unit identifiers; configuring an action channel set for each distributed energy intelligent agent, writing channel identifiers, channel dimension values and channel action object identifiers into the action channel set, and generating an action channel configuration result; Reading the action amplitude boundary of each channel according to the action channel configuration result to form an action amplitude boundary set; And generating an action switching frequency constraint according to the action channel configuration result.
5. The deep reinforcement learning-based distributed energy agent control method according to claim 1, wherein the generating of the immunoaffinity score sequence specifically comprises: Obtaining reinforcement learning environment state representation vectors corresponding to adjacent control periods in a state sequence, extracting components corresponding to photovoltaic output data, wind power output data, charging pile load data, controllable load power data and communication delay data, and generating disturbance calculation input sub-vectors; Calculating a load mutation characteristic value and a reproducible output force fluctuation characteristic value of the disturbance calculation input subvector in a single control period; extracting network side constraint approximation components from the state sequence, respectively generating node voltage boundary approximation characteristic values and line congestion approximation characteristic values, and calculating communication delay drift characteristic values from components corresponding to communication delay data; Sequentially splicing the load abrupt change characteristic value, the reproducible output fluctuation characteristic value, the node voltage boundary approximation characteristic value, the line congestion approximation characteristic value and the communication delay drift characteristic value to generate a disturbance fingerprint vector, and performing level mapping according to the disturbance fingerprint vector to generate a disturbance level identifier; Obtaining a disturbance antigen fingerprint entry set in a disturbance immune library, extracting a disturbance antigen fingerprint vector of each disturbance antigen fingerprint entry, and performing similarity calculation on the disturbance fingerprint vector and each disturbance antigen fingerprint vector to generate an immune affinity score sequence.
6. The distributed energy agent regulation and control method based on deep reinforcement learning according to claim 1, wherein the entering a perturbation sampling process specifically comprises: Obtaining an immunoaffinity score sequence and a disturbance grade mark, calculating a maximum immunoaffinity score and an average immunoaffinity score based on the immunoaffinity score sequence, and generating an affinity aggregation value set; comparing the affinity aggregation value set with an affinity threshold value to generate an unknown disturbance judgment value; performing level matching according to the disturbance level identification and the level threshold value to generate a high-risk disturbance judgment value, and combining the unknown disturbance judgment value and the high-risk disturbance judgment value to generate an exploration trigger judgment result; when the exploration triggering judgment result is that exploration is forbidden, extracting a reinforcement learning environment state representation vector corresponding to the current control period from a state sequence, mapping the reinforcement learning environment state representation vector to a distributed energy source intelligent body set according to a distributed energy source unit identifier, and generating an intelligent body input vector set; Inputting the intelligent agent input vector set into a deep reinforcement learning strategy network, generating a normal regulation and control action instruction set, and writing the normal regulation and control action instruction set into a regulation and control instruction queue; Performing validity screening on a normal regulation and control action instruction set in a regulation and control instruction queue according to the action amplitude boundary set and the action switching frequency constraint, generating a normal regulation and control action instruction set, and issuing and executing the normal regulation and control action instruction set; and when the exploration trigger judgment result is trigger exploration, entering a perturbation sampling flow.
7. The deep reinforcement learning-based distributed energy agent regulation and control method according to claim 1, wherein the generation of the contract satisfaction identification value and the default positioning vector specifically comprises: reading various operation constraints in a control period when the exploration trigger judgment result is trigger exploration, and extracting an action channel set, an action amplitude boundary set and an action switching frequency constraint; determining the channel value range of each channel according to the action amplitude boundary set, and generating an action amplitude feasible region by splicing according to the channel identification; combining the action switching frequency constraint and the action amplitude feasible domain to generate a feasible action domain; Extracting a network side constraint approximation component corresponding to a current control period and energy storage state of charge data from a state sequence, and writing a contract judgment input set; generating a perturbation motion candidate set for each distributed energy agent in the actionable domain; Calculating action amplitude cost values, action switching cost values and exploration duration cost values of the candidate actions of the candidate set of the micro-disturbance actions, combining the action amplitude cost values, the action switching cost values and the exploration duration cost values to obtain exploration cost values, comparing the exploration cost values with an exploration cost upper bound, screening out candidate actions which do not meet the exploration cost upper bound, and generating a cost constraint micro-disturbance action candidate set; And performing exploration control contract verification on the cost constraint micro-disturbance action candidate set according to the contract judgment input set, and respectively generating a voltage out-of-limit judgment value, a line thermal stability judgment value, a power balance judgment value, an energy storage charge state boundary judgment value, an action switching frequency judgment value and an exploration duration judgment value, thereby generating a contract satisfaction identification value and a violation positioning vector.
8. The deep reinforcement learning-based distributed energy agent regulation and control method according to claim 1, wherein the generation of the disturbance response sample set specifically comprises: acquiring contract meeting identification values and default positioning vectors, extracting candidate motion vector sets from the cost constraint perturbation motion candidate sets, and generating a perturbation motion set to be executed; When the contract meeting identification value is met, screening the micro-disturbance action set to be executed, outputting a target micro-disturbance action vector, and writing the target micro-disturbance action vector into a regulating and controlling instruction queue to generate a micro-disturbance regulating and controlling action instruction; When the contract meeting identification value is not met, locating the constraint violation position according to the constraint violation positioning vector, determining the constraint violation type, reading the action amplitude boundary set and the action switching frequency constraint, executing the minimum correction projection on the to-be-executed micro-disturbance action set, generating a correction micro-disturbance action vector, and writing the correction micro-disturbance action vector into a regulating command queue to generate a correction micro-disturbance regulating action command; Issuing a micro-disturbance regulation action instruction or a corrected micro-disturbance regulation action instruction to a corresponding distributed energy intelligent agent set and keeping execution, collecting distributed energy operation data and network constraint data in the execution process, and generating an execution feedback data set; And preprocessing the execution feedback data set, constructing an execution feedback state representation vector, and writing a disturbance response sample set in association with the micro-disturbance regulation action instruction or the corrected micro-disturbance regulation action instruction.
9. The method for regulating and controlling the distributed energy agent based on the deep reinforcement learning according to claim 1, wherein the generation of the regulating and controlling result of the distributed energy agent specifically comprises the following steps: extracting a reinforcement learning environment state representation vector, a micro disturbance regulation and control action instruction or a correction micro disturbance regulation and control action instruction and an execution feedback state representation vector from a disturbance response sample set to generate an update sample set; inputting the updated sample set into a deep reinforcement learning strategy network, calculating the network update amount according to the updated sample set, and executing parameter update to generate an updated deep reinforcement learning strategy network; and calling the updated deep reinforcement learning strategy network to perform reasoning, outputting a regulation and control action instruction set corresponding to each distributed energy intelligent agent, and issuing and executing the regulation and control action instruction set to generate a regulation and control result of the distributed energy intelligent agent.
10. A distributed energy agent regulation and control system based on deep reinforcement learning, which performs the distributed energy agent regulation and control method based on deep reinforcement learning as set forth in any one of claims 1 to 9, comprising: The data acquisition module is used for acquiring distributed energy operation data and network constraint data in a micro-grid or park power distribution network and performing preprocessing on the acquired data; the state representation construction module is used for constructing a reinforcement learning environment state representation vector and forming a state sequence; The intelligent agent mapping module is used for mapping the state sequence to the distributed energy intelligent agent set and defining action switching frequency constraint and action amplitude boundary; The disturbance immunity evaluation module is used for calculating disturbance fingerprint vectors and disturbance grade identifiers based on the state sequences, acquiring disturbance antigen fingerprint item sets and generating an immunoaffinity score sequence; The exploration trigger module is used for generating an exploration trigger judging result based on the immunoaffinity score sequence and the disturbance grade identification, generating a normal regulation and control action instruction set through the deep reinforcement learning strategy network and issuing the normal regulation and control action instruction set for execution when the exploration trigger judging result is forbidden to explore, and entering a micro disturbance sampling flow when the exploration trigger judging result is triggered to explore; the contract verification module is used for generating a feasible action domain and a perturbation action candidate set under the triggering exploration condition, executing exploration control contract verification and generating a contract meeting identification value and a violation positioning vector; The micro-disturbance executing module is used for selecting a target micro-disturbance action and issuing and executing when the contract meeting identification value is met, executing minimum correction projection on the candidate set of micro-disturbance actions according to the default positioning vector when the contract meeting identification value is not met, generating a correction micro-disturbance action and issuing and executing, and generating a disturbance response sample set; And the strategy updating module is used for updating the deep reinforcement learning strategy network based on the disturbance response sample set to generate a distributed energy agent regulation and control result.

Description

Distributed energy intelligent agent regulation and control method and system based on deep reinforcement learning Technical Field The invention relates to the field of distributed energy collaborative regulation, in particular to a distributed energy intelligent body regulation method and system based on deep reinforcement learning. Background Along with the large-scale access of the photovoltaic power generation units, the wind power generation units, the energy storage devices, the charging piles, the controllable loads and the reactive compensation devices in the micro-grid and park power distribution network, the distributed energy running state has the characteristics of strong volatility, complex coupling relation and changeable constraint conditions. In the prior art, the distributed energy units are generally scheduled through a rule base, model predictive control or a conventional reinforcement learning method so as to realize active power distribution, reactive support adjustment and energy storage charge and discharge control. However, in the prior art, most of static scheduling or single target optimization is focused, the joint change process of the distributed energy operation data and the network constraint data is difficult to simultaneously characterize, an effective recognition mechanism for unknown disturbance and high risk disturbance is lacked, the boundary of constraint voltage out-of-limit, line thermal stability, power balance and energy storage charge state cannot be synchronized in the exploration process, the regulation and control action out-of-limit, on-line exploration risk is higher, strategy updating stability is insufficient easily caused, and the safety cooperative regulation and control requirement of the distributed energy agent under a complex operation scene is difficult to meet. Disclosure of Invention The invention aims to provide a distributed energy intelligent agent regulation and control method based on deep reinforcement learning, which adopts a deep reinforcement learning and disturbance immune mechanism to realize the cooperative regulation and control of the distributed energy intelligent agent and has the advantages of high safety, strong adaptability and good stability. According to the embodiment of the invention, the distributed energy intelligent agent regulation and control method based on deep reinforcement learning comprises the following steps: collecting distributed energy operation data and network constraint data in a micro-grid or park power distribution network, and performing preprocessing to generate a standardized operation data set; constructing a reinforcement learning environment state representation vector based on the standardized operation data set, and forming a state sequence; mapping the state sequence to a distributed energy intelligent agent set, and defining an action switching frequency constraint and an action amplitude boundary; Calculating a disturbance fingerprint vector and a disturbance grade identifier based on the state sequence, and acquiring a disturbance antigen fingerprint item set to generate an immunoaffinity score sequence; generating an exploration trigger judgment result based on the immunoaffinity score sequence and the disturbance grade mark, generating a normal regulation and control action instruction set through a deep reinforcement learning strategy network and issuing and executing when the exploration trigger judgment result is forbidden to explore, and entering a micro disturbance sampling flow when the exploration trigger judgment result is triggered to explore; Under the condition of triggering exploration, generating a feasible action domain and a perturbation action candidate set, and executing exploration control contract verification to generate a contract meeting identification value and a violation positioning vector; When the contract meeting identification value is met, selecting a target micro-disturbance action and issuing and executing, and when the contract meeting identification value is not met, executing minimum correction projection on a micro-disturbance action candidate set according to the violation positioning vector, generating a correction micro-disturbance action and issuing and executing, and generating a disturbance response sample set; and updating the deep reinforcement learning strategy network based on the disturbance response sample set to generate a distributed energy intelligent agent regulation and control result. Optionally, the distributed energy operation data includes photovoltaic output data, wind power output data, energy storage charge and discharge power data, energy storage charge state data, charging pile load data, controllable load power data and reactive compensation device operation data, the network constraint data includes node voltage data, line current carrier data, tie line power flow data, frequency deviation data and communication time delay data, a