CN-121998207-A - Multi-agent power distribution network scheduling method and device based on information dynamic aggregation

CN121998207ACN 121998207 ACN121998207 ACN 121998207ACN-121998207-A

Abstract

The invention provides a multi-agent distribution network scheduling method and a device based on information dynamic aggregation, which relate to the technical field of power system operation control and are characterized in that a distribution network is divided into a plurality of control areas and a scheduling optimization model is established, and then converting the data into a Markov game model, constructing reinforcement learning agents for distributed optimization, constructing a distributed learning architecture, and independently executing decision and model updating by each agent based on local observation, so that global information sharing is avoided, and the data privacy of each region is effectively ensured. And dynamically encrypting the local observation variable at the current moment by utilizing each optimized reinforcement learning agent to generate each encrypted local information, so that the information encryption of each agent in the information interaction process is realized, and the privacy leakage risk is further reduced. Real-time scheduling is completed by dynamically generating aggregation weights, so that the cooperative efficiency and decision performance of multiple agents are improved, and the problems of privacy leakage risk and low cooperative efficiency caused by a static aggregation strategy are solved.

Inventors

MA XIANG
HUANG YINQIANG
HUANG YUANJIE
LIU HAOTIAN
WU XUEFENG
CHEN WENJIN
SHEN XI
HUANG JIANFENG
ZHENG RAN
SONG XIN
LU HANG
QI YUCHEN

Assignees

国网浙江省电力有限公司金华供电公司
清华四川能源互联网研究院

Dates

Publication Date: 20260508
Application Date: 20260410

Claims (12)

1. The multi-agent power distribution network scheduling method based on information dynamic aggregation is characterized by comprising the following steps of: Dividing a power distribution network into a plurality of control areas, wherein each control area contains distributed resources, establishing a power distribution network dispatching optimization model with the aim of minimizing the running cost of the distributed resources, and setting constraint conditions; Converting the power distribution network dispatching optimization model into a Markov game model comprising a plurality of agents, wherein each agent corresponds to a control area, and each agent takes a local observation variable as input and a local control instruction as output; Constructing corresponding reinforcement learning agents for each agent in the Markov game model; Performing distributed reinforcement learning based on a reinforcement learning algorithm to optimize the Markov game model, and obtaining reinforcement learning agents corresponding to each optimized control area; And dynamically encrypting the local observation variable at the current moment by utilizing the optimized reinforcement learning intelligent agents according to the acquired global information at the previous moment, generating each encrypted local information, dynamically aggregating the information of each encrypted local information through dynamic generation of aggregation weights to obtain global information at the current moment for cooperative perception of the intelligent agents, generating a local control instruction of a corresponding area based on the encrypted local information, and completing real-time scheduling of the power distribution network.
2. The information dynamic aggregation-based multi-agent power distribution network scheduling method according to claim 1, wherein the distributed resources comprise micro gas turbines, distributed photovoltaics and distributed energy storage, the constraint conditions comprise equipment constraint, voltage constraint and tide constraint, and the equipment constraint comprises micro gas turbine equipment constraint, distributed photovoltaics equipment constraint and distributed energy storage equipment constraint.
3. The information dynamic aggregation-based multi-agent power distribution network scheduling method according to claim 1, wherein the step of converting the power distribution network scheduling optimization model into a markov game model including a plurality of agents comprises: Defining each control area as an intelligent agent, wherein the number of the intelligent agents is the same as that of the control areas of the power distribution network; defining a local observation variable of an intelligent agent at the current moment, wherein the local observation variable comprises load of a local area node, output of a micro gas turbine, distributed photovoltaic output, distributed energy storage output and energy storage charge state; Defining an action variable of the intelligent agent at the current moment, wherein the action variable comprises a local control instruction and is used for controlling controllable equipment in a local area distributed resource; and defining a reward function of the intelligent agents at the current moment, wherein the reward function comprises an operation cost item and a voltage out-of-limit penalty item, and the intelligent agents share the same reward function to realize cooperation.
4. The method for scheduling a multi-agent power distribution network based on dynamic aggregation of information according to claim 1, wherein the step of constructing a corresponding reinforcement learning agent for each agent in the markov game model comprises: for each of the agents, Constructing a local measurement embedded network, which is used for mapping a local observation variable at the current moment into a first implicit characteristic vector; constructing a global information embedded network, and mapping the global information at the previous moment into a second implicit characteristic vector; An information encryption network is constructed and used for fusing the first implicit characteristic vector at the current moment and the second implicit characteristic vector at the last moment and outputting a local information vector encrypted at the current moment; Constructing a strategy network, which is used for generating a local action variable at the current moment according to the local information vector encrypted at the current moment; constructing a value network for estimating an expected accumulated discount rewards according to the local information vector encrypted at the current moment and the local action variable at the current moment; And constructing an entropy coefficient network for adaptively outputting entropy coefficients according to the encrypted local information vector.
5. The method for dispatching the multi-agent power distribution network based on the dynamic information aggregation as claimed in claim 4, wherein the steps of dynamically encrypting the local observation variable at the current time according to the global information obtained at the previous time by using each optimized reinforcement learning agent, generating each encrypted local information, dynamically aggregating the information of each encrypted local information by dynamically generating an aggregation weight to obtain the global information at the current time for each agent to cooperatively perceive, generating a local control instruction of a corresponding area based on the encrypted local information, and completing the real-time dispatching of the power distribution network comprise the following steps: At the current moment, each optimized agent generates an encrypted local information vector through a local measurement embedded network, a global information embedded network and an information encryption network according to a local observation variable at the current moment and global information at the last moment, which are acquired in real time; Carrying out information dynamic aggregation on the encrypted local information vectors of all reinforcement learning agents through dynamic generation of aggregation weights to obtain global information at the current moment for each agent to cooperatively sense and encrypt and aggregate at the next moment; Each intelligent agent inputs the encrypted local information vector into a strategy network to generate a local control instruction of a corresponding area; and all the agents issue control instructions to finish the power distribution network dispatching at the current moment.
6. The method for scheduling a multi-agent power distribution network based on dynamic aggregation of information according to claim 4, wherein before the step of dynamically aggregating information of each encrypted local information by dynamically generating an aggregation weight to obtain global information at a current time for collaborative perception of each agent, the method further comprises: Setting a global aggregator and setting clusters consisting of a plurality of control areas according to geographic position relation or electric connection tightness, wherein each cluster is provided with a cluster aggregator which is an independent edge computing node or is doubled by an agent in a certain control area in the cluster; the cluster aggregator receives encrypted local information sent by all control areas in the cluster, carries out first aggregation, generates cluster feature vectors and sends the cluster feature vectors to the global aggregator; And the global aggregator carries out secondary aggregation on all cluster feature vectors to generate global information.
7. The method for dispatching the multi-agent power distribution network based on the dynamic information aggregation according to claim 5, wherein the step of dynamically generating the aggregation weight to dynamically aggregate the encrypted local information vector of each reinforcement learning agent to obtain global information at the current moment for each agent to cooperatively sense and cryptographically aggregate at the next moment comprises the following steps: calculating cosine similarity between the local information vector encrypted at the current moment of each agent and global information at the previous moment, and taking the cosine similarity as dynamic weight of global contribution of the agent; each intelligent agent broadcasts the local information vector and the corresponding cosine similarity thereof after being encrypted at the current moment; and carrying out weighted average on the local information vector encrypted at the current moment of each agent according to the cosine similarity to obtain global information at the current moment for each agent at the next moment to cooperatively sense so as to obtain the local information encrypted at the next moment and calculate the dynamic weight at the next moment, thereby completing information encryption and information dynamic aggregation.
8. The multi-agent power distribution network scheduling method based on information dynamic aggregation according to claim 2, wherein, The micro gas turbine equipment constraint comprises a micro gas turbine active force constraint, a micro gas turbine reactive force constraint and a climbing constraint, wherein the micro gas turbine active force constraint is that the active force of the micro gas turbine on a current node of a current area at the current moment is within an upper limit and a lower limit of the active force of the micro gas turbine on the current node of the current area, the micro gas turbine reactive force constraint is that the reactive force of the micro gas turbine on the current node of the current moment is within an upper limit and a lower limit of the reactive force of the micro gas turbine on the current node of the current area, and the climbing constraint is that the change amplitude of the reactive force of the micro gas turbine on the current node of the current area at two adjacent moments is not more than the upper limit of the climbing rate of the micro gas turbine on the current node of the current area.
9. The method for dispatching the multi-agent power distribution network based on dynamic information aggregation according to claim 2, wherein the constraint of the distributed photovoltaic equipment comprises a capacity constraint, specifically that the sum of squares of active power output and reactive power output of the distributed photovoltaic on the current node of the current area at the current moment does not exceed the square of the installed capacity of the distributed photovoltaic on the current node of the current area.
10. The multi-agent distribution network scheduling method based on information dynamic aggregation according to claim 2, wherein the distributed energy storage device constraint comprises an energy storage active force constraint, a state of charge upper limit constraint and a state of charge transfer constraint, wherein the energy storage active force constraint is that active force of distributed energy storage on a current node of a current area at a current moment is within an active force upper limit and a lower limit range of distributed energy storage on the current node of the current area, and the state of charge upper limit and the lower limit constraint are that state of charge of distributed energy storage on the current node of the current area at the current moment is within a state of charge upper limit and a state of charge of distributed energy storage on the current node of the current area; The state of charge transfer constraint is: When the distributed energy storage is in a charging state, the charging state of the distributed energy storage on the current node of the current area at the current moment is equal to the charging state of the distributed energy storage on the current node of the current area at the previous moment minus the product of the active output and the charging efficiency of the distributed energy storage on the current node of the current area at the current moment; when the distributed energy storage is in a discharging state, the charge state of the distributed energy storage on the current node of the current area at the current moment is equal to the charge state on the current node of the current area at the last moment minus the ratio of the active power output of the distributed energy storage on the current node of the current area at the current moment to the discharging efficiency.
11. The method for scheduling a multi-agent power distribution network based on dynamic aggregation of information according to claim 4, wherein in the step of performing distributed reinforcement learning based on reinforcement learning algorithm, the distributed reinforcement learning is performed by using an improved SAC algorithm, and specifically comprises: each intelligent agent interacts with the power distribution network environment, and interaction samples are stored in respective experience pools; randomly extracting training samples from the experience pool, and respectively calculating a value loss function, a strategy loss function and an entropy loss function of each agent; Updating parameters of a local measurement embedded network, a global information embedded network and an information encryption network based on the value loss function and the strategy loss function; updating parameters of a strategy network based on the strategy loss function; Updating parameters of a value network based on the value loss function; updating parameters of an entropy coefficient network based on the entropy loss function; repeating the steps until the policies of the intelligent agents are converged.
12. The utility model provides a multi-agent distribution network dispatch device based on information dynamic aggregation which characterized in that includes: the model building module is used for dividing the power distribution network into a plurality of control areas, wherein each control area comprises distributed resources, a power distribution network scheduling optimization model is built with the aim of minimizing the running cost of the distributed resources, and constraint conditions are set; the model conversion module is used for converting the power distribution network dispatching optimization model into a Markov game model comprising a plurality of intelligent agents, each intelligent agent corresponds to a control area, and each intelligent agent takes a local observation variable as input and takes a local control instruction as output; the model reinforcement module is used for constructing corresponding reinforcement learning agents for each agent in the Markov game model; The model optimization module is used for carrying out distributed reinforcement learning based on a reinforcement learning algorithm so as to optimize the Markov game model and obtain reinforcement learning agents corresponding to each optimized control area; the real-time scheduling module is used for dynamically encrypting the local observation variable at the current moment according to the obtained global information at the previous moment by utilizing the optimized reinforcement learning agents, generating the local information after encryption, dynamically generating the aggregation weight to dynamically aggregate the information of the local information after encryption to obtain the global information at the current moment for the cooperative sensing of the agents, generating the local control instruction of the corresponding area based on the local information after encryption, and completing the real-time scheduling of the power distribution network.

Description

Multi-agent power distribution network scheduling method and device based on information dynamic aggregation Technical Field The invention relates to the technical field of operation control of power systems, in particular to a multi-agent power distribution network scheduling method and device based on information dynamic aggregation. Background The traditional centralized scheduling mode faces the problems of single point of failure, high communication pressure, slow response speed, poor expansibility and the like. In order to improve the flexibility and expandability of regulation and control, the distributed regulation and control architecture is used for realizing the efficient collaborative management of wide-area distributed resources, and the requirements of safe, economic and stable operation of the power distribution network are met. However, the distributed power generation resources have huge quantity and various types, if an optimization method based on a physical model is adopted, an accurate model of each device needs to be maintained and global coordination is carried out, so that modeling and maintenance costs are high, and model accuracy is greatly influenced by parameter uncertainty. For this reason, reinforcement learning methods that do not require an accurate model are employed as viable paths. By deploying local agents in each area, autonomous decision making and collaborative optimization are realized by utilizing a multi-agent reinforcement learning algorithm, so that the centralized modeling burden can be reduced, the dynamic change of a system can be adapted, and a new data-driven paradigm is provided for efficient regulation and control of large-scale distributed resources. At present, a distributed regulation and control method of a power distribution network based on multi-agent reinforcement learning still has obvious defects in the design of a cooperative mechanism. The existing communication architecture relies on explicit sharing of observation or action information among agents, and part of the methods even require global state visibility or neural network parameter synchronization. The information sharing mechanism is extremely easy to expose sensitive operation data of areas governed by all the main bodies, such as load fluctuation, power generation output, energy storage states and the like, has privacy disclosure risks, and is difficult to meet the requirements of multiple main bodies on data safety and privacy protection in a power system. On the other hand, the mainstream multi-agent reinforcement learning method based on information aggregation generally adopts a fixed weight or simple average mode to fuse the information of each agent, and lacks the dynamic perception capability on the running situation of the system. The mechanism cannot distinguish between critical information and redundant information, resulting in important regulatory instructions being diluted during the aggregation process and invalid or duplicate information being overstretched. Under the scene of rapid change of the running state of the power distribution network, the static aggregation strategy is difficult to adapt to dynamic requirements, and the cooperative efficiency and decision performance of the multi-agent system are restricted. Disclosure of Invention In order to solve the technical problems of privacy disclosure risk and low cooperative efficiency caused by static aggregation strategies in the prior art, the invention provides a multi-agent power distribution network scheduling method and device based on information dynamic aggregation, which realizes independent decision and model update of each agent based on local observation by constructing a distributed learning architecture, the global information sharing is avoided, meanwhile, the local observation variable is dynamically encrypted in the information interaction process, and the aggregation weight is dynamically generated based on the system running state to carry out information aggregation, so that the implicit coordination with low cost and high information utilization rate is realized while the data privacy of each region is ensured. In order to achieve the above purpose, the present invention provides the following technical solutions: The invention provides a multi-agent power distribution network scheduling method based on information dynamic aggregation, which comprises the steps of dividing a power distribution network into a plurality of control areas, wherein each control area comprises distributed resources, establishing a power distribution network scheduling optimization model with the aim of minimizing the running cost of the distributed resources, setting constraint conditions, converting the power distribution network scheduling optimization model into a Markov game model comprising a plurality of agents, wherein each agent corresponds to one control area, each agent takes a local observation variable as i