CN-115411728-B - Multi-micro-grid system coordination control method integrating Q learning and potential game

CN115411728BCN 115411728 BCN115411728 BCN 115411728BCN-115411728-B

Abstract

A multi-micro-grid system coordination control method integrating Q learning and potential game belongs to the technical field of micro-grid coordination control, solves the problem of how to achieve multi-micro-grid coordination control by taking micro-grid income maximization and inter-micro-grid output balance as targets, and builds the multi-micro-grid system coordination control method integrating reinforcement learning and potential game based on a multi-micro-grid distributed coordination architecture and a potential game optimization strategy. The method comprises the steps of fully utilizing the distributed characteristic of potential game, regarding each micro-grid as an intelligent body, adopting a distributed coordination control structure, establishing a potential game model for the purpose of maximally improving and balancing the economy of a single micro-grid system and an integral multi-micro-grid system, then taking a Q learning algorithm of reinforcement learning as a carrier, and fusing the potential game and the reinforcement learning algorithm by a parameter transmission method, thereby obtaining an optimal Nash equilibrium solution, improving optimizing performance, improving the economy of the multi-micro-grid system, and realizing the benefit balance of the integral system and individuals in the system.

Inventors

LIU WEI
ZHANG SICONG

Assignees

南京理工大学

Dates

Publication Date: 20260505
Application Date: 20220926

Claims (9)

1. A multi-micro-grid system coordination control method integrating Q learning and potential game is characterized by comprising the following steps: s1, constructing a target optimization decision model for maximizing the output benefit and balancing the output of a micro-grid under a multi-micro-grid distributed game architecture, and setting a power balance constraint condition and a micro-grid output constraint condition; S2, carrying out linear weighting processing on the target optimization decision to obtain a local payment function, further designing a global potential function and a local utility function which meet a potential equation, establishing a potential game strategy set, and constructing a potential game model with a distributed characteristic; s3, fusing potential game control and a Q learning algorithm in a parameter transmission mode, solving a potential game model to obtain a game optimization result and analyzing, wherein the method specifically comprises the following steps of: (a) Firstly, initializing game parameters and Q values, discretizing a potential game strategy set, transmitting the strategy set to a Q learning state set, (B) Design of power variation value by considering rated capacity of micro-grid and avoiding system instability caused by overlarge power fluctuation Q learning action set composed of P; (c) Collecting information of neighbor micro-grids, calculating a utility function of each micro-grid, transmitting the utility function value to instant rewards in a Q learning algorithm, and updating a Q value in the Q learning algorithm; (d) Selecting an optimal action by adopting a greedy strategy, updating a state value according to the selected action, and transmitting the state value to a game optimization strategy; (e) Judging whether Nash equilibrium is achieved, if so, continuing the next step, otherwise, returning to the step (c); (f) And (c) judging whether convergence conditions are met, if yes, obtaining a final micro-grid output plan, otherwise, returning to the step (c).
2. The coordinated control method of a multi-micro grid system integrating Q learning and potential game according to claim 1, wherein the construction method of the optimization decision model in step S1 is as follows: 1) The net benefit of maximizing the microgrid output benefit is: (1) Where F 1,i is the net return in the output return of the microgrid, P i is the output of the microgrid i in the multi-microgrid system, The unit electricity price is m i which is the i output cost coefficient of the micro-grid; 2) Minimizing a power difference between each micro grid and a neighboring micro grid in the multi-micro grid system to balance each micro grid output, wherein an objective function is as follows: (2) Wherein F 2,i is the power difference between the micro grid I and the adjacent micro grid j, I i is the adjacent set of the micro grid I, and P j is the output of the adjacent micro grid j of the micro grid I.
3. The method for coordinated control of a multi-microgrid system with Q learning and potential game fusion according to claim 2, wherein the power balance constraint condition and the microgrid output constraint condition in step S1 are specifically as follows: (3) Wherein P load is the total load of the multi-micro-grid system, N is the potential game participant set, P i,max is the rated capacity of the micro-grid i, and N MG is the number of micro-grids in the multi-micro-grid system.
4. The coordinated control method of a multi-micro grid system for fusing Q learning and potential game according to claim 3, wherein the method of linear weighting processing in step S2 is as follows: (4) Wherein F i (P i ,P -i ) is a local payment function of the micro-grid i, P -i is the output of other micro-grids except the micro-grid i in the multi-micro-grid system, And Respectively the weighting coefficients of the different objective functions.
5. The method for coordinated control of a multi-microgrid system with integrated Q-learning and potential gaming according to claim 4, wherein the global potential function in step S2 is as follows The formula of (2) is as follows: (5) The formula of the local utility function is as follows: (6) Wherein, the As a local utility function, F j (P i ,P -i ) is a local payment function for the neighbor microgrid j of the microgrid i.
6. The coordinated control method of a multi-micro grid system integrating Q learning and potential gaming according to claim 5, wherein the design method of the potential gaming strategy set in step S2 is as follows: (1) Designing a potential game strategy set according to the micro-grid output constraint The method comprises the following steps: (7) (2) The potential game strategy obtained by solving is required to be within the capacity limit of the micro-grid, and meanwhile, the power balance constraint of the multi-micro-grid system is also required to be met.
7. The coordinated control method of a multi-microgrid system integrating Q-learning and potential gaming according to claim 6, wherein the discrete interval length of the potential gaming strategy set in step (a) is as follows The method comprises the following steps: (8) Wherein M is the number of divided intervals, and P max and P min are determined by the upper and lower limits of the potential game strategy set.
8. The coordinated control method of a multi-microgrid system integrating Q learning and potential gaming according to claim 7, wherein the formula for updating the Q value in the Q learning algorithm in the step (c) is as follows: (9) Wherein P i epsilon A is the action value of each step in Q learning, alpha epsilon 0,1 is the learning rate of the Q learning algorithm, and gamma epsilon 0,1 is the discount parameter; for the k + 1Q iteration, For the value of Q iteration at the kth time, For the output variation value of the ith micro grid, Is the utility function value at the kth time of the ith micro-grid, The output change value corresponding to the maximum Q value at the kth iteration of the ith micro-grid, For the ith micro-grid pass And (5) a force value after the change.
9. The coordinated control method of a multi-microgrid system integrating Q learning and potential gaming according to claim 8, wherein the formula for selecting the optimal action by using a greedy strategy in the step (d) is as follows: (10) Wherein, the For optimal actions selected using a greedy strategy.

Description

Multi-micro-grid system coordination control method integrating Q learning and potential game Technical Field The invention belongs to the technical field of micro-grid coordinated control, and relates to a multi-micro-grid system coordinated control method integrating Q learning and potential gaming. Background With rapid development of renewable energy technology and large-scale high penetration of distributed energy in a power distribution network, a single micro-grid system is gradually transformed to a multi-micro-grid system. The multi-micro-grid has higher reliability and can effectively improve the in-situ capacity of renewable energy sources, but because of large scale, high complexity and diversified investment bodies, the traditional centralized control method is difficult to meet the control requirements, and the whole benefits of the system and the individual benefits in the system are difficult to balance, and the reference "A multiagent-based hierarchical energy management strategy for multi-microgrids considering adjustable power and demand response"(V.H.Bui,etc.,IEEE Transactions on Smart Grid 9.2(2018):1323-1333); is used for researching a multi-micro-grid distributed coordination control method for effectively coordinating the economic relationship between the whole and the individual and improving the system economy. The reinforcement learning is mainly to interact with the environment through the intelligent agent so as to continuously improve the self behavior, the intelligent agent selects actions to act on the environment to obtain feedback of environmental rewards or punishments, and selects the next action according to the feedback and environmental changes, so that the actions beneficial to the target are reserved, and the actions unfavorable to the target are deleted. The Q learning algorithm is an offline control algorithm based on value function iteration in reinforcement learning, and the principle is that a Q value table containing previous experience is used as an initial value of subsequent iterative computation, so that the convergence time of the algorithm is shortened. Potential Gaming (PG) is a subclass of non-cooperative gaming, and was first proposed by Monderer and Shapely in 1996. The method maps the variation of the individual benefits into the potential functions, when the individual increases the individual benefits through adjusting the strategy, the value of the potential functions also increases synchronously, and the Nash equilibrium solution can be indirectly obtained by solving the maximum value or the maximum value of the potential functions. The potential game has a distributed characteristic, is suitable for solving a distributed optimization problem, has limited improvement characteristics (finite improvement properties, FIP), and has pure strategy Nash balance in each finite potential game, so that the potential game has great advantages in terms of algorithm complexity and calculation amount. In the prior art, the coordinated game optimization of the multi-micro-grid system mainly adopts traditional methods such as master-slave game, kuno-oligopolite game and the like. For example, literature "Economic optimization method of multi-stakeholder in a multi-microgrid system based on Stackelberg game theory"(Q.Wu,etc.,Energy Reports 8(2022):345-351) proposes a method for optimizing energy management of a micro-grid system based on a Stackelberg game, and literature "Cournot oligopoly game-based local energy trading considering renewable energy uncertainty costs"(Y.J.Zhang,etc.,Renewable Energy 159.3(2020):1117-1127) uses a Coulopolite game for the electric market to improve transactions between power generation companies and customs or balance profits between multiple supplies, but the methods have the problems that the methods are difficult to be matched with a distributed optimization control method or a Nash equilibrium solving process is complex. Literature "A Potential Game Approach to Distributed Operational Optimization for Microgrid Energy Management with Renewable Energy and Demand Response"(J.Zeng,etc.,IEEE Transactions on Industrial Electronics 66.6(2019):4479-4489) uses potential gaming for full-distributed operation optimization of a micro-grid energy management system, but when the number of game participants is large and the strategy set is large, the calculation amount of solution is still large, and the algorithm solution effect still needs to be improved. The literature (Liu Hong and the like, a key laboratory of a smart grid education department, tianjin university and 2019 month) discloses a grid-connected integrated energy micro-grid coordination scheduling model and method based on multi-main game and reinforcement learning, which aims at the problems that the traditional centralized optimization scheduling method is difficult to comprehensively reflect benefit requirements of different intelligent agents in an integrated energy micro