CN-122021044-A - Multi-agent simulation system for bidding decision of electric power market

CN122021044ACN 122021044 ACN122021044 ACN 122021044ACN-122021044-A

Abstract

The invention discloses a multi-agent simulation system for an electric power market bidding decision, which comprises an environment simulation module and a multi-agent module, wherein the environment simulation module comprises a market clearing unit and a market settlement unit which are used for executing a market clearing algorithm and performing financial settlement to generate a reward signal and a market environment state, the multi-agent module comprises a plurality of agents, each agent consists of an agent strategy network and an agent action decoder, the multi-agent module is used for receiving the reward signal and the market environment state information fed back by the environment simulation module and performing autonomous strategy learning and bidding decision, and the agent action decoder comprises an original action receiving and analyzing unit, a normalization and scale mapping unit, a constraint projection and structure integration unit and a compliance quotation output unit. The invention obviously improves the convergence speed and the strategy economy of multi-agent reinforcement learning in the electric power market simulation.

Inventors

SONG YUHUI
LIU ZHANHONG
JING CHAOXIA
PAN ZHANHUA

Assignees

华南理工大学
广州智联聚能科技有限公司

Dates

Publication Date: 20260512
Application Date: 20260210

Claims (9)

1. A multi-agent simulation system for power market bid decisions, comprising: The system comprises an environment simulation module, a market clearing unit, a price calculating unit and a price calculating unit, wherein the environment simulation module is used for simulating a core transaction and settlement flow of an electric power market and comprises a market clearing unit and a market settlement unit, the market clearing unit is used for receiving bidding information submitted by market participants, executing a preset market clearing algorithm to determine a market clearing result comprising a winning amount and a clearing price, and the market settlement unit is used for determining the market clearing result to carry out financial settlement according to the market clearing unit, generating a reward signal for driving the update of an agent bidding strategy and generating market environment state information comprising the market clearing result and financial settlement information; The system comprises an environment simulation module, a multi-agent module, a low-dimensional control point parameter vector, an agent action decoder and a market clearing unit, wherein the environment simulation module is used for receiving a reward signal and market environment state information fed back by the environment simulation module, performing autonomous strategy learning and bid decision, generating bid information and submitting the bid information to the market clearing unit, the multi-agent module comprises a plurality of independent agents, each agent represents a market participant, each agent comprises an agent strategy network and an agent action decoder, the agent strategy network is used for performing internal agent bid strategy learning and updating according to the reward signal received from the environment simulation module, performing bid decision calculation based on the received market environment state information, outputting a low-dimensional control point parameter vector representing the agent bid strategy, and the agent action decoder is used for receiving the low-dimensional control point parameter vector, converting the low-dimensional control point parameter vector into a high-dimensional segmented bid curve which completely accords with the electric power market bidding specification through a data decoding process based on Bezier curve mathematical characteristics and submitting the low-dimensional control point parameter vector to the market clearing unit.
2. The multi-agent simulation system for power market bid decision of claim 1 wherein the agent action decoder comprises: The original action receiving and analyzing unit is used for receiving the low-dimensional control point parameter vector output by the agent policy network, and performing differential transformation on the low-dimensional control point parameter vector by adopting softplus functions to obtain a positive sequence with all components being positive values; The normalization and scale mapping unit is used for mapping the positive value sequence into a group of effective control point sequences for defining a Bezier curve, the effective control point sequences realize price boundary constraint through fixing head and tail control points, and intermediate control points are constructed through accumulating the normalized positive value sequences so as to realize monotonically increasing conditions; The constraint projection and structure integration unit is used for receiving the effective control point sequence, generating a Bezier curve through a Bernstein polynomial, and determining a continuous bidding curve function of an intelligent agent on the basis of the mathematical characteristics of the Bezier curve, wherein the Bezier curve can strictly monotonically increase conditions and price boundary constraints; And the compliance quotation output unit is used for carrying out uniform discrete sampling on the continuous bidding curve function, wherein the sampling points are equal to the number of bidding sections specified by the market, the function value corresponding to each sampling point is used as the quotation of the corresponding bidding section, so that a final high-dimensional sectional bidding curve is generated, and the final high-dimensional sectional bidding curve is returned to the environment simulation module.
3. The multi-agent simulation system for power market bid decision according to claim 2, wherein the preset market clearing algorithm executed by the market clearing unit is a safety constraint unit combination algorithm and a safety constraint economic dispatch algorithm, the safety constraint unit combination algorithm and the safety constraint economic dispatch algorithm are based on a mixed integer linear programming model, the mixed integer linear programming model is solved by calling a business solver to realize power market clearing, and a market clearing result comprising the winning bid amount and the clearing price of each agent is determined.
4. The multi-agent simulation system for power market bid decision of claim 3 wherein the specific process of the market settlement unit performing financial settlement includes calculating the net settlement benefit of each agent as a reward signal driving its policy update according to the market clearing result, and generating market environmental status information for characterizing the running condition of the market based on the market clearing result and the financial settlement information.
5. The multi-agent simulation system for power market bid decision of claim 4 wherein the agent policy network is a deep reinforcement learning algorithm based policy model that updates internal agent bid policy parameters based on the reward signals and calculates based on the market environmental state information, outputting a low-dimensional control point parameter vector characterizing the agent bid policy.
6. The multi-agent simulation system for power market bid decision of claim 5 wherein the raw action receiving and parsing unit performs the following operations: a. receiving a low-dimensional control point parameter vector output by the agent policy network, wherein the dimension of the low-dimensional control point parameter vector is n-1, which is smaller than the number of segments B of a high-dimensional segmented bidding curve conforming to the bidding specification of an electric power market, so as to realize dimension reduction of an action space; b. Transforming the low-dimensional control point parameter vector by adopting softplus functions to obtain a specific mathematical process of the positive value sequence, wherein the specific mathematical process comprises the following steps: Setting the low-dimensional control point parameter vector output by the agent strategy network as Wherein Represents an n-1 dimensional real number vector, Representation of Is used for carrying out differential transformation on the low-dimensional control point parameter vector by adopting softplus functions to obtain the positive value sequence The specific mathematical expression of (2) is: ; In the formula, As a function of softplus, For transformed positive sequences The differentiable transformation ensures that each component obtained is always greater than zero and that a smooth, continuous sequence of differentiable positive values is provided over the entire domain.
7. The multi-agent simulation system for power market bid decision of claim 6 wherein the normalization and scaling mapping unit performs the following operations: a. the specific process of meeting the boundary constraint by fixing the head-to-tail control points comprises the following steps: a1, definition An effective control point sequence of an n-order Bezier curve, wherein Effective control points The abscissa of (2) is a fixed normalized electric quantity value, and the ordinate is a quotation; a2, defining a price interval specified by a market as Wherein As a lower limit of the price to be achieved, Is the upper price limit; a3, fixing the first control point of the Bezier curve Is the lower price limit, and fixes the tail control point of Bezier curve The ordinate of (2) is the upper price limit, and the mathematical expression is: ; The abscissa of the head and tail control points is fixed to be 0 and 1, and the starting point and the end point of the generated Bezier curve are ensured to accurately fall in the boundary of the price interval by fixing the head and tail control points; b. the specific process for constructing the intermediate control point by accumulating the normalized positive value sequence comprises the following steps: b1, calculating the sum S of positive value sequences, wherein the mathematical expression is as follows: ; b2, constructing an intermediate control point by accumulating the normalized positive value sequence, wherein the mathematical expression is as follows: ; In the formula, Is the ordinate of the ith effective control point; and mapping the positive value sequence into a group of effective control point sequences which strictly meet monotonically increasing conditions and price boundary constraints through the original action receiving and analyzing unit.
8. The multi-agent simulation system for power market bid decision of claim 7, wherein the specific process of the constraint projection and structure integration unit generating a bezier curve-based continuous bid curve function according to the binstein polynomial equation is as follows: Based on the sequence of active control points Generating an n-order Bezier curve based on a Bernstein polynomial formula, and determining the n-order Bezier curve as a continuous bidding curve function of the intelligent agent The mathematical expression is: ; wherein t is a curve parameter, and the value range [0,1] corresponds to the complete path from the starting point to the end point of the curve; is a Bernstant polynomial, and the mathematical expression is as follows: ; In the formula, Is a binomial coefficient; The continuous bidding curve function passes through the first effective control point at t=0, i.e. By the last active control point at t=1, i.e The above process ensures that the entire curve is located inside the convex hull formed by the sequence of valid control points, thereby ensuring that the curve meets the boundary constraints.
9. The multi-agent simulation system for power market bidding decisions of claim 8, wherein the specific process of generating the high-dimensional segmented bidding curve conforming to the power market bidding specification by the compliance bid output unit through uniform discrete sampling is as follows: uniformly sampling curve parameter t in interval [0,1], sampling point The definition is as follows: ; in the k-th section price quotation corresponding price Taking the function value of the continuous bidding curve function at the sampling point The method comprises the following steps: ; finally, a high-dimensional segmented bidding curve is output And combining bidding intervals corresponding to the high-dimensional discrete bidding vectors to form a high-dimensional segmented bidding curve which accords with the market monotonicity constraint and the boundary constraint.

Description

Multi-agent simulation system for bidding decision of electric power market Technical Field The invention relates to the technical field of electric power market simulation and multi-agent reinforcement learning, in particular to a multi-agent simulation system for electric power market bidding decision. Background With the continuous deepening of the construction of novel power systems with new energy as main bodies, the structure, main bodies and transaction mechanisms of the power market become more complex. The large-scale grid connection of intermittent power sources such as wind power, photovoltaic and the like and the emergence of novel market bodies such as independent energy storage, virtual power plants, flexible loads and the like enable the behavior strategies of market participants to present high heterogeneity, dynamic property and interactivity. In order to understand the dynamic evolution law of the market in depth, evaluate the effectiveness of new market mechanism design, predict participant strategy behavior and prevent market risks, simulation methods based on Multi-agent systems (Multi-AGENT SYSTEM, MAS) have become an indispensable core tool for electric power market research. Under the multi-Agent simulation framework, traditional market participants such as power generators, electricity-selling companies, large users and the like are modeled as agents (agents) with autonomous sensing, decision-making and learning capabilities. Each agent typically employs a reinforcement learning (Reinforcement Learning, RL) algorithm that learns optimal bidding or trading strategies through continuous interaction with the environment (i.e., simulated market) with the goal of maximizing its own long-term benefits (e.g., profits). The method can effectively simulate the dynamic process that participants in the real market make independent decisions and game each other based on local information, thereby providing dynamic balance and strategy interaction insight which are difficult to obtain by the traditional optimization model. However, when high-fidelity electric power market rules are introduced into large-scale multi-agent reinforcement learning simulation, two major technical challenges are faced: First, a "dimension disaster" of the action space. To truly reflect the operation of the power market, an agent is typically required to submit a high-dimensional segmented bidding curve, i.e., a supply or demand curve composed of a plurality of price-electricity pairs arranged in a monotonically non-decreasing order. For an agent that needs to submit a B-segment bid, the dimension of its original action space is at least B (if only bid) or 2B (if bid amounts), and the value of B is typically large (e.g., 10 segments or more). When the system comprises tens or even hundreds of intelligent agents, the dimension of the joint action space expands exponentially, so that the training efficiency of the reinforcement learning algorithm is extremely low, the convergence is slow, and even the reinforcement learning algorithm cannot converge, and the scale and the practicability of the simulation system are severely restricted. Second, the "hard constraint" of the market rules is satisfied. Electric market bidding has strict rule constraints, with the most central being monotonicity constraints and price boundary constraints. The standard reinforcement learning agent outputs unconstrained continuous or discrete values, which cannot guarantee that the motion vectors output by the standard reinforcement learning agent naturally meet the complex engineering and economic constraints. The traditional solution mostly adopts a 'post-processing' mode, for example, after the intelligent agent outputs, the intelligent agent outputs actions are forced to meet monotonicity through a projection algorithm (such as isotonic regression PAVA), and boundary constraint is met through cutting operation. The method has the obvious defects that 1) post-processing operation is often not tiny, a gradient reflux path from final market benefit (rewards) to strategy network parameters is cut off, so that an intelligent agent cannot directly learn how to generate 'compliance and good' actions through gradient signals, training is unstable and strategy is suboptimal, and 2) extra calculation cost is increased. In summary, it is difficult to ensure the reality of the simulation and realize the high efficiency and feasibility of the large-scale multi-agent reinforcement learning simulation in the prior art. Therefore, an innovative system architecture is urgently needed, and market rules can be embedded in the decision process of the intelligent agents, so that intelligent dimension reduction and automatic constraint satisfaction of an action space are fundamentally realized, and large-scale, high-efficiency and high-fidelity multi-agent simulation of the electric power market is supported. It is to this urgent need that the present invention