CN-121192798-B - Deep reinforcement learning-based multi-agent cooperative management method for energy storage power station

CN121192798BCN 121192798 BCN121192798 BCN 121192798BCN-121192798-B

Abstract

The invention discloses an energy storage power station multi-agent cooperative management method based on deep reinforcement learning, and relates to the technical field of intelligent control. The multi-agent collaborative management method for the energy storage power station based on deep reinforcement learning comprises the steps of S1, collecting energy storage collaborative awareness data, preprocessing the energy storage collaborative awareness data, S2, dividing three types of control units, evaluating the operation collaborative deviation degree of the energy storage power station, dynamically selecting action strategies of the control units and constructing an action value evaluation model, S3, constructing a state, action and rewarding triplet sample, iterating and optimizing the action value evaluation model round by round, S4, deploying the control units into a simulation operation environment, executing action decision and state interaction feedback, and judging whether a strategy correction mechanism is triggered after each action decision is executed. The problems that the prior energy storage power station is not timely coordinated with each subsystem, and the battery state estimation error is difficult to timely feed back, so that strategy lag and equipment operation unbalance are caused are solved.

Inventors

SUN HAIWANG
ZHANG HAORAN
LIU XINGNAN
LI XUEQIANG
LIU SHENGCHUN
QIN GUOQIANG
ZHOU MENG
WANG XINGHAO
ZHU HONGJUAN
LIU JIAHAO

Assignees

天津提尔科技有限公司

Dates

Publication Date: 20260512
Application Date: 20251124

Claims (7)

1. The multi-agent cooperative management method for the energy storage power station based on deep reinforcement learning is characterized by comprising the following steps of: S1, collecting energy storage cooperative sensing data, and performing time alignment, abnormal rejection, standardization and normalization on the energy storage cooperative sensing data to obtain preprocessed energy storage cooperative sensing data; s2, inputting the preprocessed energy storage cooperative sensing data into a battery management control unit, an energy management control unit and a data acquisition and monitoring control unit, evaluating the operation cooperative deviation degree of an energy storage power station, dynamically selecting action strategies of the control units, and constructing an action value evaluation model for each control unit based on an evaluation result and the action strategies; The method comprises the specific steps of inputting the preprocessed energy storage collaborative awareness data into a battery management control unit, an energy management control unit and a data acquisition and monitoring control unit, and evaluating the operation collaborative deviation degree of an energy storage power station, wherein the specific steps are as follows: The preprocessed energy storage collaborative sensing data is input into a multi-control unit architecture and is divided into three control units according to functions, namely a battery management control unit, an energy management control unit and a data acquisition and monitoring control unit, and battery state estimation, electric quantity scheduling and equipment operation monitoring tasks are correspondingly executed respectively; Calculating the voltage average value of all batteries and the temperature average value of all batteries, dividing the voltage average value of all batteries by the voltage average value of all batteries subtracted by the voltage single body voltage of the ith battery, and squaring the comparison value to obtain a voltage deviation factor, dividing the temperature average value of all batteries subtracted by the temperature average value of all batteries subtracted by the voltage average value of the ith battery, squaring the comparison value to obtain a temperature deviation factor, adding the temperature deviation factor to the voltage deviation factor to obtain a single state fluctuation factor, dividing the absolute value of battery current by the rated current of the batteries to obtain a load intensity factor, subtracting the rated frequency of the power grid from the power grid frequency to obtain an absolute value, dividing the absolute value by the rated frequency of the power grid to obtain a power grid disturbance factor, multiplying the load intensity factor by the power grid disturbance factor to obtain an operation disturbance amplification factor, and multiplying the single state fluctuation factor by the operation disturbance amplification factor to obtain a single operation cooperative deviation term; the method comprises the following specific steps of dynamically selecting action strategies of each control unit, and constructing an action value evaluation model for each control unit based on evaluation results and the action strategies: Comparing the operation cooperative deviation evaluation value with a cooperative deviation threshold value in real time, and dynamically switching an action strategy selection logic, wherein when the operation cooperative deviation evaluation value is smaller than or equal to the cooperative deviation threshold value, the coordination of the operation state of the current energy storage power station is judged, and each control unit selects an action strategy with the aim of power efficiency; Constructing a motion value evaluation model for each control unit based on a deep neural network by combining the operation collaborative deviation evaluation value and the energy storage collaborative awareness data to form a state vector and taking a motion strategy as a motion vector; S3, extracting energy storage collaborative perception data and operation collaborative deviation evaluation results, constructing a disturbance risk evaluation value, extracting a state vector and an action vector, constructing a state, action and rewarding triplet sample, performing forward reasoning and backward correction on the action value evaluation model, and iteratively optimizing the action value evaluation model round by round; The specific steps of extracting the energy storage collaborative awareness data and the operation collaborative deviation evaluation result and constructing the disturbance risk evaluation value are as follows: Setting a sliding time window with a fixed length as a training period, extracting an energy storage collaborative sensing data sequence and an operation collaborative deviation evaluation value sequence in each training period, acquiring a maximum charge and discharge power value, a minimum charge and discharge power value, a maximum battery cell temperature value and a minimum battery cell temperature value, and calculating a battery temperature average value, a battery pack current average value and an operation collaborative deviation evaluation value average value; Dividing the maximum value of the charge and discharge power by the minimum value of the charge and discharge power, dividing the minimum value of the charge and discharge power by the rated power of the battery, adding one to the ratio, and taking the natural logarithm to obtain a power fluctuation coefficient, dividing the maximum value of the battery temperature by the average value of the battery temperature, obtaining a temperature difference distribution coefficient, dividing the absolute value of the average value of the battery current by the rated current of the battery, obtaining a load intensity coefficient, adding one to the average value of the running cooperative deviation evaluation value, obtaining a cooperative disturbance correction coefficient, and multiplying the power fluctuation coefficient, the temperature difference distribution coefficient, the load intensity coefficient and the cooperative disturbance correction coefficient in sequence to obtain a disturbance risk evaluation value; And S4, loading regulation and control targets for each control unit, deploying the control targets into a simulated operation environment, executing action decisions and state interaction feedback, and after each action decision is executed, evaluating the deviation degree of the current operation state and the action strategy, and judging whether to trigger a strategy correction mechanism or not so as to realize action strategy optimization closed loop.
2. The deep reinforcement learning-based energy storage power station multi-agent collaborative management method according to claim 1, wherein the specific steps of collecting energy storage collaborative awareness data are as follows: And collecting energy storage cooperative sensing data in the operation process of the energy storage power station, wherein the energy storage cooperative sensing data comprises battery cell voltage, battery cell temperature, battery pack current, battery cycle times, environment temperature, power grid voltage, power grid frequency, transformer temperature and charge and discharge power.
3. The deep reinforcement learning-based energy storage power station multi-agent collaborative management method according to claim 1, wherein the specific steps of performing time alignment, abnormal rejection, normalization and normalization processing on the energy storage collaborative awareness data to obtain preprocessed energy storage collaborative awareness data are as follows: The method comprises the steps of processing energy storage collaborative perception data through an interpolation filling method based on uniform sampling period alignment, aligning the data from different acquisition sources to a consistent time sequence, screening the energy storage collaborative perception data through an anomaly identification method based on sliding window differential detection and physical range constraint, identifying and eliminating instantaneous mutation or out-of-range data records, carrying out numerical transformation on the energy storage collaborative perception data through a maximum and minimum standardization method based on data type grouping, unifying numerical scales and distribution characteristics of different channels, uniformly mapping the energy storage collaborative perception data through a symmetrical interval normalization method, and mapping all numerical data into a uniform numerical range.
4. The deep reinforcement learning-based energy storage power station multi-agent collaborative management method according to claim 1, wherein the specific steps of extracting a state vector and an action vector, constructing a state, action and rewarding triplet sample, performing forward reasoning and backward correction on an action value evaluation model, and iteratively optimizing the action value evaluation model round by round are as follows: taking the disturbance risk evaluation value as a reward signal, extracting a state vector and a motion vector, constructing a state, motion and reward triplet sample, and storing the state, motion and reward triplet sample into an experience sample pool; adopting a circulating sample extraction strategy, extracting a state, action and rewarding triplet sample from an experience sample pool according to batches, and utilizing a state vector as input to perform forward reasoning according to the current action strategy so as to output an action evaluation value; The method comprises the steps of calculating deviation values of an action evaluation value and a disturbance risk evaluation value to obtain a loss amount, executing backward correction operation on the action value evaluation model by taking the loss amount as a training basis, updating the action value evaluation model to finish model training in a current training period, and entering the next iteration after each training period is finished, and continuously training the action value evaluation model until convergence.
5. The method for collaborative management of multiple agents of an energy storage power station based on deep reinforcement learning of claim 1, wherein the specific steps of loading a regulation and control target for each control unit, deploying the regulation and control target into a simulated operation environment, and executing action decision and state interaction feedback are as follows: Based on the action value evaluation model after training, a scheduling behavior target is configured for the control unit, wherein the battery management control unit loads a control strategy taking monomer thermal control and state balance as targets, the energy management control unit loads a power distribution strategy taking economic benefit and battery life as balance targets, and the data acquisition and monitoring control unit loads an operation monitoring strategy taking operation boundary constraint as a core; After policy loading and scope configuration are completed, the control unit is deployed into a simulation running environment, and action decision and state interaction feedback are executed, wherein the simulation running environment comprises a battery cluster assembly, a bidirectional converter assembly and a power grid interface assembly.
6. The method for collaborative management of multiple agents in an energy storage power station based on deep reinforcement learning of claim 1, wherein the specific steps of evaluating the deviation degree between the current running state and the action strategy after each action decision is performed are as follows: Extracting a state vector in a current training period after each action decision is executed, performing reasoning based on an action value evaluation model, outputting an action evaluation value, extracting a battery cell voltage output value and a battery cell temperature output value from the action evaluation value, subtracting a corresponding voltage measurement value from the voltage output value of an ith battery cell to obtain a voltage deviation value, subtracting a corresponding temperature measurement value from the temperature output value of the ith battery cell to obtain an absolute value to obtain a temperature deviation value, adding the voltage deviation value and the temperature deviation value of each battery cell to obtain a state deviation value of the ith battery cell, summing the state deviation values of all battery cells and dividing the state deviation values by the number of the battery cells to obtain an average state deviation value, subtracting a rated frequency of a power grid from a power grid, obtaining a frequency disturbance coefficient from the power grid frequency, and multiplying the average state deviation value by the frequency disturbance coefficient to obtain a state drift evaluation value.
7. The method for collaborative management of multiple agents in an energy storage power station based on deep reinforcement learning of claim 6, wherein the determining whether to trigger a policy correction mechanism comprises the following specific steps of: The method comprises the steps of comparing a state drift evaluation value with a state offset threshold in real time, judging that the current running state is normal when the state drift evaluation value is smaller than the state offset threshold, maintaining the existing action strategy by a control unit, judging that the current running state deviates from a control target when the state drift evaluation value is larger than or equal to the state offset threshold, triggering a strategy correction mechanism, performing interference suppression on an action evaluation value corresponding to the current action strategy by taking the state drift evaluation value as an adjustment signal, dynamically reconstructing action value sequence, outputting a corrected action evaluation value sequence, guiding action selection logic to adjust to the state target direction based on the corrected action evaluation value sequence, and realizing self-adaptive response and dynamic correction of the action strategy to the offset state.

Description

Deep reinforcement learning-based multi-agent cooperative management method for energy storage power station Technical Field The invention relates to the technical field of intelligent control, in particular to a multi-agent cooperative management method of an energy storage power station based on deep reinforcement learning. Background Along with the continuous improvement of the permeability of new energy power generation, the energy storage power station has increasingly remarkable effect in a power system, and becomes an important means for guaranteeing the frequency modulation, peak clipping and valley filling of a power grid and improving the quality of electric energy. Because the energy storage system comprises a plurality of complex subsystems, the running states of the energy storage system have strong coupling, time-varying and nonlinear characteristics, and the traditional control strategy based on rules or a single optimization target is difficult to adapt to the optimal management requirements of the energy storage system under multiple scenes and dynamic disturbance. Therefore, how to realize the cooperative control and self-adaptive optimization of the multi-control units of the energy storage power station becomes a key research direction of intelligent power dispatching and intelligent energy storage operation and maintenance. For example, the invention with publication number CN116596028a discloses a informer-based method for managing electric quantity of an energy storage power station, which comprises a cloud server and a plurality of energy storage power station subsystems. And the cloud server is provided with informer network models, performs network training by utilizing the energy storage power station data acquired and uploaded by the energy storage power station subsystem, and outputs the predicted value of the SOC of the energy storage power station under different temperatures and working conditions. And the energy storage power station subsystem downloads the corresponding model weight file from the cloud server according to the self operation environment condition. Meanwhile, the cloud server continuously receives data uploaded by the energy storage power station subsystem, performs incremental learning, and continuously optimizes the predicted temperature range by taking 5 ℃ as a step gradient. The method can solve the problem of overlong training time caused by insufficient hardware calculation of the subsystem, and can automatically update the weight file in time after external conditions are changed, so that manual update is not needed, and the prediction accuracy is improved. For example, the invention with publication number CN119378636A provides a multi-control unit parallel training method of a power system based on intelligent optimization of a group, which comprises the steps of constructing a system simulation model based on a novel power system to be controlled, constructing an initial multi-control unit based on the system simulation model, initializing particle group parameters and the initial multi-control unit, initializing a plurality of training control units, carrying out distributed parallel training on the plurality of training control units until all training control units are trained to obtain a current wheel multi-control unit, acquiring the fitness of each control unit in the current wheel control unit group and updating the particle group parameters, further updating the basic parameters of each control unit in the current wheel control unit group to update the plurality of training control units, repeating the distributed parallel training on the plurality of training control units until preset training conditions are met, and controlling the novel power system to be controlled based on the control unit group. However, although the above-mentioned technical solution makes a certain progress in the aspects of energy storage management prediction and multi-control unit training, the real-time coordination between the multi-control units, the autonomous decision response and the self-adaptive capability under the disturbance condition still have a shortage, and it is difficult to satisfy the optimization control requirement of the energy storage power station in the high-frequency dynamic regulation. In addition, the existing method depends on a fixed model architecture and centralized reasoning, lacks a dynamic weighing mechanism for the operation risk of the energy storage system, and cannot realize effective coordination of all control units under non-ideal working conditions. Therefore, in view of the above problems, there is a need for a multi-agent collaborative management method for an energy storage power station based on deep reinforcement learning. Disclosure of Invention Technical problem to be solved Aiming at the defects of the prior art, the invention provides a multi-agent cooperative management method for an energy storage power station