CN-122025710-A - Multi-stage charging optimization method and device for flow battery based on target agent model

CN122025710ACN 122025710 ACN122025710 ACN 122025710ACN-122025710-A

Abstract

The invention discloses a multi-stage charging optimization method and device of a flow battery based on a target agent model, comprising the steps of obtaining battery state data of a current sampling period and control quantity actually executed in a last sampling period; the method comprises the steps of constructing a to-be-optimized control sequence of a current sampling period, recursively calling a target agent model to process battery state data and the to-be-optimized control sequence, outputting a battery state prediction sequence in a prediction time domain, constructing a target function according to the battery state prediction sequence, the to-be-optimized control sequence and a control quantity, solving the target function under the condition that the to-be-optimized control sequence and the battery state prediction sequence meet preset constraint conditions to obtain an optimal control sequence of the current sampling period, extracting a first step control quantity of the optimal control sequence as a control quantity actually executed in a next sampling period, and driving the flow battery to perform multi-stage charging optimization. The invention can realize real-time, safe, high-efficiency and self-adaptive optimal control of the charging process of the flow battery.

Inventors

MAO HENGSHAN
LIU JUN
WANG JIE
LIU XIAOJIE
WANG XIAO
Sun Diefei
XIONG BINYU

Assignees

中电建新能源集团股份有限公司
武汉理工大学

Dates

Publication Date: 20260512
Application Date: 20251225

Claims (12)

1. A multi-stage charging optimization method of a flow battery based on a target agent model is characterized by comprising the following steps: acquiring battery state data of a current sampling period and a control quantity actually executed in a last sampling period; Constructing a to-be-optimized control sequence of a current sampling period, recursively calling a target agent model to process the battery state data and the to-be-optimized control sequence, and outputting a battery state prediction sequence in a prediction time domain, wherein the to-be-optimized control sequence consists of to-be-optimized control amounts of all control steps in a control time domain; Constructing an objective function according to the battery state prediction sequence, the control sequence to be optimized and the control quantity, and solving the objective function when the control sequence to be optimized and the battery state prediction sequence meet the preset constraint condition to obtain an optimal control sequence of the current sampling period; and extracting a first step control quantity of the optimal control sequence as a control quantity actually executed in the next sampling period so as to drive the flow battery to carry out multi-stage charging optimization.
2. The method of claim 1, wherein the battery state data includes at least a pentavalent vanadium ion concentration in a cell stack, a pentavalent vanadium ion concentration in a liquid storage tank, a state of charge, a cell stack voltage, and an open circuit voltage, the control amount includes at least a charging current and an electrolyte flow rate, the battery state prediction sequence includes at least a pentavalent vanadium ion concentration prediction value in a cell stack, a pentavalent vanadium ion concentration prediction value in a liquid storage tank, a state of charge prediction value, a cell stack voltage prediction value, and an open circuit voltage prediction value for each prediction step in a prediction time domain, and the control amount to be optimized includes at least a charging current to be optimized and an electrolyte flow rate to be optimized.
3. The method of claim 1, wherein the target agent model is trained in the following manner: Collecting historical operation data of the flow battery in a constant-current, constant-power and constant-voltage multi-stage charging process, wherein the historical operation data comprise charging currents at different sampling moments, electrolyte flow, pentavalent vanadium ion concentration in a galvanic pile, pentavalent vanadium ion concentration in a liquid storage tank, charge state, galvanic pile voltage and open-circuit voltage; Taking historical operation data of the previous sampling moment as input characteristics, taking the pentavalent vanadium ion concentration in the pile, the pentavalent vanadium ion concentration in the liquid storage tank, the state of charge, the pile voltage and the open circuit voltage of the next sampling moment as prediction targets, and training a neural network model; and when the error between the predicted value output by the neural network model and the predicted target is smaller than a preset error threshold, obtaining the target agent model, wherein the predicted value is obtained by processing the input characteristics by the neural network model.
4. The method of claim 1, wherein the recursively invoking a target agent model to process the battery state data and the control sequence to be optimized comprises: Taking battery state data of a current sampling period as an initial input state, inputting a first step control quantity in the control sequence to be optimized and the initial input state into a target agent model, and outputting battery state data of a first prediction step; Inputting the battery state data of the first prediction step and the second step control quantity in the control sequence to be optimized to a target agent model, and outputting the battery state data of the second prediction step; And sequentially cycling until the battery state data of all the prediction steps in the prediction time domain are output, so as to form a battery state prediction sequence in the prediction time domain.
5. The method of claim 1, wherein said constructing an objective function from said battery state prediction sequence, said control sequence to be optimized, and said control quantity comprises: Determining the sum of squares of deviations of the open-circuit voltage predicted values and the open-circuit voltage reference values of all the predicted steps in the prediction time domain and the sum of squares of deviations of the state-of-charge predicted values and the state-of-charge reference values of all the predicted steps in the prediction time domain; Determining the square sum of the control quantity variation of each control step in a control time domain according to the control sequence to be optimized and the control quantity; and constructing an objective function according to the sum of squares of deviations of the open-circuit voltage predicted value and the open-circuit voltage reference value, the sum of squares of deviations of the state of charge predicted value and the state of charge reference value and the sum of squares of the control quantity variation.
6. The method of claim 5, wherein constructing the objective function based on the sum of squares of deviations of the open circuit voltage predicted value and the open circuit voltage reference value, the sum of squares of deviations of the state of charge predicted value and the state of charge reference value, and the sum of squares of the control amount variation comprises: The objective function is constructed according to the following formula: U bocv,n is the open-circuit voltage predicted value of the n-th prediction step; The open-circuit voltage reference value is the n-th prediction step, the SOC n is the charge state prediction value of the n-th prediction step, and the SOC ref is the charge state reference value; Is a weight coefficient; the control quantity change quantity of the mth control step is N P which is the predicted time domain step number, N c which is the control time domain step number; the sum of squares of deviations of the predicted value of the open-circuit voltage and the reference value of the open-circuit voltage; the sum of squares of the deviation of the state of charge predicted value and the state of charge reference value; is the sum of squares of the control amount variation.
7. The method of claim 1, wherein the preset constraint conditions at least comprise a state constraint and a control input constraint, the state constraint at least comprises that a stack voltage predicted value is within a preset stack voltage safety upper and lower limit range, a pentavalent vanadium ion concentration predicted value in a stack is within a preset stack pentavalent vanadium ion concentration safety upper and lower limit range, a pentavalent vanadium ion concentration predicted value in a liquid storage tank is within a preset liquid storage tank safety upper and lower limit range, and the control input constraint at least comprises that a charging current to be optimized is within a preset charging current upper and lower limit range, and an electrolyte flow to be optimized is within a preset electrolyte flow upper and lower limit range.
8. The method of claim 1, wherein said solving said objective function comprises: And solving the objective function by adopting a nonlinear programming algorithm, wherein the nonlinear programming algorithm finds a control sequence to be optimized which meets a preset constraint condition and minimizes an objective function value through iterative calculation, and the control sequence is used as the optimal control sequence.
9. A multi-stage charging optimization device of a flow battery based on a target agent model is characterized by comprising the following components: the acquisition module is used for acquiring battery state data of the current sampling period and the control quantity actually executed in the last sampling period; The model processing module is used for constructing a to-be-optimized control sequence of a current sampling period, recursively calling a target agent model to process the battery state data and the to-be-optimized control sequence, and outputting a battery state prediction sequence in a prediction time domain, wherein the to-be-optimized control sequence consists of to-be-optimized control amounts of all control steps in a control time domain; The function solving module is used for constructing an objective function according to the battery state prediction sequence, the to-be-optimized control sequence and the control quantity, and solving the objective function under the condition that the to-be-optimized control sequence and the battery state prediction sequence meet the preset constraint condition to obtain an optimal control sequence of the current sampling period; And the extraction optimization module is used for extracting the first step control quantity of the optimal control sequence as the control quantity actually executed in the next sampling period so as to drive the flow battery to carry out multi-stage charging optimization.
10. An electronic device, comprising: a memory and a processor in communication with each other, the memory having stored therein computer instructions which, upon execution, cause the processor to perform the steps of the method of any of claims 1 to 8.
11. A computer storage medium storing computer program instructions which, when executed, implement the steps of the method of any one of claims 1 to 8.
12. A computer program product comprising a computer program which, when executed by a processor, implements the steps of the method according to any one of claims 1 to 8.

Description

Multi-stage charging optimization method and device for flow battery based on target agent model Technical Field The invention relates to the technical field of flow battery charging control, in particular to a multi-stage flow battery charging optimization method and device based on a target agent model. Background All-vanadium redox flow battery (VRFB, abbreviated as redox flow battery) is one of core technologies of large-scale electrochemical energy storage by virtue of the advantages of independent design of capacity and power, high operation safety, long cycle life and the like, and is widely applied to the scenes of renewable energy consumption, power grid peak regulation, frequency modulation, distributed energy systems and the like. In practical applications of all-vanadium redox flow batteries, the charge control strategy has significant impact on energy efficiency, electrolyte utilization and system life. On the one hand, the prior art generally depends on a high-fidelity electrochemical mechanism model in the charge optimization of the all-vanadium redox flow battery, and the model can accurately describe the processes of electrolyte transmission, ion migration and galvanic pile reaction, but has extremely high computational complexity, and is difficult to run in real time in a battery management system, so that an optimization result cannot be directly applied to online control. On the other hand, the existing multi-stage charging method is mostly based on experience division or fixed threshold adjustment, lacks of prediction capability for future state evolution of the battery, and often cannot realize self-adaptive control under complex working conditions, so that the charging efficiency is easily reduced and even the voltage is out of range risk is easily caused by the traditional method when the concentration of electrolyte is unbalanced or power fluctuates. Furthermore, while some research attempts have been directed to introducing data-driven methods, most have remained on offline predictions or state estimates of battery performance metrics, lacking deep coupling with an optimal control framework, and failed to directly develop an executable charging strategy, which has resulted in limited practical operation. In summary, the prior art has the problems that the mechanism model is large in calculated amount, difficult to optimize in real time, the empirical type phasing strategy lacks predictability, safety and self-adaption, the data driving method and the optimization control are used for fracturing, closed loops cannot be formed, and the like. In view of the above problems, no effective solution has been proposed at present. Disclosure of Invention The embodiment of the specification provides a multi-stage charging optimization method and device for a flow battery based on a target agent model, which are used for solving the defects of the existing charging optimization method for the all-vanadium flow battery in terms of instantaneity, safety, self-adaptability and control integration. In a first aspect, embodiments of the present disclosure provide a multi-stage charging optimization method for a flow battery based on a target proxy model, including: acquiring battery state data of a current sampling period and a control quantity actually executed in a last sampling period; Constructing a to-be-optimized control sequence of a current sampling period, recursively calling a target agent model to process the battery state data and the to-be-optimized control sequence, and outputting a battery state prediction sequence in a prediction time domain, wherein the to-be-optimized control sequence consists of to-be-optimized control amounts of all control steps in a control time domain; Constructing an objective function according to the battery state prediction sequence, the control sequence to be optimized and the control quantity, and solving the objective function when the control sequence to be optimized and the battery state prediction sequence meet the preset constraint condition to obtain an optimal control sequence of the current sampling period; and extracting a first step control quantity of the optimal control sequence as a control quantity actually executed in the next sampling period so as to drive the flow battery to carry out multi-stage charging optimization. In some embodiments, the battery state data at least comprises a pentavalent vanadium ion concentration in a galvanic pile, a pentavalent vanadium ion concentration in a liquid storage tank, a charge state, a galvanic pile voltage and an open circuit voltage, the control quantity at least comprises a charging current and an electrolyte flow, the battery state prediction sequence at least comprises a pentavalent vanadium ion concentration predicted value in the galvanic pile, a pentavalent vanadium ion concentration predicted value in the liquid storage tank, a charge state predicted value, a galvanic pile voltage predicte