CN-122001214-A - DC-DC converter control method based on intelligent reinforcement learning algorithm

CN122001214ACN 122001214 ACN122001214 ACN 122001214ACN-122001214-A

Abstract

The invention relates to a DC-DC converter control method based on an agent reinforcement learning algorithm, aiming at a DC-DC buck converter supplying power to a constant power load, firstly, an agent is built, the agent obtains a state observed quantity formed by presetting various environment variable values of the DC-DC buck converter, and a reward value corresponding to a difference value between an output voltage and a preset reference voltage, and a corresponding action signal is output through agent processing for controlling the DC-DC buck converter; and then through the reinforcement study training to the intelligent body, is by model after training to DC down-converter control for DC down-converter output voltage all has better adaptability and dynamic response characteristic under different operating modes, and the intelligent body of design has good steady state performance in the application, can handle the model parameter uncertainty that the components and parts ageing leads to, guarantees the stability in the middle of the practical application.

Inventors

ZHANG XUEYONG
ZHANG CHAOHONG
LIANG CHUANCHUAN
XU SHEN
SUN WEIFENG

Assignees

东南大学

Dates

Publication Date: 20260508
Application Date: 20251202

Claims (9)

1. The DC-DC converter control method based on the agent reinforcement learning algorithm is characterized in that the following steps are executed for a DC-DC buck converter supplying power to a constant power load: Step I, building an intelligent body, wherein the intelligent body obtains a state observation quantity formed by presetting various environment variable values of a DC-DC buck converter, obtains a rewarding value obtained by rewarding a function on a difference value between an output voltage of the DC-DC buck converter and a preset reference voltage, processes the obtained state observation quantity and the rewarding value through the intelligent body, outputs a corresponding action signal through the intelligent body, and outputs a corresponding duty ratio control signal to the DC-DC buck converter through a PWM module for control; And II, performing reinforcement learning training on the intelligent body based on the work of the DC-DC buck converter to obtain a trained model, and performing work control on the DC-DC buck converter by the trained model.
2. The method for controlling a DC-DC converter based on an agent reinforcement learning algorithm according to claim 1, wherein the agent comprises a Critic network and an Actor network, the Critic network comprises a Critic optimizer, a Critic evaluation network and a Critic target network, the Actor network comprises a Actor optimizer, a Actor evaluation network and a Actor target network, and the step II is performed in the reinforcement learning training process of the agent according to the following steps; step A. Initializing parameters of an Actor network and a Critic network and an experience replay pool, and obtaining the corresponding first step of a DC-DC buck converter by an intelligent agent Time of day action signal State observation quantity composed of preset environment variable values And (d) the The difference between the moment output voltage and the preset reference voltage is rewarded by a rewarding function Store in the experience replay pool and obtain the corresponding first of DC-DC down converter State observation quantity formed by presetting environment variable values at moment Storing the data in an experience replay pool, randomly sampling the data with the preset data amount in the experience replay pool when the data amount in the experience replay pool reaches a preset data amount threshold value, and repeatedly executing the following steps B to E to update an Actor network and a Critic network; step B. Evaluating the network by actor Calculate to obtain the corresponding first Time of day action signal And send to critic an evaluation network while sending an action signal Outputting corresponding duty ratio control signals through a PWM module Controlling the DC-DC buck converter; at the same time by actor target networks Calculate to obtain the corresponding first Time of day action signal And sent to critic target network, wherein, The representation actor evaluates the set of parameters to be trained in the network, The representation actor evaluates the network function, Representing actor sets of parameters to be trained in the target network, Representing actor the target network function, and then entering step C; step C, critic target network aims at state observation quantity And an action signal Pressing down Calculating to obtain corresponding evaluation results Then pressed by critic target network Obtain the first Time of day pair Time action signal Is (are) evaluated according to the evaluation results of And sent to critic optimizer via critic evaluation network, wherein, Representing a discount factor to be trained; At the same time critic evaluates the network against the state observations And an action signal Pressing down Calculating to obtain corresponding evaluation results And sent to critic optimizer and actor evaluation network, wherein, The representation critic evaluates the set of parameters to be trained in the network, Representation critic evaluates the network function and then proceeds to step D; Step d, critic the optimizer is formulated as follows: ; Calculating to obtain loss function result And minimize loss function results Obtaining critic a parameter update gradient of a parameter set to be trained in the evaluation network, returning to the critic evaluation network, and aiming at the parameter update gradient by the critic evaluation network according to the parameter update gradient And updating the parameters, wherein, the parameters are updated, Representing a desired function; At the same time critic the evaluation network will update Sent to critic target network, and updated by critic target network Pressing down Aiming at parameter set to be trained in critic target network The updating is performed such that, Representing a preset update amplitude, and then entering a step E; Step E, actor evaluation network forwarding received evaluation results To actor optimizer, the actor optimizer applies the following formula: ; Calculating to obtain sampling strategy gradient And returns to actor the evaluation network, which is operated by actor to gradient according to the sampling strategy For the following Parameter updating is carried out, and after updating Sending to actor target network, and actor target network according to updated information Pressing down For the following And updating the parameters, wherein, the parameters are updated, Representing the gradient operation of the deviator, Representing a set of parameters to be trained for actor evaluation networks Is used for the gradient of (a), Representing the relative motion signal Is a gradient of (a).
3. The method for controlling a DC-DC converter based on an agent reinforcement learning algorithm as set forth in claim 2, wherein step II is performed during reinforcement learning training of the agent by Samples are stored in an experience replay pool and sampled by a preset number of samples for reinforcement learning training.
4. The method for controlling a DC-DC converter according to claim 2, wherein in the step II, a trained model is obtained based on reinforcement learning training of the agent, the operation of the DC-DC converter is controlled by actor evaluation network in the trained model, and actor evaluation network obtains the corresponding first DC-DC converter State observation quantity formed by presetting environment variable values at moment Pressing down Calculate to obtain the corresponding first Time of day action signal And outputs corresponding duty ratio control signals through the PWM module To the DC-DC buck converter.
5. The method for controlling a DC-DC converter based on an agent reinforcement learning algorithm according to any one of claims 1 to 3, wherein the DC-DC buck converter includes input voltages at two adjacent times corresponding to preset environmental variable values And (3) with Inductor current And (3) with Output voltage And (3) with Output current And (3) with Reference voltage Error between output voltage and reference voltage And (3) with 。
6. The method for controlling a DC-DC converter according to any one of claims 1 to 3, wherein the DC-DC buck converter outputs a voltage according to the agent reinforcement learning algorithm With a preset reference voltage Difference between them The following reward function: ; the obtained prize value , wherein, Expressed in natural constant An exponential function of the base.
7. The method for controlling a DC-DC converter based on an agent reinforcement learning algorithm according to any one of claims 1 to 3, wherein the action signal obtained by the network calculation is evaluated with respect to actor Also comprises a control unit for controlling the motion signal Adding noise update, then sending to PWM module for processing, outputting corresponding duty ratio control signal To the DC-DC buck converter.
8. The method for controlling a DC-DC converter based on an agent reinforcement learning algorithm according to any one of claims 1 to 3, wherein the DC-DC buck converter comprises a voltage source E, NMOS, a diode D, an inductor L, a capacitor C and a resistor R, wherein the positive electrode of the voltage source E is connected with the source electrode of an NMOS tube N1, the drain electrode of the NMOS tube N1 is connected with one end of the inductor L and the negative electrode of the diode D respectively, the grid electrode of the NMOS tube N1 is used for receiving a duty ratio control signal output by an agent through a PWM module, the other end of the inductor L is connected with one end of the capacitor C and one end of the resistor R respectively, the negative electrode of the voltage source E is connected with the positive electrode of the diode D, the other end of the capacitor C and the other end of the resistor R respectively, and the two ends of the resistor R are connected with the two ends of a constant power load.
9. The method for controlling a DC-DC converter based on an agent reinforcement learning algorithm according to claim 7, wherein the dynamic spatial model of the DC-DC buck converter based on the average switch is as follows: ; Wherein, the Representing the inductance value in the DC-DC buck converter, Representing the capacitance value in the DC-DC buck converter, Representing the input voltage in the DC-DC buck converter, Representing the output voltage in the DC-DC buck converter, Representing the constant power of the load, The resistance value of the resistor R is indicated, Representing the current through the inductor L in the DC-DC buck converter, Representation of With respect to the rate of change of the time, Representing output voltage Rate of change with respect to time.

Description

DC-DC converter control method based on intelligent reinforcement learning algorithm Technical Field The invention relates to a DC-DC converter control method based on an agent reinforcement learning algorithm, and belongs to the technical field of DC-DC buck converter control. Background In recent years, the use of dc micro-grids has expanded in many industrial applications, as it has more advantages than ac micro-grids. However, since the Constant Power Load (CPL) has a negative incremental impedance characteristic, cascading with the DC-DC converter reduces the power supply reliability. In order to mitigate the destructiveness of CPL, the prior art has proposed advanced control methods such as proportional-integral-derivative (PID) control, sliding mode control, model predictive control, and the like. The control method based on model driving has certain adaptability to the change of working conditions, but may show slow dynamic response speed, and is difficult to adapt to different circuit element parameters. In recent years, intelligent control methods of DC-DC converters have become a trend, such as genetic algorithms, fuzzy neural networks, etc., to cope with uncertain or partially uncertain dynamics. However, these methods have limited learning ability and generalization ability. Disclosure of Invention The invention aims to provide a DC-DC converter control method based on an agent reinforcement learning algorithm, so that output voltage has good self-adaptability and dynamic response characteristics under different working conditions. The invention designs a DC-DC converter control method based on an intelligent reinforcement learning algorithm, and the control method comprises the following steps of: Step I, building an intelligent body, wherein the intelligent body obtains a state observation quantity formed by presetting various environment variable values of a DC-DC buck converter, obtains a rewarding value obtained by rewarding a function on a difference value between an output voltage of the DC-DC buck converter and a preset reference voltage, processes the obtained state observation quantity and the rewarding value through the intelligent body, outputs a corresponding action signal through the intelligent body, and outputs a corresponding duty ratio control signal to the DC-DC buck converter through a PWM module for control; And II, performing reinforcement learning training on the intelligent body based on the work of the DC-DC buck converter to obtain a trained model, and performing work control on the DC-DC buck converter by the trained model. As a preferable technical scheme of the invention, the intelligent agent comprises a Critic network and an Actor network, wherein the Critic network comprises a Critic optimizer, a Critic evaluation network and a Critic target network, the Actor network comprises a Actor optimizer, a Actor evaluation network and a Actor target network, and the step II is executed according to the following steps in the process of performing reinforcement learning training on the intelligent agent; step A. Initializing parameters of an Actor network and a Critic network and an experience replay pool, and obtaining the corresponding first step of a DC-DC buck converter by an intelligent agent Time of day action signalState observation quantity composed of preset environment variable valuesAnd (d) theThe difference between the moment output voltage and the preset reference voltage is rewarded by a rewarding functionStore in the experience replay pool and obtain the corresponding first of DC-DC down converterState observation quantity formed by presetting environment variable values at momentStoring the data in an experience replay pool, randomly sampling the data with the preset data amount in the experience replay pool when the data amount in the experience replay pool reaches a preset data amount threshold value, and repeatedly executing the following steps B to E to update an Actor network and a Critic network; step B. Evaluating the network by actor Calculate to obtain the corresponding firstTime of day action signalAnd send to critic an evaluation network while sending an action signalOutputting corresponding duty ratio control signals through a PWM moduleControlling the DC-DC buck converter; at the same time by actor target networks Calculate to obtain the corresponding firstTime of day action signalAnd sent to critic target network, wherein,The representation actor evaluates the set of parameters to be trained in the network,The representation actor evaluates the network function,Representing actor sets of parameters to be trained in the target network,Representing actor the target network function, and then entering step C; step C, critic target network aims at state observation quantity And an action signalPressing downCalculating to obtain corresponding evaluation resultsThen pressed by critic target networkObtain the firstTime of day pairTime action signalI