CN-121989906-A - Mode switching self-learning system and method for hybrid electric vehicle

CN121989906ACN 121989906 ACN121989906 ACN 121989906ACN-121989906-A

Abstract

The invention provides a mode switching self-learning system and method for a hybrid electric vehicle, and relates to the field of power control of hybrid electric vehicles. The method comprises an engine work optimizing module, a basic mode switching control module and a self-learning compensation control module, wherein an optimal starting and pre-working rotating speed curve of an engine is obtained through optimization of an on-line dynamic programming algorithm, a double-motor optimal basic output torque is obtained through combination of PI closed loop control and a particle swarm optimization algorithm, a double-motor compensation torque at a moment t is generated based on a SAC reinforcement learning algorithm, and the compensation torque and the basic torque are combined linearly to form an actual output torque to complete mode switching. The invention effectively improves the smoothness of mode switching, reduces energy consumption, realizes accurate output of compensation torque under multiple working conditions, enhances the working condition self-adaption capability and robustness of mode switching control, and meets the real-time control requirement of the whole vehicle.

Inventors

ZHANG HONGDANG
HUANG YUHAN
SHI YAO
WANG JIAMING
SU FEI
Rong Xiangwei
YANG HONGTU

Assignees

常州机电职业技术学院

Dates

Publication Date: 20260508
Application Date: 20260324

Claims (10)

1. The mode switching self-learning system of the hybrid electric vehicle is characterized in that the hybrid electric vehicle matched with the system is a power split type hybrid electric vehicle, a transmission structure of the hybrid electric vehicle comprises a double planetary gear mechanism, a torsional damper (1), a locking clutch (2), an engine (3), a first motor (4), a second motor (5) and an output shaft (6), the double planetary gear mechanism comprises a first planetary gear (7) and a second planetary gear (8), the first planetary gear (7) and the second planetary gear (8) are both composed of a sun gear (70), a planet carrier (71) and a gear ring (72), the engine (3) is connected with the planet carrier (71) of the first planetary gear (7) through the torsional damper, the gear ring of the second planetary gear (8) is connected with the output shaft (6), the first motor (4) and the second motor (5) are respectively connected with the sun gear (70) of the first planetary gear, the sun gear (8), the first planetary gear (7) and the second planetary gear (72) are connected with the second planetary gear (8) through the torsional damper, and the self-learning module is connected with the self-locking module of the self-learning system, and the self-learning system is controlled by the self-locking module; the engine work optimizing module takes mode switching time, power source energy consumption, engine speed tracking error and engine terminal speed tracking error as quaternary optimizing targets, and obtains an optimal engine starting and pre-working speed curve based on an online dynamic programming algorithm; The basic mode switching control module takes the optimal starting and pre-working rotating speed curve output by the engine working optimizing module as a rotating speed tracking reference, and combines a proportional-integral closed-loop control and a particle swarm parameter optimizing algorithm to determine the optimal basic output torque of the double motors in the adaptive mode switching process; The self-learning compensation control module generates double-motor compensation torque aiming at different acceleration working conditions based on a soft actor-critic reinforcement learning algorithm, the double-motor compensation torque and the double-motor optimal basic output torque are combined linearly to obtain double-motor actual output torque, and the double-motor actual output torque is used as a motor torque command to control the torque cooperative control of the hybrid electric vehicle to finish mode switching.
2. The hybrid vehicle mode switching self-learning system according to claim 1, wherein in the engine operation optimizing module, a dynamic differential equation of a hybrid vehicle mode switching process is established by taking a rotational speed of an engine (3), a rotational speed of a carrier (71) of a first planetary gear (7), a rotational angle error of the engine (3) and the first planetary gear (7), a rotational speed of a ring gear of a second planetary gear (8), a rotational speed of an output shaft and a rotational angle error of the second planetary gear (8) and the output shaft (6) of the transmission structure as state variables, taking output torques of the first motor (4) and the second motor (5) as system control input variables, and taking a rotational speed of the output shaft (6) and a vehicle speed of the whole vehicle as output variables: ; ; ; ; Where x is the state variable of the mode switching process, Expressed as the engine speed, Representing the carrier speed of the first row of planets, Indicating the engine and first planetary gear angle error, Representing the ring gear speed of the second planetary gear set, Indicating the rotation speed of the output shaft, U is the output torque of the motor; And The output torque of the first motor and the output torque of the second motor are respectively represented, wherein w is the disturbance of the known input variable of the system; And Respectively representing the output torque and the output shaft end load of the engine; 、 And Coefficient matrices representing state variables, control input variables, and known output variables, respectively.
3. The hybrid vehicle mode switching self-learning system according to claim 2, wherein the engine operation optimizing module obtains an optimal engine start and pre-operation curve through on-line optimization by taking a weighted function of a mode switching time, a power source energy consumption, an engine speed tracking error and an engine terminal speed tracking error as an optimization objective function and taking an engine start and pre-operation curve as an optimization target according to a dynamic differential equation of a mode switching process, and the mathematical expression of an on-line optimized optimizing model is as follows: ; ; wherein min represents minimization; Representing a total optimization objective function of an engine operation optimizing module; Representing a mode switching time function; Representing a power source energy consumption function; representing an engine speed tracking error function; representing an engine terminal rotational speed tracking error function; t represents time; A weight coefficient representing a mode switching time function; A weight coefficient representing a power source energy consumption function; A weight coefficient representing an engine speed tracking error function; a weight coefficient representing an engine terminal speed tracking error function; And U represents the output torque of the motor; And Respectively representing the optimal engine starting and pre-working target rotating speed and the actual rotating speed; Representing an engine terminal rotational speed tracking error; the engine work optimizing module carries out iterative solution on the optimizing model through an on-line dynamic programming algorithm to obtain an optimal rotating speed curve of an engine starting stage and a pre-working stage in a mode switching process, wherein the solving process combines dynamic characteristics with an optimizing target by taking a dynamic differential equation of a mode switching process of a hybrid electric vehicle as a constraint condition, and the optimal programming of the engine rotating speed curve is realized.
4. The hybrid vehicle mode switching self-learning system of claim 3, wherein the base mode switching control module is configured to: combining an output shaft optimal rotating speed curve converted based on a target vehicle speed and an engine optimal starting and pre-working rotating speed curve output by an engine working optimizing module to obtain a multi-dimensional total rotating speed tracking target ; Calculating the deviation of the total rotation speed tracking target and the state variable of the mode switching process ; Will deviate from As an input state variable of the proportional-integral control, the two-motor output torque is calculated based on the proportional-integral control, and the calculation formula is: ; ; ; in the formula, Expressed as an optimal engine start and pre-operation target rotational speed, Indicating the target rotation speed of the output shaft, calculating by the target vehicle speed, The real-time rotating speed of the output shaft is represented, Representing engine speed, x representing a state variable of the mode switching process; Is a proportionality coefficient and is used for restraining the rotating speed deviation at the current moment; The integral coefficient is used for eliminating the rotating speed static difference in the whole mode switching process; u is the motor output torque.
5. The hybrid vehicle mode switching self-learning system of claim 4 wherein the base mode switching control module is configured to adjust the optimization objective function with a weighted function of the mode switching jerk, the two motor energy and the speed tracking error as parameters to 、 Setting upper and lower limit constraints for optimizing parameters according to the hardware performance of the double motors and the smoothness requirement of the whole vehicle, and constructing a parameter optimization model with constraints, wherein the parameter optimization model is as follows: ; ; wherein min represents minimization; representing an optimization objective function in the basic mode switching control module; Representing an impact function; Representing a motor energy function; Representing a rotational speed tracking error function; a weight coefficient representing an impact function; a weight coefficient representing an energy function of the motor; a weight coefficient representing a rotation speed tracking error function; Is a proportionality coefficient; Is an integral coefficient; 、 Respectively represent Minimum and maximum values of (2); 、 Respectively represent A is the acceleration of the whole vehicle; Target engine speed; Is the actual rotation speed of the engine; The basic mode switching control module performs a particle swarm optimization algorithm on the base mode 、 Performing iterative optimization, and solving to obtain an optimal proportion coefficient And an optimal integral coefficient Substituting the torque into a proportional-integral control calculation formula to obtain the optimal basic output torque of the double motors 。
6. The hybrid vehicle mode switching self-learning system according to claim 1, wherein the self-learning compensation control module trains the motor compensation amount by adopting a soft actor-critique reinforcement learning algorithm, takes a vehicle speed, an acceleration, an engine speed, a vehicle speed tracking error, a motor output torque, a jerk and a mode switching time as an observation space, takes the motor compensation torque as an action space, takes a weighting function of the vehicle speed tracking error, the jerk and the mode switching time as a reward space, generates a random acceleration curve in the reset function for tracking training, and obtains a motor compensation torque suitable for multiple working conditions, and the motor compensation torque and a motor optimal output torque obtained by a basic mode switching control module are combined into a motor actual output torque.
7. The hybrid vehicle mode switching self-learning system of claim 6, wherein the observation space is a set of state parameters at time t , wherein, Is the speed of the whole vehicle, Is the acceleration of the whole vehicle, Is the engine speed, Is a vehicle speed tracking error, Output torque for the motor, For mode switching impact, Elapsed time for mode switching; the action space is the motor compensation torque at the time t ; The rewarding space is a rewarding function at the moment t , The weighting function of the vehicle speed tracking error, the impact degree and the mode switching time is used for representing the working condition adapting effect of the current motor compensation quantity, Is positively correlated with the mode switch control effect.
8. The hybrid vehicle mode switching self-learning system of claim 7, wherein the training process of the soft actor-critique reinforcement learning algorithm comprises the steps of initializing, environment interaction, experience storage, sample sampling, target Q value calculation, network updating, target network soft updating and iterative convergence, and is specifically: s1, initializing, namely initializing a strategy network Two critics networks 、 Two target critics network Entropy temperature coefficient And an experience playback pool R; S2, environment interaction, namely in the whole vehicle control environment, according to the current state Sampling actions from a random strategy to explore, wherein the action expression is as follows: ; wherein t is denoted as t; Is a policy network; Is a random noise variable; Mapping the observed state and noise into a specific motor torque compensation amount for generating a function based on the action of the strategy network and random noise; s3, storing experience, namely, setting experience data obtained by environment interaction , , , ) Store in experience playback pool R, where Is the value of the bonus function at time t, The observation state at the time t+1; S4, sampling data, namely randomly sampling experience data samples with the batch size of N from the experience playback pool R , , , ) Where i is the sample number, For the observed state of the i-th sample, The torque is compensated for the motor of the ith sample, For the value of the bonus function for the i-th sample, Observation state for the (i+1) th sample; s5, calculating a target Q value, namely, for the state of the next moment Sampling optimal actions from a current policy network Calculating the target Q value of the ith sample by combining the entropy term Comprising: ; Wherein i is the i-th data sample; To at the same time The action obtained by sampling the strategy network is performed; Is a discount factor, and ; The log probability is the action; s6, updating the critics networks, namely training the two critics networks by using the mean square error loss, wherein the loss function expression is as follows: ; Wherein, the Mean square error of two critics networks; Is that And Estimating a lower Q value; N is the batch size; s7, updating the strategy network by minimizing the strategy loss Parameters of (2) The policy loss expression is: ; Wherein, the Is a strategic network loss; s8, target network soft updating, namely iteratively updating two target critics network parameters in a soft updating mode; and S9, iteration convergence, namely repeating the steps S2-S8, continuously interacting with the whole vehicle control environment, updating network parameters until the network loss tends to be stable or reaches the preset maximum iteration step number, and judging training convergence.
9. The self-learning system for mode switching of the hybrid electric vehicle according to claim 8, wherein the motor compensation torque suitable for multiple working conditions obtained after training convergence is converted into a table look-up mode for storage, the table look-up mode is input as real-time working condition parameters of the whole vehicle and output as matched motor torque compensation torque, and the motor compensation torque called by table look-up is linearly combined with the optimal basic output torque of the dual motor to obtain the actual output torque of the dual motor, wherein the actual output torque of the motor is used as a final motor control command in the mode switching process of the hybrid electric vehicle.
10. A mode switching self-learning method for a hybrid electric vehicle, which is applied to the mode switching self-learning system for the hybrid electric vehicle according to any one of claims 1 to 9, and is characterized in that the method comprises the following steps: When the engine work optimizing module receives a mode switching instruction, the optimal starting and pre-working speed curves of the engine are obtained by solving the four-element optimizing targets of the mode switching time, the power source energy consumption, the engine speed tracking error and the engine terminal speed tracking error based on an online dynamic programming algorithm; The basic mode switching control module takes the optimal starting and pre-working rotating speed curve output by the engine working optimizing module as a rotating speed tracking reference, and combines the proportional-integral closed-loop control and the particle swarm parameter optimizing algorithm to determine the optimal basic output torque of the double motors in the adaptive mode switching process; The self-learning compensation control module generates double-motor compensation torque aiming at different acceleration working conditions based on a soft actor-critics reinforcement learning algorithm, the double-motor compensation torque and the double-motor optimal basic output torque are combined linearly to obtain double-motor actual output torque, and the double-motor actual output torque is used as a motor torque command to control the hybrid electric vehicle to complete mode switching.

Description

Mode switching self-learning system and method for hybrid electric vehicle Technical Field The invention relates to the field of hybrid electric vehicle power control, in particular to a mode switching self-learning system and method for a hybrid electric vehicle. Background The hybrid electric vehicle technology and the pure electric vehicle technology coexist for a long time and are strategically complementary, wherein the hybrid electric vehicle combines the coupling mechanisms of multiple power sources, planetary gears and the like, has the technical characteristics of a series hybrid power system and a parallel hybrid power system, has excellent power performance, no driving range anxiety, has obvious energy saving and emission reduction effects, and becomes a main research configuration in the field of the current hybrid electric vehicles. The whole vehicle control of the hybrid electric vehicle not only needs to improve the fuel economy and the dynamic property under the steady state through power coupling and power source complementation, but also needs to precisely coordinate the transient torque so as to ensure the driving smoothness. In the actual running process, the hybrid electric vehicle is influenced by the driving environment and the working condition demands, the hybrid electric vehicle needs to be frequently switched between a pure electric mode and a hybrid power mode, the transient mode switching process needs to finish operations such as engine starting, multi-power-source torque matching and the like, and the requirements on a control strategy are extremely high. Meanwhile, the parameters of the traditional control strategy are fixed values, so that the switching smoothness under different acceleration and vehicle speed working conditions is difficult to consider, the working condition self-adaptive capacity of the mode switching control is weak, the problems of large rotating speed deviation, whole vehicle impact, pause and frustration and the like are easy to occur, and the driving experience is reduced. In recent years, a comprehensive control method combining an optimization algorithm, feedback control and an intelligent algorithm provides a new direction for mode switching control of a hybrid electric vehicle, and how to realize accurate control of a mode switching process through algorithm fusion, and meanwhile, a control strategy has self-learning capability to adapt to a wide working condition becomes a research focus of a person skilled in the art. Disclosure of Invention The invention aims to overcome at least one technical problem in the prior art and provides a mode switching self-learning system and method for a hybrid electric vehicle. On one hand, the embodiment of the invention provides a mode switching self-learning system of a hybrid electric vehicle, the hybrid electric vehicle matched with the system is a power split hybrid electric vehicle, a transmission structure of the hybrid electric vehicle comprises a double planetary gear mechanism, a torsional vibration damper, a locking clutch, an engine, a first motor, a second motor and an output shaft, the double planetary gear mechanism comprises a first planetary gear and a second planetary gear, the first planetary gear and the second planetary gear are respectively composed of a sun gear, a planet carrier and a gear ring, an engine is connected with the planet carrier of the first planetary gear through the torsional vibration damper, the gear ring of the second planetary gear is connected with the output shaft, the first motor and the second motor are respectively connected with the sun gear of the first planetary gear and the sun gear of the second planetary gear, the gear ring of the first planetary gear is connected with the gear ring of the second planetary gear, the planet carrier of the second planetary gear is locked to the frame through the locking clutch, the self-learning system comprises an engine work optimizing module, a basic mode switching control module and a self-learning compensation control module, the engine work optimizing mode is connected with the sun gear, the planet carrier is connected with the output shaft of the planetary gear, the torque of the engine is optimized based on a pre-optimizing power source, the optimal power source switching module is a pre-optimizing power source, the optimal engine work torque is calculated based on a target rotation speed, the optimal target rotation rate is calculated, the optimal work rate is calculated, the best mode is calculated based on a best-phase-optimal target rate control mode, and the best-running mode is calculated, and a best-based on a best-mode working decision-mode is calculated, and a best-mode working mode is calculated, and a best-based on a best-mode, and a best-mode working speed optimal speed control mode, and a best mode is calculated, and a speed mode, and a working speed is based on a working speed, and a speed is based. And ge