CN-121978898-A - Automobile train power domain control method based on transform two-way output and SAC reinforcement learning

CN121978898ACN 121978898 ACN121978898 ACN 121978898ACN-121978898-A

Abstract

The invention relates to the technical field of automobile train power system control and intelligent prediction, in particular to an automobile train power domain control method based on a converter double-way output and SAC reinforcement learning, which consists of a data preprocessing module, a deep learning module, a theoretical constraint module and an SAC reinforcement learning optimization module, wherein the system captures a complex mode in vehicle data by using a deep learning technology, integrates the theoretical constraint of vehicle dynamics into the complex mode, and adaptively adjusts constraint intensity by using the SAC reinforcement learning technology so as to realize the prediction of vehicle working conditions with higher precision and more conforming to the physical laws.

Inventors

YANG ZHIGANG
WANG LEI

Assignees

陕西重型汽车有限公司

Dates

Publication Date: 20260505
Application Date: 20260106

Claims (8)

1. A vehicle train power domain control method based on a converter double-way output and SAC reinforcement learning is characterized by comprising a data preprocessing module, a training set, a verification set and a test set, wherein the data preprocessing module is used for cleaning, normalizing and serializing vehicle history data; the deep learning module is used for training a deep learning model by utilizing training set data and predicting future vehicle characteristics based on a historical vehicle state sequence; The theoretical constraint module is used for establishing a vehicle theoretical feature generator and modeling the physical relationship among key features based on a vehicle dynamics principle; the fusion module is used for fusing the theoretical constraint with the prediction result of the deep learning module through the adjusted constraint intensity parameter to generate a prediction result conforming to the physical rule; And the SAC reinforcement learning optimization module is used for constructing an SAC reinforcement learning agent and automatically learning the optimal characteristic constraint strength by comparing the error of the prediction result and the actual value so as to balance the prediction accuracy and the physical rationality.
2. The method for controlling the power domain of the automobile train based on the converter double-output and SAC reinforcement learning according to claim 1, wherein the data preprocessing module receives original vehicle history data of engine speed, vehicle speed, gear, throttle, coolant temperature, fan speed, fuel consumption rate and engine torque.
3. The method for controlling the power domain of the automobile train based on the two-way output of the transducer and the reinforcement learning of the SAC according to claim 1, wherein the characteristics of the automobile are divided into a power group, a transmission group and a thermal management group, and different attention calculations are respectively applied in the groups and between the groups.
4. The method for controlling the power domain of an automobile train based on the two-way output of a transducer and the reinforcement learning of SAC according to claim 1, wherein the deep learning prediction module adopts a transducer architecture, and the transducer architecture comprises A) An input layer for receiving a normalized vehicle feature sequence; b) A position coding layer for adding position information for the input sequence; c) A transducer encoder comprising 3 encoder layers, each layer comprising a multi-headed self-attention mechanism and a feedforward neural network; d) And (5) outputting a layer, namely generating a prediction sequence.
5. The automobile train power domain control method based on the transform two-way output and the SAC reinforcement learning according to claim 1, wherein the theoretical constraint module constructs a theoretical constraint model based on a vehicle dynamics principle to generate a theoretical characteristic value conforming to a physical rule; 1) Vehicle dynamics modeling: Constructing an engine rotating speed-vehicle speed-gear relation model: n_engine = v i_transmission i_final / (2π r_wheel) Wherein n_engine is the engine speed, v is the vehicle speed, i_transmission is the transmission gear ratio, i_final is the main reducer gear ratio, and r_wavelet is the wheel radius; Establishing an engine speed-torque-throttle position relation model: T_engine = f(n_engine, throttle_position) Wherein f is a nonlinear function fitted based on experimental data; Constructing a cooling liquid temperature-fan rotating speed relation model, and designing a fuel consumption rate calculation model; 2) Theoretical feature generation: calculating theoretical eigenvalues based on the physical model; and applying characteristic physical boundary constraint to ensure that the generated value is in a physical possible range, realizing change rate constraint and limiting the characteristic change rate to be in a physical reasonable range.
6. The method for controlling the power domain of the automobile train based on the two-way output of the Transformer and the SAC reinforcement learning according to claim 1, wherein the constraint fusion module comprises: 1) Constraint intensity parameterization: Setting independent constraint intensity parameters for each feature, and constructing a constraint intensity matrix which represents the constraint intensity of different features in different time steps; 2) And (5) weighted fusion calculation: The fusion result is calculated according to the following formula: enhanced_prediction(t,f)=(1-constraint_level(f)) dl_prediction(t,f)+constraint_level(f) theory_features(t,f) wherein t is a time step index, f is a feature index, constraint_level (f) is a constraint intensity parameter of the feature f; 3) Gradient retention mechanism: and a gradient maintaining mechanism is designed to ensure that the gradient can correctly flow back to the deep learning model in the back propagation process, so that a differentiable structure with automatically adjusted constraint intensity is realized.
7. The vehicle train power domain control method based on the transform two-way output and the SAC reinforcement learning according to claim 1, wherein the determining yaw state obtains a steering wheel angle through a steering angle sensor, obtains a current vehicle yaw rate through a yaw angle sensor, obtains a current vehicle speed through a wheel speed sensor, calculates an ideal yaw rate of the vehicle according to a specified algorithm, and judges the yaw state of the current vehicle by comparing the ideal yaw rate with an actual yaw rate.
8. The method for controlling the power domain of an automobile train based on the two-way output of a transducer and SAC reinforcement learning according to claim 1, wherein the SAC reinforcement learning optimization module comprises: 1) And (3) designing a state space: Constructing state characteristics based on prediction errors and physical rationality, and discretizing the state into 10 state levels; 2) And (3) designing an action space: setting 10 discrete constraint intensity levels ranging from 0.0 to 1.0, and independently setting an action space for each feature; 3) And (3) bonus function design: designing a comprehensive rewarding function: reward = α prediction_accuracy + (1-α) physical_validity; where α is the equilibrium coefficient.

Description

Automobile train power domain control method based on transform two-way output and SAC reinforcement learning Technical Field The invention relates to the technical field of automobile train power system control and intelligent prediction, in particular to an automobile train power domain control method based on a converter double-way output and SAC reinforcement learning. Background In the control of an automobile train power system (engine-transmission-wheels/whole vehicle), a gear shifting strategy and an energy consumption control strategy often need to know key state quantities in the short-term future in advance, such as an engine rotating speed, an output torque, a vehicle speed, a cooling system temperature, a current gear and the like. The existing method based on rule or univariate prediction is difficult to simultaneously describe the coupling relation among multiple features, and when the gear and the continuous features are predicted in a unified regression mode, gear decimal values, jump and unavailable values are easy to generate, so that a gear shifting control auxiliary system cannot directly utilize a prediction result, and further the coordination of energy consumption optimization and dynamic performance and smoothness targets is influenced. Aiming at the use scene and the characteristics of an automobile train, predictive information obtained through a map, a GPS and the like is applied to a large model control system, and a large model power domain integrated control technology is developed by combining estimation and prediction of the running state of the automobile. The model is used for predicting the running state parameters of the vehicle on the front road information by analyzing the front road information in advance, so that the running state parameters of the vehicle are adjusted in advance, and the oil saving effect is achieved. The control of the accelerator, the fan and the gear is realized by integrating an advanced network communication technology and a map, and data such as the map, vehicle data analysis, driver driving habit summary, real-time traffic conditions and the like are introduced for power domain control, so that the adaptability and individuation requirements of the vehicle power domain control are improved, and the requirements of a driver on economy, power and comfort under different environments are met. The periodicity and regionalization adjustment control of the control technology is realized through a large model algorithm, so that the regional adaptability of the vehicle can be ensured, and the economy of the automobile train is improved. In recent years, deep learning has made remarkable progress in the field of timing prediction, such as a transducer architecture, which can effectively capture long-sequence dependencies. However, the deep learning model driven by pure data still has obvious defects in the prediction of the vehicle working condition (1) a large amount of training data is needed, (2) the generalization capability of the deep learning model on abnormal working conditions is weak, and (3) a prediction result which does not accord with the physical rule of the vehicle can be generated. These models, while excellent under large data conditions, may deviate severely from reality when encountering conditions where training data is not covered, even producing physically impossible outputs. The traditional physical model is constructed by means of engineering experience and physical laws, has accurate prediction capability on dynamics characteristics under specific working conditions, but lacks generality and adaptivity. Such models typically require a large amount of prior knowledge and a complex parameter calibration process, and as the complexity of the operating conditions increases, the difficulty of model construction increases exponentially. Researchers have tried to combine physical models with deep learning, such as Physical Information Neural Networks (PINN), but most of these methods use a fusion strategy with fixed weights, and cannot dynamically adjust the strength of theoretical constraints according to different characteristics and working conditions. Furthermore, existing approaches lack an efficient mechanism to balance the contributions of both data and theoretical drives, especially a systematic approach in terms of how to dynamically balance the contributions of both. Therefore, a more flexible and adaptive fusion method needs to be developed, which not only fully utilizes the capability of deep learning to excavate complex modes from data, but also introduces the theoretical constraint of vehicle dynamics to ensure the physical rationality of a predicted result, and simultaneously self-adaptively balances the contributions of data driving and theoretical driving through an intelligent optimization mechanism. The multi-step time sequence prediction method capable of realizing continuous dynamic state prediction and discrete