CN-121857347-B - Full-link micro physical enhancement type automatic driving control method and system

CN121857347BCN 121857347 BCN121857347 BCN 121857347BCN-121857347-B

Abstract

The present disclosure relates to a physical enhancement type automatic driving control method and system with a micro-link. The method comprises the steps of constructing a full-link micro PERL dynamics model to reconstruct real vehicle dynamics with high fidelity, establishing a micro-pretightening PID execution layer, calculating transverse errors and longitudinal speed errors based on dynamic pretightening distance, generating steering wheel corner instructions and automobile longitudinal acceleration instructions by using a PID control law supporting gradient feedback, and utilizing a SAC network to adaptively adjust PID gain parameters and pretightening distance in real time according to the current vehicle state. The full-link micro-closed-loop training architecture designed by the invention directly guides the end-to-end optimization of the control parameters by utilizing the gradient information of the dynamic residual error, effectively solves the problem of distortion of the Sim2Real model, ensures the physical interpretability and smoothness of the control strategy, and simultaneously remarkably improves the path tracking precision and robustness of the vehicle under the complex working condition by a trained strategy network.

Inventors

LIANG CHENG
CUI YIXIN
YANG SHUO
CHEN SHIYANG
WU NAN
WANG YIZHI
SONG YAO
HUANG YANJUN

Assignees

同济大学

Dates

Publication Date: 20260512
Application Date: 20260318

Claims (9)

1. A full-link, scalable, physically enhanced autopilot control method, comprising: Training a full-link-microscopic PERL dynamics model, wherein the full-link-microscopic PERL dynamics model is configured to be formed by a physical prior layer and a residual error regression layer in parallel, training is carried out by minimizing the predicted vehicle state and the real vehicle state of the physical layer and the residual error regression layer, and parameters of the full-link-microscopic PERL dynamics model are frozen after training; after freezing parameters of a PERL dynamics model capable of being micro in a full link, pre-training a SAC strategy network, wherein the SAC strategy network is configured to adaptively adjust PID gain parameter values and pre-aiming distances in real time based on vehicle states; After the SAC strategy network pretraining is completed, carrying out the combined fine adjustment of a full-link micro PERL dynamics model and the SAC strategy network, in the fine adjustment training, based on the current vehicle state and the control quantity, the full-link micro PERL dynamics model predicts the vehicle state at the next moment, the vehicle state at the next moment is input into the SAC strategy network, the SAC strategy network outputs a corresponding PID gain parameter value and a pretightening distance, an actuator calculates a transverse error and a longitudinal speed error based on the dynamic pretightening distance, based on the error and the gain parameter value, a PID control law supporting gradient feedback is used for respectively generating the control quantity, an automatic driving vehicle updates the vehicle state under the control quantity, and the prediction error gradient in the fine adjustment training is directly transmitted back to the SAC strategy network by introducing a direct physical gradient through a chain rule by utilizing the full-link micro characteristic of the PERL dynamics model and the actuator; during reasoning, the trained SAC network is used for generating PID gain parameter values and pretightening distance based on the actual vehicle state, so that automatic driving control is realized.
2. The method of claim 1, wherein the control amount is comprised of a steering wheel angle command and a longitudinal acceleration command.
3. The method of claim 1, wherein the residual regression layer uses a loss function with physical constraint regularization during training: Wherein, the Real world tag data representing time t +1, Representing the nominal state prediction value of the physical a priori layer at time t +1, A state quantity correction value representing the predicted output of the residual regression layer, Regularization coefficients representing the physical confidence constraint terms.
4. The method of claim 1, wherein the residual regression layer is a fully-connected multi-layer perceptron structure that inputs a splice vector of vehicle state and control quantity And performs layer normalization processing to accelerate convergence, vehicle state = Control amount = Output and vehicle status Correction of the same dimension Wherein, the method comprises the steps of, And Is the coordinates of the position of the mass center of the vehicle, For the course angle of the vehicle, For the longitudinal speed of the vehicle, For the transverse velocity of the vehicle, Yaw rate for the vehicle; In order to command the steering angle, Is a longitudinal acceleration command.
5. The method of claim 1, wherein the physical prior layer is based on classical newton's law of mechanics, and a three degree of freedom monorail dynamics model is selected to calculate the motion trend of the vehicle in an ideal state.
6. The method of claim 1, wherein the actuator calculates a lateral error and a longitudinal velocity error based on the dynamic pre-aiming distance, and wherein the step of employing a micro-error calculation method based on the dynamic pre-aiming and the soft attention mechanism comprises: Pretarget distance based on SAC output Calculating a virtual pre-aiming point in a field of view in front of a vehicle , , Is the centroid position coordinate of the current moment of the vehicle, The heading angle of the vehicle at the current moment; the attention weight of each point on the reference track is calculated by adopting Gaussian kernel function based on Euclidean distance : Wherein the method comprises the steps of Is a temperature coefficient of the silicon carbide material, Is the first on the reference track A plurality of discrete path points are provided, Is the first on the reference track The value range of j is 0 to N, and N is the total number of reference track points; Obtaining matching points through weighted summation And its normal vector : , ; Calculating lateral errors ; Reference speed based on each waypoint Calculating longitudinal speed error , Is the current longitudinal speed of the vehicle.
7. The method of claim 1, wherein the PID control law supporting gradient feedback simulates mechanical limits of a physical actuator using pure algebraic calculations, comprising: Based on the transverse distance error and the longitudinal speed error at the current moment, respectively combining the integral and the derivative of the respective historical errors to calculate the non-limiting transverse control quantity And longitudinal control amount ; By using The functions are respectively subjected to smooth amplitude limiting to generate steering wheel angle instructions With longitudinal acceleration command : , Wherein: is the mechanical limit value of the steering wheel angle, Is the mechanical limit value of the longitudinal acceleration.
8. A computer-readable storage medium, characterized in that a computer program is stored that can be loaded by a processor and that performs the method according to any one of claims 1 to 7.
9. A full link, physically enhanced autopilot control system comprising: The PERL dynamics model training module is configured to perform full-link-microscopic PERL dynamics model training, wherein the full-link-microscopic PERL dynamics model consists of a physical priori layer and a residual regression layer in parallel, and predicts the vehicle state at the next moment based on the current vehicle state and the control quantity; The SAC strategy network training module is configured to pretrain the SAC strategy network after freezing parameters of the PERL dynamics model capable of being micro in the full link, and the SAC strategy network is configured to adaptively adjust PID gain parameter values and pretightening distance in real time based on the input vehicle state; The combined fine tuning module is configured to perform combined fine tuning of a full-link micro-PERL dynamics model and the SAC strategy network after the SAC strategy network pretraining is completed, in the fine tuning training, the current vehicle state is used as the input of the full-link micro-PERL dynamics model, the vehicle state at the next moment is input into the SAC strategy network, the SAC strategy network outputs a corresponding PID gain parameter value and a pretightening distance, an actuator calculates a transverse error and a longitudinal speed error based on the dynamic pretightening distance, a PID control law supporting gradient feedback is used for respectively generating control quantities based on the error and the gain parameter value, an automatic driving vehicle updates the vehicle state under the action of the control quantities, and the prediction error gradient in the fine tuning training is directly returned to the SAC strategy network by introducing a direct physical gradient through a chain rule by utilizing the full-link micro-feature of the PERL dynamics model and the actuator; during reasoning, the trained SAC network is used for realizing automatic driving control based on PID gain parameter values and pretightening distance generated by the actual vehicle state.

Description

Full-link micro physical enhancement type automatic driving control method and system Technical Field The invention relates to the technical field of automatic driving and intelligent control, in particular to a full-link micro physical enhancement type control method and system, which execute a decision process in real time through a vehicle-mounted control system and actuate a required vehicle subsystem to jointly execute a dynamic driving task. Background With the development of automatic driving technology, algorithm verification based on simulation has become a key link of system development. However, in migrating a control strategy trained in a simulation environment to a real vehicle, challenges of model distortion and insufficient control robustness are always faced. The prior art has mainly the following two types of solutions, but has the following obvious limitations. The first is a control method based on a conventional physical model. Existing solutions typically employ an idealized single-rail or double-rail dynamics model in conjunction with a PID or MPC controller. However, it is difficult for an ideal model to accurately describe the unmodeled dynamics (e.g., mechanical transmission lash, actuator response time lag) and environmental uncertainties (e.g., non-linear friction characteristics of the tire and the ground, microscopic irregularities of the road surface) that exist in real vehicles. The gap between the model and the reality causes the problem that the control parameters are excellent in simulation, and the tracking error is large and even the system is unstable on a real vehicle. In addition, the parameters of the traditional PID controller) And the pre-aiming distance is usually a fixed value or a static value based on a table lookup, and cannot meet the dynamic adjustment requirement of the vehicle under the limit working condition. The second type is an end-to-end intelligent control method based on data driving. The existing scheme mainly utilizes deep reinforcement learning (such as SAC and DDPG) or imitate learning algorithm to directly construct the mapping from sensor data to control instructions (such as steering wheel rotation angle and acceleration). Although such methods have powerful nonlinear fitting and adaptation capabilities, there are significant drawbacks: The control instruction output by the pure neural network lacks dynamic constraint, is easy to generate high-frequency jitter, not only causes the abrasion of an actuator, but also does not meet the riding comfort requirement, and is extremely easy to output an abnormal instruction when encountering an unseen working condition, so that potential safety hazard exists. The training efficiency is low and convergence is difficult, the traditional model-free reinforcement learning generally updates the strategy only through sparse reward signals, the environment is regarded as a 'black box', and the differential information of the dynamic model is not utilized to guide the optimization direction, so that the sampling efficiency is low, and the training process is long and unstable. Sim2Real has poor generalization ability, and policy performance is drastically reduced when physical parameters of the Real world (such as road friction coefficient) are inconsistent with simulation, because pure data driven models tend to over-fit the features of the simulation environment. Disclosure of Invention The invention aims to provide a physical enhancement type automatic driving control method and system with a micro full link, and aims to solve the problems that a strategy network often has a large tracking error and even a system is unstable on a real vehicle caused by simulation model distortion and lack of physical constraint in pure end-to-end reinforcement learning in automatic driving control. The method comprises the steps of training a full-link micro PERL dynamics model, configuring the full-link micro PERL dynamics model to be composed of a physical priori layer and a residual error regression layer in parallel, training a predicted vehicle state and a real vehicle state through the minimized physical layer and the residual error regression layer, freezing parameters of the full-link micro PERL dynamics model after training, pre-training a SAC strategy network after freezing parameters of the full-link micro PERL dynamics model, enabling the SAC strategy network to be configured to be based on a vehicle state, adaptively adjusting PID gain parameter values and pre-aiming distance in real time, enabling the full-link micro PERL dynamics model and the SAC strategy network to be combined and finely adjusted after the SAC strategy network is pre-trained, enabling the full-link micro PERL dynamics model to be based on a current vehicle state and a control quantity in fine-tuning training, inputting the SAC strategy network at the next moment according to the predicted vehicle state, enabling a PID gain parameter value a