CN-119408530-B - Airport guided vehicle automatic parking method based on reinforcement learning

CN119408530BCN 119408530 BCN119408530 BCN 119408530BCN-119408530-B

Abstract

The invention provides an airport guided vehicle automatic parking method based on reinforcement learning, which comprises the steps of 1, setting vehicle contour coordinates and a parking coordinate system of the airport guided vehicle, 2, building a vehicle kinematics model by combining the parking coordinate system, 3, building a vehicle collision judging mechanism, 4, setting a state space and an action space of an intelligent body, 5, setting an instant rewarding and reporting mechanism of the intelligent body in a motion process, 6, improving an experience playback mechanism and an intelligent body exploration mechanism in an algorithm, 7, training an improved automatic parking algorithm frame by combining an apron layout, and inputting acquired real-time acceleration and steering angle into the vehicle kinematics model to realize automatic parking control. The invention has shorter training period and better learning effect, so that the vehicle can automatically park in a limited space under the complex environment of the apron, thereby reducing the burden of ground staff and avoiding causing safety accidents.

Inventors

WANG YUXIAO
LI GUANCONG
ZHAO YUYU

Assignees

中国民航大学

Dates

Publication Date: 20260505
Application Date: 20240910

Claims (8)

1. The airport guided vehicle automatic parking method based on reinforcement learning is characterized by comprising the following steps of: step 1, setting vehicle contour coordinates and a parking coordinate system of an airport guided vehicle; step 2, combining a parking coordinate system to establish a vehicle kinematic model; Step3, setting a vehicle collision judging mechanism; The vehicle collision determination mechanism in step 3 specifically includes: Setting a meeting condition of vehicle contour vertex coordinates when a vehicle collides and a meeting condition of geometric figure areas constructed around a collision point P; the meeting condition of the vehicle contour vertex coordinates is as follows: (2); in the formula (2), For the vehicle's vertex coordinates, Is the obstacle coordinates; the conditions for the geometric area built around the collision point P are: (3); Step 4, setting a state space and an action space of the intelligent agent; Step 5, setting an instant rewarding and reporting mechanism of the intelligent body in the movement process; The instant rewards and rewards mechanism of the intelligent agent in the movement process in the step 5 comprises the following steps: forward rewards for guiding the vehicle to approach to the berth, target rewards obtained when the vehicle is parked and punishments obtained when the vehicle collides; the expression of the forward prize for guiding the vehicle close to the berth is: (4); In the formula (4) of the present invention, Is a parking target position; the expression of the target rewards acquired when the vehicle is parked is: (5); in the formula (5) of the present invention, A constant is awarded for the forward base, Is a time penalty factor; the expression of the penalty obtained when the vehicle collides is: (6); Step 6, constructing an automatic parking algorithm frame based on deep reinforcement learning, and improving an experience playback mechanism and an agent exploration mechanism in the algorithm to obtain an improved automatic parking algorithm frame; and 7, training the improved automatic parking algorithm frame by combining with the apron layout to obtain real-time acceleration and steering angle of the airport guide vehicle, and inputting the real-time acceleration and steering angle into the vehicle kinematic model to realize automatic parking control.
2. The method for automatically parking an airport guided vehicle based on reinforcement learning of claim 1, wherein the vehicle contour coordinates of the airport guided vehicle in step 1 comprise a vehicle contour vertex 、、、 Rear axle center point of vehicle As a point of reference for the movement, 、 For the projection of the front and rear axes on the ground, 、 For the length of the front-rear suspension, As the wheelbase, the base is used for the axle, For the steering angle of the front wheel, Is the included angle between the vehicle body and the horizontal direction, The parking coordinate system is that the vehicle is driven at a speed 。
3. The reinforcement learning-based airport guided vehicle automatic parking method of claim 1, wherein the expression of the vehicle kinematic model in step 2 is: (1)。
4. the reinforcement learning-based automatic parking method for airport guided vehicles according to claim 1, wherein the state space of the agent in step 4 is The action space is , wherein, For the acceleration of the vehicle, Is the front wheel steering angle.
5. The reinforcement learning-based airport guided vehicle automatic parking method of claim 1, wherein the improved automatic parking algorithm framework of step 6 comprises: the Actor decision module, the Critic evaluation module, the deep reinforcement learning environment, the experience playback pool; The experience playback pool is used for providing experience samples required by the learning of the intelligent agent, the experience samples comprise the state of the airport guided vehicle at the current moment, action information, instant rewards and the state of the airport guided vehicle at the next moment, wherein the state of the airport guided vehicle comprises transverse and longitudinal position information, course angle and speed of the vehicle, the action information of the airport guided vehicle comprises steering angle and acceleration of wheels, the instant rewards are accumulated rewards of actions of the intelligent agent, and the instant rewards are calculated by matching the instant rewards with future rewards adjusted by a discount factor gamma and are used for exciting the intelligent agent to achieve long-term targets; the Critic evaluation module is used for outputting evaluation value of executing actions on the intelligent agent; the Actor decision module is used for outputting an agent execution action, wherein the agent execution action comprises acceleration and steering angle of an airport guided vehicle.
6. The reinforcement learning based airport lead vehicle automatic parking method of claim 1, wherein the step of improving the experience playback mechanism and the agent exploration mechanism in step 6 comprises: step 6.1, updating an experience sample in an experience playback pool based on an Actor decision module and a Critic evaluation module, outputting action information of an airport guided vehicle at the current moment, bringing the action information of the airport guided vehicle at the current moment into a kinematic model to obtain state information of the airport guided vehicle, calculating accumulated rewards of an intelligent agent by combining the state information of the airport guided vehicle at each moment and an instant rewards mechanism, and finally storing the updated information as the experience sample in the experience playback pool; step 6.2, guiding the intelligent agent to park in the set motion state space based on the updated experience sample in the experience playback pool; and 6.3, designing a priority experience playback mechanism, screening the updated experience samples to obtain high-quality experience samples, and finishing the improvement of the experience playback mechanism and the intelligent agent exploration mechanism.
7. The reinforcement learning-based automatic parking method of an airport guided vehicle of claim 6, wherein obtaining a high quality experience sample in step 6.3 comprises: Calculating the average value and variance of all updated experience sample instant rewards and combining the variance and average value design priority of the instant rewards Screening to obtain a high-quality experience sample; Priority level The expression of (2) is: (7); In the formula (7) of the present invention, As a sample of the experience that is presented, For the round of rewarding the average value, 、 The weights of the rewards mean and variance, respectively.
8. The reinforcement learning based automatic parking method of an airport lead vehicle of claim 1, wherein the step of real-time acceleration and steering angle of the airport lead vehicle of step 7 comprises: Step 7.1, based on the screened high-quality experience, performing iterative computation on the parameters of the Actor and the Critic evaluation module by adopting a gradient optimization method to finish updating of the Actor and the Critic evaluation module, designing an environment exploration attenuation mechanism in the training process, and finishing training of an improved automatic parking algorithm frame; step 7.2, extracting an action output result of the Actor decision module after training is completed, inputting the action output result into a vehicle kinematics model to realize automatic parking control, wherein the action output result comprises acceleration And steering angle ; The expression of gradient optimization and parameter update rate of Critic evaluation module is: (8); In the formula (8), the expression "a", For the rate of the network learning, The number of experiences selected from the batch of experiences; The expression of gradient optimization and parameter update rate of the Actor decision module is as follows: (9); the expression of the environment exploration decay mechanism is: (10); In the formula (10) of the present invention, As a result of the initial standard deviation, Is the decay rate.

Description

Airport guided vehicle automatic parking method based on reinforcement learning Technical Field The invention relates to an airport guided vehicle automatic parking method based on reinforcement learning, and belongs to the technical field of intelligent driving. Background With the rapid development of the global aviation industry, the complexity and frequency of the apron traffic scheduling are increased. The automatic parking technology is taken as an important component of intelligent ground traffic management, gradually becomes a key requirement for airport guided vehicle operation, plans a driving path at each moment according to real-time information fed back by the environment where the vehicle is located, and accurately controls the speed and the steering by combining with state information of the vehicle, so that parking accidents caused by insufficient experience of a driver can be effectively avoided, waiting time of the guided vehicle is reduced, and normal operation of the airplane and related ground facilities is ensured. The automatic parking technology originates from the beginning of the 90 th century, adopts an algorithm structure of path planning and track tracking control, namely after the berth is found, firstly adopts a geometric figure splicing or numerical fitting mode to plan a parking path, and then adopts a traditional control method to track and control the parking path. However, the vehicle involves a plurality of control variables such as speed, acceleration and the like in the parking process, and the early automatic parking method is a control algorithm which is dependent on a series of preset rules, and needs to redesign or adjust parameters according to the change of the parking environment and the demand, so that certain generalization capability is lacked. In contrast, the development of computer science and artificial intelligence today brings convenience to the application of deep reinforcement learning, and a control algorithm based on deep reinforcement learning can perform learning of an optimal strategy by interacting with an environment, especially after training of a large amount of diversified data, an intelligent agent can gradually adapt to various unknown environments and output correct actions, wherein a deep deterministic strategy Gradient (DEEP DETERMINISTIC Policy Gradient, DDPG) algorithm is widely applied to continuous and high-precision control tasks such as automatic parking due to its excellent stability and excellent capability of processing continuous states and action spaces. The algorithm adopts an off-line learning mode in combination with an experience playback mechanism, and ensures the stability of the training process while breaking the correlation of learning samples. The learning capacity of the deep neural network is utilized to process the dynamic relation between the high-dimensional real-time state of the vehicle in the parking environment and the execution decision, and the internal parameters of the vehicle are continuously adjusted in the network training process, so that the output action can minimize the prediction error and maximize the task income, thereby realizing safe and stable parking operation. The parking environment of the airport guide vehicle is complex and the available space is limited due to various flight ground service facilities such as corridor bridges, power supply vehicles, tractors and the like on the airport apron. Thus, completing a park task for a lead vehicle using the DDPG algorithm typically has the following drawbacks: (1) Feedback of the parking environment is generally highly random, and experience playback pools in algorithms are limited in capacity, so that an intelligent agent can search the environment in the training process to generate a large number of invalid experience samples, so that successful samples are updated and submerged, the algorithm training period is overlong, and the parking quality is poor. (2) The algorithm adopts a deterministic strategy, the intelligent body lacks sufficient exploration capacity in the initial stage of training, and the added environment exploration signal can interfere with the strategy learning of the intelligent body in the later stage, so that the training process of the intelligent body oscillates, and the learning efficiency is low. Disclosure of Invention The invention aims to solve the technical problem of poor parking quality in the prior art, and further provides an airport guide vehicle automatic parking method based on reinforcement learning. The technical scheme adopted for solving the problems is that the invention provides an airport guide vehicle automatic parking method based on reinforcement learning, which comprises the following steps: step 1, setting vehicle contour coordinates and a parking coordinate system of an airport guided vehicle; step 2, combining a parking coordinate system to establish a vehicle kinematic model; Step3, settin