CN-121543451-B - Intelligent vehicle autonomous lane change decision system and method based on near-end policy optimization

CN121543451BCN 121543451 BCN121543451 BCN 121543451BCN-121543451-B

Abstract

An intelligent vehicle autonomous lane change decision system and method based on near-end policy optimization. Belongs to the technical field of intelligent transportation and automatic driving. The method solves the technical problems that the prior art still has defects in the aspects of state space design, safety constraint modeling, training strategy optimization and the like. The system comprises a traffic simulation environment module, a state observation and feature construction module, a near-end strategy optimization channel switching decision module, a rewarding calculation and safety supervision module, a training control and data recording module and a strategy export and application module. The training environment is built based on the microscopic traffic simulation platform, and the training scene is closer to real road traffic by configuring different traffic flow densities and speed distribution, so that the strategy has better adaptability under different working conditions.

Inventors

HAN JIAYI
JIANG YUNCHUAN
YANG LU
SONG DONGJIAN
XU WENHAO
YAN ZIXUAN
YANG XIN

Assignees

吉林大学
北京理工大学

Dates

Publication Date: 20260505
Application Date: 20260119

Claims (7)

1. Intelligent vehicle autonomous lane change decision system based on near-end policy optimization, which is characterized in that the system comprises: The system comprises a traffic simulation environment module, a state observation and feature construction module, a near-end strategy optimization channel switching decision module, a rewarding calculation and safety supervision module, a training control and data recording module and a strategy export and application module; the traffic simulation environment module is used for constructing an initialization simulation environment and traffic flow parameters based on urban traffic simulation software SUMO and providing controllable overtaking working conditions; The state observation and feature construction module is used for acquiring the running information of the own vehicle and the front vehicle from the traffic simulation environment module at each simulation time step through the traffic control interface, and combining and normalizing the running information to form a state vector; The near-end strategy optimization channel change decision module comprises a strategy network sub-module and a value network sub-module, wherein the strategy network sub-module outputs the probability of two discrete actions of keeping a lane and changing a channel after receiving the state vector; when training the strategy network, a near-end strategy optimization algorithm is adopted, and a cutting mechanism in the near-end strategy optimization algorithm is improved, specifically: First, according to each time step Corresponding collision and safety penalty term Dividing training sample into common sample and high risk sample, and adopting basic clipping coefficient for common sample Using more stringent clipping coefficients for high risk samples And meet the following Whereby the probability ratio of the clipping objective function of the near-end policy optimization algorithm Is based on the security level of the sample Or (b) Is switched in a self-adaptive manner, The definition process of (1) is specifically as follows: Definition of the first embodiment Inter-vehicle distance of each prediction step: ; Wherein, the For the actual longitudinal distance of the current own vehicle from the preceding vehicle, As the current relative velocity of the motor and the motor, In order to simulate the step of time, Index the predicted steps, and upper limit of the predicted steps is recorded as If there is a certain ( ) So that Or the current actual distance Has itself been smaller than If the potential or actual collision risk exists in the vehicle, then Otherwise ; Representing a longitudinal dynamic safety distance; penalty factors for collisions and severe unsafe behavior, , , As a result of the safety weight coefficient, , , And The absolute upper bounds of space efficiency rewards, speed deviation penalties and transverse track deviation penalties in single-step simulation are respectively given; The rewards calculation and safety supervision module is used for evaluating each step of simulation result, calculating composite rewards and constructing dynamic safety distance changing along with speed, judging whether collision risk exists in the future, and applying punishment in advance or stopping the current round when the collision risk exists; the composite rewards are expressed as: ; Wherein, the Is the first Total rewards for the individual time steps; Rewarding space efficiency; Punishment for speed deviation; punishment for lateral track deviation; Punishment for collision and safety; The weight coefficient of each part; the training control and data recording module organizes the whole reinforcement learning training process to form a complete closed-loop training process; And the strategy export and application module is used for exporting strategy network parameters into a model file after training convergence, and integrating the model file into an actually applied simulation platform or hardware-in-the-loop system.
2. The intelligent vehicle autonomous lane change decision system based on proximal policy optimization of claim 1, The traffic simulation environment module is used for initializing the simulation environment and constructing traffic flow parameters, namely constructing a double-lane straight line road scene, setting scene parameters including road length, lane number, lane width and lane speed limit, and setting the arrival rate, vehicle type and speed range of the background traffic flow through configuration files.
3. The intelligent vehicle autonomous lane change decision system based on proximal policy optimization of claim 2, When the state observation and feature construction module carries out state vector construction, a six-dimensional state vector is formed, and the six-dimensional state vector is normalized to form a state vector, wherein the six-dimensional state vector is respectively as follows: the longitudinal position of the self-vehicle is used for representing the running progress of the self-vehicle on a road; The longitudinal speed of the vehicle is used for representing the current running speed; The longitudinal relative distance between the front vehicle and the own vehicle of the current lane is used for measuring the following distance; The longitudinal relative speed of the front vehicle and the self vehicle of the current lane is used for judging the change trend of the following distance; The longitudinal relative distance between the front vehicle and the own vehicle of the target lane is used for evaluating the safety gap of the target lane; The longitudinal relative speed of the front vehicle and the self vehicle of the target lane is used for judging the change of the safety gap of the target lane along with time.
4. The intelligent vehicle autonomous lane change decision system based on proximal policy optimization as claimed in claim 3, wherein, In the strategy network sub-module, a state vector is input into a strategy network, the strategy network outputs probability distribution of two actions of lane keeping and lane changing, the strategy network adopts a two-layer or three-layer fully-connected neural network structure, and a non-linear activation function is used by a hidden layer.
5. The intelligent vehicle autonomous lane change decision system based on proximal policy optimization of claim 4, In the training control and data recording module, a four-stage progressive training strategy is adopted, specifically: The low traffic flow simple scene, only a small amount of low-speed vehicles are arranged in front of the vehicle, the background traffic flow is less, and the main purpose is to enable an intelligent agent to learn to carry out basic following and safe lane changing on the premise of keeping a safe distance; Fixing a overtaking task scene, setting a front vehicle with obviously low speed and a relatively idle target lane on the basis of keeping the traffic density low, guiding an intelligent agent to learn to actively initiate lane changing and finish overtaking, and improving task completion efficiency; step three, a multi-station random scene increases the number of background vehicles, randomizes the initial positions and speeds of the own vehicle and the front vehicle, enables the intelligent body to experience various different overtaking distances, speed differences and queue insertion conditions, and enhances the adaptability of strategies to working condition changes; and step four, a high-density complex scene is adopted, the traffic density is improved, a plurality of interference vehicles are added, and the intelligent body is further trained in a complex interaction environment, so that the intelligent body can still keep higher safety and success rate under the conditions of high load and strong interference.
6. The intelligent vehicle autonomous lane change decision system based on the near-end policy optimization of claim 5, wherein the training control and data recording module organizes the entire reinforcement learning training process to include each round of environment initialization, simulation time step propulsion, state acquisition, action execution, reward recording, dominance function calculation and triggering the update of the near-end policy optimization algorithm, and records parameters of each round of training process.
7. Intelligent vehicle autonomous lane change decision method based on near-end policy optimization, characterized in that it is performed using a system according to any of claims 1-6, said method being in particular: S1, initializing simulation environment and traffic flow parameters, setting road geometric information, number of lanes, lane speed limit parameters and arrival rate and speed distribution of background traffic flow, and starting SUMO simulation; s2, setting initial positions and speeds of the vehicle and the front vehicle, starting a round of simulation, and emptying data records of the previous round; s3, collecting states of a vehicle and a front vehicle from a simulation environment by a state observation and feature construction module at each simulation time step, and generating a state vector with fixed dimensionality; S4, inputting the current state vector into a near-end strategy optimization lane change decision module, and outputting a lane keeping or lane change action decision by a strategy network; s5, executing the action in a traffic simulation environment module, and automatically updating the position and the speed of the vehicle by using the SUMO and a built-in driver model thereof; S6, the reward calculation and safety supervision module reads the updated vehicle state, calculates the total reward of the current time step, and judges whether the dynamic safety distance is violated or collision occurs or not; s7, the training control and data recording module records state acquisition, action execution, rewarding recording and value estimation data of the current step, and when the number of experience samples reaches a preset threshold value, a near-end strategy optimization algorithm is called to update the strategy network and the value network for multiple rounds; S8, repeating the steps S2 to S7 until the number of training rounds or the performance index reaches a preset requirement, and finally, deriving a final strategy model by a strategy derivation and application module for subsequent deployment and application.

Description

Intelligent vehicle autonomous lane change decision system and method based on near-end policy optimization Technical Field The invention belongs to the technical field of intelligent traffic and automatic driving, and particularly relates to an intelligent vehicle autonomous lane change decision system and method based on near-end policy optimization. Background With the continuous increase of the maintenance quantity of motor vehicles, the problems of traffic jams and traffic accidents are increasingly prominent. Research and statistics show that the factors of drivers are dominant in the cause of traffic accidents, and the proportion of cognitive errors, decision errors and misoperations in accidents is extremely high, so that the automatic driving decision technology research oriented to safety and efficiency is promoted. The existing automatic driving lane change decision method mainly can be divided into two main categories, namely a decision method based on rules or models, such as a finite state machine, a behavior tree, model Predictive Control (MPC) and the like, and a decision method based on deep reinforcement learning. The method based on the rule or the model has clear structure and easy realization by manually setting threshold values and logic triggering lane changing, decelerating or lane keeping and other actions, but the rule is difficult to be exhausted in a complex traffic scene, the adaptability to environmental change is limited, and emergency is difficult to treat. The decision method based on deep reinforcement learning utilizes a deep neural network to extract characteristics of a high-dimensional environment state, repeated trial and error is carried out in a simulation environment through reinforcement learning, and driving strategies under different scenes are learned. Algorithms such as DQN, DDPG, TD and the like are adopted in the existing research to realize vehicle behavior or lane change decision, and a certain effect is obtained in a simulation environment. The method also has the work of combining traffic simulation platforms such as SUMO and the like with Gym environments, and training a lane changing strategy by adopting a PPO algorithm, so that the feasibility of deep reinforcement learning in lane changing tasks is proved. However, the prior art scheme still has the defects in aspects of state space design, safety constraint modeling, training strategy optimization and the like, such as verification only in a simple scene, lack of a dynamic safety distance and collision pre-judging mechanism, low training convergence speed and the like. On one hand, on the basis of the existing scheme of algorithms such as a Depth Q Network (DQN), a depth deterministic strategy gradient (DDPG), a dual-delay depth deterministic strategy gradient (TD 3) and the like, strategy updating concussion easily occurs under a high-dimensional state and a complex traffic scene, convergence is slow, a large number of simulation samples and longer training time are needed to obtain available strategies, and the training cost is high and the efficiency is low; on the other hand, a plurality of methods simply add collision punishment in a reward function, lack explicit modeling on 'dynamic safety distance' and 'short-time collision risk', enable an intelligent body to frequently generate unsafe behaviors such as rear-end collision and sudden braking in an exploration stage, enable an effective sample proportion to be low, enable strategies to be prone to be in local optimum, enable partial researches to be conducted by adopting a self-built simplified simulation environment or only using a small amount of vehicle information, enable the combination of microscopic characteristics of urban traffic simulation software SUMO and traffic flow to be insufficient, enable factors such as traffic flow density and speed distribution to be fully considered, enable adaptability and generalization capability of a lane change strategy to be insufficient under different working conditions, enable the traditional methods to be directly trained under a fixed high-difficulty scene, enable the intelligent body to frequently collide or fail in a task in the initial training stage, enable the effective reward signal to be sparse, enable the training period to be long, and enable the participation verification cost to be high. Disclosure of Invention In order to solve the technical problems, the invention provides an intelligent vehicle autonomous lane change decision system and method based on near-end policy optimization. In a first aspect, a system includes: The system comprises a traffic simulation environment module, a state observation and feature construction module, a near-end strategy optimization channel switching decision module, a rewarding calculation and safety supervision module, a training control and data recording module and a strategy export and application module; the traffic simulation environment modu