CN-121997461-A - Automatic driving decision method based on space-time diagram network and risk perception reinforcement learning

CN121997461ACN 121997461 ACN121997461 ACN 121997461ACN-121997461-A

Abstract

The invention relates to the technical field of automatic driving and intelligent transportation, and discloses an automatic driving decision method based on space-time diagram network and risk perception reinforcement learning, which comprises the steps of obtaining states of an automatic driving vehicle and surrounding traffic participants, abstracting the automatic driving vehicle and the surrounding traffic participants into traffic diagram nodes, and generating traffic state representation containing node time sequence characteristics and connection relations; the method comprises the steps of carrying out information transmission and clustering on node time sequence characteristics to generate automatic driving vehicle node representation, constructing an observation state, carrying out Bayesian inference on potential behaviors of surrounding traffic participants as hidden variables to output risk situation descriptions, finally fusing the automatic driving vehicle node representation and the risk situation descriptions to generate an enhancement state which is used as reinforcement learning input, taking the risk situation descriptions as constraint and maximizing accumulation expectation as targets to realize optimal driving decision, and realizing safe, efficient and robust decision control of automatic driving vehicles in complex traffic scenes.

Inventors

WANG HAOCONG
JIANG LEI
WANG XIAOMIN

Assignees

西南交通大学

Dates

Publication Date: 20260508
Application Date: 20260128

Claims (10)

1. The automatic driving decision method based on the space-time diagram network and the risk perception reinforcement learning is characterized by comprising the following steps of: S1, acquiring state information of an automatic driving vehicle and corresponding surrounding traffic participants, abstracting the automatic driving vehicle and the corresponding surrounding traffic participants into traffic map nodes in a current traffic scene, acquiring connection relations among the nodes, and generating traffic state representation comprising node time sequence characteristics and the connection relations; S2, based on the traffic state representation, carrying out information propagation and clustering on the node time sequence characteristics to generate an automatic driving vehicle node representation integrating surrounding traffic participant space interaction; S3, constructing an observed traffic state, simultaneously taking potential behaviors or behavior intentions of surrounding traffic participants as hidden variables, performing Bayesian inference, and calculating posterior probability distribution of the hidden variables to obtain risk situation description of potential risk levels in the current traffic scene; And S4, fusing the automatic driving vehicle node representation with the risk situation description, generating an enhanced state representation containing traffic interaction situation information and risk uncertainty information, inputting the enhanced state representation as a state of reinforcement learning, and realizing the optimal driving decision of the automatic driving vehicle by taking the risk situation description as a constraint and taking the maximum accumulation expectation as a target.
2. The automatic driving decision method based on space-time diagram network and risk perception reinforcement learning according to claim 1, wherein the node timing characteristics are recursively updated by a gating cycle unit and expressed as: Wherein, the Representing nodes At the moment of time Is used for the time sequence characteristics of the (a), Representing a non-linear state update function, Representing nodes At the moment of time Is used for the time sequence characteristics of the (a), Representing nodes At the moment of time Is provided with a status information of (a), Representing nodes At the moment of time The longitudinal position of the lane in which it is located, Representing nodes At the moment of time Is used for the speed of the (c) in the (c), Representing nodes At the moment of time Is used for the acceleration of the vehicle, Representing nodes At the moment of time The mark of the lane where the vehicle is located, Representing nodes At the moment of time The amount of lateral offset from the lane centerline, Representing nodes At the moment of time Whether or not to be allowed to perform the constraint of the channel behavior, Representing nodes At the moment of time The lateral position of the lane in which it is located, Representing an autonomous vehicle, the remainder being surrounding traffic participants, Representing the transpose operation.
3. The automatic driving decision method based on space-time diagram network and risk perception reinforcement learning according to claim 1, wherein the connection relationship between the nodes is at least one of a relative distance relationship, a relative position relationship and a lane topology relationship.
4. The automatic driving decision method based on space-time diagram network and risk perception reinforcement learning according to claim 1, wherein step S2 specifically comprises: S21, determining candidate neighbors of each node by adopting a distance threshold value or a Top-K method, and generating a candidate neighbor set of each node , Representing nodes Is a candidate neighbor set of (a); S22, calculating interaction scores among nodes based on the candidate neighbor set, namely: Wherein, the Is shown at the moment Node And node The score of the interaction between the two, A function representing the correlation of interactions between nodes or the strength of influence, Representing nodes At the moment of time Is used for the time sequence characteristics of the (a), Representing nodes At the moment of time Is used for the time sequence characteristics of the (a), Is shown at the moment Node And node The relative status information between the two, Representing nodes And node The relative position of the two longitudinal directions, Representing nodes And node The transverse relative position between the two, Representing nodes And node The relative velocity between the two, Representing nodes And node The relative acceleration between the two, Representing nodes And node The spatial relative relationship of the lanes in which they are located, Indicating the operation of the transpose, The weight vector is represented by a weight vector, Representing a non-linear activation function, The operation of the splice is indicated and, Respectively representing a weight matrix and a bias vector; S23, based on interaction scores among nodes, selecting a candidate neighbor set Top-K2 is selected as a final neighbor set , Representing nodes K2 represents the number of final neighbor sets participating in the spatial interactive aggregation; S24, mapping interaction scores among nodes into interaction weights based on the final neighbor set, namely: Wherein, the Representing nodes And node The weight of the interaction between the two, Representing an index the function of the function is that, Is shown at the moment Node And node Grading interaction between the two; s25, based on the final neighbor set, combining the interaction weights, carrying out weighted aggregation on the time sequence characteristics of all nodes to generate the space interaction characteristics of the nodes, and obtaining the automatic driving vehicle node representation integrating the surrounding traffic participant space interaction, namely: Wherein, the Is shown at the moment Node Is provided with a spatial interaction characteristic of (a), Is shown at the moment A set of spatially-interacted features of the node, Is shown at the moment The spatial interaction characteristics of the autonomous vehicle nodes are a representation of the autonomous vehicle nodes that incorporate spatial interactions of surrounding traffic participants, Is shown at the moment Node Is provided.
5. The automatic driving decision method based on space-time diagram network and risk perception reinforcement learning according to claim 4, wherein the formula for determining the candidate neighbors of each node by using the distance threshold is: Wherein, the Representing nodes Is used to determine the candidate neighbor of (c), Representing nodes At the moment of time The longitudinal position of the lane in which it is located, Representing nodes At the moment of time The longitudinal position of the lane in which it is located, Representing a distance threshold.
6. The automatic driving decision method based on space-time diagram network and risk perception reinforcement learning according to claim 4, wherein the Top-K method is adopted to determine the formula of the candidate neighbor of each node as follows: Wherein, the Representing nodes Is used to determine the candidate neighbor of (c), Representing an upper limit on the number of candidate neighbors, Representing nodes At the moment of time The longitudinal position of the lane in which it is located, Representing nodes At the moment of time The longitudinal position of the lane where it is located.
7. The automatic driving decision method based on space-time diagram network and risk perception reinforcement learning according to claim 1, wherein step S3 specifically comprises: S31, constructing an observed traffic state, namely: Wherein, the Indicating that the traffic state is observed, Indicating that the autonomous vehicle node is at time Is provided with a status information of (a), The operation of the splice is indicated and, Is shown at the moment An autonomous vehicle node representation that merges surrounding traffic participant space interactions, The set pooling operator is represented as a set, Representing the final set of neighbors of the autonomous vehicle node, Representing nodes At the moment of time Is provided with a status information of (a), Is shown at the moment Automatic driving vehicle node and node The relative status information between the two, Is shown at the moment Is defined in the specification; S32, taking potential behaviors or behavior intentions of surrounding traffic participants as hidden variables, carrying out Bayesian inference in combination with observing traffic states, and calculating posterior probability distribution of the hidden variables, namely: Wherein, the Representing hidden variables Is a function of the prior probability distribution of (c), Expressed in given hidden variables Observing the traffic state under the condition Is a function of the likelihood of a (c) in the set, Representing the traffic state at a given observation Lower hidden variable Is a function of the posterior probability distribution of (c), Representing observed traffic conditions Probability distribution functions of (2); s33, based on posterior probability distribution of hidden variables, taking neighbor nodes of the automatic driving vehicle nodes as target nodes, calculating relative longitudinal and transverse displacement of the automatic driving vehicle nodes and the target nodes in a prediction time domain, and simultaneously combining a longitudinal and transverse safety envelope half-threshold value to construct a joint safety degree so as to acquire the most dangerous value of collision of the automatic driving vehicle and the target nodes in the prediction time domain, generating an overall risk assessment value, and describing the overall risk assessment value as a risk situation of potential risk level in a current traffic scene.
8. The automatic driving decision method based on space-time diagram network and risk perception reinforcement learning according to claim 7, wherein step S33 specifically comprises: s331, calculating the predicted longitudinal and transverse positions of the target node at the future time under the condition of given hidden variables, namely: Wherein, the Expressed in given hidden variables Target node under condition At future time Is provided with a prediction of the longitudinal position of the vehicle, Representing a target node At the moment of time Is arranged at the longitudinal position of the (c), Representing a target node At the moment of time Is used for the speed of the (c) in the (c), Representing the continuous time variable in the prediction domain, Expressed in given hidden variables Target node under condition Is used for the acceleration of the vehicle, Expressed in given hidden variables Target node under condition At future time Is used to predict the lateral position of the vehicle, Representing a target node At the moment of time The lateral position of the lane in which it is located, Representing a target node The transverse coordinates of the center line of the lane where it is located, Representing a predicted time domain total length; s332, calculating the relative longitudinal displacement and the lateral displacement of the automatic driving vehicle node and the target node at the future moment under the given hidden variable condition based on the predicted longitudinal and lateral positions of the target node at the future moment under the given hidden variable condition, namely: Wherein, the Expressed in given hidden variables Automatic driving vehicle node and target node under condition At future time Is arranged in the longitudinal direction of the shaft, Indicating that the autonomous vehicle is at a future time Is arranged at the longitudinal position of the (c), Expressed in given hidden variables Automatic driving vehicle and target node under condition At future time Is arranged in the transverse direction of the frame, Representing an autonomous vehicle node at a future time The transverse position of the lane where the vehicle is located; S333, acquiring the geometric size of the vehicle, constructing a collision envelope half-threshold, and determining an automatic driving vehicle node and a target node Envelope half-threshold for collision in longitudinal direction For determining nodes of automatically driven vehicles and target nodes Envelope half-threshold for collision in transverse direction The method comprises the following steps: Wherein, the 、 Respectively represent the nodes of the automatic driving vehicle and the target nodes Is provided with a pair of side rails, 、 Respectively represent the nodes of the automatic driving vehicle and the target nodes Is a vehicle body width of (a); S334, judging relative longitudinal displacement Whether or not it is less than or equal to the envelope half threshold And relatively transversely displace Less than or equal to the envelope half threshold If yes, automatically driving the vehicle node and the target node A collision occurs, and step S335 is performed, otherwise, no collision occurs; S335, introducing a safety margin, and calculating a longitudinal safety envelope half-threshold value and a transverse safety envelope half-threshold value, namely: Wherein, the Representing an autonomous vehicle node and a target node The longitudinal safety envelope half-threshold of the collision, Representing an autonomous vehicle node and a target node The lateral safety envelope half-threshold for a collision, 、 All represent safety margins; s336, calculating the joint safety degree based on the longitudinal safety envelope half threshold value and the transverse safety envelope half threshold value, namely: Wherein, the Representing an autonomous vehicle and a target node At future time The degree of joint safety of the collision at the time, Indicating that the maximum value is taken; S337, calculating the most dangerous value in the prediction time domain based on the joint safety degree, namely: Wherein, the Representing an autonomous vehicle node and a target node At the moment of time The most dangerous value of the time is that, Representing to take the minimum value; S338, constructing continuous risk quantity corresponding to the target node based on the most dangerous value in the prediction time domain, namely: Wherein, the Representing a target node At the moment of time The amount of risk is continued at that time, Representing a truncation operator; S339, based on the continuous risk quantity corresponding to the target node, acquiring the maximum continuous risk quantity, and generating an overall risk assessment, namely: Wherein, the Is shown at the moment I.e. a risk situation description of the potential risk level.
9. The automatic driving decision method based on space-time diagram network and risk perception reinforcement learning according to claim 1, wherein step S4 specifically comprises: s41, fusing the automatic driving vehicle node representation with the risk situation description to generate an enhanced state representation containing traffic interaction situation information and risk uncertainty information, namely: Wherein, the Is shown at the moment Is a representation of the enhanced state of (c), The function of the fusion operation is represented as, Is shown at the moment The spatial interaction characteristics of the autonomous vehicle nodes are a representation of the autonomous vehicle nodes that incorporate spatial interactions of surrounding traffic participants, Is shown at the moment The overall risk assessment of (a), i.e. a risk situation description of the potential risk level, 、 All of which represent the parameters that can be learned, Representing a splicing operation; S42, constructing driving behavior actions, namely: Wherein, the Is shown at the moment Is used for the driving behavior of the vehicle, Is shown at the moment The time-discrete longitudinal driving behaviour, Is shown at the moment The time-discrete lateral driving behaviour, Is shown at the moment Discrete lane changing behavior; s43, constructing a reward function based on the enhancement state representation and the driving behavior action, namely: Wherein, the Representing a bonus function that is based on the received data, 、、、 All of which represent non-negative weight coefficients, Is shown at the moment A gain term that is close to the desired speed, 、 Respectively at the time Is used for controlling the longitudinal acceleration and the impact amplitude of the vehicle, Is shown at the moment The penalty term of the time-varying channel, Is shown at the moment Lane center offset penalty term; s44, constructing a safety cost constraint based on the risk situation description, namely: Wherein, the Is shown at the moment Is added to the safety cost of the (c) in the (c), Indicating that the maximum value is taken, Is shown at the moment Is used for the overspeed constraint of (a), Is shown at the moment Is used for the channel changing constraint; S45, constructing an optimization target with a cumulative expectation maximization based on the reward function and the safety cost constraint; And S46, when the accumulation expectation is maximum, using the corresponding driving behavior as an optimal driving decision strategy of the automatic driving vehicle so as to realize the optimal driving decision of the automatic driving vehicle.
10. The automatic driving decision method based on space-time diagram network and risk perception reinforcement learning according to claim 9, wherein the optimization objective of accumulating expectation maximization is: Wherein, the Representing the optimization objective of accumulating the desired maximization, A driving decision strategy is represented and is used to determine, The desired operator is represented by a representation of the desired operator, Representing a security cost penalty factor.

Description

Automatic driving decision method based on space-time diagram network and risk perception reinforcement learning Technical Field The invention relates to the technical field of automatic driving and intelligent transportation, in particular to an automatic driving decision method based on space-time diagram network and risk perception reinforcement learning. Background With the continuous development of automatic driving technology, the autonomous decision-making ability of an automatic driving vehicle in a complex traffic environment is becoming an important point of research and application. In actual traffic scenarios, autonomous vehicles are often required to interact with multiple traffic participants simultaneously, such as other vehicles, pedestrians, or non-vehicles, and to make driving decisions in a dynamic environment of multiple lanes, dense traffic flows, etc. The traffic scene has the characteristics of large number of traffic participants, complex mutual influence relationship, continuous change of state along with time and the like, and has higher requirements on the safety and stability of an automatic driving decision method. In the prior art, automatic driving decisions are generally based on modeling and analyzing traffic scene states, and corresponding driving decisions are generated by processing vehicle self states and surrounding environment information. However, in practical applications, it is found that there are not only directly observable relations of positions, speeds, etc. but also complex interactive relations formed by lane structures, traffic rules, and interaction influences between traffic participants. Meanwhile, the driving behavior of the traffic participant has certain uncertainty, and the future behavior change of the traffic participant is often difficult to accurately describe only through the certainty state at the current moment. Aiming at the problems, some existing methods attempt to improve the automatic driving decision performance by introducing mechanisms such as structural modeling, behavior prediction or risk assessment. However, in a complex dynamic traffic scenario, how to simultaneously describe the interaction relationship between traffic participants, the evolution characteristics of states with time and the potential behavior risks in a unified decision frame still faces a certain difficulty. Especially, under the condition of considering the driving safety, the passing efficiency and the driving comfort, the conventional decision method is difficult to effectively restrict the traffic risk, and the insufficient response of the decision process to the potential dangerous scene is easy to cause. Disclosure of Invention Aiming at the defects in the prior art, the invention provides an automatic driving decision method based on space-time diagram network and risk perception reinforcement learning, and the safety, high efficiency and robust driving decision control of the automatic driving vehicle under a complex traffic scene is realized by constructing a structured traffic state representation with time perception capability and performing display modeling on the behavior uncertainty of surrounding traffic participants, so that the unsafe problem of the automatic driving vehicle in the actual traffic environment decision in the prior art is solved. In order to achieve the aim of the invention, the invention adopts the following technical scheme: an automatic driving decision method based on space-time diagram network and risk perception reinforcement learning comprises the following steps: S1, acquiring state information of an automatic driving vehicle and corresponding surrounding traffic participants, abstracting the automatic driving vehicle and the corresponding surrounding traffic participants into traffic map nodes in a current traffic scene, acquiring connection relations among the nodes, and generating traffic state representation comprising node time sequence characteristics and the connection relations; S2, based on the traffic state representation, carrying out information propagation and clustering on the node time sequence characteristics to generate an automatic driving vehicle node representation integrating surrounding traffic participant space interaction; S3, constructing an observed traffic state, simultaneously taking potential behaviors or behavior intentions of surrounding traffic participants as hidden variables, performing Bayesian inference, and calculating posterior probability distribution of the hidden variables to obtain risk situation description of potential risk levels in the current traffic scene; And S4, fusing the automatic driving vehicle node representation with the risk situation description, generating an enhanced state representation containing traffic interaction situation information and risk uncertainty information, inputting the enhanced state representation as a state of reinforcement learning, and realizi