CN-121990000-A - Automatic driving decision method, system, terminal equipment and medium

CN121990000ACN 121990000 ACN121990000 ACN 121990000ACN-121990000-A

Abstract

The invention provides an automatic driving decision method, an automatic driving decision system, terminal equipment and a medium, which comprise the steps of obtaining environmental state information of an automatic driving vehicle and predicting a movement track of an obstacle to generate a candidate action scheme, checking the candidate action scheme, removing the scheme which violates a core traffic rule to obtain a compliance scheme set, constructing a constraint compliance Markov decision process model, limiting a dynamic action space to the compliance scheme set, calculating reward values of all schemes in the compliance scheme set by adopting a sectional type reward function, solving the constraint compliance Markov decision process model to obtain an optimal strategy, outputting the optimal action scheme, and controlling the automatic driving vehicle to execute the optimal action scheme. The invention can overcome the fundamental structural contradiction that the damage is not reduced to the maximum degree due to adherence to traffic rules and damage to traffic route rules and cause liability ambiguity due to pursuit of damage optimization in the prior art, has absolute rule compliance, and can actively pursue collision damage minimization.

Inventors

PENG LIQIANG
MAO XINZHI
SUN BAOXUE

Assignees

湖南师范大学

Dates

Publication Date: 20260508
Application Date: 20260408

Claims (10)

1. An automatic driving decision method applied to an automatic driving vehicle facing unavoidable collision risk, comprising: acquiring environmental state information of an automatic driving vehicle, predicting a motion trail of surrounding obstacles, and generating a set of discrete candidate action schemes based on the dynamics state of the automatic driving vehicle; carrying out Boolean logic verification on the candidate action schemes one by one based on a preset core traffic rule, and removing the candidate action schemes which violate the core traffic rule to obtain a compliance scheme set which is formed by schemes completely conforming to the core traffic rule; Constructing a constraint compliance Markov decision process model, and limiting the dynamic action space of the constraint compliance Markov decision process model to the compliance solution set; Calculating rewarding values of all schemes in the compliance scheme set by adopting a sectional rewarding function in the constraint compliance Markov decision process model, wherein the sectional rewarding function sequentially calculates a main ethical target cost and a secondary ethical target cost, the main ethical target cost represents the total severity of collision injury estimated based on collision energy and vulnerability of a collision object, and the secondary ethical target cost represents the variance of expected injury values of all impacted parties to measure fairness of injury distribution; And solving the constraint compliance Markov decision process model to obtain an optimal strategy, outputting an optimal action scheme corresponding to the maximum expected cumulative rewards in the compliance scheme set, and controlling the automatic driving vehicle to execute the optimal action scheme.
2. The automated driving decision method of claim 1, wherein calculating the prize value for each of the compliance solution set using a segmented prize function comprises: by calculation formula Obtaining the prize value Wherein, the method comprises the steps of, Representing a current environmental state of the autonomous vehicle when the autonomous vehicle is at risk of collision; Representing an action plan in the compliance plan set; representing the current environmental state Down execution action scheme Is a negative cost; representing primary ethical objective costs; Representing a secondary ethical objective cost; representing a preset secondary target weight; A hierarchical activation function is represented for controlling the introduction timing of the secondary ethical objective costs.
3. The automated driving decision method of claim 2, wherein the layered activation function The value logic of (a) is specifically as follows: Calculating the main ethical target cost of all candidate action schemes in the compliance scheme set, and obtaining the current minimum main ethical target cost; When candidate action scheme Corresponding primary ethical objective cost The difference value from the current minimum main cost is smaller than a preset target tolerance threshold value When determining candidate action schemes Entering a secondary objective optimization stage to Assigning a value of 1; When candidate action scheme Corresponding primary ethical objective cost The difference value from the current minimum main cost is larger than or equal to the preset target tolerance threshold value When it will The value is 0.
4. The automated driving decision method of claim 3, wherein in the step of obtaining environmental status information of the automated driving vehicle, the environmental status information includes relative coordinates of positions of surrounding obstacles, speed, acceleration, curvature, and a violation level attribute of the potential collision object, and the violation level attribute is used to quantify a traffic violation severity of the potential collision object.
5. The automated driving decision method of claim 4, wherein in the step of performing boolean logic verification on the candidate action schemes one by one based on preset core traffic rules, the core traffic rules are stored in a remotely updatable rule base in a structured data format, and each core traffic rule comprises a rule unique identifier, road sign marking applicable conditions, and a machine executable trajectory judgment function.
6. The automatic driving decision method according to claim 1, wherein when the compliance solution set is an empty set, an extreme dilemma emergency strategy is triggered and a preset maximum emergency braking command is directly output.
7. The automatic driving decision method according to claim 1, characterized in that the automatic driving decision method further comprises: and encrypting and storing the input data, the intermediate calculation result and the finally output optimal action scheme in the whole decision process to form a data chain for audit.
8. An automated driving decision system, comprising: The sensing and predicting module is used for acquiring the environmental state information of the automatic driving vehicle and predicting the motion trail of surrounding obstacles; a scenario generation module for generating a set of discrete candidate action scenarios based on the dynamics state of the autonomous vehicle; The two-stage decision engine comprises a first-stage filtering unit and a second-stage optimizing unit, wherein the first-stage filtering unit is used for carrying out Boolean logic verification on the candidate action schemes one by one based on a preset core traffic rule, eliminating the candidate action schemes which violate the core traffic rule to obtain a compliance scheme set consisting of schemes which completely accord with the core traffic rule, the second-stage optimizing unit is used for constructing a constraint compliance Markov decision process model, limiting the dynamic action space of the constraint compliance Markov decision process model to the compliance scheme set, adopting a segmented rewarding function to calculate rewarding values of all schemes in the compliance scheme set in the constraint compliance Markov decision process model, solving the constraint compliance Markov decision process model to obtain an optimal strategy, and outputting an optimal action scheme corresponding to the maximum expected accumulated rewarding in the compliance scheme set; and the execution control module is used for controlling the automatic driving vehicle to execute the optimal action scheme.
9. A terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the method according to any of claims 1 to 7 when executing the computer program.
10. A computer readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the method according to any one of claims 1 to 7.

Description

Automatic driving decision method, system, terminal equipment and medium Technical Field The invention belongs to the technical field of intersection of artificial intelligence and vehicle engineering, and particularly relates to an automatic driving decision method, an automatic driving decision system, terminal equipment and a medium, which are particularly suitable for an automatic driving scheme capable of carrying out ethical decision when the collision risk is unavoidable. Background With the popularization of the automatic driving technology to public roads, when the high-level automatic driving system faces unavoidable collision risks, the core challenges of the high-level automatic driving system are changed from simple perception and control precision problems to more complex social compatibility and responsibility ethics problems. When the system predicts that the injury cannot be avoided, the decision logic of the algorithm directly decides the allocation of the injury. For this problem, the prior art is largely divided into two types of paths. The first is an absolute rule-based path, which performs only traffic rule compliance checking, and if all alternatives violate or there are multiple different injury levels within the compliance envelope, the system only mechanically performs the default maximum emergency braking. The disadvantage of this solution is that the algorithm function is negative, giving up the optimization opportunity to actively reduce injuries with vehicle mobility within the compliance boundary, resulting in a suboptimal safety outcome. The second type is a comprehensively weighted multi-objective optimized path, and the method converts the violation degree and various injury severity indexes into cost items with the same magnitude, and carries out linear weighted calculation through a single cost function. The disadvantage of this solution is that the absolute rigidity of the traffic rules is compromised, the system may take the initiative to go beyond the line or reverse in order to pursue overall injury minimization, not only leading to unpredictable algorithmic behaviour, triggering public trust crisis, but also extremely obscuring the definition of responsibility after an accident. The above-mentioned drawbacks have prompted the need in the art for a new architecture that allows both absolute compliance and active optimization, and the present invention is presented in this context. Disclosure of Invention The invention provides an automatic driving decision method, an automatic driving decision system, terminal equipment and a medium, which aim to solve the technical problems of solving the problems that in an automatic driving ethical decision, the damage caused by adhering to traffic rules is not reduced to the maximum extent, and the fundamental structural contradiction that the traffic bottom line rules are destroyed and responsibility is fuzzy is caused because the damage is pursued to be optimal in the prior art is overcome, so that the automatic driving decision method, the system, the terminal equipment and the medium have absolute rule compliance and can actively seek collision damage minimization in legal compliance space. In a first aspect, the present invention provides an autopilot decision method for use with an autopilot vehicle at risk of unavoidable collisions, the method comprising the steps of: acquiring environmental state information of an automatic driving vehicle, predicting a motion trail of surrounding obstacles, and generating a set of discrete candidate action schemes based on the dynamics state of the automatic driving vehicle; Carrying out Boolean logic verification on candidate action schemes one by one based on a preset core traffic rule, and removing the candidate action schemes which violate the core traffic rule to obtain a compliance scheme set which is composed of schemes which completely accord with the core traffic rule; constructing a constraint compliance Markov decision process model, and limiting the dynamic action space of the constraint compliance Markov decision process model to a compliance scheme set; Calculating rewarding values of all schemes in a compliance scheme set by adopting a sectional rewarding function in a constraint compliance Markov decision process model, wherein the sectional rewarding function sequentially calculates primary ethical target cost and secondary ethical target cost, the primary ethical target cost represents total severity of collision injury estimated based on collision energy and vulnerability of a collision object, and the secondary ethical target cost represents variance of expected injury values of all impacted parties so as to measure fairness of injury distribution; and solving the constraint compliance Markov decision process model to obtain an optimal strategy, outputting an optimal action scheme corresponding to the maximum expected cumulative rewards in the compliance scheme set, a