CN-121978920-A - Automatic driving decision control method based on hierarchical strategy network

CN121978920ACN 121978920 ACN121978920 ACN 121978920ACN-121978920-A

Abstract

The invention discloses an automatic driving decision control method based on a layered strategy network, which comprises the steps of collecting automatic driving core data, constructing a driving scene, constructing a layered strategy network architecture based on a driving primitive layered design and a cooperative mechanism of an arbiter-expert strategy, performing digital twin driving training and verification on the layered strategy network, deploying a training mature model to a real vehicle system, and further fine-tuning through real vehicle data feedback. The method integrates the interpretability of the modularized method and the high performance advantage of the end-to-end method, achieves the accurate processing of the conventional driving scene and the extremely long tail scene by means of the layered design of the driving primitive and the digital twin simulation training, and is suitable for the decision control module of the L4 and above level automatic driving system.

Inventors

An Mande
LIU SHIYU
ZHANG HUALONG
XING YU
LIU YUXIANG
HAN YIMEI
LIU CHANG
ZHAO LIANG

Assignees

沈阳航空航天大学

Dates

Publication Date: 20260505
Application Date: 20260113

Claims (4)

1. The automatic driving decision control method based on the hierarchical strategy network is characterized by comprising the following steps of: Collecting automatic driving core data; Constructing a driving scene; Constructing a hierarchical policy network architecture based on a collaborative mechanism of a driving primitive hierarchical design and an arbiter-expert policy: And deploying the model with mature training to a real vehicle system, and further fine-tuning through real vehicle data feedback.
2. The hierarchical policy network based automatic driving decision control method according to claim 1, wherein the driving primitive hierarchical design comprises: establishing driving primitive dividing principles, wherein the driving primitive dividing principles comprise a mutual exclusivity principle that each primitive corresponds to a unique driving scene; Dividing driving primitives into four layers according to scene frequency and priority: the 0 layer-system state primitive is a non-driving scene, and covers the system closing, static standby and fault states, and has the highest priority; The 1 layer-safety covering primitive is an extreme emergency scene, comprising emergency braking and danger avoiding steering, wherein the driving time is less than 1 percent, and the priority is high; The 2-layer tactical transition primitive-short-term transition scene comprises lane change, confluence and crossing traffic, which takes up about 1% of driving time, and has medium priority; 3 layers-steady state driving primitive-conventional default scene including lane keeping, following driving, low-speed creeping, which takes up about 99% of driving time, and has lowest priority; setting a hierarchy preemption rule, wherein a high-layer primitive covers a low-layer primitive, namely, a 0 layer covers all hierarchies, a1 layer covers 2-3 layers, and a 2 layer covers 3 layers, so that the safety response priority in an extreme scene is ensured.
3. The hierarchical policy network based automatic driving decision control method according to claim 1, wherein said constructing a hierarchical policy network architecture comprises: constructing a shared perception backbone network, fusing camera, laser radar, GPS and vehicle dynamics data, and generating a unified aerial view environment representation as a system global environment information source; designing an arbiter module, adopting a high-level neural network classifier, inputting BEV environment characterization and navigation targets, and outputting a candidate expert strategy adapting to the current scene; deploying a strategy gateway and a verifier module, implanting preset safety rules and traffic regulations, and carrying out safety verification on candidate expert strategies; Constructing an expert strategy group, training a dedicated end-to-end model for each driving primitive, and only processing the control instruction output of the corresponding scene by the expert strategy; And establishing a state transition mechanism, abstracting the hierarchical strategy network into a finite state machine, and controlling smooth transition among driving primitives by an arbiter.
4. The hierarchical policy network based automatic driving decision control method according to claim 1, wherein the digital twin driving training and verification comprises: constructing a virtual training environment of a diversified scene; generating a scene data set based on the virtual training environment; Training a network by a sub-module, namely training a shared perception backbone network firstly, respectively training expert strategies to adapt to a corresponding scene, and finally training an arbiter; And performing closed-loop verification through a digital twin platform, simulating various driving scenes, testing the decision accuracy and the safety performance of the system, and iteratively optimizing network parameters.

Description

Automatic driving decision control method based on hierarchical strategy network Technical Field The invention belongs to the technical field of automatic driving and artificial intelligence, and particularly relates to an automatic driving decision control method based on a hierarchical strategy network. Background The core goal of the autopilot system is to achieve safe and efficient autonomous navigation. The prior art is mainly divided into two major architecture concepts, namely a modularization method and an end-to-end method. The modularized method disassembles the driving task into independent sub-modules such as perception, prediction, planning, control and the like, and all the sub-modules cooperate according to a standardized flow. The sensing module is responsible for processing the sensor data and identifying surrounding targets, the predicting module predicts the future positions of the targets, the planning module generates expected tracks, and the control module converts the tracks into physical control instructions. The method has the advantages of strong interpretability, easy implantation of safety rules, support of parallel development and the like, but also has the problems of accumulated errors among modules, high brittleness of the system, suboptimal overall performance and the like, and is difficult to cope with complex dynamic driving environments. The end-to-end method directly maps the sensor input and the driving control instruction through a single neural network, and intermediate submodule splitting is not needed. The method can learn the complex environment association which is difficult to be covered by human programming, has the potential of exceeding human performance, and can reduce the cost of manual engineering. However, it also has the key drawbacks of "black box" problem (opaque internal working mechanism), poor adaptability of long-tail scene (poor performance due to insufficient rare scene data), security unauthentication (inability to implant hard security rules), etc. The prior two architectures have obvious limitations that the modularization method is not enough in generality although being trusted, and the end-to-end method has excellent performance but doubtful safety. This makes it difficult for them to meet the dual core requirements of autopilot for safety and efficiency at the same time. Especially in extremely long tail scenes (such as emergency braking, danger avoiding steering, complex intersection passing and the like), the conventional method is a key bottleneck for restricting the automatic driving technology from falling to the ground because decision errors are caused by accumulation of module errors or effective response strategies cannot be formed due to insufficient data. In summary, providing a hierarchical strategy network method based on digital twin driving and oriented to an extreme scene of automatic driving is a problem to be solved urgently. Disclosure of Invention In view of the above, the present invention is directed to providing an automatic driving decision control method based on a hierarchical policy network, so as to solve the problems that the existing automatic driving decision architecture is difficult to balance between safety and high performance, and the capability of handling extremely long tail scenes is insufficient. The technical scheme provided by the invention is that the automatic driving decision control method based on the hierarchical strategy network comprises the following steps: Collecting automatic driving core data; Constructing a driving scene; Constructing a hierarchical policy network architecture based on a collaborative mechanism of a driving primitive hierarchical design and an arbiter-expert policy: And deploying the model with mature training to a real vehicle system, and further fine-tuning through real vehicle data feedback. Preferably, the driving primitive layer design includes: establishing driving primitive dividing principles, wherein the driving primitive dividing principles comprise a mutual exclusivity principle that each primitive corresponds to a unique driving scene; Dividing driving primitives into four layers according to scene frequency and priority: the 0 layer-system state primitive is a non-driving scene, and covers the system closing, static standby and fault states, and has the highest priority; The 1 layer-safety covering primitive is an extreme emergency scene, comprising emergency braking and danger avoiding steering, wherein the driving time is less than 1 percent, and the priority is high; The 2-layer tactical transition primitive-short-term transition scene comprises lane change, confluence and crossing traffic, which takes up about 1% of driving time, and has medium priority; 3 layers-steady state driving primitive-conventional default scene including lane keeping, following driving, low-speed creeping, which takes up about 99% of driving time, and has lowest priority; setti