CN-122021782-A - Training method, planning method, equipment and medium for automatic driving planning model

CN122021782ACN 122021782 ACN122021782 ACN 122021782ACN-122021782-A

Abstract

The present application relates to the field of autopilot technology, and more particularly to a training method for an autopilot planning model, a planning method, apparatus and medium for an autopilot vehicle. The method comprises the steps of obtaining traffic environment information at the current moment, historical state information of traffic participants and a planning track of a host vehicle from the current moment to the next moment, which is generated by a planning model, determining an associated participant from the traffic participants based on the planning track of the host vehicle and the historical state information of the traffic participants, predicting a response track of the associated participant from the current moment to the next moment by using a pre-constructed response model based on the traffic environment information at the current moment, the planning track of the host vehicle and the historical state information of the associated participant, updating the state information of the host vehicle and the state information of the associated participant by using the planning track of the host vehicle and the response track of the associated participant, and repeating the steps at the next moment serving as a new current moment to train the planning model.

Inventors

REN SHAOQING
CHENG JIN
CHENG ZHENGXIN
LIU GUOYI
WANG CHENGFA
CHEN KUNSHENG
YANG JIN

Assignees

安徽蔚来智驾科技有限公司

Dates

Publication Date: 20260512
Application Date: 20260413

Claims (13)

1. The training method of the automatic driving planning model is characterized by comprising the following steps of: acquiring traffic environment information at the current moment, historical state information of traffic participants and a planned track of a host vehicle from the current moment to the next moment, which is generated by a planning model; Determining at least one associated participant from the traffic participants based on the planned trajectory of the host vehicle and the historical state information of the traffic participants; predicting a response track of the at least one associated participant from the current time to the next time by using a pre-constructed response model based on the traffic environment information of the current time, the planned track of the host vehicle and the historical state information of the at least one associated participant; updating status information of the host vehicle and the associated participant using the planned trajectory of the host vehicle and the response trajectory of the at least one associated participant, and And repeatedly executing the steps by taking the next moment as a new current moment so as to train the planning model.
2. The method of claim 1, wherein determining at least one associated participant from the traffic participants comprises: And screening out traffic participants which have collision risks with the host vehicle in a time period from the current moment to the next moment from the traffic participants as the associated participants.
3. The method of claim 2, wherein screening traffic participants from the traffic participants who are at risk of collision with the host vehicle during a period of time from the current time to the next time comprises: for each traffic participant, acquiring its original trajectory recorded in the dataset for the period of time; Judging whether the distance between the planned track of the host vehicle and the original track of the traffic participant is smaller than a preset safety threshold frame by frame If the judgment result of at least one frame is yes, determining that the traffic participant and the host vehicle have collision risk.
4. The method of claim 1, wherein determining at least one associated participant from the traffic participants comprises: and eliminating targets which run in the same direction as the host vehicle and are positioned in front of the host vehicle from the traffic participants.
5. The method of claim 1, wherein the response model is a pre-trained world model for simulating driving behavior of the associated participant in the real physical world and outputting its response trajectory in a future period based on the input traffic environment information, the historical state information of the associated participant, and the planned trajectory of the host vehicle.
6. The method of claim 1, wherein the method further comprises: and generating a planned track of the host vehicle from the current moment to the next moment by the planning model every a preset number of time frames, wherein the preset number of time frames are spaced between the current moment and the next moment.
7. The method of claim 1, wherein the method further comprises: for other traffic participants not determined to be associated participants, their status information from the current time to the next time is determined based on their original trajectories recorded in the dataset.
8. The method of claim 1, wherein the method further comprises: And constructing a reward signal of reinforcement learning training based on the planned track of the host vehicle and the response track of the associated participant, wherein the reward signal is used for evaluating the quality of the planned track generated by the planning model.
9. The method of claim 8, wherein constructing a reward signal for reinforcement learning training comprises: constructing a negative rewards signal when a collision occurs between the planned trajectory of the host vehicle and the response trajectory of the at least one associated participant A positive reward signal is constructed when deduction is made according to the planned trajectory of the host vehicle and the response trajectory of the at least one associated participant resulting in a safe passage of both.
10. The method of claim 1, wherein updating the state information of the host vehicle and the associated participant with the planned trajectory of the host vehicle and the response trajectory of the at least one associated participant comprises: And respectively storing the planned track of the host vehicle and the response track of the associated participant as a state sequence of the host vehicle and the associated participant in a time period from the current moment to the next moment according to a time frame sequence.
11. A method of planning an autonomous vehicle, comprising: acquiring traffic environment information and motion state information of traffic participants at the current moment; Inputting the traffic environment information and the movement state information of the traffic participants into a pre-trained planning model, and generating a planning track of the host vehicle from the current moment to the next moment by using the planning model, wherein the planning model is trained by the method of any one of claims 1-10.
12. An electronic device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, the execution of the computer program causing the method according to any one of claims 1-11 to be performed.
13. A computer readable storage medium, characterized in that the computer readable storage medium comprises instructions which, when run, perform the method according to any of claims 1-11.

Description

Training method, planning method, equipment and medium for automatic driving planning model Technical Field The present application relates to the field of autopilot technology, and more particularly to a training method of an autopilot planning model, a planning method of an autopilot vehicle, and an electronic device and a computer-readable storage medium capable of implementing the above methods. Background In the training of an autopilot planning model, it is often necessary to simulate interactions of an own vehicle with other traffic participants in a simulation environment. In the prior art, the behaviors of other traffic participants are mostly simulated by adopting a mode of historical data playback or based on fixed rules. When the self-vehicle adopts a driving strategy different from the historical data, the behaviors of other traffic participants cannot be adaptively adjusted according to the self-vehicle decision, so that the interaction relationship in the simulation environment is greatly different from the real driving scene. The problem of interaction distortion is particularly prominent in complex traffic situations such as game playing, avoidance and the like, and the training effect of the planning model is affected. It is noted that the information disclosed in the above background section is only for enhancement of understanding of the background of the application and thus may include information that does not form the prior art that is already known to those of ordinary skill in the art. Disclosure of Invention To solve or at least alleviate one or more of the above problems, the present application provides a training method of an autopilot planning model, a planning method of an autopilot vehicle, and an electronic device and a computer readable storage medium implementing the above methods. According to a first aspect of the present application, there is provided a training method of an automatic driving planning model, characterized by comprising the steps of acquiring traffic environment information at a current time, historical state information of a traffic participant, and a planned trajectory of a host vehicle generated by a planning model from the current time to a next time, determining at least one associated participant from the traffic participants based on the planned trajectory of the host vehicle and the historical state information of the traffic participant, predicting a response trajectory of the at least one associated participant from the current time to the next time using a pre-constructed response model based on the traffic environment information at the current time, the planned trajectory of the host vehicle, and the historical state information of the at least one associated participant, updating state information of the host vehicle and the associated participant using the planned trajectory of the host vehicle and the response trajectory of the at least one associated participant, and repeating the above steps with the next time as a new current time to train the planning model. Alternatively or additionally to the above, in a training method of an automatic driving planning model according to an embodiment of the present application, determining at least one associated participant from among the traffic participants includes screening traffic participants from the traffic participants that are at risk of collision with the host vehicle during a period from the current time to the next time as the associated participants. In addition or alternatively, in the training method of the automatic driving planning model according to an embodiment of the present application, screening out traffic participants having collision risks with the host vehicle in a period from the current time to the next time from the traffic participants includes, for each traffic participant, acquiring an original track of the traffic participant in the period recorded in a data set, judging whether a distance between a planned track of the host vehicle and the original track of the traffic participant is smaller than a preset safety threshold frame by frame, and if the judgment result of at least one frame is yes, determining that the traffic participant has collision risks with the host vehicle. Alternatively or additionally to the above, in a method of training an autopilot planning model in accordance with an embodiment of the present application, determining at least one associated participant from the traffic participants includes eliminating from the traffic participants a target traveling in a same direction as and in front of the host vehicle. Alternatively or additionally to the above, in the training method of the automatic driving planning model according to an embodiment of the present application, the response model is a pre-trained world model for simulating driving behavior of the associated participant in the real physical world based on the input traffic environmen