CN-116342652-B - Track prediction model training method, track prediction device and medium

CN116342652BCN 116342652 BCN116342652 BCN 116342652BCN-116342652-B

Abstract

A method for training a track prediction model, a track prediction method, a device and a medium are disclosed, the method comprises the steps of obtaining first extraction features corresponding to a first image sequence, applying random disturbance to the first extraction features through the track prediction model to be trained to obtain first disturbance features corresponding to the random disturbance, generating a multi-mode track prediction result of an obstacle in the first image sequence through the track prediction model to be trained based on the first disturbance features corresponding to the random disturbance, training the track prediction model to be trained based on a second image sequence and the multi-mode track prediction result, and determining the track prediction model to be trained after training as a target track prediction model in response to the fact that the trained track prediction model to be trained meets preset training end conditions. The embodiment of the disclosure can directly predict the track under the view angle of the camera, so that the reduction of depth information is not required, and the accuracy of the track prediction result is improved.

Inventors

Dai Shengzhe

Assignees

北京地平线信息技术有限公司

Dates

Publication Date: 20260512
Application Date: 20230316

Claims (14)

1. A method of training a trajectory prediction model, comprising: Acquiring a first extraction feature corresponding to a first image sequence, wherein the first image sequence comprises images respectively acquired at a plurality of historical moments by a first camera arranged on a first movable device; Applying random disturbance to the first extracted features through a track prediction model to be trained to obtain first disturbance features corresponding to the random disturbance; generating a multi-mode track prediction result of the obstacle in the first image sequence through the track prediction model to be trained based on the first disturbance characteristics corresponding to the random disturbance; Training the track prediction model to be trained based on a second image sequence and the multi-mode track prediction result, wherein the second image sequence comprises images respectively acquired by the first camera at a plurality of future moments after the plurality of historical moments; and determining the trained track prediction model to be a target track prediction model in response to the trained track prediction model to be trained meeting a preset training ending condition.
2. The method of claim 1, wherein, The step of applying random disturbance to the first extracted feature through the track prediction model to be trained to obtain first disturbance features corresponding to the random disturbance, includes: determining a first characteristic disturbance distribution parameter corresponding to the first extracted characteristic through a condition variation self-encoder in the track prediction model to be trained; Based on the first characteristic disturbance distribution parameters, applying random disturbance to the first extracted characteristic through the condition variation self-encoder to obtain first disturbance characteristics corresponding to a plurality of random disturbance respectively; the training of the track prediction model to be trained based on the second image sequence and the multi-mode track prediction result comprises the following steps: And training the track prediction model to be trained based on the second image sequence, the multi-mode track prediction result and the first characteristic disturbance distribution parameter.
3. The method of claim 2, wherein the training the trajectory prediction model to be trained based on the second image sequence, the multi-modal trajectory prediction result, and the first characteristic disturbance distribution parameter comprises: Acquiring a second extraction feature corresponding to the second image sequence; determining a second feature disturbance distribution parameter corresponding to the second extracted feature via the conditional variation self-encoder; Determining a first model loss value by comparing the first characteristic disturbance distribution parameter with the second characteristic disturbance distribution parameter; and training the track prediction model to be trained based on the first model loss value.
4. The method of claim 1, wherein the multi-modal trajectory prediction result comprises a plurality of predicted trajectories of the obstacle and predicted probability values for each of the plurality of predicted trajectories; the training of the track prediction model to be trained based on the second image sequence and the multi-mode track prediction result comprises the following steps: determining a future true trajectory of the obstacle based on the second image sequence; determining a first track loss value based on the future real track and the plurality of predicted tracks; Determining a probability loss value based on the future real track, the plurality of predicted tracks and the predicted probability values corresponding to the plurality of predicted tracks; Determining a second model loss value based on the first trajectory loss value and the probability loss value; and training the track prediction model to be trained based on the second model loss value.
5. The method of claim 4, wherein the determining a first trajectory loss value based on the future true trajectory and the plurality of predicted trajectories comprises: respectively determining the similarity between the plurality of predicted tracks and the future real track; determining weights for the plurality of predicted trajectories based on the similarities corresponding to the plurality of predicted trajectories respectively; Determining a second track loss value for each of the plurality of predicted tracks based on the similarity corresponding to each of the plurality of predicted tracks; And weighting the second track loss value corresponding to each of the plurality of predicted tracks by using the weight corresponding to each of the plurality of predicted tracks to obtain a first track loss value.
6. The method of claim 4, wherein the determining a probability loss value based on the future real trajectory, the plurality of predicted trajectories, and the respective predicted probability values for the plurality of predicted trajectories comprises: respectively determining the similarity between the plurality of predicted tracks and the future real track; determining a reference probability value for each of the plurality of predicted trajectories based on the similarity corresponding to each of the plurality of predicted trajectories; And determining a probability loss value based on the predicted probability value and the reference probability value corresponding to each of the plurality of predicted trajectories.
7. The method of claim 1, wherein, The obtaining the first extraction feature corresponding to the first image sequence includes: overlapping a plurality of images in the first image sequence along the channel direction to obtain an overlapped image; Extracting features of the superimposed images to obtain first extracted features corresponding to the first image sequence; Or alternatively The obtaining the first extraction feature corresponding to the first image sequence includes: Obtaining barrier information corresponding to each of a plurality of images in the first image sequence, and obtaining a plurality of barrier information, wherein each barrier information in the plurality of barrier information comprises barrier boundary box information or barrier segmentation result information; and carrying out feature extraction on an obstacle information sequence formed by the plurality of obstacle information to obtain a first extraction feature corresponding to the first image sequence.
8. The method of claim 7, wherein the feature extracting the obstacle information sequence composed of the plurality of obstacle information to obtain the first extracted feature corresponding to the first image sequence includes: sequentially taking each obstacle information in the obstacle information sequence as current obstacle information; responding to the current obstacle information as the first obstacle information in the obstacle information sequence, and generating extraction features corresponding to the current obstacle information based on the current obstacle information through a feature extractor in the track prediction model to be trained; generating, via the feature extractor, an extraction feature corresponding to the current obstacle information based on an extraction feature corresponding to previous obstacle information of the current obstacle information and the current obstacle information in response to the current obstacle information being non-first obstacle information in the obstacle information sequence; and determining the extraction feature corresponding to the current obstacle information as a first extraction feature corresponding to the first image sequence in response to the current obstacle information being the last obstacle information in the obstacle information sequence.
9. The method of claim 1, wherein, The multi-mode track prediction result comprises a plurality of prediction tracks of the obstacle; Each of the plurality of predicted trajectories is characterized by target data comprising coordinates of each of four corner points of a bounding box of the obstacle for each of the plurality of future times.
10. A trajectory prediction method, comprising: Acquiring a third extraction feature corresponding to a third image sequence, wherein the third image sequence comprises images respectively acquired at a plurality of moments by a second camera arranged on a second movable device; Applying random disturbance to the third extracted feature through a target track prediction model to obtain second disturbance features corresponding to the random disturbance; Generating a multi-mode track prediction result of the obstacle in the third image sequence through the target track prediction model based on the second disturbance characteristics corresponding to the random disturbance; The target track prediction model is trained by the training method of the track prediction model according to any one of claims 1-9.
11. A training device of a trajectory prediction model, comprising: The first acquisition module is used for acquiring a first extraction feature corresponding to a first image sequence, wherein the first image sequence comprises images respectively acquired at a plurality of historical moments by a first camera arranged on first movable equipment; The first disturbance applying module is used for applying random disturbance to the first extracted features acquired by the first acquisition module through a track prediction model to be trained to obtain first disturbance features corresponding to the random disturbance; the first generation module is used for generating a multi-mode track prediction result of the obstacle in the first image sequence through the track prediction model to be trained based on the first disturbance characteristics corresponding to the random disturbance obtained by the first disturbance application module; The training module is used for training the track prediction model to be trained based on a second image sequence and the multi-mode track prediction result generated by the first generation module, wherein the second image sequence comprises images respectively acquired by the first camera at a plurality of future moments after the plurality of historical moments; The determining module is used for determining the trained track prediction model to be a target track prediction model in response to the track prediction model to be trained after being trained by the training module meeting a preset training ending condition.
12. A trajectory prediction device, comprising: The second acquisition module is used for acquiring a third extraction feature corresponding to a third image sequence, wherein the third image sequence comprises images respectively acquired at a plurality of moments by a second camera arranged on second movable equipment; The second disturbance applying module is used for applying random disturbance to the third extracted feature acquired by the second acquiring module through the target track prediction model to obtain second disturbance features corresponding to the random disturbance; The second generation module is used for generating a multi-mode track prediction result of the obstacle in the third image sequence through the target track prediction model based on the second disturbance characteristics corresponding to the random disturbance obtained by the second disturbance application module; The target track prediction model is trained by the training method of the track prediction model according to any one of claims 1-9.
13. A computer readable storage medium storing a computer program for executing the training method of the trajectory prediction model of any one of the preceding claims 1 to 9 or for executing the trajectory prediction method of claim 10.
14. An electronic device, the electronic device comprising: A processor; a memory for storing the processor-executable instructions; The processor is configured to read the executable instructions from the memory and execute the instructions to implement the method of training the trajectory prediction model of any one of the preceding claims 1-9 or to perform the method of trajectory prediction of claim 10.

Description

Track prediction model training method, track prediction device and medium Technical Field The disclosure relates to machine vision technology, in particular to a training method of a track prediction model, a track prediction method, a track prediction device and a medium. Background Autopilot technology is increasingly used in mobile devices such as vehicles. Track prediction is an important link of an automatic driving technology, and the safety and the comfort of a vehicle can be greatly influenced by a track prediction result. It should be noted that, the conventional track prediction scheme performs track prediction based on Bird-Eye View (BEV), and the scheme needs to undergo reduction of depth information, and the reduction result of the depth information is often inaccurate, thereby reducing the accuracy of the track prediction result. Disclosure of Invention The method and the device are used for solving the problem that the accuracy of the track prediction result obtained by adopting a traditional track prediction scheme is low. The embodiment of the disclosure provides a training method of a track prediction model, a track prediction method, a track prediction device and a medium. According to an aspect of the embodiments of the present disclosure, there is provided a training method of a trajectory prediction model, including: Acquiring a first extraction feature corresponding to a first image sequence, wherein the first image sequence comprises images respectively acquired at a plurality of historical moments by a first camera arranged on a first movable device; Applying random disturbance to the first extracted features through a track prediction model to be trained to obtain first disturbance features corresponding to the random disturbance; generating a multi-mode track prediction result of the obstacle in the first image sequence through the track prediction model to be trained based on the first disturbance characteristics corresponding to the random disturbance; Training the track prediction model to be trained based on a second image sequence and the multi-mode track prediction result, wherein the second image sequence comprises images respectively acquired by the first camera at a plurality of future moments after the plurality of historical moments; and determining the trained track prediction model to be a target track prediction model in response to the trained track prediction model to be trained meeting a preset training ending condition. According to another aspect of an embodiment of the present disclosure, there is provided a trajectory prediction method including: Acquiring a third extraction feature corresponding to a third image sequence, wherein the third image sequence comprises images respectively acquired at a plurality of moments by a second camera arranged on a second movable device; Applying random disturbance to the third extracted feature through a target track prediction model to obtain second disturbance features corresponding to the random disturbance; and generating a multi-mode track prediction result of the obstacle in the third image sequence through the target track prediction model based on the second disturbance characteristics corresponding to the random disturbance. According to still another aspect of the embodiments of the present disclosure, there is provided a training apparatus of a trajectory prediction model, including: The first acquisition module is used for acquiring a first extraction feature corresponding to a first image sequence, wherein the first image sequence comprises images respectively acquired at a plurality of historical moments by a first camera arranged on first movable equipment; The first disturbance applying module is used for applying random disturbance to the first extracted features acquired by the first acquisition module through a track prediction model to be trained to obtain first disturbance features corresponding to the random disturbance; the first generation module is used for generating a multi-mode track prediction result of the obstacle in the first image sequence through the track prediction model to be trained based on the first disturbance characteristics corresponding to the random disturbance obtained by the first disturbance application module; The training module is used for training the track prediction model to be trained based on a second image sequence and the multi-mode track prediction result generated by the first generation module, wherein the second image sequence comprises images respectively acquired by the first camera at a plurality of future moments after the plurality of historical moments; The determining module is used for determining the trained track prediction model to be a target track prediction model in response to the track prediction model to be trained after being trained by the training module meeting a preset training ending condition. According to still another aspect of the emb