EP-4738193-A1 - COMPUTER IMPLEMENTED SYSTEM AND METHOD FOR GENERATING DATA SETS REPRESENTING TRAFFIC SCENARIOS

EP4738193A1EP 4738193 A1EP4738193 A1EP 4738193A1EP-4738193-A1

Abstract

The present invention is directed to a computer implemented system (100) and method for generating data sets (3) representing traffic scenarios to be used for training and/or testing automated driving (AD) functions. The system (100) comprises one or more processors and a memory that stores one or more programs that are configured to be executed by the one or more processors. Said programs comprise at least one first Large Language Model (LLM) and/or Vision Language Model (VLM) being trained at least on formatted text descriptions of traffic scenarios and on natural language to perform a task given a prompt. Besides, said programs include instructions to • Transform an initial data set (1) into a formatted text description of a corresponding real traffic scenario, • Construct a first prompt (11) as input for the first LLM or VLM acting as modifier agent (10), wherein the first prompt (11) includes the formatted text description of said real traffic scenario and a user specification of a wanted modification of said real traffic scenario, • Apply the first prompt to the modifier agent (10) and obtain a modified formatted text description of a modified traffic scenario as output ((12), and • Transform the modified formatted text description into a corresponding data set (3) representing the modified traffic scenario.

Inventors

YAO, YU
Bhatnagar, Salil
HALLGARTEN, MARCEL

Assignees

Robert Bosch GmbH

Dates

Publication Date: 20260506
Application Date: 20241031

Claims (10)

Computer implemented system (100) for generating data sets (3) representing traffic scenarios to be used for training and/or testing automated driving (AD) functions, said system (100) at least comprising: one or more processors and a memory that stores one or more programs that are configured to be executed by the one or more processors, said programs comprising at least one first Large Language Model (LLM) and/or Vision Language Model (VLM) being trained at least on formatted text descriptions of traffic scenarios and on natural language to perform a task given a prompt, wherein said one or more programs include instructions to • Transform an initial data set (1) into a formatted text description of a corresponding real traffic scenario, • Construct a first prompt (11) as input for the first LLM or VLM acting as modifier agent (10), wherein the first prompt (11) includes the formatted text description of said real traffic scenario and a user specification of a wanted modification of said real traffic scenario, • Apply the first prompt to the modifier agent (10) and obtain a modified formatted text description of a modified traffic scenario as output ((12), and • Transform the modified formatted text description into a corresponding data set (3) representing the modified traffic scenario.
Computer implemented system according to claim 1, wherein said one or more programs comprise a set of interacting LLMs and/or VLMs being trained at least on formatted text descriptions of traffic scenarios and on natural language to perform a task given a prompt, wherein said one or more programs include instructions to • Construct a second prompt (21) as input for a second LLM or VLM acting as quality assurance (QA) agent (20) to evaluate the output of the modifier agent (10), wherein the second prompt (21) includes the input (11) and the output (12) of the modifier agent (10) and at least one quality criterion, • Apply the second prompt (21) to the QA agent (20) and i. when the QA agent (20) indicates that the quality criterion is not met, construct another first prompt (11) for the modifier agent (10) in order to further modify the formatted text description of the given real traffic scenario, ii. or when the QA agent (20) indicates that the quality criterion is met, transform the modified formatted text description into a corresponding data set (3) representing the modified traffic scenario.
Computer implemented system according to anyone of claims 1 or 2, wherein said programs comprise a set of interacting LLMs and/or VLMs being trained at least on formatted text descriptions of traffic scenarios and on natural language to perform a task given a prompt, and wherein the set of interacting LLMs and/or VLMs comprises at least one junior modifier agent specialized in at least one subtask of scenario modification, and wherein the modifier agent is configured to • analyze the first prompt in order to break down the corresponding task into several subtasks, • distribute said subtasks to appropriately specialized junior modifier agents, • merge the outputs of said junior modifier agents in order to generate a modified formatted text description of a modified traffic scenario as output.
Computer implemented system according to anyone of claims 2 or 3, wherein the set of interacting LLMs and/or VLMs further comprises at least one junior QA agent specialized in at least one subtask of evaluation, and wherein the QA agent is configured to • analyze the second prompt in order to break down the corresponding task into several subtasks, • distribute said subtasks to appropriately specialized junior QA agents and • merge the outputs of said specialized junior QA agents in order to check whether the quality criterion is met and to construct another first prompt for the modifier agent, if necessary.
Computer implemented system according to anyone of claims 1 to 4, said one or more programs further comprising a first Neural Network (NN) (31) being trained at least on data and on formatted text descriptions of traffic scenarios, wherein the one or more programs include instructions to access said first NN in order to transform the initial data set (1) into a formatted text description of the corresponding real traffic scenario.
Computer implemented system according to anyone of claims 1 to 5, further comprising a second NN (35) being trained on data and on formatted text descriptions of traffic scenarios, wherein the one or more programs include instructions to access said second NN (35) in order to transform the modified formatted text description into a corresponding data set (3) representing the modified traffic scenario.
Computer implemented method for generating data sets (3) representing traffic scenarios to be used for training and/or testing automated driving (AD) functions, wherein at least one first Large Language Model (LLM) and/or Vision Language Model (VLM) is used for modification and/or augmentation of initial data sets (1) representing real traffic scenarios, said first LLM or VLM being trained at least on formatted text descriptions of traffic scenarios and on natural language to perform a task given a prompt, said method comprising the following steps: • Transforming an initial data set (1) into a formatted text description of a corresponding real traffic scenario, • Constructing a first prompt (11) as input for the first LLM or VLM acting as modifier agent (10), wherein the first prompt (11) includes the formatted text description of said real traffic scenario and a user specification of a wanted modification of said real traffic scenario, • Applying the first prompt (11) to the modifier agent (10) and obtaining a modified formatted text description of a modified traffic scenario as output (12), • transforming the modified formatted text description into a corresponding data set (3) representing the modified traffic scenario.
Computer implemented method according to claim 7, wherein a set of interacting LLMs and/or VLMs is used for modification and/or augmentation of initial data sets (1) representing real traffic scenarios, said LLMs and/or VLMs being trained at least on formatted text descriptions of traffic scenarios and on natural language to perform a task given a prompt, said method further comprising the following steps: • Constructing a second prompt (21) as input for a second LLM or VLM acting as the quality assurance (QA) agent (20) to evaluate the output of the modifier agent (10), wherein the second prompt (21) includes the input (11) and the output (12) of the modifier agent (10) and at least one quality criterion, • Applying the second prompt (21) to the QA agent (20) and i. when the QA agent (20) indicates that the quality criterion is not met, constructing another first prompt (11) for the modifier agent (10) in order to further modify the formatted text description of the given real traffic scenario, ii. or when the QA agent (20) indicates that the quality criterion is met, transforming the modified formatted text description into a corresponding data set (3) representing the modified traffic scenario.
Computer implemented method for training and/or testing automated driving (AD) functions, especially prediction functions and planning functions, wherein data sets generated by a system and/or method according to anyone of the preceding claims are used as training data sets or evaluation and testing data sets.
Computer implemented automated driving (AD) function, especially for prediction of traffic scene development and behavior planning for single traffic participants, being trained, tested and/or evaluated on data sets generated by a system and/or method according to anyone of preceding claims 1 to 8.

Description

Technical field Safety requirements for automated driving (AD) functions are very high in order to guarantee that vehicles using such driving systems always operate in a safe manner and do not cause any unnecessary and foreseeable accidents. Moreover, in dangerous traffic situations, the performance of such driving systems should be comparable to the behavior of a competent human driver who tries to avoid or at least minimize risk. Nowadays, many AD functions use machine learning (ML) especially deep learning (DL) to handle the enormous variety of possible traffic scenarios. Consequently, the performance of such driving functions strongly depends on the amount and quality of training data used to learn an AD function. In this context, it is of utmost importance that the training data covers a large variety of traffic scenarios and situations. Besides, testing and evaluation of such AD functions is only meaningful and significant when carried out on data sets representing all kinds of different traffic scenarios and situations. The choice of training and testing data depends heavily on the specific AD function being developed, the available resources, and the desired level of performance and safety. A comprehensive approach often utilizes a combination of data sources to ensure robustness and generalization to real-world scenarios. Training and testing data for AD functions come from a diverse range of sources, each with its own strengths and weaknesses. They can be broadly categorized as follows: Real-world data: Collected from sensors on vehicles driving in real-world scenarios. This data is highly representative of real-world complexity, including unpredictable events and behavior and rare edge cases, but is expensive and time-consuming to collect, label, and curate. Privacy concerns are also a significant factor.Simulated data: Generated using simulation environments that recreate real-world physics and sensor models. This data is cost-effective, allows for controlled experimentation and testing of edge cases, and avoids privacy issues. However, it can struggle to fully capture the complexity and unpredictability of the real world, leading to a "reality gap."Synthetic data: Artificially generated data that may or may not be based on real-world scenarios. It's useful for augmenting existing datasets, creating specific scenarios, and addressing class imbalances. However, it needs careful validation to ensure relevance and avoid introducing biases. Hybrid approaches combine multiple data sources to leverage the strengths of each. For instance, using real-world data for common scenarios and simulated data for rare edge cases. This is becoming increasingly common as a way to achieve robust and reliable performance. Within these categories, the training and testing data can further be classified by: Sensor modality, i.e. camera, LiDAR, radar, GPS, IMU, etc.Scenario coverage, i.e. urban, highway, rural, different weather conditions, etc.Labeling quality: Manually labeled, semi-supervised, or unsupervised. In most cases the sensor data collected from real world scenarios have to be preprocessed to transform this data into input data for the AD function to be trained or tested. This preprocessing is crucial since it have to transform raw, noisy, and heterogeneous data into a consistent, manageable, and informative format still representing the initial real-world scenario. Common preprocessing steps are: Sensor calibration to correct distortions and inaccuracies inherent in individual sensors e.g., camera lens distortion, LiDAR reflectivity variations,Sensor synchronization to align data streams from different sensors in time and space,Data cleaning, filtering and noise removal,Data transformation to convert data between different coordinate systems e.g., sensor coordinates to vehicle coordinates, or to a global map frame,Feature extraction to extracts relevant features from the raw data. For images, this could involve object detection, lane detection, or semantic segmentation. For LiDAR, it might involve ground plane removal, point cloud clustering, or generating occupancy grids.Data formatting to convert data into a suitable format for the chosen machine learning framework e.g., creating tensors for deep learning models. Although large amounts of data sets have been collected when driving in real-world traffic, there is still the need for more training and testing data, e.g. in order to increase the coverage of different traffic scenarios. The present invention aims to automatically generate data sets representing significant traffic scenarios to be added to the training data and/or testing data of an AD function. The proposed method starts with a data set representing a real traffic scenario, i.e. which has been collected from real-world traffic. This real-world data set is then modified and/or augmented automatically such that it represents a well-defined modification of the initial real traffic scenario. Ther