CN-121979713-A - Automatic driving test scene deduction and generation method based on world model
Abstract
The invention discloses an automatic driving test scene deduction and generation method based on a world model, which comprises the following steps of obtaining original data of a multi-source traffic scene and constructing a training data set; the method comprises the steps of converting a traffic participant state and an environment state into a unified scene state representation based on a training data set, constructing a generated world model based on the scene state representation, learning a time sequence evolution rule of a traffic scene, generating an initial scene of an automatic driving test by using the generated world model and performing time sequence deduction under a given test constraint condition to form a continuously evolving test scene sequence, guiding generation of a test scene containing complex interaction or potential conflict in the time sequence deduction process, and judging the effectiveness of the test scene and outputting the test scene to a test platform. The invention realizes the automation, high-quality generation and deduction of the automatic driving test scene, provides comprehensive, real and complex test scene support for the safety verification of an automatic driving system, and has important application value.
Inventors
- LEI CAILIN
- ZHANG ZIHANG
- CHEN GUN
- FU QIYUAN
- ZHAO CONG
- JI YUXIONG
Assignees
- 重庆交通大学
Dates
- Publication Date
- 20260505
- Application Date
- 20260120
Claims (10)
- 1. The method for deducting and generating the automatic driving test scene based on the world model is characterized by comprising the following steps of: s1, acquiring original data of a multi-source traffic scene and constructing a training data set; s2, converting the traffic participant state and the environment state into a unified scene state representation based on the training data set; S3, constructing a generated world model based on scene state representation, and learning a time sequence evolution rule of a traffic scene; S4, under the given test constraint condition, generating an initial scene of an automatic driving test by using a generated world model; S5, performing time sequence deduction on the initial scene by using the generated world model to form a continuously evolving test scene sequence; S6, guiding and generating a test scene containing complex interaction or potential conflict in the time sequence deduction process; s7, judging the effectiveness of the test scene and outputting the test scene to the test platform.
- 2. The method according to claim 1, wherein in the step S1, the original data of the multi-source traffic scene includes time sequence track data of traffic participants including motion information of dynamic traffic elements such as self-cars, other vehicles, pedestrians, bicycles, etc. on continuous time steps, and road topology and traffic facility data including static or quasi-static environmental information such as lane lines, curbstones, traffic signs, traffic signal positions and states, etc.
- 3. The method according to claim 2, wherein in the step S2, the scene state is represented as a scene tensor X E R E×T×D , where E represents the total number of elements to be modeled in the scene, T represents the total number of physical time steps of modeling, and D represents the feature dimension of joint modeling of each element, and the scene tensor employs a multi-tensor structure to model the different types of traffic scene elements, including at least a traffic participant tensor and a traffic signal tensor.
- 4. A method according to claim 3, wherein in step S3, a generative world model is built based on the diffusion model for learning a conditional probability distribution of future scene states given the context.
- 5. The method according to claim 4, wherein the step S3 specifically includes: Forward noise adding, namely gradually adding Gaussian noise to the real scene tensor X to obtain a noise version Z t of each step, wherein a specific formula is that ; Determining denoising target, model learning predicts a speed signal required for removing noise from Z t , wherein the specific formula is ; Constructing a loss function, namely adopting mean square error loss, improving the characteristics of a sparse tensor, supervising the prediction of all the features for an effective time step, supervising the prediction of only the effective features for an ineffective time step and setting the target of the corresponding real-valued feature as zero, wherein the specific loss function is as follows 。
- 6. The method according to claim 1, wherein the step S4 specifically includes parsing and formalizing the test constraint into a conditional input of the generated world model, the conditional input including at least a patch mask and a patch context, and reconstructing an initial scene state conforming to the conditional input through a conditional generation process of the generated world model.
- 7. The method according to claim 1, wherein in the step S5, the time sequence deduction is implemented through a closed-loop interactive framework formed by a generated world model and a planner, and specifically comprises the steps of constructing the closed-loop interactive framework formed by the generated world model and the automatic driving planner, calling the generated world model to predict a scene state in a future time domain based on a historical state, a vehicle track intention provided by the planner and map information construction conditions in each deduction step, adopting a next moment state update simulation and triggering the planner to reprogram, and circularly performing autoregressive deduction to form a continuous test scene sequence.
- 8. The method according to claim 7, wherein the step S6 specifically includes the steps of forming a complex interaction or risk pattern of the desired test as an objective function, and at each step of the time sequence deduction, performing gradient adjustment on the generating process of the generated world model based on the objective function to guide the deduction process to generate a scene conforming to the preset test target.
- 9. The method according to claim 1, wherein the quantification criteria on which the validity determination is based in step S7 include traffic rule constraints including checking traffic signal compliance, lane keeping, road right rules and speed restrictions, and physical rationality constraints including checking kinematic continuity, collision detection and dimensional rationality.
- 10. The method according to claim 1, wherein the outputting to the test platform in step S7 includes converting the scene tensor determined by the validity into a standard format of the target test platform and integrating into an automated test pipeline.
Description
Automatic driving test scene deduction and generation method based on world model Technical Field The invention relates to the technical field of automatic driving test and simulation, in particular to a world model-based automatic driving test scene deduction and generation method. Background With the rapid development of autopilot technology, autopilot systems have made significant progress in perception, decision and control. However, its security verification is still highly dependent on a large scale, highly complex test scenario. The existing test method mainly depends on a manual construction scene or a rule-based simulation generation mode, has the problems of insufficient scene coverage and difficulty in reflecting a real traffic evolution rule, and particularly has insufficient system modeling capability in a traffic scene long-term evolution process, so that a test scene with a continuous interaction relationship and complex evolution behavior is difficult to generate. In recent years, world model technology provides a new solution to the above problems due to its strong timing reasoning and prediction capabilities. The world model can build implicit understanding of the evolution law of the physical environment through learning the historical state, and performs multi-step and closed-loop simulation deduction on the basis. Focusing on the traffic field, the generated world model can learn complex interaction modes and space-time evolution distribution among traffic participants and between the traffic participants and the environment from massive real data, and provides core capability support for automatic and high-quality scene generation. However, how to effectively apply the world model to the automatic driving test and construct a set of complete framework capable of automatically deducting, guiding and generating high-risk scenes and ensuring rationality is still a technical problem to be solved. Disclosure of Invention Aiming at the problems, the invention provides an automatic driving test scene deduction and generation method based on a world model, which is used for automatically and efficiently generating a test scene which is comprehensive in coverage, complex in evolution and accords with a real traffic rule, thereby improving the test verification efficiency and depth of an automatic driving system. In order to solve the problems in the background art and achieve the technical purposes, the invention is realized by the following technical scheme: a method for deducting and generating an automatic driving test scene based on a world model comprises the following steps: s1, acquiring original data of a multi-source traffic scene and constructing a training data set; s2, converting the traffic participant state and the environment state into a unified scene state representation based on the training data set; S3, constructing a generated world model based on scene state representation, and learning a time sequence evolution rule of a traffic scene; S4, under the given test constraint condition, generating an initial scene of an automatic driving test by using a generated world model; S5, performing time sequence deduction on the initial scene by using the generated world model to form a continuously evolving test scene sequence; S6, guiding and generating a test scene containing complex interaction or potential conflict in the time sequence deduction process; s7, judging the effectiveness of the test scene and outputting the test scene to the test platform. Further, in the step S1, the original data of the multi-source traffic scene includes time sequence track data of the traffic participant, and road topology and traffic facility data, where the time sequence track data of the traffic participant includes motion information of dynamic traffic elements such as self-vehicles, other vehicles, pedestrians, bicycles, etc. on continuous time steps, and the road topology and traffic facility data includes static or quasi-static environmental information such as lane lines, curbs, traffic signs, traffic signal lamp positions and states, etc. Further, in the step S2, the scene state is represented as a scene tensor X E R E×T×D, where E represents the total number of elements to be modeled in the scene, T represents the total number of physical time steps of modeling, and D represents the feature dimension of joint modeling of each element, and the scene tensor adopts a multi-tensor structure to model the traffic scene elements of different types, and includes at least a traffic participant tensor and a traffic signal tensor. Further, in the step S3, a generated world model is constructed based on the diffusion model for learning a conditional probability distribution of future scene states under given context conditions. Further, the step S3 specifically includes: forward noise adding, namely gradually adding Gaussian noise to the real scene tensor X to obtain noise versions of each step The