US-20260127819-A1 - DIFFUSION-GUIDED OBJECT INSERTION AND LIGHT SOURCE MANIPULATION FOR AUTONOMOUS DRIVING

US20260127819A1US 20260127819 A1US20260127819 A1US 20260127819A1US-20260127819-A1

Abstract

Methods and systems for training a model include generating a relightable neural radiance field (NeRF) reconstruction of an input video of a driving scene. A virtual object is inserted into the driving scene using the NeRF reconstruction to create a simulated scene. Scene intrinsics are optimized within the simulated scene. An autonomous driving model is trained using the simulated scene.

Inventors

Bingbing Zhuang
Ziyu Jiang
Shanlin Sun

Assignees

NEC LABORATORIES AMERICA, INC.

Dates

Publication Date: 20260507
Application Date: 20251104

Claims (20)

1 . A computer-implemented method for training a model, comprising: generating a relightable neural radiance field (NeRF) reconstruction of an input video of a driving scene; inserting a virtual object into the driving scene using the NeRF reconstruction to create a simulated scene; optimizing scene intrinsics within the simulated scene; and training a model for driving using the simulated scene.
2 . The method of claim 1 , wherein generating the relightable NeRF reconstruction includes predicting albedo and material properties using a machine learning model, jointly optimized with a light source.
3 . The method of claim 1 , wherein optimizing the scene intrinsics includes optimizing a loss function that includes a score distillation sampling (SDS) loss from light source manipulation, an SDS loss from inserting the virtual object, and a rendering loss from the NeRF reconstruction.
4 . The method of claim 1 , wherein generating the NeRF reconstruction includes a sign distance field (SDF) geometry representation of the input video.
5 . The method of claim 1 , wherein inserting the virtual object includes tone matching to match a color profile of the input video.
6 . The method of claim 1 , wherein the virtual object includes information about physically-based rendering textures and materials.
7 . The method of claim 1 , further comprising manipulating a light source in the simulated scene to change a direction and/or a time of day.
8 . The method of claim 1 , wherein training the autonomous driving model includes performing reinforcement learning on a neural network model using the simulated scene and a plurality of additional driving scenes.
9 . The method of claim 1 , further comprising analyzing a new video using the autonomous driving model to determine a driving action and performing the driving action in an autonomous vehicle.
10 . The method of claim 9 , wherein the driving action is selected from the group consisting of acceleration, braking, and steering.
11 . A system for training a model, comprising: a hardware processor; and a memory that stores a computer program which, when executed by the hardware processor, causes the hardware processor to: generate a relightable neural radiance field (NeRF) reconstruction of an input video of a driving scene; insert a virtual object into the driving scene using the NeRF reconstruction to create a simulated scene; optimize scene intrinsics within the simulated scene; and train a model for driving using the simulated scene.
12 . The system of claim 11 , wherein generation of the relightable NeRF reconstruction includes prediction of albedo and material properties using a machine learning model, jointly optimized with a light source.
13 . The system of claim 11 , wherein optimization of the scene intrinsics includes optimization of a loss function that includes a score distillation sampling (SDS) loss from light source manipulation, an SDS loss from inserting the virtual object, and a rendering loss from the NeRF reconstruction.
14 . The system of claim 11 , wherein generation of the NeRF reconstruction includes a sign distance field (SDF) geometry representation of the input video.
15 . The system of claim 11 , wherein insertion of the virtual object includes tone matching to match a color profile of the input video.
16 . The system of claim 11 , wherein the virtual object includes information about physically-based rendering textures and materials.
17 . The system of claim 11 , wherein the computer program further causes the hardware processor to manipulate a light source in the simulated scene to change a direction and/or a time of day.
18 . The system of claim 11 , wherein training of the autonomous driving model includes reinforcement learning on a neural network model using the simulated scene and a plurality of additional driving scenes.
19 . The system of claim 11 , wherein the computer program further causes the hardware processor to analyze a new video using the autonomous driving model to determine a driving action and to perform the driving action in an autonomous vehicle.
20 . The system of claim 19 , wherein the driving action is selected from the group consisting of acceleration, braking, and steering.

Description

RELATED APPLICATION INFORMATION This application claims priority to U.S. Patent Application No. 63/716,903, filed on Nov. 6, 2024, incorporated herein by reference in its entirety. BACKGROUND Technical Field The present invention relates to autonomous driving systems and, more particularly, to training machine learning systems for autonomous driving. Description of the Related Art Autonomous driving systems may make use of machine learning systems to incorporate large amounts of information from recorded driving scenes. This training makes it possible for the autonomous driving systems to react to their present environment, analyzing information from cameras and other sensors to navigate safely. SUMMARY A method for training a model includes generating a relightable neural radiance field (NeRF) reconstruction of an input video of a driving scene. A virtual object is inserted into the driving scene using the NeRF reconstruction to create a simulated scene. Scene intrinsics are optimized within the simulated scene. A model is trained for driving using the simulated scene. A system for training a model includes a hardware processor and a memory that stores a computer program. When executed by the hardware processor, the computer program causes the hardware processor to generate a NeRF reconstruction of an input video of a driving scene, to insert a virtual object into the driving scene using the NeRF reconstruction to create a simulated scene, to optimize scene intrinsics within the simulated scene, and to train a model for driving using the simulated scene. These and other features and advantages will become apparent from the following detailed description of illustrative embodiments thereof, which is to be read in connection with the accompanying drawings. BRIEF DESCRIPTION OF DRAWINGS The disclosure will provide details in the following description of preferred embodiments with reference to the following figures wherein: FIG. 1 is a diagram of a simulated driving scene that includes a virtual object with controlled lighting, in accordance with an embodiment of the present invention; FIG. 2 is a block/flow diagram of a method for training and using a model for autonomous driving, in accordance with an embodiment of the present invention; FIG. 3 is a diagram of an autonomous vehicle that can collect video about a driving scene and automatically respond to driving conditions, in accordance with an embodiment of the present invention; FIG. 4 is a diagram of an exemplary neural network architecture that can be used to implement part of an autonomous driving model, in accordance with an embodiment of the present invention; and FIG. 5 is a diagram of an exemplary deep neural network architecture that can be used to implement part of an autonomous driving model, in accordance with an embodiment of the present invention. DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS Training a machine learning model for an autonomous driving system uses large amounts of data for both training and verification before road testing. That training data should be highly diverse to cover different possible scenarios in the real world, especially safety-critical scenarios. Collecting such data purely from real driving logs is challenging, especially for corner cases that rarely happen in real world but are important for verification. Training data may therefore be simulated to cover scenarios where real-world training data is sparse or unavailable. A simulation pipeline may include reconstructing a digital twin of the background as a Neural Radiance Field (NeRF), then editing the digital twin to create photorealistic variations, for example including virtual object insertion and light source manipulation. Virtual object insertion, in the context of autonomous driving, enables the simulation of new safety-critical scenarios. For example, object insertion may add a white truck driving in the wrong lane towards the autonomous vehicle, or traffic barriers sitting in the middle of the road. Light source manipulation helps to simulate data captured at different timestamps, such as from dawn to dusk, and evening. Photorealism for both virtual object insertion and light source estimation both benefit from accurate light source estimation, which, however, suffers from a high degree of ambiguity because it further relies on accurate decomposition of intrinsic scene properties such as albedo and material. The rich nature image prior offered in large diffusion models may be used to address this challenge, which enables highly photorealistic object insertion and light source manipulation. As a result, the simulated training data is more realistic and will train the autonomous driving system to provide superior results. Referring now to FIG. 1, an example driving scene is shown. An initial scene may be captured by a camera that is mounted on an autonomous vehicle 102, and may show the surroundings of the autonomous vehicle 102 from a particular perspect