EP-4742186-A1 - PROBABILISTIC PREDICTION OF OCCLUDED PEDESTRIANS AND OTHER ANIMATE OBJECTS IN AUTOMOTIVE ENVIRONMENTS

EP4742186A1EP 4742186 A1EP4742186 A1EP 4742186A1EP-4742186-A1

Abstract

The disclosed systems and techniques are directed to identifying and responding to presence of target objects in occluded areas of driving environments. The techniques include training, using perception data associated with a first driving scene, a first machine learning model (MLM) to determine a location, within the first driving scene, of a target object masked with a masking transformation. The techniques further include training, using an output of the first MLM for a training driving scene, a second MLM to generate a map of probabilities of one or more target objects to be in an occluded region of the training driving scene, the training driving scene comprising at least one of the first driving scene or a second driving scene, and causing the second MLM to be deployed on an autonomous vehicle.

Inventors

ZHANG, Yunheng
LI, YUNCHENG
JIANG, CHIYU
CHEN, KAIFEI
SUN, ZHENG
HUANG, YUE

Assignees

WAYMO LLC

Dates

Publication Date: 20260513
Application Date: 20251111

Claims (15)

A method comprising: training, using perception data associated with a first driving scene, a first machine learning model to determine a location, within the first driving scene, of a target object masked with a masking transformation; training, using an output of the first machine learning model for a training driving scene, a second machine learning model to generate a map of probabilities of one or more target objects to be in an occluded region of the training driving scene, the training driving scene comprising at least one of the first driving scene or a second driving scene; and causing the second machine learning model to be deployed on an autonomous vehicle.
The method of claim 1, wherein the perception data associated with the first driving scene comprises: context data comprising locations and types of objects in the first driving scene; and roadgraph data representing one or more drivable lanes in the first driving scene.
The method of claim 1 or 2, wherein the masking transformation comprises a plurality of random shifts of the target object sampled from a reference distribution.
The method of claim 3, wherein the first machine learning model comprises a diffusion model and wherein training the first machine learning model comprises reversing the plurality of random shifts.
The method of any of claims 1-4, wherein training the second machine learning model comprises: processing, using the first machine learning model, the training driving scene to obtain the output of the first machine learning model that comprises a reference map of probabilities of the one or more target objects to be in the occluded region of the training driving scene; and using the reference map of probabilities as ground truth in training of the second machine learning model.
The method of claim 5, wherein using the reference map of probabilities as ground truth in training of the second machine learning model comprises: changing one or more parameters of the second machine learning model to reduce a loss value characterizing a difference between the map of probabilities generated by the second machine learning model and the reference map of probabilities generated by the first machine learning model.
The method of any of claims 1-6, further comprising: processing, using the second machine learning model, live perception data associated with a live driving scene to generate a live map of probabilities for one or more live target objects to be in an occluded region of the live driving scene; and causing the autonomous vehicle to perform an avoidance action based on the live map of probabilities.
The method of claim 7, wherein causing the autonomous vehicle to perform the avoidance action comprises: determining that an individual probability of the live map of probabilities is above a threshold probability; optionally wherein the threshold probability depends on at least one of: a distance between the autonomous vehicle and a location associated with the individual probability, a type of the autonomous vehicle, or a condition of a driving surface of the live driving scene.
A system comprising: a memory device; and one or more processing devices communicatively coupled to the memory device, the one or more processing devices to: train, using perception data associated with a first driving scene, a first machine learning model to determine a location, within the first driving scene, of a target object masked with a masking transformation; train, using an output of the first machine learning model for a training driving scene, a second machine learning model to generate a map of probabilities of one or more target objects to be in an occluded region of the training driving scene, the training driving scene comprising at least one of the first driving scene or a second driving scene; and cause the second machine learning model to be deployed on an autonomous vehicle.
The system of claim 9, wherein the perception data associated with the first driving scene comprises: context data comprising locations and types of objects in the first driving scene; and roadgraph data representing one or more drivable lanes in the first driving scene.
The system of claim 9 or 10, wherein the masking transformation comprises a plurality of random shifts of the target object sampled from a reference distribution; optionally wherein the first machine learning model comprises a diffusion model and wherein to train the first machine learning model, the one or more processing devices are to reverse the plurality of random shifts.
The system of any of claims 9-11, wherein to train the second machine learning model, the one or more processing devices are to: process, using the first machine learning model, the training driving scene to obtain the output of the first machine learning model that comprises a reference map of probabilities of the one or more target objects to be in the occluded region of the training driving scene; and use the reference map of probabilities as ground truth in training of the second machine learning model; optionally wherein to use the reference map of probabilities as ground truth in training of the second machine learning model, the one or more processing devices are to: change one or more parameters of the second machine learning model to reduce a loss value characterizing a difference between the map of probabilities generated by the second machine learning model and the reference map of probabilities generated by the first machine learning model.
A computing system of a fleet of autonomous vehicles, comprising: a first memory device; and a first processing device communicatively coupled to the first memory device, the first processing device to: train, using perception data associated with a first driving scene, a first machine learning model to determine a location, within the first driving scene, of a target object masked with a masking transformation; train, using an output of the first machine learning model for a training driving scene, a second machine learning model to generate a map of probabilities of one or more target objects to be in an occluded region of the training driving scene, the training driving scene comprising at least one of the first driving scene or a second driving scene; and cause the second machine learning model to be deployed on an autonomous vehicle of the fleet of autonomous vehicles.
The computing system of claim 13, wherein the masking transformation comprises a plurality of random shifts of the target object sampled from a reference distribution, and wherein the first machine learning model comprises a diffusion model and wherein to train the first machine learning model, the first processing device is to reverse the plurality of random shifts; and/or wherein to train the second machine learning model, the first processing device is to: process, using the first machine learning model, the training driving scene to obtain the output of the first machine learning model that comprises a reference map of probabilities of the one or more target objects to be in the occluded region of the training driving scene; and use the reference map of probabilities as ground truth in training of the second machine learning model.
The computing system of claim 13 or 14, further comprising: a second memory device of the autonomous vehicle of the fleet of autonomous vehicles; and a second processing device of the autonomous vehicle, the second processing device communicatively coupled to the second memory device, the second processing device to: process, using the second machine learning model, live perception data associated with a live driving scene to generate a live map of probabilities for one or more live target objects to be in an occluded region of the live driving scene; determine that an individual probability of the live map of probabilities is above a threshold probability; and cause the autonomous vehicle to perform an avoidance action based on the individual probability; optionally wherein the threshold probability depends on at least one of: a distance between the autonomous vehicle and a location associated with the individual probability, a type of the autonomous vehicle, or a condition of a driving surface of the live driving scene.

Description

TECHNICAL FIELD The instant specification generally relates to autonomous vehicles. More specifically, the instant specification relates to detection of occluded pedestrians and other animate objects in automotive environments. BACKGROUND An autonomous (fully and partially self-driving) vehicle (AV) operates by sensing an outside environment with various electromagnetic (e.g., radar and optical) and non-electromagnetic (e.g., audio and humidity) sensors. Some autonomous vehicles chart a driving path through the environment based on the sensed data. The driving path can be determined based on Global Positioning System (GPS) data and road map data. While the GPS and the road map data can provide information about static aspects of the environment (buildings, street layouts, road closures, etc.), dynamic information (such as information about other vehicles, pedestrians, street lights, etc.) is obtained from contemporaneously collected sensing data. Precision and safety of the driving path and of the speed regime selected by the autonomous vehicle depend on timely and accurate identification of various objects present in the outside environment and on the ability of a driving algorithm to process the information about the environment and to provide correct instructions to the vehicle controls and the drivetrain. BRIEF DESCRIPTION OF THE DRAWINGS The present disclosure is illustrated by way of examples, and not by way of limitation, and can be more fully understood with references to the following detailed description when considered in connection with the figures, in which: FIG. 1 is a diagram illustrating components of an example autonomous vehicle (AV) capable of deploying systems that predict presence of target objects in occluded areas of driving environments, in accordance with some implementations of the present disclosure.FIG. 2 is a diagram illustrating example training of a teacher model to identify locations of target objects in driving environments, in accordance with some implementations of the present disclosure.FIG. 3 illustrates one example driving scene that can be used as a training input to train the teacher model of FIG. 2 to identify locations of target objects in driving environments, in accordance with some implementations of the present disclosure.FIG. 4 is a diagram illustrating an example training of a student model to predict a likelihood of presence of target objects in occluded areas of driving environments, in accordance with some implementations of the present disclosure.FIG. 5 is a diagram illustrating example inference operations that deploy a trained occluded object prediction model to predict likelihoods of presence of target objects in occluded areas of driving environments, in accordance with some implementations of the present disclosure.FIG. 6 illustrates schematically a decision-making process used for trajectory planning by an autonomous vehicle, in accordance with some implementations of the present disclosure.FIG. 7 illustrates an example method of training and deploying machine learning models to predict likelihoods of presence of target objects in occluded areas of driving environments, in accordance with some implementations of the present disclosure.FIG. 8 depicts a block diagram of an example computer device capable of training and/or deploying machine learning models to predict likelihoods of presence of target objects in occluded areas of driving environments, in accordance with some implementations of the present disclosure. SUMMARY In one implementation, disclosed is a method that includes training, using perception data associated with a first driving scene, a first machine learning model to determine a location, within the first driving scene, of a target object masked with a masking transformation. The method further includes training, using an output of the first machine learning model for a training driving scene, a second machine learning model to generate a map of probabilities of one or more target objects to be in an occluded region of the training driving scene, the training driving scene comprising at least one of the first driving scene or a second driving scene, and causing the second machine learning model to be deployed on an autonomous vehicle. In another implementation, disclosed is a system that includes a memory device and one or more processing devices communicatively coupled to the memory device. The one or more processing devices are to train, using perception data associated with a first driving scene, a first machine learning model to determine a location, within the first driving scene, of a target object masked with a masking transformation. The one or more processing devices are further to train, using an output of the first machine learning model for a training driving scene, a second machine learning model to generate a map of probabilities of one or more target objects to be in an occluded region of the training driving scene, the