EP-4736149-A1 - SYSTEM AND METHODS FOR TRAINING AND VALIDATION OF AN END-TO-END ARTIFICIALLY INTELLIGENT NEURAL NETWORK FOR AUTONOMOUS DRIVING AT SCALE

EP4736149A1EP 4736149 A1EP4736149 A1EP 4736149A1EP-4736149-A1

Abstract

The technology disclosed comprises systems and methods for the training and validation for an end-to-end neural-network learning model configured for autonomous driving. The end-to-end neural-network learning model is trained using human-operated driving demonstration data to curate training data examples of driving tasks and driving routes, as well as curation of particularly difficult driving tasks. The determination of difficulty of driving tasks uses a combination of entropy measurements in training, evaluation of model performance, and manual labeling. The conditional imitation learning model can be configured as a memory-augmented transformer model that leverages a memory-cached frame buffer to access previous states in a driving trajectory. The disclosed technology can be applied to passenger vehicles or autonomous robots for delivery tasks.

Inventors

KENTLEY-KLAY, TIM
DUVAUD, Werner
HAINAUT, Aurèle
DELOCHE, Maxime
CARRÉ, Ludovic

Assignees

HYPRLABS, INC.

Dates

Publication Date: 20260506
Application Date: 20240629

Claims (20)

CLAIMS What is claimed is: 1. A computer-implemented method for building a training data set for training an end-to- end neural network for autonomous driving tasks from at least a hundred thousand hours of operator supervised driving data, the method including: collecting, from a fleet of human operator supervised vehicles, demonstration data from driving tasks comprising: human operators in the fleet each supervising the vehicle through a driving task that has an intended route for at least a next 3 seconds while the vehicle captures data for a sequence of driving states including at least video from one camera, location data from a GNSS receiver, a velocity vector of travel, steering wheel orientation, and accelerator/brakes actuation; the hundred-thousand hours of operator supervised driving data including encounters with a distribution of driving tasks following the intended routes including lane keeping, turning, arriving at a destination, parking, navigating in proximity to moving and parked vehicles and pedestrians, obeying traffic signals, and avoiding collisions; and first entropic situations organically arising during driving tasks and captured in the driving data; directing part of the human operator supervised driving to create and resolve second entropic situations by imposing on a particular driving task a particular entropic situation for a particular human operator in the fleet to execute, flagging at least starts of executing the second entropic situations, and capturing the driving data as the particular human operator extricates the vehicle from the particular entropic situation; curating, from the captured driving data, a training set of driving data to imitate, wherein the curating includes: selecting a representative sample of base routine driving situations with starts and ends; identifying the first entropic situations and selecting a set of first entropic situations with starts and ends; locating the flagged second entropic situations and selecting a set of second entropic situations with starts and ends; and excluding or labelling as negative examples driving tasks, if any, that produced an avoidable collision with a moving or stationary object; and saving the curated driving data for use in training an end-to-end conditional imitation learning model to automate the vehicle.
2. The method of claim 1, further including: initializing an end-to-end conditional imitation learning model to automate the vehicle, wherein the end-to-end conditional imitation learning model is configured to imitate a behavioral policy of steering and accelerator/brakes actuation, defined by a probability distribution of actions given states in the curated training data set, whereby the imitated behavioral policy is leveraged to predict a driving control action in response to (i) a present state including a visual image, location, intended path, and steering and accelerator/brake actuation, (ii) a compressed representation from at least five earlier states over at least three seconds, and (iii) at least one intended route condition; and training the conditional imitation learning model with the curated training data set to imitate the behavioral policy of the human operators in the fleet, wherein the training further comprises optimizing the imitated behavioral policy by minimizing a dissimilarity metric between the imitated behavioral policy and the human operator behavioral policy until a pre-defined stopping point is reached.
3. The method of claim 1 or claim 2, wherein the training further comprises satisfying a focal loss function that emphasizes training to handle the first and second entropic situations.
4. The method of any one of claims 1-3, further including: the vehicle receiving an updated intended route for the at least three seconds in at least one sample in the selected set of driving data to imitate; and the updated intended route being used as part of the sample beginning when the vehicle adopted the updated intended route, whereby the intended route changed during the training sample.
5. The method of any one of claims 1-4, further including the intended route having an origin to a destination.
6. The method of any one of claims 1-5, wherein the data captured for the sequence of driving states further includes accelerometer G-forces.
7. The method of any one of claims 6, further including using an accelerometer analysis to identify the first entropic situations.
8. The method of any one of claims 1-7, further including using entropy between imitated behavior and the driving control action predicted by the conditional imitation learning model to identify the first entropic situations.
9. A computer-implemented method for building a training data set for training autonomous delivery tasks from at least a hundred hours of operator supervised driving data, the method including: collecting, from a fleet of human operator supervised transporters demonstration data from delivery tasks comprising: human operators in the fleet each supervising the transporter through a delivery task that has an intended route for at least the next 3 seconds while the transporter captures data for a sequence of driving states including at least video from one camera, returns from at least one radar or LiDAR, location data from a GNSS receiver, velocity vector of travel, steering orientation, and accelerator/brakes actuation; the hundred hours of operator supervised driving data including encounters with a distribution of delivery tasks including lane keeping, turning, arriving at a destination, navigating in the presence of moving and parked vehicles and pedestrians, obeying traffic signals, and avoiding collisions; first entropic situations organically arising during delivery tasks and captured in the driving data; and directing part of the human operator supervised driving to create and resolve second entropic situations by having a confounding operator take over the supervised driving and creating a second entropic situation for the human operator to resolve, flagging take over and relinquishment of control by the confounding operator, and capturing the driving data as the human operator extricates the transporter from the second entropic situation; curating, from the captured driving data, a training set of at least 15 hours of driving data to imitate, wherein the curating includes: selecting a representative sample of base routine driving situations with starts and ends; identifying the first entropic situations and selecting a set of first entropic situations with starts and ends; locating the flagged second entropic situations and selecting a set of second entropic situations with starts and ends; and excluding or labelling as negative examples driving tasks, if any, that produced an avoidable collision with a moving or stationary object; and saving the curated driving data for use in training an end-to-end conditional imitation learning model to automate the vehicle.
10. The method of claim 9, further including: initializing an end-to-end conditional imitation learning model to automate the vehicle, wherein the end-to-end conditional imitation learning model is configured to imitate a behavioral policy of steering and accelerator/brakes actuation, defined by a probability distribution over actions and states in the curated training data set, such that the imitated behavioral policy is leveraged to predict a driving control action in response to (i) the present state, (ii) at least five earlier states over at least three seconds, and (iii) at least one intended route condition; and training the conditional imitation learning model with the curated training data set, such that the conditional imitation learning model is trained to imitate the behavioral policy of the human operators in the fleet, wherein the training further comprises optimizing the imitated behavioral policy by minimizing a dissimilarity metric between the imitated behavioral policy and the human operator behavioral policy until a pre-defined stopping point is reached.
11. The computer-implemented method of claim 9 or claim 10, with further training the end- to-end neural network for autonomous delivery tasks from at least a hundred hours of confounded autonomous driving data, the method including: collecting, from a fleet of autonomous transporters demonstration data from autonomous delivery tasks comprising: autonomous transporters in the fleet each operating the conditional imitation learning model as an autonomous agent to supervise the transporter through a delivery task that has an intended route from an origin to a destination while the transporter captures data for a sequence of driving states including at least video from one camera, returns from at least one radar or LiDAR, location data from a GNSS receiver, velocity vector of travel, steering orientation, and accelerator/brakes actuation; third entropic situations organically arising during delivery tasks and captured in the driving data; taking over part of the delivery tasks to create and resolve fourth entropic situations by having the confounding operator take over the autonomous driving and creating a fourth entropic situation for the autonomous agent to resolve, flagging take over and relinquishment of control by the confounding operator, and capturing the driving data as the autonomous transporter extricates the itself from the second difficult situation; curating, from the confounded autonomous driving data, a further autonomous training set of at least 10 hours of driving data to imitate, wherein the curating includes: identifying the third entropic situations and selecting a set of third entropic situations with starts and ends; locating the flagged fourth entropic situations, human and selecting a set of fourth entropic situations with starts and ends; and update training the conditional imitation learning model with the further curated autonomous training data set, such that the conditional imitation learning model training is reinforced by autonomous resolution of the fourth entropic situations.
12. The method of any one of claims 9-11, further including: the vehicle receiving an updated intended route for the at least three seconds in at least one sample in the selected set of driving data to imitate; and the updated intended route being used as part of the sample beginning when the vehicle adopted the updated intended route, whereby the intended route changed during the training sample.
13. An end-to-end conditional imitation learning model, including a stack of processors trained by imitation learning to control an autonomous vehicle, the processors running on processing hardware coupled to memory, further including: an input receiving processor that receives a video camera feed, a steering orientation feed, an accelerator / brake feed, a velocity vector feed, a current location feed, and an intended course feed; a first-in first-out frame buffer that holds at least nine prior frames of embeddings from a second stage processor, the frames spanning at least three seconds of travel by the autonomous vehicle a first stage processor that embeds the video camera feed into an embedding space; the second stage processor that further processes output from the first stage processor combined with the at least nine prior frames, the steering orientation feed, the accelerator / brake feed, the velocity vector feed, the current location feed, and the intended course feed for at least a next three seconds of operation and produces a frame output; and a third classification processor that converts the frame output from the second stage into actuation signals directed to control the steering wheel and the accelerator / brake.
14. The trainer of claim 13, wherein the first stage processor and the second stage processor are transformers, and the third classification processor is a fully connected neural network or multi-layer perceptron.
15. The trainer of claim 13 or claim 14, wherein the input receiving processor further receive a radar or LiDAR feed and the first stage processor inputs include the radar or LiDAR feed. 16. A training set selector for building a training data set for training an end-to-end neural network for autonomous driving tasks from at least a 100,000 hours of operator supervised driving data, wherein the driving data includes demonstration data collected from a fleet of human operator supervised vehicles comprises: human operators in the fleet each supervising the vehicle through a driving task that has an intended route for at least the next 3 seconds while the vehicle captures data for a sequence of driving states including at least video from one camera, location data from a GNSS receiver, a velocity vector of travel, steering wheel orientation, and accelerator/brakes actuation; the 100,000 hours of operator supervised driving data including encounters with a distribution of driving tasks following the intended routes including lane keeping, turning, arriving at a destination, parking, navigating in the presence of moving and parked vehicles and pedestrians, obeying traffic signals, and avoiding collisions; and first entropic situations organically arising during driving tasks and captured in the driving data; second entropic situations created by imposing on a particular driving task a particular entropic situation for a particular human operator in the fleet to execute and resolve, with flagged at least starts of executing the second entropic situations, and captured driving data as the particular human operator extricated the vehicle from the particular entropic situation; wherein the training set selector comprises: a base situation selection processor configured to automatically select a representative sample of base routine driving situations with starts and ends; a first selection processor configured to automatically identify the first entropic situations and select a set of first entropic situations with starts and ends; a second selection processor configured to automatically locate the flagged second entropic situations and select a set of second entropic situations with starts and ends; and a manual curation GUI configured to interact with a user who excludes or labels as negative examples driving tasks, if any, that produced an avoidable collision with a moving or stationary object; and a training set builder configured to save the curated driving data for use in training an end-to-end conditional imitation learning model to automate the vehicle.
16. A vehicle operation validator method, utilizing the trained stack of any of claims 13-15, an onboard validation processor, and a central validation processor, including for each vehicle in a fleet: the trained stack receiving the feeds, processing the frames, and outputting steering and acceleration actuation signals; the onboard validation processor comparing the actuation signals from the trained ML stack to operator generated actuation signals, detecting deviations and flagging the deviations; the central validator receiving the feeds, the actuation signals, and the flagged deviations; and the central validation processor processing the flagged deviations in near real time; finding at least one instance in which operation of the vehicle in accordance with the actuation signals that gave rise to the flagged deviations would result in would have led to a virtual incident; and reporting the virtual incident in the near real time for human devised corrective training.
17. The vehicle operation validator method of claim 16, further utilizing an onboard incident sensor on each vehicle in the fleet and a central incident calculator, further including: the onboard incident sensor detecting an incident from at least one vehicle in the fleet and flagging the incident; the central incident calculator processing the flagged incident from the at least one vehicle in the near real time; finding at least one flagged incident in which the operator’s actuations leading to the flagged incident and the trained ML stack’s actuation signal coincided; and reporting the flagged incident coincidence in the near real time as a candidate for human devised corrective training.
18. A vehicle operation validator method, utilizing the trained stack of any of claims 13-15, an onboard validation processor, and a central validation processor, including for each vehicle in a fleet: the trained stack receiving the feeds, processing the frames, and outputting the actuation signals; an operator observing and intervening to take over control the operation of the vehicle; the onboard validation processor comparing the actuation signals from the trained ML stack to operator intervention actuation signals, detecting deviations and flagging the deviations; the central validator receiving the feeds, the actuation signals and the flagged deviations from at least one vehicle in the fleet; and the central validation processor processing the flagged deviations in near real time; finding at least one instance in which operation of the vehicle in accordance with the actuation signals that gave rise to the flagged deviations would result in would have led to a virtual incident; and reporting the virtual incident in the near real time for human devised corrective training.
19. The vehicle operation validator method of claim 18, further utilizing an onboard incident sensor on each vehicle in the fleet and a central incident calculator, further including: the onboard incident sensor detecting an incident from at least one vehicle in the fleet and flagging the incident; the central incident calculator processing the flagged incident from the at least one vehicle in the near real time; finding at least one flagged incident in which the operator’s actuations leading to the flagged incident and the trained ML stack’s actuation signal coincided; and reporting the flagged incident coincidence in the near real time as a candidate for human devised corrective training.
20. A vehicle operation validator method, utilizing the trained stack of any of claims 13-15, an onboard validation processor, and a central validation processor, including for each vehicle in a fleet: the trained stack receiving the feeds, processing the frames, and outputting the actuation signals; an operator at least once taking over control, creating an entropic situation, and relinquishing control, producing a flagged deviation; the onboard validation processor comparing the actuation signals from the trained ML stack to operator generated actuation signals, detecting deviations and flagging the deviations; the central validator receiving the feeds, the actuation signals, the flagged deviations and the flagged incident from at least one vehicle in the fleet; the central validation processor processing the flagged deviations in near real time; finding at least one instance in which operation of the vehicle in accordance with the actuation signals that gave rise to the flagged deviations would result in would have led to a virtual incident; and reporting the virtual incident in the near real time as a candidate for human devised corrective training.

Description

SYSTEM AND METHODS FOR TRAINING AND VALIDATION OF AN END-TO-END ARTIFICIALLY INTELLIGENT NEURAL NETWORK FOR AUTONOMOUS DRIVING AT SCALE CROSS-REFERENCE [0001] This application claims the benefit of and priority to U.S. Provisional Application No. 63/524,213 filed 29 June 2023, titled “Scalable Training and Validation For an End-To-End Autonomous Driving Model” (Atty. Docket No. HYPR 1001-1). RELATED CASES [0002] This application is related to the following commonly owned applications which are incorporated by reference herein for all purposes. [0003] U.S Patent No. 18/731,115, filed 31 -May-2024, titled “System and Methods For Providing Driver Assistance Alerts Using an End-To-End Artificially Intelligent Collision Avoidance System and Advanced Driver Assistance Systems” (Atty. Docket No. HYPR 1002-1). [0004] U. S CIP Patent Application No. , filed contemporaneously, titled “System and Methods For Providing Driver Assistance Alerts Using an End-To-End Artificially Intelligent Collision Avoidance System and Advanced Driver Assistance Systems” (Atty. Docket No. HYPR 1002-3). [0005] U.S. Patent Application No. 18/431,827, filed 2 February 2024, titled “Multi- Functional Inventory Storage and Delivery System” (Atty. Docket No. HYPR 1000-2) which claims priority to U.S. Provisional Application 63/443,342 filed 3 February 2023, titled “Multi- Functional Inventory Storage and Delivery System” (Atty. Docket No. HYPR 1000-1). FIELD OF THE TECHNOLOGY DISCLOSED [0006] The technology disclosed relates to end-to-end neural networks configured for autonomous driving. In particular, the technology disclosed relates to a scalable method and apparatus for training and validating an end-to-end network configured for autonomous driving. BACKGROUND [0007] The subject matter discussed in this section should not be assumed to be prior art merely as a result of its mention in this section. Similarly, a problem mentioned in this section or associated with the subject matter provided as background should not be assumed to have been previously recognized in the prior art. The subject matter in this section merely represents different approaches, which in and of themselves can also correspond to implementations of the technology disclosed. Autonomous driving technology has been of great interest to academia, industry, and public sector in recent years thanks to the advantages offered in both driver and rider satisfaction and safety. Vehicle automation can already be observed in today’s market via the use of semi- automated systems such as advanced driver assistance systems and partially automated functions for a wide range of tasks including lane changing, speed control, and parking maneuvers. Automation of these tasks is highly desirable to drivers due to the increased convenience, assurance, and comfort while driving. Moreover, advancements to autonomous driving are beneficial to public safety, infrastructure, and vehicle longevity due to the potential reduction in number and severity of vehicle accidents offered by advanced driver assistance systems. Additionally, autonomous driving technology is highly relevant to a plethora of other robotic apparatuses and methods including space probes, industrial robot arms, military drones, and delivery robots. For example, the E-commerce industry can benefit from the use of autonomous delivery robots that bypass efficiency, cost, quality, and environmental pollution concerns addressed with traditional delivery methods. Despite over forty years of research on autonomous vehicle development bolstered by advancements in artificial intelligence, computer vision, sensor technology, and network infrastructure, fully autonomous vehicles are not yet available for individual or commercial use on the market. The Society of Automotive Engineers defines six levels of driving automation ranging from zero (fully controlled by a human agent) to five (fully autonomously controlled). Although progress is substantial, safety and reliability performance is still lacking. Traditional autonomous driving systems, characterized by an aggregation of independent submodules responsible for individual tasks such as perception, localization, mapping, and path-planning, are challenging to optimize due to the complexity of the systems, requiring large teams of expensive engineers, often over 1000, and the enormous volume of data necessary to develop these systems which are sensor and compute heavy. Furthermore, the manual labelling of this data, or supervised learning, necessary for the artificial intelligence systems configured for traditional autonomous driving is expensive. Many data formats required by traditional autonomous driving systems, such as pre-built high-definition maps, are not only expensive to construct and label, but pose risks to safety and generalizability due to the limited capacity to react in situations where the real world environment does not correlate to the map as expected. The drawbacks asso