JP-7855147-B2 - Target-based movement forecasting

JP7855147B2JP 7855147 B2JP7855147 B2JP 7855147B2JP-7855147-B2

Inventors

チョウドゥリー，サンジバン
クマール，スミット
マルケッティ‐ボウィック，ミコル

Assignees

オーロラ・オペレイションズ・インコーポレイティッド

Dates

Publication Date: 20260507
Application Date: 20231221
Priority Date: 20221228

Claims (19)

A computer-implemented method, (a) A step of acquiring state data related to multiple actors in the environment of an autonomous vehicle and map data representing multiple lanes in the environment, (b) A step of determining a plurality of potential targets, including at least one potential target for each individual actor of the plurality of actors, based on the state data and the map data, the potential targets include the location of a potential destination in the environment and the target path to the location of the potential destination, (c) Processing the state data, the map data, and the plurality of potential targets using a machine learning prediction model to determine (i) predicted targets for each actor of the plurality of actors, (ii) predicted interactions between each actor of the plurality of actors and other actors based on the predicted targets, and (iii) a continuous trajectory for each actor based on the predicted targets - the machine learning prediction model includes a graph neural network including a plurality of nodes and a plurality of edges, wherein the plurality of nodes includes (i) a plurality of actor nodes corresponding individually to each actor of the plurality of actors, and (ii) a plurality of target nodes corresponding individually to each of the plurality of potential targets, and the plurality of edges includes (iii) one or more actor-target edges that individually connect individual actor nodes and individual target nodes, and (iv) one or more target-target edges that individually connect at least two of the plurality of target nodes - (d) A computer implementation method comprising the step of initiating the movement of an autonomous vehicle based on the predicted target, the predicted interaction, or the continuous trajectory for each of the actors.
(c) Step is, The computer implementation method according to claim 1, further comprising the step of determining the probability of each potential target for each individual actor, wherein the predicted targets include the individual potential targets having the highest probability.
The computer implementation method according to claim 1, wherein the target path of the potential target includes a nominal path defined in the map data.
The computer implementation method according to claim 3, wherein the map data includes a plurality of map features related to the potential target, and the plurality of map features include a plurality of waypoints for the nominal route defined in the map data.
The computer implementation method according to claim 1, wherein the predicted interactions are further based on potential targets for other actors that intersect with the predicted targets for the individual actors in the environment.
The computer implementation method according to claim 5, wherein the predicted interactions include a probability distribution for one or more interaction types between the individual actors and the other actors.
The computer implementation method according to claim 6, wherein the probability of the predicted interaction between the individual actors and the other actors is based on one or more distances between the individual actors and the shared space relating to the predicted target and the other actors' potential targets.
The computer implementation method according to claim 1, wherein the state data includes a plurality of actor states related to the individual actor, and the plurality of actor states indicate one or more positions, one or more velocities, or one or more directions of travel (headings) for the individual actor at the present time and one or more past time points.
(c) Step is, The first part of the machine learning prediction model processes the state data, the map data, and the multiple potential targets to generate multiple feature embeddings corresponding to the multiple actors and the multiple potential targets. The computer implementation method according to claim 1, comprising the step of processing the plurality of feature embeddings to determine the predicted targets for each of the actors, the predicted interactions between the actors and other actors based on the predicted targets, and the continuous trajectories of the actors based on the predicted targets.
The computer implementation method according to claim 9, wherein the plurality of feature embeddings include actor feature embeddings corresponding to actor nodes, target feature embeddings corresponding to target nodes, actor-target feature embeddings corresponding to actor-target edges, and target-target feature embeddings corresponding to target-target edges.
(c) Step is, The steps include: determining the predicted target for each actor by processing at least one of the actor feature embedding or the actor-target feature embedding; The computer implementation method according to claim 10, comprising the step of processing the objective-objective feature embedding to determine the predicted interactions between the individual actors and other actors based on the predicted objective.
The steps include receiving vehicle motion data for the autonomous vehicle (the vehicle motion data indicates the potential movement of the autonomous vehicle), The steps include generating conditional feature embeddings based on the potential movements of the autonomous vehicle, The steps include: linking the conditional feature embedding to at least one of the plurality of feature embeddings; The computer implementation method according to claim 9, further comprising the step of determining the predicted target, the predicted interaction, and the continuous trajectory for each of the actors by processing the plurality of feature embeddings using a machine learning conditional prediction model.
(c) Step is, The steps include performing one or more message forwarding rounds to generate multiple updated feature embeddings, The computer implementation method according to claim 9, comprising the step of determining the predicted targets, predicted interactions, and continuous trajectories for each actor based on the plurality of updated feature embeddings.
A computing system, One or more processors, The system includes one or more non-temporary computer-readable media that store instructions that can be executed by one or more processors to perform an operation, and the operation is (a) An operation to acquire state data related to multiple actors in the environment of an autonomous vehicle and map data representing multiple lanes in the environment, (b) An operation to determine a plurality of potential targets, including at least one potential target for each of the plurality of actors, based on the state data and the map data, the potential targets include the location of the potential destination in the environment and the target path to the location of the potential destination, (c) Processing the state data, the map data, and the plurality of potential targets using a machine learning prediction model to determine (i) the predicted targets for each actor of the plurality of actors, (ii) the predicted interactions between each actor of the plurality of actors and other actors based on the predicted targets, and (iii) the operation of determining a continuous trajectory for each actor based on the predicted targets - the machine learning prediction model includes a graph neural network including a plurality of nodes and a plurality of edges, wherein the plurality of nodes includes (i) a plurality of actor nodes corresponding individually to each actor of the plurality of actors, and (ii) a plurality of target nodes corresponding individually to each of the plurality of potential targets, and the plurality of edges includes (iii) one or more actor-target edges that individually connect individual actor nodes and individual target nodes, and (iv) one or more target-target edges that individually connect at least two of the plurality of target nodes - (d) A computing system including an action to initiate the movement of an autonomous vehicle based on the predicted target, the predicted interaction, or the sequential trajectory for each of the actors.
(c) The operation is, The computing system according to claim 14, further comprising the operation of determining the probability of each potential target for each individual actor, wherein the predicted targets include the individual potential targets having the highest probability.
The computing system according to claim 14, wherein at least one actor represents the autonomous vehicle, and the state data relates to at least one of the autonomous vehicle's position estimation system or inertial measurement device.
The computing system according to claim 16, wherein the map data includes a plurality of map features relating to the potential target, and the plurality of map features include a plurality of waypoints for a nominal route defined in the map data.
A control system for an autonomous vehicle, wherein the control system is One or more processors, The system includes one or more non-temporary computer-readable media that store instructions that can be executed by one or more processors to perform an operation, and the operation is (a) An operation to acquire state data related to multiple actors in the environment of the autonomous vehicle and map data representing multiple lanes in the environment, (b) An operation to determine a plurality of potential targets, including at least one potential target for each of the plurality of actors, based on the state data and the map data, the potential targets include the location of the potential destination in the environment and the target path to the location of the potential destination, (c) Processing the state data, the map data, and the plurality of potential targets using a machine learning prediction model to determine (i) the predicted targets for each actor of the plurality of actors, (ii) the predicted interactions between each actor of the plurality of actors and other actors based on the predicted targets, and (iii) the operation of determining a continuous trajectory for each actor based on the predicted targets - the machine learning prediction model includes a graph neural network including a plurality of nodes and a plurality of edges, wherein the plurality of nodes includes (i) a plurality of actor nodes corresponding individually to each actor of the plurality of actors, and (ii) a plurality of target nodes corresponding individually to each of the plurality of potential targets, and the plurality of edges includes (iii) one or more actor-target edges that individually connect individual actor nodes and individual target nodes, and (iv) one or more target-target edges that individually connect at least two of the plurality of target nodes - (d) An autonomous vehicle control system comprising: an action to initiate the movement of the autonomous vehicle based on the predicted target, the predicted interaction, or the continuous trajectory of the individual actors.
(c) The operation is, The control system for an autonomous vehicle according to claim 18, further comprising the operation of determining the probability of each potential target for each of the individual actors, wherein the predicted targets include the individual potential targets having the highest probability.

Description

Claim of Priority <br/> This application claims its benefit under U.S. Patent Application No. 18/147,316, filed December 28, 2022, and U.S. Patent Application No. 18/471,960, filed September 21, 2023, both of which are incorporated herein by reference in their entirety. An autonomous platform can process data to recognize the environment in which it can operate. For example, an autonomous vehicle can use various sensors to perceive its surroundings and identify objects around it. Through the perceived surrounding environment, the autonomous vehicle can identify an appropriate route and travel along that route with little to no human intervention. This disclosure describes an improved intent prediction technique that can be used in autonomous platforms for motion prediction and ultimately motion planning. The improved intent prediction technique includes a machine learning model (e.g., a graph neural network) trained to generate a number of discrete intents and continuous motion outputs based on historical actor observations and map geometry for traffic scenes. The outputs include (i) goal probability (e.g., the probability that an actor on a road will follow a nominal path), (ii) interaction probability (e.g., the probability that an actor will yield or reverse yield to another actor), and (iii) a continuous goal-based trajectory of an actor. For example, an autonomous platform such as an autonomous vehicle can use these outputs to more accurately predict the future movements of actors in its environment while planning its own movements. In some cases, additional machine learning models (e.g., typified graph neural networks) can be used to adjust the output based on the predicted movement of the autonomous platform. The techniques described here can improve the speed, efficiency, and accuracy of predicting the future movement of dynamic actors in traffic scenarios, thereby improving the decision-making and response times of dynamic actors when planning the movement of the autonomous platform. More specifically, this disclosure relates to a machine learning predictive model for predicting the future movement of actors in a traffic scene based on environmental state data and map data. Actors include both autonomous platforms and other dynamic objects in the traffic scene. State data may include current and historical observations such as location, velocity, and direction of movement for each actor in the scene. Map data can identify multiple lanes in the traffic scene and different lane features for each of the multiple lanes. Based on the actors' current state and the map data, the model may determine a number of potential goals for individual actors in the traffic scene. Each goal may include a short-term goal (e.g., a potential goal within the next 5 seconds) and a target path to the target that each actor can access based on the actor's location in the traffic scene (e.g., multiple waypoints along one or more traffic lanes). The machine learning predictive model may process the state data and map data to determine the probability that an actor (i) will follow a target path and/or (ii) will interact with other actors in the traffic scene while following a target path. Furthermore, this model can determine the continuous trajectories of actors conditioned according to (iii) the target path. A machine learning prediction model may include a graph neural network with multiple nodes and edges. The multiple nodes may include one or more target nodes of the target node type and one or more actor nodes of the actor node type. The multiple edges may include one or more actor-target edges of the actor-target edge type and one or more goal-target edges of the goal-target edge type. An actor-target edge may connect a particular actor to its potential goal, and a goal-target edge may connect two goals associated in a "shared space" (e.g., the space where two corresponding goal paths intersect), which could lead to an interaction between two actors performing the two goals, respectively. Multiple nodes and edges may contain feature representations encoded using features different from state data and map data, based on their respective node and edge types. For example, an actor node may contain an actor feature representation encoded based on the current state and one or more past states of a particular actor. A target node may contain a target feature representation encoded based on the waypoint information of the corresponding target path. An actor-target edge may contain an actor-target feature representation encoded based on actor state information in the Path Tangent Frame. Furthermore, a target-target edge may contain a target-target feature representation encoded based on actor state information for two actors that can perform two goals related to the shared space. Message forwarding may be performed in several rounds to update the feature representations based on information from adjacent nodes and edges. The ou