US-12617096-B2 - Coordination of multiple robots using graph neural networks

US12617096B2US 12617096 B2US12617096 B2US 12617096B2US-12617096-B2

Abstract

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for controlling a plurality of robots. One of the methods includes: obtaining state data representing a current state of the environment; generating, from the state data, graph data representing a graph of the current state of the environment; processing the graph data using a graph neural network to generate a graph output that comprises a respective updated feature representation for each of the robot nodes in the graph; and selecting, based on the graph output, a respective action to be performed by each of the robots.

Inventors

Matthew Lai
Jonathan Karl Scholz
Jose Enrique Chen

Assignees

GDM HOLDING LLC

Dates

Publication Date: 20260505
Application Date: 20220915

Claims (18)

1 . A method performed by one or more computers and for controlling a plurality of robots to cause the robots to move to a plurality of target locations in an environment, the method comprising repeatedly performing the following operations: obtaining state data representing a current state of the environment; generating, from the state data, graph data representing a graph of the current state of the environment, the graph comprising a plurality of nodes and a plurality of edges, wherein each edge connects a respective pair of nodes from the plurality of nodes, wherein the plurality of nodes comprises a respective robot node for each of the robots and a respective target node for each of the target locations, wherein the graph includes one or more edges between at least one of the robot nodes and at least one of the target nodes, and wherein the graph data comprises a respective initial feature representation for each of the robot nodes and for each of the target nodes; processing the graph data using a graph neural network to generate a graph output that comprises a respective updated feature representation for each of the robot nodes; selecting, based on the graph output, a respective action to be performed by each of the robots; and controlling at least one of the robots using the selected actions.
2 . The method of claim 1 , wherein the environment includes one or more obstacles, wherein the graph comprises a respective obstacle node for each of the one or more obstacles, and wherein the graph data comprises a respective initial feature representation for each of the obstacle nodes.
3 . The method of claim 2 , wherein the graph includes edges between each robot node and each obstacle node.
4 . The method of claim 3 , wherein the graph does not include any edges between any two obstacle nodes.
5 . The method of claim 3 , wherein the graph does not include any edges between any obstacle node and any target node in the graph.
6 . The method of claim 1 , wherein the graph includes edges between each robot node and each other robot node.
7 . The method of claim 1 , wherein the graph includes edges between each robot node and each target node.
8 . The method of claim 1 , wherein the graph does not include any edges between any two target nodes.
9 . The method of claim 1 , wherein the graph data comprises edge data representing the edges in the graph.
10 . The method of claim 9 , wherein the graph neural network includes one or more graph layers, each of the graph layers configured to update, for any given node, the feature representation for the given node based only on feature representations for nodes that are connected to the node by an edge in the graph.
11 . The method of claim 1 , wherein selecting the respective action to be performed by each of the robots comprises: predicting the respective action by processing the graph output.
12 . The method of claim 1 , wherein selecting the respective action to be performed by each of the robots comprises: performing one or more planning iterations using the graph output to generate plan data; and selecting actions using the plan data.
13 . One or more computer-readable storage media storing instructions that, when executed by one or more computers, cause the one or more computers to perform the operations of the respective method of claim 1 .
14 . A method performed by one or more computers and for controlling a plurality of robots to cause the robots to perform a task that involves moving to a plurality of target locations in an environment, the method comprising repeatedly performing the following operations: performing a plurality of planning iterations starting from a current state of the environment to generate plan data, wherein performing each planning iteration comprises: traversing through states of the environment starting from the current state until a leaf state of the environment is reached; generating, from state data characterizing the leaf state, graph data representing a graph of the leaf state of the environment, the graph comprising a plurality of nodes and a plurality of edges, wherein each edge connects a respective pair of nodes from the plurality of nodes, wherein the plurality of nodes comprises a respective robot node for each of the robots and a respective target node for each of the target locations, wherein the graph includes one or more edges between at least one of the robot nodes and at least one of the target nodes, and wherein the graph data comprises a respective initial feature representation for each of the robot nodes and for each of the target nodes; processing the graph data using a graph neural network to generate a graph output that comprises a respective updated feature representation for each of the robot nodes; generating, from the graph output, an update to the plan data; and updating the plan data using the generated update; after performing the plan data, selecting an action using the plan data; and controlling at least one of the robots using the selected actions.
15 . The method of claim 14 , wherein generating, from the graph output, an update to the plan data comprises: generating a summary feature of the leaf state from the graph output; and processing the summary feature using a value prediction neural network to predict a value score that represents a predicted value of being in the leaf state to successfully completing the task.
16 . The method of claim 14 , wherein generating, from the graph output, an update to the plan data comprises, for each robot node: processing the updated feature representation for the robot node using a policy neural network to generate a policy output that defines a probability distribution over a set of possible actions to be performed by the corresponding robot node when the environment is in the leaf state.
17 . The one or more computers of claim 14 comprising one or more storage devices storing instructions that when executed by the one or more computers, cause the one or more computers to perform the operations of claim 14 .
18 . A system comprising: one or more computers; and one or more storage devices storing instructions that when executed by the one or more computers, cause the one or more computers to perform the operations comprising: obtaining state data representing a current state of the environment; generating, from the state data, graph data representing a graph of the current state of the environment, the graph comprising a plurality of nodes and a plurality of edges, wherein each edge connects a respective pair of nodes from the plurality of nodes, wherein the plurality of nodes comprises a respective robot node for each of the robots and a respective target node for each of the target locations, wherein the graph includes one or more edges between at least one of the robot nodes and at least one of the target nodes, and wherein the graph data comprises a respective initial feature representation for each of the robot nodes and for each of the target nodes; processing the graph data using a graph neural network to generate a graph output that comprises a respective updated feature representation for each of the robot nodes; selecting, based on the graph output, a respective action to be performed by each of the robots; and controlling at least one of the robots using the selected actions.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS This application is a U.S. National Stage application under 35 U.S.C. § 371 and claims the benefit of priority to International Application No. PCT/EP2022/075656, having an International filing date of Sep. 15, 2022, which claims benefit of priority to U.S. Provisional Patent Application No. 63/252,799, filed on Oct. 6, 2021, the disclosure of which is hereby incorporated by reference in its entirety. BACKGROUND This specification relates to robotics, and more particularly to planning robotic movements of a plurality of robots. Robotics planning refers to scheduling the physical movements of robots in order to perform tasks. Certain applications require the coordination of multiple robots in a complex environment. For example, in industrial applications, multiple robot arms can simultaneously operate in a complex workspace to maximize production throughput. Each of the multiple robot arms can be controlled to move along a respective motion trajectory in order to reach one of multiple specified target locations and perform an operation. Coordination of the motion paths and schedules of the multiple robot arms is critical so as to accomplish the operational tasks in an optimal time frame without the multiple robot arms colliding with each other or colliding with obstacle objects in the work environment. SUMMARY This specification describes technologies that relate to using graph neural network (GNN) processing to coordinate the actions of a plurality of robots in an environment. In one innovative aspect, there is described a method for planning the actions to be performed by a plurality of robots in an environment using a graph neural network. A computing system can repeatedly perform the method to generate actions to be performed by each of the robots at each of multiple time steps. The generated robot actions can be communicated to the robots to control their operations in the environment at each time step. The goal for generating the robot actions includes controlling the robots to accomplish specified tasks in an optimal time frame without the robots colliding with each other or colliding with other objects in the environment. As an example, the specified tasks can include moving each of the robots to a respective one of a plurality of target locations. As another example, the specified task can include moving one or more of the multiple robots in a specified path, such as controlling a robotic arm that holding a milling bit to closely follow a milling pattern. As another example, the specified task can include the plurality of robots cooperating to perform a task, such as controlling two robotic arms to hold two work pieces, and a third robotic arm to perform welding. In some implementations, the environment can be a physical environment, e.g., a physical workcell, in which the one or more robots operate in. In some other implementations, the environment can be a virtual representation of a physical environment, e.g., a simulated operational environment, in which simulations of robot motions can be conducted. In the cases of the simulated operational environment, the system can plan the actions for the robots interactively with a simulator that receives the planned actions generated by the planning system and outputs updated state observations of the environment. The planning process can start with obtaining state data representing a current state of the environment. The computing system generates graph data representing a graph of the current state of the environment. The computing system processes the graph data using a graph neural network to generate a graph output, and selects a respective action to be performed by each of the robots based on the graph output. The graph representing the current state of the environment includes a plurality of nodes and a plurality of edges. Each edge connects a respective pair of nodes from the plurality of nodes. The plurality of nodes include a respective robot node for each of the robots and a respective target node for each of the target locations. The graph data includes a respective initial feature representation for each of the robot nodes and for each of the target nodes. As an example, the initial feature representation for each robot node can include one or more coordinates for tooltips, coordinates for each of the joints, and current joint angles. The initial feature representation for each target node can include coordinates of the target location. In some implementations, the initial feature representation for one or more of the target nodes can further include compatibility information for the corresponding one or more targets. The compatibility information for a target node can identify, for example, a sub-set of the robots that are compatible for operating on the corresponding target. For example, the plurality of robots can be configured with various tooltips that are compatible or incompatible for operating o