JP-7856754-B2 - Method for training a robotic device controller, method for controlling a robotic device, robotic device control system, computer program elements, and computer-readable media

JP7856754B2JP 7856754 B2JP7856754 B2JP 7856754B2JP-7856754-B2

Inventors

チェン、リン、ツァイ
チア、イー、チョン
クリッティン、コーキーリー
シャーリー

Assignees

ディーコンストラクト、テクノロジーズ、プライベート、リミテッド

Dates

Publication Date: 20260511
Application Date: 20210917

Claims (20)

A method for training a robotic device controller, For each of the multiple digital training input images, the encoder network encodes the digital training input image into features in the latent space. The decoder network determines, from the features, whether each of the multiple regions shown in the digital training input image is traversable, and determines information regarding the distance between the viewpoint of the digital training input image and the region in terms of relative depth. The policy model determines control information for controlling the movement of the robot device based on the aforementioned features. This includes training a neural network that includes the encoder network, the decoder network, and the policy model, At least the policy model is trained in a supervised manner using control information ground truth data of the digital training input images. method.
The method according to claim 1, wherein training the encoder network and the decoder network includes training an autoencoder comprising the encoder network and the decoder network.
The method according to claim 1 or 2, comprising training the encoder network together with the decoder network.
The method according to any one of claims 1 to 3, comprising training the encoder network together with the decoder network and the policy model.
The decoder network includes a semantic decoder and a depth decoder, and for each digital training input image, The semantic decoder determines, based on the features, whether each of the multiple regions shown in the digital training input image is traversable. Based on the characteristics , the depth decoder determines that for each of the multiple regions shown in the digital training input image, The method according to any one of claims 1 to 4, wherein the neural network is trained to determine information regarding the distance between the viewpoint of the digital training input image and the region.
The method according to claim 5, wherein the semantic decoder is trained in a supervised manner.
The method according to claim 5, wherein the depth decoder is trained in a supervised manner or in an unsupervised manner.
The method according to any one of claims 1 to 7, wherein one or more of the encoder network, the decoder network, and the policy model are convolutional neural networks.
The method according to any one of claims 1 to 8, wherein the control information includes control information for each of a plurality of robot device movement commands.
The method according to any one of claims 1 to 9, wherein the policy model is trained such that the neural network determines the control information from features encoded by the encoder network of multiple training input images.
A method for controlling a robotic device, Training a robot device controller according to the method described in any one of claims 1 to 10, Acquiring one or more digital images showing the surroundings of the robot device, Encoding one or more features of the one or more digital images using the encoder network, To supply one or more of the above features to the policy model, A method comprising controlling the robot device in accordance with the control information output of the policy model in response to one or more of the aforementioned features.
The method according to claim 11, comprising receiving one or more digital images from one or more cameras of the robot device.
The method according to claim 11 or 12, wherein the control information includes control information for each of a plurality of robot device movement commands, and the method includes receiving an instruction for a robot device movement command and controlling the robot device in accordance with the control information for the instructed robot device movement command.
The policy model is such that the neural network is trained to determine the control information from features encoded by the encoder network from multiple training input images, and the method is The acquisition of multiple digital images showing the surroundings of the robot device, The encoder network is used to encode the multiple digital images into multiple features, The above-mentioned multiple features are supplied to the policy model, The method according to any one of claims 11 to 13, further comprising controlling the robot device in accordance with the control information output of the policy model in response to the plurality of features.
The method according to claim 14, wherein the plurality of digital images include images received from different cameras.
The method according to claim 14 or 15, wherein the plurality of digital images include images taken from different viewpoints.
The method according to any one of claims 14 to 16, wherein the plurality of digital images include images taken at different times.
A robotic device control system configured to carry out the method described in any one of claims 1 to 17.
A computer program element that, when executed by one or more processors, includes program instructions that cause the one or more processors to carry out the method described in any one of claims 1 to 17.
A computer-readable medium containing program instructions that, when executed by one or more processors, cause the one or more processors to carry out the method described in any one of claims 1 to 17.

Description

Various aspects of this disclosure relate to apparatuses and methods for controlling robotic devices, and apparatuses and methods for training robotic device controllers. Robotic devices, such as mobile robots, can be controlled using remote control by a human user. For this purpose, the human user can, for example, be supplied with images from the robot's perspective and react accordingly, or maneuver the robot around obstacles. However, this requires precise input from the user at the right time and therefore constant attention from the human user. Therefore, a method is desirable that allows robots to move more autonomously in accordance with high-level commands from human users, such as "move forward" (along a path like a corridor), "turn right," or "turn left." According to various embodiments, a method is provided for training a robotic device controller, comprising training a neural network including an encoder network, a decoder network, and a policy network, wherein for each of a plurality of digital training input images, an encoder network encodes the digital training input image into features in latent space; a decoder network determines from the features whether each of a plurality of regions shown in the digital training input image is traversable and determines information regarding the distance between the viewpoint of the digital training input image and the region; and a policy model determines from the features control information for controlling the movement of the robotic device, with at least the policy model being trained in a supervised manner using control information ground truth data of the digital training input images. According to one embodiment, training an encoder network and a decoder network includes training an autoencoder comprising an encoder network and a decoder network. According to one embodiment, this method includes training an encoder network together with a decoder network. According to one embodiment, the method includes training the encoder network together with the decoder network and the policy network. According to one embodiment, the decoder network includes a semantic decoder and a depth decoder. For each digital training input image, the semantic decoder determines from features whether each of the multiple regions shown in the digital training input image is traversable, and the depth decoder determines from one or more features information regarding the distance between the viewpoint of the digital training input image and each of the multiple regions shown in the digital training input image. The neural network is trained in this manner. According to one embodiment, the semantic decoder is trained in a supervised manner. According to one embodiment, the depth decoder is trained in a supervised manner, or the depth decoder is trained in an unsupervised manner. According to one embodiment, one or more of the encoder network, decoder network, and policy network are convolutional neural networks. According to one embodiment, the control information includes control information for each of the multiple robot device movement commands. According to one embodiment, the neural network is trained so that the policy model determines control information from features encoded by the encoder from multiple training input images. According to one embodiment, a method for controlling a robotic device is provided, comprising: training a robotic device controller according to a method according to any one of the embodiments described above; acquiring one or more digital images showing the surroundings of the robotic device; encoding one or more digital images into one or more features using an encoder network; supplying one or more features to a policy network; and controlling the robot in response to one or more features according to the control information output of a policy model. According to one embodiment, the method includes receiving one or more digital images from one or more cameras of a robotic device. According to one embodiment, the control information includes control information for each of a plurality of robot device movement commands, and the method includes receiving instructions for robot device movement commands and controlling the robot according to the control information for the instructed robot device movement commands. According to one embodiment, the policy model is a neural network trained to determine control information from features encoded by an encoder of multiple training input images. The method includes: acquiring multiple digital images showing the surroundings of a robotic device; encoding the multiple digital images into multiple features using the encoder network; supplying the multiple features to the policy network; and controlling the robot according to the control information output of the policy model in response to the multiple features. According to one embodiment, the multiple digital images include images received from diff