EP-4737069-A1 - DEVICE AND METHOD FOR LEARNING ROBOT DEVICE DYNAMICS
Abstract
According to various embodiments, a method for learning robot device dynamics is described, comprising providing demonstrations for movements of a robot device, wherein each demonstration demonstrates dynamics of the robot device by indicating a sequence of demonstrated states of the robot device in an ambient space, for each demonstration, encoding the demonstrated states of the sequence of demonstrated states of the demonstration to encoded demonstrated states in a latent space by an encoding function which maps states from the ambient space to the latent space, determining latent space velocities at the encoded demonstrated states according to predetermined dynamics in the latent space, determining predicted velocities at the demonstrated states in ambient space from the determined latent space velocities at the encoded demonstrated states according to the Jacobian of an inverse of the encoding function and determining a loss for the demonstration including a prediction loss determined from a difference of the predicted velocities at the demonstrated states and the demonstrated velocities and training the encoding function to reduce a total loss including the losses determined for at least some of the demonstrations.
Inventors
- Rozo, Leonel
- Beik-Mohammadi, Hadi
- LI, JIALIN
Assignees
- Robert Bosch GmbH
Dates
- Publication Date
- 20260506
- Application Date
- 20241104
Claims (11)
- A method for learning robot device dynamics, comprising: providing (201) demonstrations for movements of a robot device (101), wherein each demonstration demonstrates dynamics of the robot device (101) by indicating a sequence of demonstrated states of the robot device (101) in an ambient space; for each demonstration, encoding (202) the demonstrated states of the sequence of demonstrated states of the demonstration to encoded demonstrated states in a latent space by an encoding function which maps states from the ambient space to the latent space; determining (203) latent space velocities at the encoded demonstrated states according to predetermined dynamics in the latent space; determining (204) predicted velocities at the demonstrated states in ambient space from the determined latent space velocities at the encoded demonstrated states according to the Jacobian of an inverse of the encoding function; and determining (205) a loss for the demonstration including a prediction loss determined from a difference of the predicted velocities at the demonstrated states and the demonstrated velocities; and training (206) the encoding function to reduce a total loss including the losses determined for at least some of the demonstrations.
- The method of claim 1, wherein the predetermined dynamics in the latent space are contractive.
- The method of claim 2, wherein the predetermined dynamics in the latent space are given by a matrix with predetermined eigenvalues.
- The method of claim 3, comprising determining the eigenvalues according to a predetermined contraction rate and/or a predetermined contraction ratio.
- The method of claim 3 or 4, wherein the total loss further includes an alignment loss term which rewards that the demonstrated trajectories (after being encoded to trajectories of encoded states, i.e. encoded trajectories) match the direction of the eigenvector of the largest eigenvalue of the predetermined dynamics in latent space (i.e. the loss encourages the ambient eigen-axis (i.e. the decoder image of the latent space axis given by the direction of the eigenvector of the largest eigenvalue) to align with the demonstrated trajectories).
- The method of any one of claims 1 to 5, wherein the total loss further comprises a manifold matching loss term which rewards that the set of points which the inverse of the encoding function generates from the encoded demonstrated states matches the set of demonstrated states in ambient space.
- The method of any one of claims 1 to 6, comprising constructing the encoding function from a sequence of diffeomorphisms wherein training the encoding function comprises adjusting parameters of the diffeomorphisms.
- A method for controlling a robot device (101), comprising learning robot device dynamics according to any one of claims 1 to 7 and following the determined robot device dynamics for controlling the robot device (101).
- A robot device controller (106), configured to perform a method of any one of claims 1 to 8.
- A computer program comprising instructions which, when executed by a computer, makes the computer perform a method according to any one of claims 1 to 8.
- A computer-readable medium comprising instructions which, when executed by a computer, makes the computer perform a method according to any one of claims 1 to 8.
Description
The present disclosure relates to devices and methods for learning robot device dynamics (for robot device control). To ensure the safety of fully autonomous robots, stability guarantees are crucial in preventing undesirable and potentially harmful actions. Learning dynamic skills from demonstrations provides an efficient method to model highly dynamic motions from a few examples. However, stability guarantees are hard to provide in dynamical systems that are learned from demonstrations, especially when the learned dynamics are governed by neural networks. Therefore, effective approaches for learning dynamics in a way that stability is ensured are desirable. Moreover, it is desirable to be able to efficiently learn dynamics for different tasks and skills such that the robot is able to autonomously handle different control scenarios. The publication C. Durkan et al. "Neural spline flows", in Advances in neural information processing systems, 32, 2019, referred to as reference [1] in the following, describes monotonic rational-quadratic splines, which enhance the flexibility of both coupling and autoregressive transforms while retaining analytic invertibility. According to various embodiments, a method for learning robot device dynamics is provided, comprising providing demonstrations for movements of a robot device, wherein each demonstration demonstrates dynamics of the robot device by indicating a sequence of demonstrated states of the robot device in an ambient space, for each demonstration, • encoding the demonstrated states of the sequence of demonstrated states of the demonstration to encoded demonstrated states in a latent space by an encoding function which maps states from the ambient space to the latent space,• determining latent space velocities at the encoded demonstrated states according to predetermined dynamics in the latent space,• determining predicted velocities at the demonstrated states in ambient space from the determined latent space velocities at the encoded demonstrated states according to the Jacobian of an inverse of the encoding function and• determining a loss for the demonstration including a prediction loss determined from a difference of the predicted velocities at the demonstrated states and the demonstrated velocities and training the encoding function to reduce a total loss including the losses determined for at least some of the demonstrations. In other words, rather than training the dynamics in latent space, predetermined latent space dynamics are used and what is trained is the encoding function (and thus, equivalently, the decoding function, i.e. the inverse of the encoding function), i.e. the mapping between the ambient space and the latent space. In the end, dynamics are learned in this manner because for controlling the robot device states of the robot device may be mapped to the latent space using the trained encoding function, velocities in latent space may be determined and using an the Jacobian of an inverse of the encoding function (like in training) velocities in ambient space for controlling the robot device may be determined). The method described above allows effectively learning complex contractive dynamics, making it well-suited for various robotic applications. In particular, according to various embodiments, a learning approach denoted as Injective Contractive Flow (ICF) with contraction guarantees is provided. It provides explicit control over the contraction rate and contraction ratio, allowing for precise adjustments to the stability and contractive properties of the learned dynamics, via a simple canonical latent dynamics, which is then transformed via a diffeomorphic mapping (e.g., a normalizing flow) so that the resulting dynamics matches the desired observed velocities. This ensures that complex non-linear contractive dynamical systems can be learned and reproduced. Moreover, the learning approach according to various embodiments incorporates injective functions that map the canonical dynamics to high-dimensional systems. This allows the learning of high-dimensional contractive dynamical systems within a low-dimensional latent space. Additionally, this method is capable of learning dynamics on the Lie group (3), which can be used to model orientation dynamics. This extends the applicability of the learning method to scenarios involving rotational movements such as robot's end-effector motion. In the following, various examples are given. Example 1 is a method for method for learning robot device dynamics as described above. Example 2 is the method of example 1, wherein the predetermined dynamics in the latent space are contractive. Thus, stability of control when using the learned robot device dynamics is ensured. Example 3 is the method of example 2, wherein the predetermined dynamics in the latent space are given by a matrix with predetermined eigenvalues. This allows easily setting the contraction behaviour. The learned dynamics can be achieve