CN-114137950-B - Method and equipment for performing social perception model predictive control on robot equipment

CN114137950BCN 114137950 BCN114137950 BCN 114137950BCN-114137950-B

Abstract

The invention relates to a method and equipment for performing social perception model predictive control on robot equipment. The invention relates to a computer-implemented method for determining a control trajectory of a robotic device (1), comprising the steps of-performing (S1-S8) an information theory model predictive control, thereby applying a control trajectory sample prior (u * ) in each time step to obtain a control trajectory for a given time range (t f ), -determining (S21) a control trajectory sample prior (u * ) in dependence of a data-driven trajectory predictive model, the trajectory predictive model being trained to output a control trajectory sample as a control trajectory sample prior (u * ) based on an actual state of the robotic device (1).

Inventors

A. Ludenko
PALMIERI LUCA
K. O. Arras

Assignees

罗伯特·博世有限公司

Dates

Publication Date: 20260505
Application Date: 20210811
Priority Date: 20200812

Claims (8)

1. A computer-implemented method for determining a control trajectory of a robotic device (1), comprising the steps of: -performing (S1-S8) information theory model predictive control, thereby applying a control trajectory sample prior (u * ) in each time step to obtain a control trajectory for a given time range (t f ); Determining (S21) a control track sample prior (u * ) in dependence on a data-driven track prediction model, the track prediction model being trained to output a control track sample as a control track sample prior (u * ) based on an actual state of the robotic device (1), Wherein the control track sample prior (u * ) is obtained by modeling a control track sample obtained from a track prediction model and a control track sample obtained in a last time step, Wherein the control track sample prior (u * ) is obtained by summing the modeled control track samples obtained by the track prediction model and the control track samples obtained in the last time step, in particular each weighted according to their track cost according to a given cost function, Wherein the modeled control trajectory sample is considered only if the trajectory cost of the modeled control trajectory sample is higher than the trajectory cost of the control trajectory sample obtained in the last time step.
2. The method according to claim 1, wherein the information theory model predictive control iteratively evaluates the number of control track samples derived from a control track sample prior (u * ) based on a given distribution at each time step to obtain a further control track sample, wherein the further control track sample is determined in dependence on a combination of a plurality of weighted control track samples, wherein in particular the weight is determined based on a cost of each of the plurality of control track samples.
3. The method according to claim 1 or 2, wherein the data-driven trajectory prediction model comprises a machine learning model, in particular a neural network, in particular with respect to the type of one of a flexible actuation-evaluation network, a trust zone network, a policy optimization network, a near-end policy optimization network, a depth deterministic policy gradient network.
4. The method according to claim 1 or 2, wherein the robotic device (1) is controlled to act according to the determined control trajectory.
5. A control unit (11), such as a data processing device, for determining a control trajectory of a robotic device (1), the control unit (11) being configured to perform the steps of: -performing an information theory model predictive control, thereby applying a control track sample prior (u * ) in each time step to obtain a control track for a given time range (t f ); Determining a control track sample prior (u * ) in dependence on a data-driven track prediction model, which is trained to output the control track sample as a control track sample prior (u * ) based on the actual state of the robotic device (1), Wherein the control track sample prior (u * ) is obtained by modeling a control track sample obtained from a track prediction model and a control track sample obtained in a last time step, Wherein the control track sample prior (u * ) is obtained by summing the modeled control track samples obtained by the track prediction model and the control track samples obtained in the last time step, in particular each weighted according to their track cost according to a given cost function, Wherein the modeled control trajectory sample is considered only if the trajectory cost of the modeled control trajectory sample is higher than the trajectory cost of the control trajectory sample obtained in the last time step.
6. Robot device (1), comprising: -an actuation unit configured to move the robotic device (1) according to a control trajectory; -a control unit (11) according to claim 5.
7. A computer program product comprising instructions which, when the program is executed by a computer, cause the computer to perform the steps of the method of any of claims 1 to 4.
8. A machine-readable medium comprising instructions which, when executed by a computer, cause the computer to perform the steps of the method of any of claims 1 to 4.

Description

Method and equipment for performing social perception model predictive control on robot equipment Technical Field The present invention relates to model predictive control for planning trajectories of robotic devices. Background For autonomous control of the robotic device, motion planning is used to efficiently accomplish navigation tasks. Based on the sensor inputs, the actual state of the robotic device and its environment is determined, while trajectories or motion paths are developed taking into account also dynamic obstacles, such as moving objects or individuals in the environment, respectively. In general, model Predictive Control (MPC) is an efficient technique to solve the problem of open loop optimal control based on a rolling horizon (receding horizon) model. Classical model predictive control techniques work well when the target is the stabilization of a constraint system around a balance point or trajectory. In g. williams et al, "information theory for model-based reinforcement learning MPC (Information theoretic MPC for model-based reinforcement learning)", international robotics and automated conferences, 2017, an information theory approach is disclosed to overcome some of the natural limitations of standard MPC techniques. In contrast to standard MPC techniques, information theory model predictive control (IT-MPC) may operate with any system dynamics and any nonlinear cost definition in mind. Disclosure of Invention According to the invention, a method for planning a trajectory (in particular of a robotic device) according to claim 1 is provided, as well as a control unit and a robotic device according to further independent claims. Further embodiments are indicated in the dependent claims. According to a first aspect, there is provided a computer-implemented method for determining a control trajectory of a robotic device, comprising the steps of: -performing information-theoretical model predictive control, thereby applying a control-track sample prior in each time step to obtain a control track for a given time range; -determining a control track sample prior in dependence of a data-driven track prediction model trained to output a control track sample as a control track sample prior in dependence of an actual state of the robotic device. In general, the IT-MPC generates an open loop sequence of sampling controls by minimizing the Kelly-Leibutler (KL) divergence between the current control distribution and the optimal control distribution derived from the desired cost function. The control samples of the IT-MPC are generated from a normal distribution centered around the previous control sequence. Thus, the method works well as long as the optimal control sequence changes only slightly from one step to another. However, if the optimal control sequence changes significantly, for example due to a new target position or unexpected occurrence of a dynamic obstacle, local sampling around the previous control sequence may lead to poor convergence of the optimal control sequence. Thus, standard IT-MPCs are often unsuitable for navigation of mobile robots in crowded environments. Furthermore, an informed variation of the theoretical model predictive control may iteratively evaluate a plurality of control trajectory samples derived a priori from the control trajectory samples based on a given distribution at each time step to obtain further control trajectory samples, wherein the further control trajectory samples are determined in dependence on a combination of a plurality of weighted control trajectory samples, wherein the weights are determined based on a cost of each of the plurality of control trajectory samples. It may be provided that the data-driven trajectory prediction model comprises a neural network, in particular with respect to the type of one of the flexible actuation-evaluation network, the trust zone network, the policy optimization network, the proximal policy optimization network, the depth deterministic policy gradient network. One way to overcome the above limitations on poor convergence of the optimal control sequence is to use an informed sampling process by providing a predicted control trajectory as a control trajectory prior using a deep reinforcement learning algorithm. This may basically include a data driven trajectory prediction model as a machine learning model that will generate a control trajectory prior based on the current state of the environment including the robot pose and the detected pose of the obstacle, where the pose includes the position and orientation of the robot or a portion of the robot within the environment. The data-driven trajectory prediction model is trained to estimate a control trajectory prior, thereby applying a training dataset containing optimized control trajectories for different states of the environment. The optimized control trajectory may be provided by any kind of trajectory optimization process of the robotic device in a crowde