CN-121973218-A - Robot self-adaptive track generation method and system

CN121973218ACN 121973218 ACN121973218 ACN 121973218ACN-121973218-A

Abstract

The invention discloses a method and a system for generating a self-adaptive track of a robot, and belongs to the technical field of intelligent robot control; the method comprises the steps of firstly extracting input-output joint distribution from a plurality of groups of expert demonstration tracks, constructing a probability reference track containing mean and covariance through GMM and GMR, further deriving a non-parameterized KMP prediction model based on an information theory principle, and directly outputting a smooth track and uncertainty of the smooth track for any high-dimensional input. On the basis, the invention defines key adjustable parameters in the KMP model as the learnable variables and embeds the learnable variables into a reinforcement learning strategy framework. Through setting the reward function and utilizing the reinforcement learning algorithm to perform online optimization, the generated track can continuously improve the performance index of a specific task while keeping the demonstration style of an expert, and the self-adaptive track generation of the robot in a complex dynamic scene is realized, so that the sample efficiency, the task performance and the operation safety are considered.

Inventors

Dou Yahui
CUI YU
WANG XINHUA
LIU YIYANG
LIU PU

Assignees

中国机械总院集团郑州机械研究所有限公司

Dates

Publication Date: 20260505
Application Date: 20260205

Claims (6)

1. A method for generating a robot adaptive trajectory, comprising: s1, acquiring a plurality of groups of expert demonstration tracks, wherein the expert demonstration tracks consist of input variable sequences and corresponding robot output variable sequences; s2, carrying out Gaussian mixture modeling on the joint distribution of the expert demonstration tracks, and extracting a probability reference track through Gaussian mixture regression, wherein the probability reference track comprises an output mean vector and a covariance matrix corresponding to each input point; s3, constructing a kernel function, and deducing a non-parameterized nucleated motion primitive prediction model by minimizing KL divergence based on the probability reference track and the kernel function, wherein the non-parameterized nucleated motion primitive prediction model is used for predicting the mean value and covariance of output of any query input; S4, identifying and extracting a group of adjustable parameters sensitive to track performance from the nucleated motion primitive model as a learnable variable, wherein the learnable variable comprises at least one of kernel function bandwidth, covariance scaling factor and local coordinate system offset; S5, packaging the nucleated motion primitive model into a strategy function which can be differentiated about the learnable variables, and defining a reward function according to a target task; s6, interacting with the environment through a reinforcement learning algorithm, and optimizing the learnable variable by using the obtained reward signal; And S7, applying the optimized learnable variable to the nucleated motion primitive model for generating a robot motion track meeting the task performance index in real time.
2. The method of claim 1, wherein in S1, the input variable sequence comprises at least one of time, robot joint position, robot end effector speed, acceleration, pose, external sensor signals, or user input.
3. The robot adaptive trajectory generation method according to claim 1, wherein in S3, the kernel function includes a linear kernel, a polynomial kernel, or a radial basis function kernel.
4. The method of claim 1, wherein in S6, the reinforcement learning algorithm comprises a near-end strategy optimization, soft actor-critique, or depth deterministic strategy gradient.
5. The method of claim 1, further comprising encoding new task constraints as gaussian distributions and adding the new task constraints as new added data points to a reference trajectory database when the new task constraints are received, wherein confidence weights of covariance matrices of the gaussian distributions are defined as additional learnable variables and optimized together in S7.
6. A robot-adaptive trajectory generation system, comprising: the acquisition module is used for acquiring a plurality of groups of expert demonstration tracks, wherein the expert demonstration tracks consist of input variable sequences and corresponding robot output variable sequences; The Gaussian mixture modeling module is used for carrying out Gaussian mixture modeling on the joint distribution of the expert demonstration track and extracting a probability reference track through Gaussian mixture regression, wherein the probability reference track comprises an output mean vector and a covariance matrix corresponding to each input point; the kernel function construction module is used for constructing a kernel function, deducing a non-parameterized nucleated motion primitive prediction model by minimizing KL divergence based on the probability reference track and the kernel function, and predicting the mean value and covariance of output for any query input; The identification and extraction module is used for identifying and extracting a group of adjustable parameters sensitive to track performance from the nucleated motion primitive model as a learnable variable, wherein the learnable variable comprises at least one of kernel bandwidth, covariance scaling factor and local coordinate system offset; The definition module is used for packaging the nucleated motion primitive model into a strategy function which can be differentiated about the learnable variables and defining a reward function according to a target task; the optimizing module is used for interacting with the environment through a reinforcement learning algorithm and optimizing the learnable variable by using the winning signal; The track generation module is used for applying the optimized learnable variable to the nucleated motion primitive model and generating the motion track of the robot meeting the task performance index in real time.

Description

Robot self-adaptive track generation method and system Technical Field The invention relates to the technical field of intelligent robot control, in particular to a method and a system for generating a self-adaptive track of a robot. Background In the field of intelligent robots, how to enable robots to quickly learn human skills and to perform autonomous optimization according to actual task demands is a core challenge for realizing real intelligent operation. Currently, the main technical routes are divided into two major categories, namely imitative learning and reinforcement learning. Imitation learning (Imitation Learning) methods, such as Dynamic Motion Primitives (DMP), probabilistic motion primitives (ProMP), and nucleated motion primitives (KMP), can efficiently reproduce complex motor skills from a small number of expert demonstrations. Among them, KMP is one of the most advanced imitation learning tools at present due to its non-parameterized nature, natural support for high-dimensional input, and flexible online constraint injection mechanism. However, the fundamental limitation of mimicking learning is its "passive reproduction" property-it can only learn "how to do it" and cannot understand "why to do it" or "how to do it better". When there is sub-optimal demonstration itself, or a task environment/goal is changed, pure imitation often cannot guarantee optimal performance. Reinforcement learning (Reinforcement Learning, RL) then provides a paradigm of "active optimization". Through trial and error interactions with the environment and rewards feedback, the RL can learn a strategy that maximizes long-term return. Deep reinforcement learning has been successful in many complex control tasks. However, the application of the method in the robot field faces serious challenges that the sample efficiency is extremely low due to a high-dimensional continuous action space, safety risks can be caused by random exploration, the training process is unstable, and the reliable strategy is difficult to converge. To combine the advantages of both, researchers have proposed various fusion schemes such as initializing the RL strategy using Behavioral Cloning (BC) or adding a simulated regularization term to the RL loss function. However, these approaches typically treat the imitation module as a fixed, black box-like prior, failing to take full advantage of the rich, interpretable, structured parameters inside advanced imitation models such as KMP. The prior art lacks a mechanism, and can take the style or adaptability of the KMP model as a direct optimization object of the RL, so that the task performance is accurately improved while the demonstration manifold is reserved. In summary, existing imitative learning and reinforcement learning methods still have significant shortcomings in terms of: Optimizing and priori fracturing, namely providing priori by imitative learning, strengthening the parameters of the study responsible for optimizing but lacking in deep coupling between the two; Sample efficiency is bottleneck, namely, the pure reinforcement learning method needs a large number of trial and error interactions on a real robot to converge, and each interaction consumes a long time and has high risk, so that the sample efficiency is extremely low, and the method is difficult to apply to actual tasks. Task performance blind areas that the pure imitation learning method cannot be used for carrying out directional optimization aiming at specific engineering indexes (such as energy consumption, precision and flexibility); Security and stability random exploration of RL is risky in physical systems, lacking security constraints from demonstration. Therefore, a method and a system for generating a self-adaptive track of a robot, which can realize high efficiency, high robustness and task-oriented performance by finely adjusting the intrinsic parameters of a KMP structural model by reinforcement learning, are needed to be provided. Disclosure of Invention The invention provides a method and a system for generating a self-adaptive track of a robot, which can take a KMP structured model as a framework, finely adjust internal parameters of the model through reinforcement learning, and realize the self-adaptive track generation of the robot in a complex dynamic scene, thereby taking sample efficiency, task performance and operation safety into account. In order to achieve the above purpose, the invention adopts the following technical scheme: the invention provides a robot self-adaptive track generation method, which comprises the following steps: s1, acquiring a plurality of groups of expert demonstration tracks, wherein the expert demonstration tracks consist of input variable sequences and corresponding robot output variable sequences; s2, carrying out Gaussian mixture modeling on the joint distribution of the expert demonstration tracks, and extracting a probability reference track through Gaussian mixture regr