CN-121589819-B - Control method and device for robot end actuating mechanism

CN121589819BCN 121589819 BCN121589819 BCN 121589819BCN-121589819-B

Abstract

The application discloses a control method and a control device for a robot tail end executing mechanism. The method comprises the steps of receiving a task to be operated and a point cloud image corresponding to an end execution structure at a target moment, predicting the task to be operated and the point cloud image by adopting a prediction model in a pre-trained action model to obtain an output result, wherein the output result is used for representing a three-dimensional bottleneck representation of a motion track of the end execution structure, analyzing the output result by adopting a strategy model in the action model to generate an action sequence of the end execution structure, and controlling the end execution structure to execute operations corresponding to the action sequence. The method at least solves the technical problem that in the related art, the generation process of the action sequence is complex because the action sequence of the tail end execution structure is mainly generated based on a large amount of modeling data.

Inventors

LI XUELONG
Bai Chenjia
YANG SIYUAN
ZHANG YANG
ZHANG CHI

Assignees

中电信人工智能科技(北京)有限公司

Dates

Publication Date: 20260512
Application Date: 20260126

Claims (9)

1. A method for controlling an end effector of a robot, comprising: receiving a task to be operated and a point cloud image corresponding to an end execution structure at a target moment; Predicting the task to be operated and the point cloud image by adopting a prediction model in a pre-trained action model to obtain an output result, wherein the output result is used for representing a three-dimensional bottleneck representation of a motion track of the end execution structure, the prediction model is determined by acquiring a first training data set, wherein the first training data set at least comprises the point cloud image corresponding to the end execution structure and sensor data of the end execution structure, and the data in the first training data set does not contain an action label; Training the prediction model by adopting the first training data set to obtain a trained prediction model, wherein the training comprises the steps of encoding a point cloud image corresponding to the end execution structure and sensor data of the end execution structure to obtain track vectors, splicing a plurality of track vectors according to time sequence to obtain a hidden space feature track set, sampling a plurality of hidden space feature fragments with fixed length from the hidden space feature track set, wherein the hidden space feature fragments comprise a history part and a future part; Predicting the hidden space feature fragments after the mask processing by adopting the bidirectional encoder to obtain a three-dimensional bottleneck representation; Analyzing the output result by adopting a strategy model in the action model to generate an action sequence of the tail end execution structure; and controlling the end execution structure to execute the operation corresponding to the action sequence.
2. The method of claim 1, wherein the action model is determined by: Acquiring a second training data set, wherein the second training data set at least comprises a historical action sequence of the end execution structure and a motion track of the end execution structure; Determining conditional input data according to an output result of the prediction model; and training the strategy model by adopting the second training data set and the condition input data to obtain a trained strategy model.
3. The method of claim 1, wherein training the predictive model using the masked latent spatial feature segments to obtain a trained predictive model comprises: And decoding the three-dimensional bottleneck representation and a target hidden space feature segment by adopting the bidirectional decoder to obtain a predicted track, wherein the target hidden space feature segment represents that the future part in the hidden space feature segment is completely occluded.
4. The method of claim 2, wherein determining conditional input data based on the output of the predictive model comprises: Acquiring a motion sub-track segment of an end execution structure with a fixed length from the second training data set; Performing feature extraction from a history part of the motion sub-track segment of the end execution structure by using a point cloud encoder to obtain a history vector; And determining the condition input data according to the history vector and the output result of the prediction model.
5. The method of claim 4, wherein training the policy model using the second training data set and the condition input data to obtain a trained policy model comprises: adding noise to the action sequence corresponding to the motion sub-track segment of the tail end execution structure to obtain a target action sequence; and training the strategy model by adopting the target action sequence and the condition input data to obtain a trained strategy model.
6. The method of claim 1, wherein analyzing the output results using a policy model of the action models to generate a sequence of actions for an end-effector structure comprises: acquiring a historical three-dimensional representation vector before the target moment; Determining input condition data of the target moment according to the output result and the historical three-dimensional representation vector; And analyzing the input condition data at the target moment by adopting the strategy model to obtain the action sequence of the end execution structure.
7. A control device for an end effector of a robot, comprising: the receiving module is used for receiving the task to be operated and the point cloud image corresponding to the end execution structure at the target moment; The prediction module is used for predicting the task to be operated and the point cloud image by adopting a prediction model in a pre-trained action model to obtain an output result, wherein the output result is used for representing a three-dimensional bottleneck representation of a motion track of the end execution structure, the prediction model is determined by acquiring a first training data set, wherein the first training data set at least comprises the point cloud image corresponding to the end execution structure and sensor data of the end execution structure, and the data in the first training data set does not comprise an action label; Training the prediction model by adopting the first training data set to obtain a trained prediction model, wherein the training comprises the steps of encoding a point cloud image corresponding to the end execution structure and sensor data of the end execution structure to obtain track vectors, splicing a plurality of track vectors according to time sequence to obtain a hidden space feature track set, sampling a plurality of hidden space feature fragments with fixed length from the hidden space feature track set, wherein the hidden space feature fragments comprise a history part and a future part; Predicting the hidden space feature fragments after the mask processing by adopting the bidirectional encoder to obtain a three-dimensional bottleneck representation; The generation module is used for analyzing the output result by adopting a strategy model in the action model and generating an action sequence of the tail end execution structure; and the operation module is used for controlling the end execution structure to execute the operation corresponding to the action sequence.
8. A computer device comprising a memory for storing program instructions and a processor, coupled to the memory, for performing the method of controlling the robotic end effector of any one of claims 1 to 6.
9. A computer program product comprising computer instructions which, when executed by a processor, implement a method of controlling a robotic end effector as claimed in any one of claims 1 to 6.

Description

Control method and device for robot end actuating mechanism Technical Field The application relates to the technical field of artificial intelligence, in particular to a control method and a control device for a robot tail end executing mechanism. Background The robotic end effector (smart hand) presents a significant challenge to the task of operation due to its high dimensional motion space. Conventional control methods typically use optimization methods, typically focusing on a narrow task window, such as grasping. Recently, the end-to-end approach has made significant progress in smart operation. Reinforcement learning has become a popular method by which data is generated through elaborate reward functions and continuous interactions with the environment, constantly updating learning data sets and strategy networks, and eventually enabling a dexterous hand to accomplish tasks such as playing piano, redirection of objects in the hand, and dynamic throwing. Although reinforcement learning based approaches achieve some results, some challenges remain. For deployment in the Real world, a corresponding Sim-to-Real (from simulation to reality) approach must be designed to do the strategy adaptation problem for Real robots. Furthermore, the reliance of reinforcement learning algorithms on reward designs limits their applicability in certain scenarios. Imitation learning is another widely used method, and in particular, the introduction of diffusion models enables imitation learning algorithms to model more rich robot trajectory information. However, the key problem is still faced in that the imitation learning approach relies on a large amount of teleoperational high quality data, while there are great difficulties in modeling high-and multimodal sequences of actions. The existing simulated learning paradigm mainly predicts the future track of the robot based on 2D image data, but the robot track is rich in three-dimensional space information and is generally characterized by using other modal data such as point cloud and the like. Disclosure of Invention The embodiment of the application provides a control method and a control device for a robot tail end executing mechanism, which at least solve the technical problem that the generating process of a tail end executing structure action sequence is complex because the action sequence is generated mainly based on a large amount of modeling data in the related technology. According to one aspect of the embodiment of the application, a control method of a robot end execution mechanism is provided, and the control method comprises the steps of receiving a task to be operated and a point cloud image corresponding to an end execution structure at a target moment, predicting the task to be operated and the point cloud image by adopting a prediction model in a pre-trained action model to obtain an output result, wherein the output result is used for representing a three-dimensional bottleneck representation of a motion track of the end execution structure, analyzing the output result by adopting a strategy model in the action model to generate an action sequence of the end execution structure, and controlling the end execution structure to execute an operation corresponding to the action sequence. Optionally, the action model is determined by acquiring a first training data set, wherein the first training data set at least comprises a point cloud image corresponding to the end execution structure and sensor data of the end execution structure, the data in the first training data set does not comprise an action tag, training the prediction model by adopting the first training data set to obtain a trained prediction model, wherein the prediction model comprises a bidirectional encoder and a bidirectional decoder, acquiring a second training data set, wherein the second training data set at least comprises a historical action sequence of the end execution structure and a motion track of the end execution structure, determining condition input data according to an output result of the prediction model, and training the strategy model by adopting the second training data set and the condition input data to obtain the trained strategy model. Optionally, training the prediction model by adopting the first training data set to obtain a trained prediction model, wherein the training comprises the steps of encoding a point cloud image corresponding to the tail end execution structure and sensor data of the tail end execution structure to obtain a track vector; the method comprises the steps of splicing a plurality of track vectors according to a time sequence to obtain a hidden space feature track set, sampling a plurality of hidden space feature fragments with fixed lengths from the hidden space feature track set, wherein the hidden space feature fragments comprise a history part and a future part, masking the future part in the hidden space feature fragments according to a pr