CN-121973230-A - Double-arm robot control method based on single-arm track priori guidance

CN121973230ACN 121973230 ACN121973230 ACN 121973230ACN-121973230-A

Abstract

The invention discloses a double-arm robot control method based on single-arm track priori guidance, which comprises the following steps of obtaining a double-arm robot data set, obtaining skill primitives of different action types based on a single-arm action prediction model, selecting the skill primitives of different action types for a left arm and a right arm, respectively predicting the next action intention of the left arm and the right arm, constructing a double-arm diffusion generation model for establishing a mapping relation from initial action distribution to real double-arm cooperative action distribution, training the single-arm action prediction model and the double-arm diffusion generation model based on a double-arm robot training data set, obtaining the action prediction priors of the left arm and the right arm, constructing Gaussian distribution taking the single-arm priori as a mean value and preset variance as a radius, obtaining initial action vectors from the Gaussian distribution by sampling, inputting the trained double-arm diffusion generation model, and denoising the initial action vectors to generate double-arm actions. The invention can give consideration to the independence of single-arm movement and the synergy of double-arm control, and optimize the double-arm action control effect.

Inventors

Xie Yugeng
XU RUOTAO
HUANG YAN
WU SI
XU YONG

Assignees

超级机器人研究院(黄埔)
华南理工大学

Dates

Publication Date: 20260505
Application Date: 20260324

Claims (10)

1. The double-arm robot control method based on single-arm track priori guidance is characterized by comprising the following steps of: Acquiring a double-arm robot data set; Obtaining skill primitives of different action types based on a single-arm action prediction model, selecting the skill primitives of different action types for the left arm and the right arm according to double-arm instructions, and respectively predicting the next action intentions of the left arm and the right arm; constructing a double-arm diffusion generation model for establishing a mapping relation from initial action distribution to real double-arm cooperative action distribution; training the single-arm motion prediction model based on the double-arm robot training data set; training a double-arm diffusion generation model based on a double-arm robot training data set; Acquiring action prediction prior of a left arm and a right arm based on the trained single-arm action prediction model; the method comprises the steps of constructing Gaussian distribution taking single-arm motion prediction priori as a mean value and preset variance as a radius, sampling from the Gaussian distribution to obtain an initial motion vector, inputting a trained double-arm diffusion generation model, and denoising the initial motion vector to generate double-arm motion.
2. The method for controlling a double arm robot based on a priori guidance of a single arm trajectory according to claim 1, wherein the dual arm robot dataset comprises serialized multimodal information And natural language instructions, wherein, Representing environmental observations, including a plurality of visual images, Indicating the joint motion of the arms including the end positions of the left and right arms.
3. The method for controlling a double-arm robot based on a priori guidance of a single-arm trajectory according to claim 1, wherein skill primitives of different action types are selected for the left and right arms according to double-arm instructions, and the next action intents of the left and right arms are respectively predicted, specifically comprising: Constructing a visual alignment module, wherein the input of the visual alignment module is three-dimensional space voxel characteristics obtained by image coding and projecting a current observation image into a three-dimensional space, and outputting left arm observation characteristics and right arm observation characteristics; Constructing an action selection module, wherein the action selection module receives the space voxel feature vector from the visual alignment module and the task instruction feature from the language encoder, and outputs a left-arm language feature and a right-arm language feature; and generating action prior of the left arm and the right arm respectively based on the single-arm strategy network.
4. The method for controlling a double arm robot based on single arm trajectory prior guidance according to claim 3, wherein in the vision alignment module, a left arm mask and a right arm mask are generated through a mask generator, the left arm mask and the right arm mask are multiplied by an element level respectively with an original input voxel feature vector, and features after mask processing are added to the original input voxel feature vector respectively, so as to obtain a left arm observation feature and a right arm observation feature, which are specifically expressed as: ; ; ; ; ; Wherein, the Representing the current observation image of the input, A representation mask generator is provided which, Representing training parameters, viT represents an image encoder, A dot-by-dot operation is represented, The operation of the splice is indicated and, Representing the mask of the left arm, Representing the left arm observation feature after masking, Represents the observation characteristics of the left arm after the splicing, The right-arm mask is represented as such, Representing the left arm observation feature after masking, Representing the spliced right arm observation characteristics; The loss function is calculated as follows: ; Wherein, the Indicating KL divergence.
5. The method for controlling the double-arm robot based on the single-arm track priori guidance according to claim 3, wherein in the action selection module, a voxel feature sequence is obtained through linear projection and pooling processing, and the voxel feature sequence and a task instruction feature sequence are spliced in parallel in time sequence to obtain a fusion feature sequence; Inputting the fusion characteristic sequence into a transducer model, and mapping the output of the transducer model to a primitive category space through a linear layer, wherein the dimension of the primitive category space corresponds to the total number of action categories; constructing a skill primitive library, storing action semantic embedded vectors corresponding to action categories, and carrying out weighted linear combination on the vectors in the pre-constructed skill primitive library to obtain language features of a left arm and a right arm, wherein the language features are specifically expressed as follows: ; ; ; Wherein, the Representing the observation characteristics of the left arm, Which represents the characteristics of the task instruction, The weight predictor is represented by a number of the weights, Representing the training parameters of the training device, The weight of the left arm is indicated, Representing the task oriented compensation of the left arm, Primitives representing actions of different types, Representing the combined left-arm language features, Representing the observation characteristics of the right arm, The weight of the right arm is represented as, Representing the task oriented compensation of the right arm, Representing the combined right arm language features; The loss function is calculated as follows: ; In the formula, The L1 norm is represented by the expression, The L21 norm is indicated as such, Is an adjustable balance parameter.
6. The method for controlling a double-arm robot based on single-arm track priori guidance according to claim 3, wherein the method for generating the action priors of the left arm and the right arm based on the single-arm strategy network respectively comprises the following steps: Taking the left arm observation feature and the left arm language feature as input pairs, taking the right arm observation feature and the right arm language feature as input pairs, and simultaneously inputting the current sensing state of the mechanical arm body; Semantic association and spatial alignment are carried out through the cross attention layer, the pose and clamping jaw state of the end effector are predicted through probability distribution, discretized motion prediction is output, and the motion prior of the left arm and the right arm is obtained.
7. The method for controlling a double-arm robot based on single-arm track prior guidance according to claim 1, wherein the method for constructing a double-arm diffusion generation model is used for establishing a mapping relation from initial motion distribution to real double-arm cooperative motion distribution, and specifically comprises the following steps: inputting the current body sensing state of the mechanical arm into a state MLP to obtain a body sensing Token; Inputting the prior of the single-arm actions of the left arm and the right arm into a noise track MLP to obtain an action prior Token; Independently encoding images of all visual angles of the double-arm robot by utilizing a pre-trained visual encoder, extracting a 2D feature map, and back-projecting each pixel point on the 2D feature map into a 3D world coordinate system of the robot; inputting a natural language instruction into a pre-training text encoder, and extracting a language instruction Tokens; And constructing a 3D stream matching transducer model, taking a noise track Token as a query based on a cross attention module, sequentially taking a 3D scene Visual Tokens O and language instruction features as keys and values, processing interaction between the noise track Token and other internal features based on a self attention module, and obtaining final double-arm actions based on a feedforward neural network.
8. The method for controlling a single arm robot based on a priori guidance of a single arm trajectory of claim 3, wherein the single arm motion prediction model is trained based on a training dataset of the double arm robot, and the loss function is expressed as: ; ; Wherein BCE represents the cross entropy loss of behavioural cloning, Representing the gap between the real expert action and the predicted action, And In order to be able to adjust the parameters, And The loss functions of the visual alignment module and the motion selection module, respectively.
9. The method for controlling a double-arm robot based on single-arm trajectory prior guidance according to claim 1, wherein the training of the double-arm diffusion generation model based on the double-arm robot training data set specifically comprises: Inputting environmental observation data and language instructions into a single-arm motion prediction model with frozen parameters, and respectively outputting predicted motion tracks of a left arm and a right arm; superimposing random noise obtained from Gaussian distribution sampling on the predicted action tracks of the left arm and the right arm; and randomly replacing the input predicted motion trajectories of the left arm and the right arm with all-zero vectors with set probability.
10. The method for controlling the double-arm robot based on the single-arm track prior guidance according to claim 1, wherein a conditional flow matching algorithm is adopted to train a double-arm diffusion generation model.

Description

Double-arm robot control method based on single-arm track priori guidance Technical Field The invention relates to the technical field of robot control, in particular to a double-arm robot control method based on single-arm track priori guidance. Background The existing control method of the double-arm mechanical arm mainly comprises a rule-based planning method, an optimization-based motion planning method and a learning-based method. Rule-based or optimization methods typically rely on accurate environmental modeling and kinetic constraints to avoid collisions and complete tasks by jointly planning the motion trajectories of the arms. The method can obtain higher success rate under the known environment and definite task constraint, but has higher requirements on the environment perception precision and model accuracy, and once the environment changes or the task condition changes, the parameters are often required to be remodelled and adjusted, and the flexibility and the adaptability are lacked. In addition, joint planning of double-arm trajectories generally results in a significant increase in search space dimensions, high computational complexity, and difficulty in meeting real-time requirements. In recent years, with the development of deep learning, a learning method based on data driving is gradually introduced into the control of a double arm mechanical arm. The simulation learning and reinforcement learning are widely used for learning the double-arm operation strategy from demonstration data or interaction data, and the method models double-arm actions as a whole, so that the mechanical arm can autonomously learn complex operation behaviors to a certain extent, and the dependence on accurate modeling and manual rule design is reduced. However, the existing two-arm learning method still has the following defects, which limit the application effect in the actual scene: (1) The data acquisition cost is high due to high dependence on the demonstration data of the two arms. In addition, the space dimension of double-arm action is high, the combination is complex, so that a limited quantity of demonstration data is difficult to cover all possible cooperative modes, thereby reducing the data utilization efficiency and limiting the generalization capability of an algorithm; (2) The prior control architecture is difficult to balance the rationality of single arm movement and the coordination of two arms, in the task of two-arm operation, space constraint and potential conflict inevitably exist between the mechanical arms and the environment, however, the prior art often falls into two difficulties in processing the problem that firstly, the end-to-end direct coordination is carried out, a part of methods try to directly generate complete joint action tracks in the action space of the Gao Weishuang arms, the movement generation of the single arms and the coordination adjustment of the two arms are strongly coupled in the same learning process, the modeling mode forces a model to simultaneously process the rationality (such as smoothness and reaching targets) of the single arm movement and the geometric coordination relation between the two arms in one prediction, the model learning burden is excessive, convergence is difficult, the model is easy to consider each other, and the model is easy to separate from each other in a layering mode, a special coordination module is designed, the control and the coordination module is used for scheduling or revising the independent single arm strategy to forcedly separate the control from the coordination, the method reduces the dimension, but omits the inherent deep connection between the action characteristics of the single arm bottom layer and the high-layer coordination logic, the intention of the single arm is often not accurately understood, the blocking effect of the single arm is easy to execute, and the dislocation or the dislocation is easy to occur. Disclosure of Invention Aiming at overcoming the defects and shortages of the prior art and solving the problem that the prior art has scarce double-arm data and the prior architecture is difficult to balance between direct cooperation and layering separation, the invention provides a double-arm robot control method based on single-arm track priori guidance, which does not directly output double-arm actions, is not simple single-arm superposition, but uses single-arm action prediction as priori information to guide double-arm action generation, specifically, uses single-arm strategy to predict single-arm intention actions, and injects the prediction result as priori distribution into the initial stage of a double-arm action generation model. In order to achieve the above purpose, the present invention adopts the following technical scheme: the invention provides a double-arm robot control method based on single-arm track priori guidance, which comprises the following steps: Acquiring a double-arm robot data set