CN-121980909-A - Self-adaptive positioning method for data-driven micro-assembly robot

CN121980909ACN 121980909 ACN121980909 ACN 121980909ACN-121980909-A

Abstract

The invention discloses a self-adaptive positioning method and device of a data-driven micro-assembly robot, electronic equipment and a medium. The method comprises the following steps of S1, training in a simulation environment by adopting a PPO algorithm based on an existing physical model to obtain an initial action strategy network, S2, acquiring training data in an actual environment by utilizing the initial action strategy network to train a plurality of probability models, S3, performing iterative optimization on an action sequence by model prediction control based on the probability models obtained by training, executing optimized actions in the simulation environment, and re-acquiring a reinforcement learning training data set to iteratively update an action strategy, and S4, acquiring the training data in the actual environment again by utilizing the updated action strategy for further optimizing the probability models and circularly iterating until the action strategy meets the positioning performance requirement. The requirement of the assembly environment on hardware performance can be reduced, and the self-adaption capability and positioning accuracy of the micro-assembly robot under the complex environment are improved.

Inventors

WU YILONG
ZHANG XIANMIN
QIU ZHICHENG
WANG RIXIN

Assignees

华南理工大学

Dates

Publication Date: 20260505
Application Date: 20251215

Claims (10)

1. A self-adaptive positioning method of a data-driven micro-assembly robot is characterized by comprising the following steps: s1, training by adopting a near-end strategy optimization algorithm in a simulation environment based on an existing physical model to obtain an initial action strategy network; S2, acquiring training data in an actual environment by utilizing an initial action strategy network to train a plurality of probability models, wherein the probability models are used for approaching the dynamic behavior characteristics of the robot in the actual environment in a simulation environment; s3, constructing a candidate action sequence in a simulation environment based on an action strategy network, optimizing the action sequence by model prediction control based on a probability model obtained by training, executing optimized actions in the simulation environment, and collecting a reinforcement learning training data set again to update the action strategy iteratively; S4, training data are collected again in the actual environment by using the updated action strategy and used for further optimizing the probability model, and iteration is repeated until the action strategy meets the preset positioning performance requirement.
2. The method for adaptively positioning the micro-assembly robot according to claim 1, wherein the near-end strategy optimization algorithm in S1 takes the pose error of the tail end of the micro-assembly robot and a target positioning point as input, and the expression is: wherein Representing the position error in a cartesian coordinate system, Representing the quaternion error of the tip and target poses.
3. The method for adaptively positioning a micro-assembly robot according to claim 1, wherein the proximal strategy optimization algorithm in S1 takes the relative displacement of each joint motor as output, and uses the relative displacement as output by The function normalizes the output to ensure the stability of the output, and the expression is: wherein Indicating the relative displacement of the rotary joint, The relative displacement of each linear joint is shown.
4. The method for adaptively positioning a micro-assembly robot according to claim 1, wherein the reward of the near-end strategy optimization algorithm in S1 is composed of a plurality of terms expressed as: wherein: Representing a weight matrix; Representing the main pose rewards, the expression is: ; Wherein, the Representing a location prize, the expression is as follows: ; Wherein, the Representing the end of a micro-assembly robot and a target positioning point Error, the expression is: , And Respectively representing three-dimensional coordinates of the tail end of the micro-assembly robot and the target point in a world coordinate system, For fixed offset, for fine tuning, the whole function is divided into two parts, the first half is a staged reward, 、、、、、、、、、 Is a super parameter for adjusting the curvature of each position of the position rewarding curve, As a compliant function, for smoothing curve gradients, the expression is: The latter half is a continuous reward; The expression of the pose rewards is: ; Wherein, the Is the parameter of the ultrasonic wave to be used as the ultrasonic wave, For the angle error between the tail end of the micro-assembly robot and a target positioning point, the expression is as follows: for use in calculations 、 Respectively representing the quaternion of the tail end and the quaternion of the target pose of the assembly robot; The limit penalty is expressed as follows: ; Wherein, the And (3) with Are 3-dimensional vectors for defining a working space of the end of the micro-assembly robot, Representing the three-dimensional coordinates of the tips of the micro-assembly robot, Is an adjustable super parameter; representing a collision penalty, expressed as: , wherein, The contact forces in different directions at the tail end of the micro-assembly robot are directly obtained from the simulation environment information, Representing the 2-norm of the calculated contact force vector, Is a preset threshold.
5. The method for adaptively positioning a micro-assembly robot according to claim 1, wherein the probability model in S2 is defined as an average value of a plurality of probability neural network models, and the neural network is a multi-layer perceptron and combines probability output, and is expressed as: where k is the number of models, Representing an ith probabilistic neural network model, each probabilistic neural network model being capable of being represented by a current state With current motor action Under the condition that the state occurs The loss function is a weighted sum of the negative log likelihood and the mean square error of the state transition, and the expression is: ; Wherein, the As the weighting coefficient(s), For the network to output random samples of gaussian distribution, As a result of the data true value, Respectively representing the mean and variance of the state change of the model prediction.
6. The method for adaptively positioning a micro-assembly robot according to claim 1, wherein S3 comprises the steps of: S3.1, generating N candidate action sequences at each moment according to the current strategy network output and random noise; s3.2, utilizing the probability model set to parallelly deduct the future state of each action sequence, calculating expected cumulative rewards, and iteratively optimizing the action distribution mean value through an exponential weighting mechanism; and S3.3, taking the optimized action mean value as the optimal action of the current state, inputting the optimal action in the simulation environment, and collecting corresponding state change and rewards for updating the action strategy in the near-end strategy optimization algorithm.
7. The method for adaptively positioning a micro-assembly robot according to claim 6, wherein the action sequence generation formula is: wherein As the weight coefficient of the light-emitting diode, The motion average value of m times of iteration at the moment t, The action predicted value at the time t is given by the action strategy network according to the current state, Sampling random noise; the action average iterative formula is: ; Wherein, the Is the parameter of the ultrasonic wave to be used as the ultrasonic wave, For discount rewards of predicted trajectories after iterating m times, the expression is: , as a discount factor, the number of times the discount is calculated, Weights penalized for motion magnitudes.
8. The apparatus for the adaptive positioning method of a micro-assembly robot according to any one of claims 1 to 7, comprising: The strategy initialization training module is used for training by adopting a near-end strategy optimization algorithm in a simulation environment based on the existing physical model to obtain an initial action strategy network; The probability model construction module is used for acquiring training data in an actual environment by utilizing an initial action strategy network so as to train a plurality of probability models, and the probability models approach to the dynamic behavior characteristics of the robot in the actual environment in a simulation environment; The optimizing and strategy updating module is used for optimizing the action sequence based on the probability model obtained by training, executing the optimized action in the simulation environment, and re-collecting the reinforcement learning training data set to iteratively update the action strategy; And the iteration control module is used for acquiring training data in the actual environment again by utilizing the updated action strategy, further optimizing the probability model and repeating iteration until the action strategy meets the preset positioning performance requirement.
9. An electronic device comprising a processor and a memory, wherein the memory stores a computer program, and wherein the processor implements the method for adaptively positioning a micro-assembly robot according to any one of claims 1-7 when executing the computer program.
10. A storage medium having stored therein a computer program which, when executed by a processor, implements the method of adaptive positioning of a micro-assembly robot according to any of claims 1-7.

Description

Self-adaptive positioning method for data-driven micro-assembly robot Technical Field The invention belongs to the technical field of micro-assembly, and particularly relates to a self-adaptive positioning method and device of a data-driven micro-assembly robot, electronic equipment and a medium. Background The micro assembly technology is a high-precision manufacturing process, and is characterized in that accurate assembly of micro parts with the micrometer to millimeter scale is realized within the micrometer-scale error range, so as to construct a micro system with specific functions. Key technical links of the micro assembly process include precise positioning, reliable clamping and controlled release of the micro devices. In the actual assembly process, the micro-assembly robot is generally required to have high positioning precision, a miniaturized structure and flexible pose adjustment capability so as to meet the assembly requirement under the complex working condition. At present, the mainstream high-precision positioning scheme mostly adopts a servo motor integrating an encoder or an external sensor such as a grating ruler, and the actual output of the motor is compensated in real time through sensor feedback data, so that high-precision positioning control is realized. The scheme has higher cost performance and stability in assembly application of large-scale mechanical arms, wide working spaces and larger-size parts. However, this approach is not suitable for micro-assembly scenarios. The basic reason is that the integration of the high-precision encoder and the external sensor greatly increases the volume and the weight of the motor and the end actuating mechanism, the load of a single joint can be directly lifted in the micro mechanical arm structure, the driving pressure of other joint motors is further increased, and finally the overall dynamic performance and the positioning precision of the system are reduced. In addition, the volume constraints of the sensor and motor body can make the micro-robotic arm bulky, making it difficult to maintain the necessary flexibility and accessibility in small-scale operating spaces. At the same time, the influence of ambient noise on the positioning accuracy is further amplified in the micro-assembly task. Since microscale operations are highly sensitive to errors in position and force, even small vibrations, light disturbances or structural elastic changes can lead to significant assembly deviations. Conventional control algorithms (e.g., PID control, impedance control, etc.) typically rely on manual experience for parameter tuning, which is time consuming and labor intensive, and difficult to maintain stable performance in complex, diverse environments. With the increasing precision of robot structures and the increasing diversification of assembly scenes, the requirements of rapid deployment and high-precision operation are difficult to meet only by manual parameter adjustment. Therefore, how to achieve rapid acquisition and adaptive adjustment of control parameters and maintain high-precision control in a dynamic environment becomes a key problem of urgent breakthrough in the current micro-assembly field. In view of this, the present invention has been made. Disclosure of Invention The invention aims to provide a data-driven self-adaptive positioning method, device, electronic equipment and medium for a micro-assembly robot, which can effectively reduce the dependence of an assembly environment on hardware performance, reduce assembly cost and improve the self-adaptive capacity and positioning precision of the micro-assembly robot in a complex environment. The technical scheme of the invention is as follows: A self-adaptive positioning method of a data-driven micro-assembly robot comprises the following steps of S1, training by adopting a near-end strategy optimization algorithm (Proximal Policy Optimization, PPO algorithm) in a simulation environment based on an existing physical model to obtain an initial action strategy network, S2, acquiring training data in an actual environment by utilizing the initial action strategy network to train a plurality of probability models for approaching dynamic behavior characteristics of the robot in the actual environment, S3, optimizing an action sequence by Model Predictive Control (MPC) based on the probability models obtained by training, executing optimized actions in the simulation environment, and acquiring a reinforcement learning training data set again to iteratively update an action strategy, S4, acquiring training data again in the actual environment by utilizing the updated action strategy for further optimizing the probability models, and iterating until the action strategy meets preset positioning performance requirements. Further, the physical model in S1 is a geometrical model derived by solidwords software, and a dynamic model of each joint of the robot set in the simulation environment isaacl