CN-121973209-A - Smart grabbing planning method based on conditional diffusion generation and accessibility sensing

CN121973209ACN 121973209 ACN121973209 ACN 121973209ACN-121973209-A

Abstract

The invention belongs to the technical field of robots, and particularly relates to a smart grabbing planning method based on conditional diffusion generation and accessibility sensing, which comprises the steps of initializing grabbing gestures, and obtaining a multi-finger smart hand grabbing simulation data set, wherein the simulation data set comprises scene point cloud data, palm gesture data for stable grabbing and finger joint angle data; the multi-finger smart hand grabbing simulation data set is input into a conditional diffusion generation model to obtain a complete grabbing gesture, wherein the conditional diffusion generation model is used for extracting local features in scene point clouds, obtaining the complete grabbing gesture according to the local features, training a stacking self-encoder according to a mechanical arm reachability graph to obtain a grabbing reachability estimator, inputting the complete grabbing gesture into the grabbing reachability estimator to obtain a reachability probability, obtaining a target grabbing gesture according to the target grabbing gesture based on the reachability probability, and grabbing according to the target grabbing gesture.

Inventors

LI SHIQI
HU YUFENG
LI XIAO
Gu Leyuan
Fu Lequn

Assignees

华中科技大学

Dates

Publication Date: 20260505
Application Date: 20260203

Claims (8)

1. A smart grip planning method based on conditional diffusion generation and accessibility sensing, comprising: Initializing a grabbing gesture to obtain a multi-finger smart hand grabbing simulation data set, wherein the simulation data set comprises scene point cloud data, stably grabbing palm gesture data and finger joint angle data; inputting the multi-finger smart hand grabbing simulation data set into a conditional diffusion generation model to obtain a complete grabbing gesture, wherein the conditional diffusion generation model is used for extracting local features in scene point clouds and obtaining the complete grabbing gesture according to the local features; inputting the complete grabbing gesture into the grabbing reachability evaluator to obtain the reachability probability; And based on the reachable probability, acquiring a target grabbing gesture by combining with the grabbing stability probability, and grabbing according to the target grabbing gesture.
2. The smart grip planning method based on conditional diffusion generation and accessibility sensing of claim 1, wherein initializing a grip gesture, obtaining a multi-fingered smart hand grip simulation dataset comprises: Invoking GRASPNET API to generate a 6D pose of the two-finger clamping jaw; converting the 6D pose of the two-finger clamping jaw into a palm pose of the multi-finger smart hand, giving a pre-grabbing joint angle, and obtaining an initial multi-finger grabbing pose; Acquiring a multi-energy function according to micro-force closure energy, contact attraction energy, penetration rejection energy, self-penetration energy and joint overrun energy; Performing gradient descent optimization on the initial multi-finger grabbing gesture by using the multi-energy function to obtain an optimized grabbing gesture; And verifying the optimized grabbing gesture in a physical simulator, and filtering to form the multi-finger smart hand grabbing simulation data set.
3. A smart grip planning method based on conditional diffusion generation and reachability awareness as claimed in claim 2, wherein the multi-energy function is: ; Wherein, the As a function of the multiple energies of the light, In order to be able to close the energy with a slight force, In order to contact the attractive energy, In order to penetrate the repulsive energy it is necessary, In order for the energy to be self-penetrating, In order for the joint to overrun energy, 、、 And Respectively representing the weights of the corresponding energy items.
4. The smart grabbing planning method based on conditional diffusion generation and accessibility sensing according to claim 1, wherein the conditional diffusion generation model comprises a local feature extraction module and a grabbing gesture generation module; the local feature extraction module is used for extracting local features in the scene point cloud; the grabbing gesture generating module is used for acquiring the complete grabbing gesture according to the local features.
5. The smart grip planning method based on conditional diffusion generation and accessibility awareness of claim 4, wherein extracting local features in a scene point cloud comprises: Calculating each point of the object surface Is the grip fraction of (2) Wherein the grip fraction For representing points The degree of grippability of the surrounding area; And selecting the points of the grabbing degree score of the target and positioned on the surface of the object as characteristic seed points, and acquiring the local characteristics by sampling a preset number of characteristic seed points through the farthest points.
6. A smart grip planning method based on conditional diffusion generation and accessibility sensing as recited in claim 4, wherein obtaining the complete grip pose from the local features comprises: According to the local features, obtaining feature vectors of feature seed points; coding the initial wrist gesture as a target dimension vector on the condition of the feature vector; applying a forward diffusion process to the target dimension vector to generate a noisy sample; Training a denoising network by using the noisy samples to predict the diffusion speed, and reversely denoising by using a normal differential equation to obtain a denoised wrist attitude vector; Projecting the denoised wrist gesture vector back to a preset space through matrix decomposition SVD to obtain a target wrist gesture; And (3) splicing the target wrist gesture with the feature vector of the feature seed point, inputting the spliced object into the MLP, and returning to the joint angle to obtain the complete grabbing gesture.
7. The smart grip planning method based on conditional diffusion generation and accessibility awareness of claim 1, wherein training a stacked self-encoder from the robotic reachability graph, acquiring a grip reachability evaluator comprises: Dispersing the working space of the mechanical arm into equidistant voxel grids; Based on the equidistant voxel grid, randomly sampling the terminal pose by adopting forward kinematics, and marking reachable voxels; when the forward kinematics is utilized to construct the reachability graph, when the newly found voxel ratio is lower than a threshold value, switching to inverse kinematics uniform sampling, and complementing unexplored areas to obtain a complete reachability graph; Uniformly sampling palm poses and standard labels of the palm poses from the complete reachability graph to generate training samples; and training a stacked self-encoder by using the training samples, and acquiring the grabbing reachability estimator.
8. The smart grip planning method based on conditional diffusion generation and reachability sensing of claim 7, wherein training stacked self-encoders with the training samples, obtaining the grip reachability evaluator comprises a pre-training phase and a fine-tuning phase; the pre-training stage greedy trains the self-encoder layer by layer, and minimizes the mean square error between input and reconstruction; the fine tuning stage is supervised by reachability labels, updating all weights by back propagation.

Description

Smart grabbing planning method based on conditional diffusion generation and accessibility sensing Technical Field The invention belongs to the technical field of robots, and particularly relates to a smart grabbing planning method based on conditional diffusion generation and accessibility sensing. Background The multi-finger smart hand becomes an important research direction in the field of robot grabbing due to the humanized multi-joint structure and the highly free operation capability, and particularly in complex application scenes such as precise assembly, man-machine cooperation and the like. However, multi-fingered dexterous hand-based grip planning faces many challenges. Firstly, many existing grabbing data sets are based on two-finger grabbing, and the data sets are not directly applicable to multi-finger grabbing scenes, so that the problem of physical constraint problems such as contact stability, force closure, collision detection, joint angle limitation and the like in multi-finger grabbing cannot be effectively solved. Second, conventional grabbing planning methods often rely too much on global geometric features of the object, and in actual scenes, complete geometric information of the object cannot be obtained due to occlusion, environmental complexity or view angle limitation. The modeling method based on the global features is prone to failure when information is lost, and the grabbing gesture meeting the actual contact condition cannot be generated. Furthermore, in terms of the kinematic accessibility of the robotic arm, most studies place objects within a known reach of the robotic arm, assuming that all the generated gripping poses can be performed. However, in a practical environment, the position of the object may exceed the working space of the robot arm, resulting in a gripping gesture that is kinematically unreachable. Existing planning systems often lack a mechanism for quantitatively detecting the unreachable, cannot effectively screen out the grasping gesture unreachable in the kinematics, and the generated grasping gesture which is theoretically 'optimal' may not be executed due to the motion limitation of the mechanical arm. At present, related technicians have conducted some designs and researches on smart grabbing planning methods, but have the defects that a method for generating grabbing gestures based on point cloud filtering and random sampling does not introduce micro-force closing optimization, hand penetration or grabbing instability is easy to occur, training data generated based on a voxel network can be limited by pixel resolution, key local geometric characteristics such as object bulge and pit are not captured and sensitive enough, gesture deviation is easy to occur when three-dimensional geometric information is lost by relying on RGB image semantic characteristics through a scheme of gesture detection network to predict grabbing gestures, physical constraints such as joint angle limitation are not integrated, and a method for combining behavior cloning and reinforcement learning optimization tracks is characterized by dynamic interaction, force closing conditions are not included into a loss function, so that grabbing failures are possibly caused by the fact that the gestures cannot be generated, many schemes at present only conduct researches on grabbing gestures, but mechanical arm accessibility of grabbing gestures is not evaluated, and the generating gestures cannot be executed possibly due to the fact that kinematics inverse solution fails. In summary, the prior art has significant shortcomings in aspects of physical rationality of the data set, generalization of the generated model, accessibility, and the like. Therefore, a unified framework integrating large-scale high-quality data sets, capture gesture generation based on local feature perception and kinematic reachability evaluation is needed to realize stable and efficient capture planning of multi-finger smart hands in complex scenes. Disclosure of Invention In order to solve the technical problems, the invention provides a smart grabbing planning method based on conditional diffusion generation and accessibility sensing, which can realize stable and efficient grabbing planning of a multi-finger smart hand in a complex scene. In order to achieve the above object, the present invention provides a smart grabbing planning method based on conditional diffusion generation and accessibility sensing, comprising: Initializing a grabbing gesture to obtain a multi-finger smart hand grabbing simulation data set, wherein the simulation data set comprises scene point cloud data, stably grabbing palm gesture data and finger joint angle data; inputting the multi-finger smart hand grabbing simulation data set into a conditional diffusion generation model to obtain a complete grabbing gesture, wherein the conditional diffusion generation model is used for extracting local features in scene point clouds and obtaining