CN-122008246-A - Smart hand motion planning method, device and medium based on visual perception

CN122008246ACN 122008246 ACN122008246 ACN 122008246ACN-122008246-A

Abstract

The invention discloses a smart hand motion planning method, equipment and medium based on visual perception, which relate to the technical field of robot control and comprise the steps of executing object instance segmentation and object six-degree-of-freedom pose estimation based on multi-mode visual data acquired in real time, generating an environmental state representation comprising object types, segmentation masks and pose information, starting global re-planning to generate a new planning instruction when an arbitration decision indicates that a comprehensive risk differential index does not meet preset stable execution conditions, and continuously updating the environmental state representation in an executable main track execution process. According to the invention, covariance analysis is performed on the six-degree-of-freedom pose estimation result of the object, inverse disturbance is applied, a multi-state environment representation which is unfolded around the perception uncertainty is formed in a track planning stage, and multi-track planning expression under the constraint of the same task instruction is realized.

Inventors

YANG ZIHE
GAO GUANJIAN
CUI HUAFENG
CHENG GUANGWEI
GAO ZHIKAI
LIU JUNYAN
HAN WENCHAO

Assignees

杭州黑漫科技有限公司

Dates

Publication Date: 20260512
Application Date: 20260407

Claims (10)

1. A smart hand movement planning method based on visual perception is characterized by comprising the following steps of, Performing object instance segmentation and object six-degree-of-freedom pose estimation based on multi-mode visual data acquired in real time, and generating an environmental state representation comprising object types, segmentation masks and pose information; Inputting the environmental state representation and task instructions into a neural symbol hybrid planner for track planning, and generating an executable main track meeting the kinematic and collision constraints; based on the perception uncertainty characterization derived from the environmental state characterization, applying inverse disturbance to the six-degree-of-freedom pose of the object, and generating a corresponding shadow track through a track generator; Calculating a scoring value of the executable main track and the shadow track in terms of feasibility margin, expected execution risk and vision characteristic space consistency, and fusing the scoring value to obtain a comprehensive risk differential index; Comparing the comprehensive risk differential index with a preset dynamic threshold value to determine an arbitration decision; And when the arbitration decision indicates that the comprehensive risk differential index does not meet the preset stable execution condition, starting global re-planning to generate a new planning instruction, and continuously updating the environmental state representation in the execution process of the executable main track.
2. The method for planning movement of a smart hand based on visual perception as set forth in claim 1, wherein the multi-modal visual data comprises two-dimensional color image data collected simultaneously, depth image data spatially aligned with the two-dimensional color image data, and three-dimensional point cloud data reconstructed from the depth image data.
3. The visual perception-based dexterous hand motion planning method of claim 2 wherein the task instruction is a task level instruction set that predefines a dexterous hand manipulation objective, manipulation style and manipulation order; the neural-symbol hybrid planner generates an initial track through a neural network part, and outputs an executable main track after the symbol logic part applies kinematics and collision constraint; the kinematic and collision constraints refer to physical limits of angles, angular velocities and angular accelerations of joints of the dexterous hand, and minimum safety distances maintained between the dexterous hand, the mechanical arm and obstacles in the environment.
4. The flexible hand movement planning method based on visual perception of claim 3, wherein the neural network part comprises a feedforward coding structure for coding environmental state representation and task instructions, a cyclic prediction network for time sequence expansion and a fully-connected output layer for outputting a joint space path point sequence; the symbolic logic portion includes a process of verifying joint space waypoint reachability, a process of performing collision detection based on obstacle information, and a process of performing trajectory optimization for joint space waypoints that do not satisfy kinematics and collision constraints.
5. The smart hand motion planning method based on visual perception of claim 4, wherein the perception uncertainty characterization is obtained by performing covariance analysis on six-degree-of-freedom pose estimation of the object; the corresponding shadow track is generated by the track generator, and the specific steps are as follows, Extracting confidence parameters representing the current perception reliability from the multi-mode visual data; Determining the direction and the amplitude of the inverse disturbance based on the confidence coefficient parameter, taking the six-degree-of-freedom pose of the object subjected to the inverse disturbance as a new environmental state representation, and inputting the new environmental state representation to a track generator; the trajectory generator multiplexes the neural network portions in the neural symbol hybrid planner to generate shadow trajectories in a computationally simplified manner.
6. The smart hand movement planning method based on visual perception according to claim 5, wherein the fusion score value obtains a comprehensive risk differential index by the following steps, Calculating a feasibility margin score value and an expected execution risk score value based on the executable main track; calculating a visual feature space consistency score value based on the changes of the executable primary track and the shadow track in the visual feature space; And calculating the difference value of the feasibility margin score value, the expected execution risk score value, the visual feature space consistency score value and the corresponding score value of the shadow track of the executable main track and the shadow track, and carrying out weighted fusion on the difference value to generate the comprehensive risk differential index.
7. The method for planning movement of a smart hand based on visual perception of claim 6, wherein the predetermined dynamic threshold is set based on environmental state characterization and historical operational data of task instructions; The decision to determine the arbitration decision, in particular steps, When the comprehensive risk differential index is lower than a preset dynamic threshold value, the arbitration decision is to output an executable main track; And when the comprehensive risk differential index is higher than a preset dynamic threshold, the arbitration decision is to trigger global re-programming.
8. The method for planning movement of a smart hand based on visual perception according to claim 7, wherein the preset stable execution condition is a state that a comprehensive risk differential indicator is lower than a preset dynamic threshold; The global re-planning is started to generate a new planning instruction, and the representation of the environmental state is continuously updated in the executable main track execution process, specifically the steps are as follows, Triggering a global re-planning signal through an arbitration decision, re-executing executable main track generation based on environmental state characterization and task instructions, and outputting new planning instructions; the current execution is interrupted through the control layer, and the control layer is switched to drive the dexterous hand based on the new planning instruction, so that the online updating of the environmental state representation and the real-time adjustment of the motion planning are performed.
9. A computer device comprises a memory and a processor, wherein the memory stores a computer program, and the computer program is characterized in that the processor realizes the steps of the smart hand movement planning method based on visual perception as set forth in any one of claims 1 to 8 when executing the computer program.
10. A computer readable storage medium having a computer program stored thereon, wherein the computer program when executed by a processor implements the steps of the smart hand movement planning method according to any one of claims 1 to 8.

Description

Smart hand motion planning method, device and medium based on visual perception Technical Field The invention relates to the technical field of robot control, in particular to a smart hand movement planning method, equipment and medium based on visual perception. Background The robot dexterous hand gradually adopts a technical route combining visual perception and learning planning in the assembly, sorting and fine operation scenes, common procedures comprise multi-mode visual acquisition, instance segmentation and six-degree-of-freedom pose estimation, task instruction coding and joint space track prediction, and executable track generation is carried out by matching with kinematic constraint and collision constraint, and related methods are continuously applied and expanded in complex object operation and unstructured environments. In the prior art, a single-time or single-point pose estimation result is usually used as a deterministic input in an end-to-end planning link, the pose uncertainty caused by perceived noise, shielding and pose jitter is difficult to be explicitly expressed by a planning link, the planning output lacks a quantitative basis for perceived disturbance sensitivity, the situation that track evaluation is inconsistent with execution risk cognition easily occurs, and the core contradiction is concentrated on the influence of the perceived uncertainty on the planning reliability and is difficult to obtain a computable description in a generation stage. Disclosure of Invention The present invention has been made in view of the above-described problems occurring in the prior art. Therefore, the invention provides a smart hand motion planning method based on visual perception, which solves the problem that the perception uncertainty is difficult to be explicitly quantified by planning so as to influence the reliable judgment of the smart hand track. In order to solve the technical problems, the invention provides the following technical scheme: The invention provides a smart hand motion planning method based on visual perception, which comprises the steps of executing object instance segmentation and object six-degree-of-freedom pose estimation based on multi-mode visual data acquired in real time, generating an environment state representation comprising object types, segmentation masks and pose information, inputting the environment state representation and task instructions into a neural-symbol hybrid planner to carry out trajectory planning, generating an executable main trajectory meeting kinematics and collision constraints, applying anti-fact disturbance to the object six-degree-of-freedom pose based on perception uncertainty representation derived from the environment state representation, generating corresponding shadow trajectories through a trajectory generator, calculating score values of the executable main trajectory and the shadow trajectories in terms of feasibility margin, expected execution risk and visual feature space consistency, fusing the score values to obtain comprehensive risk differential indexes, comparing the comprehensive risk differential indexes with preset dynamic thresholds, determining an arbitration decision, determining a trajectory processing path according to the arbitration decision, sending the executable main trajectory to a control layer to drive the smart hand to execute when the arbitration decision indicates that the comprehensive risk differential indexes meet preset stable execution conditions, starting up the new planning execution instruction in the global state representation and continuously updating the environment representation. As an optimal scheme of the smart hand movement planning method based on visual perception, the multi-mode visual data comprises two-dimensional color image data acquired synchronously, depth image data aligned with the two-dimensional color image data in space, and three-dimensional point cloud data reconstructed according to the depth image data. The task instruction refers to a task level instruction set which performs predefined description on an operation target, an operation mode and an operation sequence of the smart hand; the neural-symbol hybrid planner generates an initial track through a neural network part, and outputs an executable main track after the symbol logic part applies kinematics and collision constraint; the kinematic and collision constraints refer to physical limits of angles, angular velocities and angular accelerations of joints of the dexterous hand, and minimum safety distances maintained between the dexterous hand, the mechanical arm and obstacles in the environment. The neural network part comprises a feedforward coding structure for coding environmental state representation and task instructions, a cyclic prediction network for time sequence expansion and a fully-connected output layer for outputting joint space path point sequences; the symbolic logic portion includes a process