CN-122008187-A - Redundant rope-driven mechanical arm perception decision execution intelligent control method and system

CN122008187ACN 122008187 ACN122008187 ACN 122008187ACN-122008187-A

Abstract

The invention discloses a redundant rope-driven mechanical arm perception decision execution integrated intelligent control method and system, and relates to the technical field of robot motion planning. The method comprises the steps of processing an environment point cloud to obtain a three-dimensional environment model, generating a plurality of groups of collision-free expert track sequences based on the three-dimensional environment model and a simplified rigid connecting rod model of a mechanical arm, training a motion control model based on an improved strengthening algorithm by adopting a three-stage learning strategy, training a first stage based on a barrier-free static target environment, training a second stage based on a barrier-free dynamic target environment, training a third stage based on a random barrier and a dynamic target environment, introducing the collision-free expert track sequences into the improved strengthening algorithm to serve as expert experience, obtaining expert experience corresponding to real-time pose, target position and barrier type of the mechanical arm, and performing real-time motion control on the mechanical arm by using the trained motion control model. The invention can realize collision-free motion planning of the redundant rope-driven mechanical arm in a shielding scene.

Inventors

Cheng Biyi
CHEN RUI
ZHANG CHUBIN
LIN BOHAO
Huang Bingting
YAN TIANTIAN
ZHANG XINDE
WANG HONGJUN
HUANG KAIXIANG
LI JIAXIANG
Zhong Chiliang
CHEN BAIHAO
MA CHUANG
CHEN JINGNING

Assignees

华南农业大学

Dates

Publication Date: 20260512
Application Date: 20251231

Claims (10)

1. The intelligent control method for sensing decision execution of the redundant rope-driven mechanical arm is characterized by comprising the following steps of: acquiring an environment point cloud by using a depth camera, and processing the environment point cloud to acquire a three-dimensional environment model, wherein the three-dimensional environment model comprises a plurality of obstacle class models; carrying out parameterized three-dimensional modeling on the mechanical arm, and establishing an equivalent simplified rigid connecting rod model; Generating a plurality of groups of collision-free expert track sequences based on the simplified rigid connecting rod model and the three-dimensional environment model; Training a motion control model based on an improved strengthening algorithm by adopting a three-stage learning strategy, wherein the first stage trains a first motion control model based on the strengthening algorithm based on a barrier-free static target environment, the second stage trains a second motion control model based on the improved strengthening algorithm based on a barrier-free dynamic target environment, the second motion control model takes a weight parameter of the first motion control model obtained by the first stage training as an initialization parameter, the third stage trains a third motion control model based on the improved strengthening algorithm based on a random barrier and the dynamic target environment, the third motion control model takes a weight parameter of the second motion control model obtained by the second stage training as an initialization parameter, and the improvement part of the improved strengthening algorithm comprises the step of introducing a collision-free expert track sequence as expert experience in the training process; The method comprises the steps of obtaining real-time pose, target position and obstacle category of the mechanical arm, determining corresponding expert experience according to the obstacle category, inputting the real-time pose, the target position and the expert experience into a trained third motion control model, and realizing real-time motion control of the mechanical arm through output control signals.
2. The method for performing integrated intelligent control on the sensing decision of the redundant rope driven mechanical arm according to claim 1 is characterized in that the processing of the environmental point cloud is performed, a three-dimensional environmental model is obtained, the environmental point cloud is subjected to outlier rejection and voxel grid downsampling, the point cloud is divided into independent objects through clustering or region growing algorithm, the independent objects comprise targets and obstacles, the center of each divided point cloud segment is calculated, the shape analysis is performed by using a PCA principal component analysis method, the size of the independent object is estimated based on the shape analysis result, the obstacle category model is obtained by combining the obstacle category, and the multi-frame point cloud is reconstructed into a complete three-dimensional environmental model.
3. The intelligent control method for executing the sensing decision of the redundant rope-driven mechanical arm is characterized by comprising the steps of collecting a plurality of images containing different types of obstacles, preprocessing and extracting features of the images to obtain high-dimensional feature vectors capable of representing obstacle information, processing the high-dimensional feature vectors to obtain low-dimensional feature representations, and inputting the low-dimensional feature representations into a pre-trained classifier to obtain the types of the obstacles.
4. The intelligent control method for the sensing decision execution of the redundant rope-driven mechanical arm is characterized in that the mechanical arm is structurally formed by sequentially connecting a plurality of sections of rigid connecting rods through universal joints, a driving mechanism of the mechanical arm comprises a plurality of ropes which are driven cooperatively, and the simplified rigid connecting rod model is constructed through a D-H parameter method.
5. The method for performing integrated intelligent control on sensing decisions of a redundant rope-driven mechanical arm according to claim 1, wherein generating a plurality of groups of collision-free expert trajectory sequences based on the simplified rigid connecting rod model and the three-dimensional environment model comprises generating a collision-free expert trajectory sequence by adopting an artificial potential field method, calculating virtual force acting on an end effector of the mechanical arm according to a target position and an obstacle category model, and mapping the virtual force into expected speed of a movable joint on the mechanical arm through a jacobian matrix.
6. The method for integrated intelligent control of redundant rope driven mechanical arm sensing decision execution according to claim 1, wherein the improvement of the improved strengthening algorithm further comprises adding a bonus modeling function to the bonus function, wherein the bonus modeling function is calculated based on the expert experience, and the expression is as follows: ; In the formula, Representing a bonus modeling function; Representing a hyper-parameter for adjusting the bonus sensitivity; Representing the current state Minimum euclidean distance from all expert states in the expert experience buffer.
7. The method of claim 6, wherein the training process of the second motion control model based on the improved reinforcement algorithm or the third motion control model based on the improved reinforcement algorithm comprises: state vectors in state space Comprises angles and angular velocities of a plurality of movable joints on a mechanical arm, speeds of a plurality of driving motors, lengths of ropes, three-dimensional coordinates of an end effector and three-dimensional coordinates of a target point, wherein state vectors in the training process of a third motion control model Also contains the three-dimensional coordinates of all obstacles; motion vector in motion space A speed signal comprising a plurality of drive motors; Environment reward function Including environment-provided rewards Modeling function with rewards The formula is: ; In the formula, Indicating hyper-parameters for adjusting expert experience guidance intensity, environment-provided rewards The method comprises the following steps: , representing a basic distance penalty, , For the euclidean distance of the current end effector to the target, The value range is [0.5, 10] for the distance weight parameter; Representing a control penalty for punishing a violent control action, , Is the first The acceleration of the individual joints is determined, In order to control the weight parameters, the value range is [0.01, 0.00001]; indicating a successful reward, which is a sparse reward based on a logical decision, , In order to successfully determine the distance it is, For a successful weight parameter, the value range is [0, 3000]; Indicating a path-aware reward, , The value range is [0, 50] for the path perception weight parameter; In the model training and parameter updating stage, a batch of experience tuples are randomly sampled from a main experience buffer zone, and each state in the batch is used for the model training and parameter updating stage Based on expert experience calculation Reward shaping function Thereby obtaining the combined rewards ; Subsequent use of combined rewards To construct a target Q value And then based on the target Q value Gradient updating of the value network and the strategy network is carried out, wherein the value network parameters are subjected to gradient updating by minimizing the mean square Belman error, the strategy network parameters are subjected to gradient updating by utilizing the Q value and the strategy entropy estimated by the maximized value network, and then soft updating of the target network is carried out.
8. The integrated intelligent control method for sensing decision execution of redundant rope driven mechanical arm according to claim 1, wherein the model networks of the first stage and the second stage are multi-layer sensor networks, the model network of the third stage is a long-short-period memory network, and the third motion control model is used for state vector in the training process Firstly, carrying out normalization processing, carrying out feature processing and dimension lifting on the normalized state vector, carrying out time sequence dependency capturing and dynamic feature extraction through a two-layer unidirectional long-short-term memory network to obtain a hidden state sequence H, then extracting the hidden state of the last time step of the hidden state sequence H, and then inputting the hidden state of the last time step into a four-layer fully-connected network to carry out deep nonlinear mapping and feature depth fusion processing to obtain the input feature vectors of the strategy network and the value network.
9. The integrated intelligent control method for sensing decision execution of a redundant rope driven mechanical arm according to claim 1, wherein expert experience samples in the training process account for 0-30% of the total training samples.
10. The intelligent control system for the sensing decision execution of the redundant rope driven mechanical arm is characterized by being realized based on the intelligent control method for the sensing decision execution of the redundant rope driven mechanical arm, which is disclosed by any one of claims 1-9, and comprises the following steps: The system comprises a model building module, a mechanical arm model building sub-module and a control module, wherein the model building sub-module comprises an environment model building sub-module and a mechanical arm model building sub-module, the environment model building sub-module is configured to acquire environment point clouds by using a depth camera, process the environment point clouds to acquire a three-dimensional environment model, and the three-dimensional environment model comprises a plurality of obstacle type models; An expert trajectory generation module configured to generate a plurality of sets of collision-free expert trajectory sequences based on the simplified rigid link model and the three-dimensional environment model; The system comprises a motion control model training module, a third motion control model training module and a third motion control model training module, wherein the motion control model training module is configured to train a motion control model based on an improved strengthening algorithm by adopting a three-stage learning strategy, the first stage trains a first motion control model based on the strengthening algorithm based on a barrier-free static target environment, the second stage trains a second motion control model based on the improved strengthening algorithm based on the barrier-free dynamic target environment, the second motion control model takes a first motion control model weight parameter obtained by training in the first stage as an initialization parameter, the third motion control model trains a third motion control model based on the improved strengthening algorithm based on a random barrier and the dynamic target environment, and the third motion control model takes a second motion control model weight parameter obtained by training in the second stage as the initialization parameter; The intelligent motion control module is configured to acquire real-time pose, target position and obstacle category of the mechanical arm, determine corresponding expert experience according to the obstacle category, input the real-time pose, the target position and the expert experience into a trained third motion control model, and realize real-time motion control of the mechanical arm through an output control signal.

Description

Redundant rope-driven mechanical arm perception decision execution intelligent control method and system Technical Field The invention relates to the technical field of robot motion planning, in particular to a redundant rope-driven mechanical arm perception decision execution integrated intelligent control method and system. Background The disadvantage of the traditional complex operation tasks relying on manual completion is particularly pronounced in the context of complex outdoor structures. Because the outdoor structure complex scene has the characteristics of serious shielding, multiple targets, irregular distribution and the like, higher requirements are put on the safety accessibility, the operation flexibility and the multi-physical field coupling capability of the robot in the complex environment. In the aspect of perception, the multi-mode sensor is mature, can output a color image and a depth image and generate a three-dimensional point cloud, supports a robot to construct a three-dimensional environment representation in a complex environment, realizes target positioning and obstacle detection, and provides support for robot positioning and basic obstacle avoidance. However, in a structural environment including branches and leaves, cables, brackets, shelves, pipelines, equipment corners, human bodies and the like, which has irregular shapes, large scale difference or complex shielding relation, the spatial relation between key barriers and operation objects and the corresponding risks and operability constraints of the key barriers and the operation objects are difficult to analyze finely only by relying on coarse-grained point clouds or simple geometric approximation. In the aspect of decision and control, the traditional kinematics, static path planning and local obstacle avoidance method are mostly established on the premise of regular environmental geometry, rigid mechanical arms and accurate models, and for redundant flexible mechanisms such as rope-driven continuous mechanical arms and the like, the planning and control calculated amount based on the accurate models is large due to nonlinear strong coupling, complex deformation and high state dimension, and modeling errors and noise sensitivity are high, so that real-time performance and robustness are difficult to be considered. Although the deep reinforcement learning can provide a new idea for the control of the flexible redundant mechanical arm through the interactive learning of the complex strategy in the high-dimensional state-action space, the method generally starts from a random strategy, relies on a large number of error test samples, has low sample efficiency and high training cost, and has safety and abrasion risks on a real robot, and can improve the convergence speed and stability to a certain extent by introducing expert demonstration or priori tracks. However, the existing researches are mostly stopped on a simplified environment or a rigid mechanical arm platform, and a unified sensing decision and control framework which faces to complex redundant mechanisms such as rope drive continuous bodies and fully combines high-quality environment sensing information is not formed. Therefore, in an unstructured environment, how to realize the integrated method of sensing decision execution of active sensing capability, path planning/obstacle avoidance movement and other functions through software and hardware collaborative design and realize efficient, safe and robust operation in a complex outdoor dynamic environment is a key problem to be solved urgently in the prior art. Disclosure of Invention In order to solve the technical problems, the invention provides an intelligent control method and system for sensing decision execution of a redundant rope-driven mechanical arm. According to one aspect of the invention, a redundant rope-driven mechanical arm perception decision execution integrated intelligent control method is provided, and the method comprises the following steps: acquiring an environment point cloud by using a depth camera, and processing the environment point cloud to acquire a three-dimensional environment model, wherein the three-dimensional environment model comprises a plurality of obstacle class models; carrying out parameterized three-dimensional modeling on the mechanical arm, and establishing an equivalent simplified rigid connecting rod model; Generating a plurality of groups of collision-free expert track sequences based on the simplified rigid connecting rod model and the three-dimensional environment model; Training a motion control model based on an improved strengthening algorithm by adopting a three-stage learning strategy, wherein the first stage trains a first motion control model based on the strengthening algorithm based on a barrier-free static target environment, the second stage trains a second motion control model based on the improved strengthening algorithm based on a barrier-free dynamic