CN-117315019-B - Mechanical arm grabbing method based on deep learning

CN117315019BCN 117315019 BCN117315019 BCN 117315019BCN-117315019-B

Abstract

The invention discloses a mechanical arm grabbing method based on deep learning, which is used for obtaining RGB images and depth images of objects to be grabbed through a depth camera. The depth image is detected, invalid pixels are obtained and the data are reconstructed so that the neural network can acquire more accurate grabbing pose. And aligning the RGB image with the depth map to obtain an RGB-D image, and aligning the RGB image with the depth map to perform center part clipping to obtain the RGB-D image. And inputting RGB-D images with the grabbing objects by adopting a pre-trained improved GGCNN network model, and generating grabbing pose information with the grabbing objects. The gripping pose is under the standard system of the mechanical arm base, namely a Cartesian coordinate system. And inputting grabbing information, and controlling the mechanical arm to carry out grabbing tasks. The invention solves the problems of generalization and poor learning ability in the original network, so that the improved neural network has better performance in obtaining the grabbing pose with grabbing objects and has practicability in the field of high-precision grabbing.

Inventors

LAI JUNFENG
Cheng Shuangxiong
SONG YONGDUAN
LING KAI
WANG PAN
WEI JIA
ZHAO MENGWEN
CHEN YUTONG
DENG YONGJING

Assignees

重庆大学

Dates

Publication Date: 20260512
Application Date: 20230906

Claims (3)

1. The mechanical arm grabbing method based on deep learning is characterized by comprising the following implementation steps of: S1, adjusting an initial state of the mechanical arm, namely a working initial point before the mechanical arm starts to grasp, enabling a depth camera to be perpendicular to a fixed height position above an XOY plane, and obtaining an RGB image and a depth image of an object to be grasped through the depth camera; S2, detecting the depth image to obtain invalid pixels and reconstructing the invalid pixels, simultaneously aligning the RGB image with the depth image to obtain an RGB-D image, and performing center part clipping on the alignment to obtain an RGB-D image with 300 multiplied by 300 pixels; s3, inputting RGB-D images of the object to be grabbed by adopting the pre-trained improved GGCNN network model, and outputting three grabbing information images with 300 multiplied by 300 pixels, wherein the grabbing information images are respectively a grabbing quality image, a grabbing angle image and a grabbing width image at a pixel level; S3, according to the generated grabbing quality map, obtaining a pixel point with highest grabbing quality, and obtaining a grabbing angle and a grabbing width corresponding to the pixel point in the grabbing angle map and the grabbing width map, so as to generate grabbing pose information of an object to be grabbed; s4, carrying out coordinate conversion on the obtained grabbing pose information through hand-eye calibration to obtain grabbing poses under a mechanical arm base coordinate system, namely a Cartesian coordinate system; s5, inputting grabbing information, and controlling the mechanical arm to carry out grabbing tasks; the repair specific operation of the depth map is as follows: The first step is to acquire a depth map of an object to be grabbed, which possibly has invalid information; The second step is to generate a depth map of the object to be grabbed with a 2D gradient; The third step is to perform threshold processing on the 2D gradient to generate a binary mask; Fourthly, performing water-flooding filling on the binary mask, and simultaneously generating masks of all invalid points in the depth image of the object to be grabbed; fifthly, performing expansion treatment on the generated mask; The sixth step is to provide the generated mask to an OpenCV image repair function that extrapolates and reconstructs invalid data from the pixels; The improved GGCNN network model is characterized in that an IBN module is added in a GGCNN network structure, and an ECA_ ResNet combined by an ultra-strong channel attention module ECANet and a residual module in a residual network is added to improve the performance of the model.
2. The deep learning-based mechanical arm grabbing method of claim 1, wherein the improved GGCNN network model comprises: Extracting a convolution part, a cavity convolution part, an ECA_ ResNet part and a final output part of the shallow layer characteristics of the image; the convolution part comprises 4 convolution layers and two maximum pooling layers, wherein the two convolution layers and one maximum pooling layer have the same parameters, namely a convolution module of 5 multiplied by 5, an IBN module and a Relu function, and the two convolution layers and the maximum pooling layer with the same parameters are arranged next, and the number of the convolution part filters is 16; The cavity convolution part comprises 2 cavity convolution layers, wherein the parameters of the two cavity convolution layers are the same, namely, the cavity convolution layers firstly pass through a 5×5 cavity convolution module, wherein dilation =2, then pass through a Relu activation function, then pass through a 5×5 cavity convolution module 32, wherein dilation =4, and finally pass through a Relu activation function once again, and the number of the cavity convolution filters is 32; The ECA_ ResNet part is provided with 6 ECA_ ResNet in series, wherein the ECA_ ResNet parameters comprise a convolution module with the step length of 3 multiplied by 3 firstly, a BN module secondly, a Relu activation function secondly, a convolution module with the step length of 3 multiplied by 3 and the BN module secondly, and ECANet finally, and the number of ECA_ ResNet filters is 32; The output part comprises two convolution layers, wherein the first convolution layer firstly passes through a convolution module of 3x3 and then enters Relu to activate a function, and the next convolution layer comprises three convolution layers of linear mapping, each convolution layer comprises a convolution module of 3x3, and the three convolution layers of linear mapping are sequentially mapped to output a grabbing success rate, a grabbing angle and a grabbing width respectively.
3. The deep learning-based mechanical arm grabbing method as claimed in claim 1, wherein when training the improved GGCNN network model, the following criteria are used to measure grabbing accuracy: (1) The difference between the predicted grabbing angle and the real grabbing angle of the mechanical arm is smaller than or equal to a specified threshold value of 30 degrees; (2) The intersection ratio Iou between the predicted grabbing rectangle and the real grabbing rectangle of the mechanical arm is larger than 25%, And cross ratio formula: ; Wherein, the A rectangular box representing a predictive grab is presented, Representing the truly grabbed rectangular frames, and calculating the intersection ratio, which is the ratio of the intersection and the union of the two rectangular frames.

Description

Mechanical arm grabbing method based on deep learning Technical Field The invention relates to a method for grabbing an unknown object by a mechanical arm based on deep learning, and belongs to the field of intelligent robots. Background An important feature of intelligent robots is the ability to sense and interact with the environment. Among the numerous functional capabilities of robots, gripping is the most basic and important function of the robot. In industrial production, the robot is a household robot which can finish a large number of heavy grabbing and placing tasks every day and provide convenience for the old and the disabled, and is mainly a daily grabbing task. Therefore, imparting a perception capability to a robot and better completing grasping through perception information has been one of important research contents in the fields of robot and machine vision. Robotic grasping systems with visual perception are typically composed of a grasping detection, grasping planning and control unit. To complete the gripping task, the robot needs to detect the gripped object in advance. In early gripping operations, the gripped object was mainly placed in a simple and structured scene. The gripping scheme is formulated by mechanical analysis based on the geometry of the object, and typically involves statics and kinematic constraints that require complex calculations. With the successful application and continued development of deep learning in instance segmentation and recognition, it has been widely used in robotic grasping detection, one approach is to use Convolutional Neural Networks (CNNs) in combination with cameras to recognize structural features of grasping objects and evaluate candidate grasping poses. This approach generally provides a system that allows the robot to grasp various shapes, where the CNN is used to train a shape detection model. This method generates a gripping point by sensing the shape of the gripped object, but is not effective for gripping irregularly shaped objects. Another approach is to directly generate a grab scheme, using a learning method to score grab quality. The grabbing scheme is usually generated by firstly detecting and identifying the position of an object, and then adopting a traditional geometric analysis method to finally make a grabbing plan. In the course of geometry analysis, a large number of calculations are typically involved. Thus, deep learning networks can be used to directly train the capture detection, where the capture scheme is obtained from the image to achieve end-to-end capture. This method proposes a model by which the optimal gripping pose of the robot can be output by inputting an image. The model uses a convolutional neural network to extract features from a scene, and then uses the convolutional neural network to predict the grabbing configuration of the object of interest. For example, a real-time grasping synthesis method for closed-loop grasping, a Generative Grasping Convolutional Neural Network (GGCNN). The system predicts the quality and pose of the grip at each pixel. However GGCNN is grabbing test data with its subsequently proposed upgrade network GGCNN2 The performance of the collection Cornell and Jacquard data sets and the actual mechanical arm grabbing needs to be improved. Disclosure of Invention Aiming at the performance problems of GGCNN and GGCNN2 in testing set and actual grabbing, the invention provides a deep learning network for estimating the grabbing pose of a facial manipulator by modifying a GGCNN network, so that a GGCNN2 model has more practicability in the field of high-precision grabbing, and provides a manipulator grabbing method based on the network. The main technical scheme adopted by the invention is a mechanical arm grabbing method based on deep learning, which comprises the following steps: S1, before the mechanical arm starts to grab, adjusting the initial state of the mechanical arm, namely a working initial point, so that the depth camera is perpendicular to a fixed height position above an XOY plane, and obtaining an RGB image and a depth image of an object to be grabbed through the depth camera. S2, detecting the depth image, obtaining invalid pixels and reconstructing the data so as to facilitate the neural network to obtain more accurate grabbing pose. And simultaneously, aligning the RGB image with the depth map to obtain an RGB-D image, and aligning the RGB image with the depth map to perform center part clipping to obtain the RGB-D image with 300 multiplied by 300 pixels. S3, inputting RGB-D images of the object to be grabbed by adopting the pre-trained improved GGCNN network model, and outputting three grabbing information images with 300 multiplied by 300 pixels, wherein the grabbing information images are respectively a grabbing quality image, a grabbing angle image and a grabbing width image at a pixel level; And S3, obtaining the pixel point with the highest grabbing quality according