CN-115719377-B - Six-degree-of-freedom pose estimation dataset automatic acquisition system

CN115719377BCN 115719377 BCN115719377 BCN 115719377BCN-115719377-B

Abstract

The embodiment of the invention discloses an automatic acquisition system for a six-degree-of-freedom pose estimation data set, which comprises a data acquisition platform and data processing equipment, wherein the data acquisition platform is used for shooting an RGBD image sequence of a target scene by using a depth camera, a data labeling software algorithm is installed in the data processing equipment and is used for processing the RGBD image sequence of the target scene, carrying out three-dimensional reconstruction on a target object to obtain a three-dimensional model, and automatically labeling segmentation mask information and six-degree-of-freedom pose information of the object in the target scene based on the three-dimensional model. The method can automatically label the pose information, the segmentation mask information and the three-dimensional model information of the scene object in the RGB-D image sequence acquired by the depth camera, and the obtained data set can be used for training and testing the robot grabbing neural network model based on the deep learning.

Inventors

CHEN PENG
SUN HANXIANG
Bao Beiyuan
CHEN HAIYONG
LIU WEIPENG

Assignees

河北工业大学

Dates

Publication Date: 20260505
Application Date: 20221124

Claims (5)

1. The automatic acquisition system for the six-degree-of-freedom pose estimation data set comprises a data acquisition platform and data processing equipment, and is characterized in that the data acquisition platform is used for shooting an RGBD image sequence of a target scene by using a depth camera; the data processing equipment is internally provided with a data labeling software algorithm which is used for processing an RGBD image sequence of a target scene, carrying out three-dimensional reconstruction on the target object to obtain a three-dimensional model, and automatically labeling segmentation mask information and six-degree-of-freedom pose information of the object in the target scene based on the three-dimensional model; the data processing device adopts a data labeling software algorithm to process RGBD image sequences of a target scene, and specifically comprises the following steps: The first step, point cloud alignment, namely, finding three-dimensional coordinates of 4 corner points marked by m ArUco in a kth frame image from an RGBD image sequence, namely, X k = { xki|i=1, 2, & gt, 4m }, and then determining the corresponding relation of X k between frames; according to the X k of each frame image and the corresponding relation between the X k and other frame images, calculating the transformation moment T 1k of the kth frame point cloud and the first frame point cloud by utilizing a point cloud global registration algorithm, if n frames are shared in an image sequence, forming a point cloud transformation set T 1 ＝{T 1k |k=1, 2, wherein n is the kth frame point cloud is aligned with the 1 st frame point cloud after T 1k transformation; Performing three-dimensional reconstruction, namely performing plane fitting on X k by using a random sampling consistency algorithm, filtering out points near a plane in each frame of point cloud, performing position and posture transformation on the k frame of point cloud according to T 1k in a set T 1 , performing three-dimensional point cloud stitching on the transformed point cloud to obtain a complete point cloud of an object, removing noise through point cloud smoothing treatment to obtain a reconstructed point cloud M, and segmenting the reconstructed point cloud M by using an Euclidean cluster segmentation algorithm to obtain point clouds O= { O i |i=1, 2. Thirdly, surface triangularization, namely calculating normal vectors of point clouds of all objects, and unifying the normal vectors of the point clouds of all objects by utilizing a normal propagation method based on tree layering Li Mantu; Fourthly, labeling segmentation information, namely projecting triangular faces in the object three-dimensional model m i , i=1, 2, and c to a camera plane one by one to obtain a segmentation mask sequence of the object; Fifthly, labeling pose information, namely calculating a direction bounding box and a pose transformation matrix T mi of the bounding box of each object three-dimensional model m i , and calculating six-degree-of-freedom pose information of an object model m i in a kth frame point cloud according to a pose transformation relation T 1k of the first frame point cloud and the kth frame point cloud.
2. The automatic acquisition system of claim 1, wherein the data acquisition platform comprises an electric turntable and a mechanical arm, wherein a target object is placed on the electric turntable, the electric turntable receives a control instruction issued by an upper computer and rotates according to the control instruction, the tail end of the mechanical arm is provided with the depth camera, and the depth camera can shoot RGD images of all angles of the target object to obtain the RGBD image sequence.
3. The automatic acquisition system of claim 2, wherein the electric turntable receives a control command issued by the upper computer through the RS-232 communication module, and the control command comprises a rotation angle and a rotation speed of the electric turntable.
4. The automated collection system of claim 2, wherein the motorized turntable comprises a stepper motor, a belt, an acrylic plate, a base plate, and a circular slide rail, wherein the stepper motor is mounted on the base plate, the belt is mounted above the circular slide rail, the circular slide rail is mounted above the base plate by two studs, the acrylic plate is mounted above the belt by two studs, and the acrylic plate is provided with a ArUoc mark.
5. The automatic acquisition system of claim 1, wherein in the surface triangularization step, a normal vector of each object point cloud is unified based on a normal propagation method of tree hierarchy Li Mantu, specifically: (1) In the principal component analysis method, normal estimation is carried out on point clouds by using a larger neighborhood radius to obtain rough normal vectors; (2) Selecting a point with the minimum curvature as a root node of the minimum spanning tree, and marking the point; (3) Calculating the distances from all unlabeled points to the tree, and finding the nearest point p i to the tree and the nearest tree node to p i ; (4) Adding p i as a new node into the tree and connecting to the tree node nearest to it, generating a new edge, and marking p i ; (5) Repeating steps 3 and 4 until all points in the point cloud are added to the minimum spanning tree; (6) Traversing all edges of the minimum spanning tree, and calculating an included angle between normal vectors of nodes connected with two ends of the minimum spanning tree; If the included angle is larger than 90 degrees, the normal vector of the node is added after inversion, so that a rough normal vector with the consistent direction is obtained; (7) In the principal component analysis method, a smaller neighborhood radius is used for carrying out normal estimation on point clouds, so as to obtain an accurate normal vector; (8) And calculating an included angle between the precise normal vector and the rough normal vector of each point, and reversing the precise normal vector if the included angle is larger than 90 degrees, so as to finally obtain the precise normal vector with the consistent direction.

Description

Six-degree-of-freedom pose estimation dataset automatic acquisition system Technical Field The invention relates to the technical field of robots, in particular to an automatic acquisition system for a six-degree-of-freedom pose estimation data set, which is suitable for a training or test data set labeling link of a deep learning network model for robot grabbing. Background With the rapid development of science and technology and industrial modernization, the robot industry is coming into new development opportunities, the market occupation scale is increasing, and a large number of robots are applied to actual production tasks such as sorting, assembling and feeding, and play an important role in various industries. Compared with the traditional manual operation, the robot has the advantages of high operation accuracy, high system stability, high return on investment and the like. Along with the rapid rise of artificial intelligence technology and continuous iteration of intelligent hardware, computer vision is increasingly closely related with robot technology, and a robot can acquire visual information of a scene by taking a camera as an 'eye', so that interaction with the external environment is realized. Robot grabbing is a key link in the operation processes of robot sorting, assembly, feeding and the like, and is a key technology for realizing automation and intellectualization of the operation process of the robot. The robot gripping can be subdivided into three subtasks, namely gripping detection, gripping planning and gripping control. The grabbing detection is a foundation of grabbing planning and grabbing control, image information and depth information of grabbing scenes are collected through cameras, particularly optical instruments such as depth cameras, three-dimensional point clouds of grabbing scenes are constructed, and then the positions and the postures of target objects are estimated through algorithms to form grabbing descriptions, so that a robot is guided to achieve grabbing. Therefore, a common grabbing detection method registers the three-dimensional model of the target object with the actual point cloud of the target object, so that pose estimation of the target object is realized. Classical algorithms include LineMOD algorithm based on template matching, point-to-feature algorithm based on voting, etc. With popularization and application of artificial intelligence technology, more and more pose estimation algorithms based on deep learning are applied to unordered grabbing tasks of robots. For example, a robot gripping method based on the modified KeypointRCNN model, a robot gripping method based on PVNet, and the like. These methods often require a large amount of data to train the network model to obtain the desired pose estimation accuracy. The richness and labeling quality of the data set directly affect the performance of the network model. Currently, although datasets such as LineMOD, homebrewedDB, HOPE have emerged that are dedicated to six degrees of freedom pose estimation. However, because the labeling of the data sets is completed completely or partially manually, the labeling work is time-consuming and labor-consuming, and the large-volume and wide-range data set labeling result is difficult to obtain. For example, the LineMOD dataset labels about 1000 pose labels and split mask labels for only 15 objects, the HomebrewedDB dataset labels only 33 objects in 13 scenes, and the HOPE dataset labels only 28 toy objects in 50 scenes. Although there is also an open-source six-degree-of-freedom object pose estimation dataset manufacturing tool such as ObjectDataSetTools, which can realize automation of the pose estimation dataset manufacturing process, that is, pose information, segmentation mask information and object three-dimensional model information can be automatically marked by a given RGB-D image sequence, the tool can only mark a single object, and cannot process complex scenes and shielding problems existing among objects, and because no method direction unification processing is performed in the three-dimensional reconstruction process, deformation, tearing and other phenomena often occur in the three-dimensional reconstruction result, so that the marking quality of the pose estimation dataset is greatly reduced. Disclosure of Invention Aiming at the technical defects mentioned in the background art, the embodiment of the invention aims to realize automatic labeling of six-degree-of-freedom pose data of an object in a shooting scene, thereby laying a foundation for realizing unordered grabbing of a robot, wherein the six-degree-of-freedom pose data mainly comprises pose information of the object, segmentation mask information of the object in an image and three-dimensional model information of the object. In order to achieve the above purpose, the embodiment of the invention provides an automatic acquisition system for a six-degree-of-freedom pose estimation data