CN-121999042-A - Six-degree-of-freedom pose estimation method based on adaptive weighted feature fusion, electronic equipment and computer program product

CN121999042ACN 121999042 ACN121999042 ACN 121999042ACN-121999042-A

Abstract

The embodiment of the application relates to a six-degree-of-freedom pose estimation method based on self-adaptive weighting feature fusion, electronic equipment and a computer program product, belonging to the technical field of image processing, wherein the method comprises the following steps: the three-dimensional point cloud of the target object is generated based on the depth image, the key points of the three-dimensional point cloud are extracted from the three-dimensional point cloud by utilizing the furthest point sampling algorithm, the corresponding geometric feature vectors are extracted through the network of the parameterizable residual PointNet, the geometric feature vectors are projected to an RGB image plane to obtain color features, the color features are input into the self-adaptive weighting fusion network, the weight coefficient for representing the reliability of the color information is calculated based on the geometric and color features, the color features are weighted according to the weight coefficient and then are spliced with the geometric features, and finally the six-degree-of-freedom pose of the object is predicted based on the fusion features. The embodiment of the application improves the precision of estimating the pose with six degrees of freedom under complex scenes such as shielding, illumination change and the like, and improves the robustness of the system.

Inventors

GUO HONGXUAN
ZHAO CHUANHAO
PENG SHULI
LIANG YING
ZHOU JIANGLONG
XIANG YONGHONG

Assignees

航天长屏科技有限公司

Dates

Publication Date: 20260508
Application Date: 20260107

Claims (10)

1. The six-degree-of-freedom pose estimation method based on self-adaptive weighting feature fusion is characterized by comprising the following steps of: generating a three-dimensional point cloud of a target object based on a depth image, selecting a point from the three-dimensional point cloud as an initial point through a furthest point sampling algorithm, and calculating the distance from each point in the three-dimensional point cloud to the initial point to generate a multi-dimensional distance array; Selecting maximum points in the multi-dimensional distance array, and performing iterative calculation on the distance from each point to the maximum points until the number of sampling points reaches a preset requirement, so as to generate three-dimensional point cloud key points; processing the three-dimensional point cloud key points through a re-parameterizable residual PointNet network, extracting geometric features larger than the three-dimensional point cloud key points, and obtaining geometric feature vectors corresponding to each three-dimensional point cloud key point; The three-dimensional point cloud key points are projected to an RGB image plane by using camera parameters, corresponding color features are obtained, the geometric features and the color features of the three-dimensional point cloud key points are fused through a self-adaptive weighting fusion network, the self-adaptive weighting fusion network calculates a weight coefficient representing the reliability of color information based on the geometric features and the color features, the color features are weighted according to the weight coefficient and then are spliced with the geometric features, and the six-degree-of-freedom pose of the object is predicted according to the fused features.
2. The six-degree-of-freedom pose estimation method based on adaptive weighted feature fusion according to claim 1, wherein selecting a maximum point in the multi-dimensional distance array, performing iterative computation on a distance from each point to the maximum point until the number of sampling points reaches a preset requirement, and generating three-dimensional point cloud key points, further comprises: Calculating the minimum distance from each point in the three-dimensional point cloud to all points in the sampling point set in each iteration, and updating the multi-dimensional distance array; selecting a point corresponding to the current maximum value in the multidimensional distance array as a new sampling point to be added into the sampling point set; And repeating the iterative process until the number of the sampling point sets reaches the preset requirement, wherein the sampling point sets are generated three-dimensional point cloud key points.
3. The six-degree-of-freedom pose estimation method based on adaptive weighted feature fusion according to claim 2, wherein the furthest point sampling algorithm uses any point of the three-dimensional point cloud as an initial point, and uses euclidean distance in a three-dimensional space to calculate the distance between the initial point and other points.
4. The six-degree-of-freedom pose estimation method based on adaptive weighted feature fusion according to claim 1, wherein the processing the three-dimensional point cloud key points through the network of the re-parameterizable residuals PointNet, extracting geometric features larger than the three-dimensional point cloud key points, and obtaining geometric feature vectors corresponding to each three-dimensional point cloud key point, further comprises: Processing by a re-parameterized residual sub-module based on the coordinate data of the three-dimensional point cloud key points, and carrying out forward propagation and gradient feedback by a parallel branch structure comprising residual connection in a training stage of the re-parameterized residual sub-module so as to update network parameters; in the reasoning stage, the weight of the convolution layer in the parallel branches is fused with the parameters of the batch normalization layer to generate an equivalent single-path convolution kernel, and the input characteristics are checked through the equivalent single-path convolution kernel to transform so as to obtain the geometrical characteristic vector corresponding to each three-dimensional point cloud key point.
5. The six-degree-of-freedom pose estimation method based on adaptive weighted feature fusion according to claim 4, wherein in the process of processing the three-dimensional point cloud key points through a network of repartitionable residual PointNet, inputting coordinates of the three-dimensional point cloud key points into a repartitionable residual sub-module, and performing dimension lifting processing on output features of the repartitionable residual sub-module through a two-time channel expansion convolution layer; Aggregating the feature after dimension lifting based on global averaging pooling operation, and adding the feature before aggregation and the feature after aggregation through a residual connection; and calculating deep geometrical information representing each three-dimensional point cloud key point according to the added result, and outputting a corresponding geometrical feature vector.
6. The six-degree-of-freedom pose estimation method based on adaptive weighted feature fusion according to claim 1, wherein the three-dimensional point cloud key points are projected to an RGB image plane by using camera parameters to obtain corresponding color features, and the geometric features and the color features of the three-dimensional point cloud key points are fused by an adaptive weighted fusion network, the adaptive weighted fusion network calculates a weight coefficient representing the reliability of color information based on the geometric features and the color features, weights the color features according to the weight coefficient, then splices the color features with the geometric features, and predicts the six-degree-of-freedom pose of the object according to the fused features, and further comprises: Projecting the three-dimensional coordinates of the three-dimensional point cloud key points to a two-dimensional RGB image plane by using a camera internal reference matrix, obtaining two-dimensional pixel coordinates corresponding to each three-dimensional point cloud key point, and obtaining color feature vectors corresponding to each three-dimensional point cloud key point by bilinear interpolation sampling from a global feature map extracted from an RGB image based on the two-dimensional pixel coordinates; Generating a fusion intermediate feature according to the color feature vector and the geometric feature vector, processing the fusion intermediate feature through another full-connection layer and through a normalization function, and calculating a scalar weight coefficient representing the reliability of the color information of the point; And calculating six-degree-of-freedom pose parameters representing the position and rotation of an object in a three-dimensional space through a pose regression network based on the point-by-point fusion feature vectors of all key points.
7. The six-degree-of-freedom pose estimation method based on adaptive weighted feature fusion according to claim 6, wherein the generating a fusion intermediate feature according to the color feature vector and the geometric feature vector is specifically implemented by performing point-by-point fusion processing on the color feature vector and the geometric feature vector through an adaptive weighted fusion network, and adding the color feature vector and the geometric feature vector after transforming through an activation function through a full connection layer.
8. A six-degree-of-freedom pose estimation system based on adaptive weighted feature fusion for implementing the six-degree-of-freedom pose estimation method based on adaptive weighted feature fusion according to any one of claims 1 to 7, comprising: the multi-dimensional distance data generation module is used for generating a three-dimensional point cloud of a target object based on the depth image, selecting one point from the three-dimensional point cloud as an initial point through a furthest point sampling algorithm, and calculating the distance from each point in the three-dimensional point cloud to the initial point to generate a multi-dimensional distance array. And the three-dimensional point cloud key point generation module is used for selecting a maximum point in the multi-dimensional distance array, and carrying out iterative calculation on the distance from each point to the maximum point until the number of sampling points reaches a preset requirement, so as to generate three-dimensional point cloud key points. And the geometric feature vector calculation module is used for processing the three-dimensional point cloud key points through a re-parameterized residual PointNet network, extracting geometric features larger than the three-dimensional point cloud key points and obtaining geometric feature vectors corresponding to each three-dimensional point cloud key point. The six-degree-of-freedom pose prediction module of the object projects three-dimensional point cloud key points to an RGB image plane by using camera parameters, obtains corresponding color features, fuses the geometric features and the color features of the three-dimensional point cloud key points through a self-adaptive weighting fusion network, calculates a weight coefficient representing the reliability of color information based on the geometric features and the color features by the self-adaptive weighting fusion network, weights the color features according to the weight coefficient, then splices the color features with the geometric features, and predicts the six-degree-of-freedom pose of the object according to the fused features.
9. An electronic device comprising a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface, the memory are in communication with each other via the communication bus, and wherein the memory has a computer program stored therein, which when executed by the processor, causes the processor to perform the steps of the method according to any of claims 1 to 6.
10. A computer program product comprising a computer program, characterized in that the computer program, when executed by a processor, implements the method of any of claims 1 to 6.

Description

Six-degree-of-freedom pose estimation method based on adaptive weighted feature fusion, electronic equipment and computer program product Technical Field The embodiment of the application relates to a six-degree-of-freedom pose estimation method based on self-adaptive weighting feature fusion, electronic equipment and a computer program product, and belongs to the technical field of image processing. Background In an intelligent manufacturing environment, industrial robots are assigned the responsibility of performing various complex tasks, such as welding, moving, assembling, and painting. Most of these operations are implemented through offline programming or manual teaching, resulting in a limited level of intelligence exhibited by the robot in task execution. In order to meet the ever-increasing demands on intelligent technology, in particular in the critical task of accurate gripping, the robot has to re-plan its working path in the face of environmental changes. In an industrial process, in order to complete loading, sorting, and assembly, the robot must be able to accurately grasp the workpieces, which often requires the workpieces to be arranged in a particular order or in a sparse manner and positioned by means of a 2D vision system until the grasp task is completed. If the workpieces are scattered or piled up, the positioning difficulty is greatly increased, so that the robot is difficult to accurately grasp the workpieces through programming or offline training. Therefore, the development of the capability of an industrial robot to identify and grasp a stray workpiece is of great significance in improving production efficiency, simplifying processing procedures, and enhancing flexibility and autonomy of the system. In the assembly operation under the visual control, the target detection is generally limited to a two-dimensional space, and the position and the orientation of the target are determined through translation and rotation on a plane, because in the traditional assembly process, shielding is not common, so that the pose recognition task can be effectively executed by adopting the traditional point and line feature matching method. However, as industrial scenes continue to complicate, object pose recognition from a single perspective has not been able to meet more complex operational requirements. In environments with occlusions or stacks of objects, conventional methods often perform poorly. In recent years, the deep learning technology has rapidly developed, which is superior to the conventional method in terms of computational efficiency, resistance to sensor noise and adaptability to complex environments, and thus has become a main development direction of object 6DoF pose estimation. Along with the rapid promotion of the industrial automation level in China, the requirements for the grabbing operation technology which can be implemented in the chaotic and partially-shielded complex environment are gradually increased, the intelligent grabbing operation shows wide application prospects in intelligent logistics, manufacturing assembly and high-risk operation scenes, and meanwhile, the technology also provides great challenges for visual recognition and positioning technology. Research and development of high-precision, high-robustness and rapid image recognition and positioning algorithms to accurately estimate the pose of an object has become the focus of the automated manufacturing field. By establishing a robust visual recognition and positioning system, the efficiency of the robot in performing highly repeatable and complex tasks such as welding, assembly, painting, etc. can be improved. The 6DoF pose estimation relates to the position and rotation of an object in a three-dimensional space, is extremely critical to accurately judging the spatial position and direction of the object and determining the pose of the object in a camera coordinate system, and has great significance to promoting the popularization of industrial automation and relieving the problem of labor shortage. From the above description of the prior art, it can be seen that the challenges of the task of estimating the pose in the intelligent industrial field are extensive and complex, and these challenges directly affect the performance of the algorithm and the feasibility of practical application. The 6DoF pose estimation aims at determining the position and orientation of an object in three-dimensional space, and is important for applications such as automated production, robot operation and quality detection. In a practical industrial environment, the target object may be occluded or overlapped with other objects, which makes it more difficult to accurately estimate the complete pose of the target object. Particularly when multiple similar objects are closely aligned, it is particularly difficult to identify the specific location and orientation of a single object. The objects involved in industrial applications