CN-121989252-A - Touch point cloud complement method based on smart hand topological structure and multilevel feature coding

CN121989252ACN 121989252 ACN121989252 ACN 121989252ACN-121989252-A

Abstract

A touch point cloud complement method based on smart hand topology structure and multi-level feature coding belongs to the technical field of robot control. The method aims to solve the problems of improving the operation capability and the perception precision of the robot in a non-visual scene. The method comprises the steps of enabling a mechanical arm to carry a smart hand to move to a target position, enabling the smart hand to be gradually closed until an electronic skin sensor records a contact signal, collecting touch point cloud data of the target, constructing a touch point cloud complement model based on a smart hand topological structure and multi-level feature coding, designing a layered loss function of the touch point cloud complement model based on the smart hand topological structure and the multi-level feature coding, and enabling core measurement to be composed of coarse level point cloud loss and dense point cloud loss, wherein a chamfer distance CD is used as a geometric similarity index for measuring space deviation between a predicted point cloud and a real point cloud. The invention realizes high-quality complement to complex objects, thereby improving the operation capability and the perception precision of the robot in a non-visual scene.

Inventors

ZHANG ZHAN
ZHANG ZIQI
ZHANG MINGYUAN
SONG ZIMING
ZUO DECHENG
FENG WEI
FENG YI
SHU YANJUN
FAN LIUFENG
ZHAO QIRONG
ZHOU PEIJIE

Assignees

哈尔滨工业大学

Dates

Publication Date: 20260508
Application Date: 20260330

Claims (5)

1.A touch point cloud complement method based on smart hand topology structure and multi-level feature coding is characterized by comprising the following steps: S1, a mechanical arm moves to a target position with a dexterous hand, the dexterous hand is gradually closed until an electronic skin sensor records a contact signal, tactile point cloud data of a target are collected, the tactile point cloud data comprise pressure data of a contact point and three-dimensional coordinate data of the contact point, and a training set and a testing set are constructed; S2, constructing a touch point cloud completion model based on a smart hand topological structure and multi-level feature coding, wherein the touch point cloud completion model comprises a pressure feature coding module, a local touch attention coding module, a topology perception map feature extraction module, a time sequence aggregation module and a point cloud reconstruction decoding module which are sequentially connected, wherein the local touch attention coding module is also connected with the point cloud reconstruction decoding module; S3, designing a layered loss function of a touch point cloud completion model based on smart hand topological structure and multi-level feature coding, wherein the layered loss function consists of two parts of coarse level point cloud loss and dense point cloud loss, and the core measurement adopts a chamfer distance CD as a geometric similarity index to measure the space deviation between a predicted point cloud and a real point cloud; And S4, training the touch point cloud completion model constructed in the step S2 and based on the smart hand topological structure and the multi-level feature codes to obtain a trained touch point cloud completion model based on the smart hand topological structure and the multi-level feature codes, wherein the trained touch point cloud completion model is used for touch point cloud completion based on the smart hand topological structure and the multi-level feature codes.
2. The method for touch point cloud completion based on smart hand topology and multi-level feature coding of claim 1, wherein the specific implementation method of step S1 comprises the following steps: s1.1, setting three knuckles on each finger of the dexterous hand, wherein each knuckle is provided with an electronic skin array, the acquisition process is fixed sampling time, and pressure data of contacts and pose information of a mechanical arm and the dexterous hand are acquired; let the pressure data of the contact point of the ith frame and ith piece of electronic skin be Is common to A plurality of contacts, wherein H is the height of the electronic skin matrix, and W is the width of the electronic skin matrix; calculating to obtain local three-dimensional coordinates of the contact point of the ith frame and the ith piece of electronic skin through the physical layout of the electronic skin The method comprises the following steps: ; Wherein, the Respectively representing the x-axis coordinate, the y-axis coordinate and the z-axis coordinate of the contact points of the ith frame and the ith piece of electronic skin, Is the total number of electronic skins; let t be the joint angle vector of seven joints of the mechanical arm The corresponding joint length vector is , wherein, And Respectively representing the angle and the length of the kth joint; The pose of the robot arm tip (smart hand base coordinate system) in the global coordinate system is: ; Wherein, the Representing a forward kinematic function of the mechanical arm; Let the fixed mounting pose of the ith electronic skin in the smart hand base coordinate system be Then its pose matrix in the global coordinate system at time t The expression of (2) is: ; s1.2, transforming the local point cloud into a global coordinate system: ; Wherein, the Representing global coordinates of the t frame and the i piece of electronic skin; s1.3. In fixed sample frames In, combining the three-dimensional coordinates of the touch array and the touch points of all electronic skins to form a touch point cloud : ; Wherein, the Is the total sampled frame.
3. The method for supplementing the touch point cloud based on the smart hand topology and the multi-level feature coding according to claim 2, wherein the data ratio of the training set and the test set constructed in the step S1 is 8:2.
4. The method for touch point cloud completion based on smart hand topology and multi-level feature encoding of claim 3, wherein the specific implementation method of step S2 comprises the following steps: S2.1, constructing a pressure characteristic coding module which comprises a first layer convolution layer, a second layer convolution layer, a batch normalization layer and an activation layer, inputting the pressure data of the contact obtained in the step S1 into the pressure characteristic coding module, outputting the extracted local touch texture and pressure gradient characteristics, wherein the expression is as follows: ; Wherein, the Representing the extracted local tactile texture and pressure gradient characteristics of the t-th frame and i-th piece of electronic skin, The function is encoded for the pressure characteristic, Is a rectifying linear unit function, is a commonly used activation function in an artificial neural network, In the form of a convolution layer, For the t frame, the pressure reading of the contact of the ith piece of electronic skin; S2.2, constructing a local haptic attention encoding module, wherein the local haptic attention encoding module comprises a first layer of linear transformation layer, a second layer of point attention layer, a third layer of linear transformation layer and a fourth layer of maximum pooling layer, three-dimensional coordinate data of a contact obtained in the step S1 and extracted local haptic texture and pressure gradient characteristics obtained in the step S2.1 are input into the local haptic attention encoding module, and the local haptic attention encoding characteristics are obtained, wherein the expression is as follows: ; Wherein, the Representing the local tactile attention coding features of the t-th frame, i-th piece of electronic skin, In order for the layer of attention to be paid, Is a model parameter of the attention layer; S2.3, constructing a topology perception graph feature extraction module which comprises a first layer graph convolution layer, a second layer graph convolution layer, a third layer maximum pooling layer and a fourth layer linear transformation layer, inputting the local haptic attention coding features obtained in the step S2.2 into the topology perception graph feature extraction module, and carrying out feature propagation and update between nodes based on topology priori through a graph neural network to obtain the features of the t frame and the i piece of electronic skin Wherein In order to illustrate the neural network, Is a multi-layer topological structure of knuckle, finger and palm, Model parameters of the graph neural network; And then all the electronic skin features are sent to a linear transformation layer to obtain the global feature of the t frame, wherein the expression is as follows: ; Wherein, the Model parameters for the linear layer; S2.4, constructing a time aggregation module, and performing time dimension feature aggregation on the global features obtained in the step S2.3 by adopting a long-short-term memory network LSTM to obtain time aggregation features , wherein, As a global feature of time frames 1 to t, Model parameters for LSTM; S2.5, constructing a point cloud reconstruction decoding module, wherein the point cloud reconstruction decoding module comprises a first layer of linear transformation layer, a second layer of cross attention layer, a third layer of convolution layer, a fourth layer of convolution layer and a fifth layer of convolution layer, the time aggregation characteristics obtained in the step S2.4 and the characteristics obtained in the step S2.2 after the local tactile attention coding are input into the point cloud reconstruction decoding module, and firstly, coarse point cloud is obtained through a multi-layer perceptron MLP Wherein Model parameters for the MLP layer; Then, through a cross attention module, taking each point of the thick point cloud as a query, taking the local tactile attention coding feature of each electronic skin as a key and a value, calculating attention scores between the query points and the features, and carrying out weighted summation on the features so as to obtain weighted tactile context representations corresponding to each thick point cloud point, namely TC; Then performing characteristic splicing to obtain , wherein, For the spliced features, grid is a 2D grid, Is a characteristic splicing function; Finally, dense point cloud is obtained 。
5. The method for touch point cloud completion based on smart hand topology and multi-level feature encoding of claim 4, wherein the specific implementation method of step S3 comprises the following steps: S3.1. Set coarse Point cloud loss Wherein Is a coarse point cloud output by the network, As a real point cloud of points, Is the chamfer distance; s3.2. Set dense Point cloud loss Wherein Representing a high-resolution point cloud finally output by the network; S3.3, constructing a layered loss function of a touch point cloud completion model based on smart hand topological structure and multi-level feature coding , wherein, The adjustment coefficient for the loss function.

Description

Touch point cloud complement method based on smart hand topological structure and multilevel feature coding Technical Field The invention belongs to the technical field of robot control, and particularly relates to a touch point cloud complement method based on a smart hand topological structure and multi-level feature coding. Background The point cloud complement is an important task in three-dimensional reconstruction and environmental understanding, and aims to generate a three-dimensional form with complete structure and rich details by reconstructing partial missing or sparse point cloud. The technology has important application value in the fields of intelligent manufacturing, robot operation, automatic driving, virtual reality and the like. For example, in a robot grabbing or assembling scene, the sensor can only acquire the surface information of the blocked or partially contacted object, which results in incomplete shape of the object, and further affects the subsequent recognition, positioning and operation precision. Therefore, how to realize high-fidelity three-dimensional point cloud complement under limited observation conditions becomes a key problem of attention in academia and industry in recent years. The existing point cloud completion methods can be divided into two types, namely a non-contact completion method based on vision or laser radar and a contact completion method based on touch perception. The former usually uses a camera, structured light or laser radar sensor to acquire sparse or dense point cloud data, and deduces a missing region through a deep neural network or a shape prior model, while the latter uses an embedded touch sensor or electronic skin to acquire multi-mode touch signals such as contact force, pressure distribution, temperature change and the like, and rebuilds the local or whole geometric structure of an object from limited contact fragments. The first type of technology generally collects surface point cloud data of an environment or an object through a monocular or multiview vision system, a depth camera, a laser radar or other equipment, and then uses a depth learning network to infer the complete geometric structure of the object. The method has good adaptability and data coverage under a macroscopic scene, but has some inherent problems that the visual point cloud is greatly influenced by illumination, visual angle and shielding, the visual point cloud cannot accurately model an object with a complex shape and transparent or reflective material, and meanwhile, the visual point cloud is extremely sparse or missing in a detailed structure or a shielded area of the object, so that the accuracy of a complement result is insufficient. In addition, the visual complement model is usually reconstructed only by relying on spatial geometrical distribution, and it is difficult to effectively capture physical characteristics of an object at a contact level, such as softness, friction coefficient, texture morphology and the like. In contrast, the point cloud complement method based on tactile sensation has been attracting attention in recent years. The touch sensor can acquire multidimensional information such as pressure distribution, shearing force, temperature change and the like of an object at a local contact point by directly contacting the surface of the object, so that surface attribute and microstructure information which cannot be perceived by vision are provided. In particular in robotic dexterous hands or electronic skin systems, the multi-point array haptic unit can acquire high resolution local surface information during dynamic contact, which provides new possibilities for reconstructing three-dimensional geometry from the contact perspective. The point cloud completion method based on touch sense generally constructs a touch sense point cloud by performing space mapping on the acquired touch sense points, and then performs shape reasoning and characterization completion by using a depth network. Compared with a visual scheme, the touch complement has higher contact precision and surface detail fidelity, and is particularly suitable for objects which are shielded, invisible or special in material. However, the existing haptic point cloud complement technology still has obvious shortcomings. Most current research focuses on overall feature extraction of haptic signals, modeling the distribution law of the haptic point cloud at the global level using convolutional networks or transformer models. Such schemes promote some degree of global consistency of the complement, but ignore structural characteristics of the haptic data itself. Specifically, the touch point cloud not only contains geometric-physical information such as position, pressure and the like, but also presents a strong spatial local correlation and a topological constraint relationship between fingers. In the contact process of the dexterous hand, natural spatial coupling and mechanical transm