CN-121236484-B - Multi-mode depth perception and grabbing system based on transparent object

CN121236484BCN 121236484 BCN121236484 BCN 121236484BCN-121236484-B

Abstract

The invention discloses a multi-mode depth perception and grabbing system based on a transparent object, which relates to the field of robot operation and comprises a multi-spectrum perception module, a depth correction module, a grabbing gesture generation module and a control module. The invention comprehensively acquires the visual information and the thermal radiation information of the transparent object by combining two perception modes of an RGB-D image and a Thermal Imaging (TIR) image, and detects error sources of an RGB-D camera and a TIR camera by systematic error analysis of a depth map. The system adopts an encoder-decoder model as a depth correction core, the model extracts complementary features of an RGB-D image and a TIR image through a mode exclusive encoder, and utilizes a feature fusion module to finish feature alignment and integration, so that the depth estimation precision on a transparent surface is effectively improved. Meanwhile, the super-parameter optimization is carried out on the depth correction model by adopting a Bayesian optimization method, and the problems of over-fitting and under-fitting are effectively avoided by the system through super-parameter optimization and model fitting treatment, so that the robustness of the depth correction model and the transparent object grabbing precision are further improved.

Inventors

HUANG XIN

Assignees

广东镭铭洋智能装备有限公司

Dates

Publication Date: 20260508
Application Date: 20251016

Claims (9)

1. Multimode depth perception and grabbing system based on transparent object, characterized by comprising: the multispectral sensing module is used for capturing an RGB-D image and a TIR image of the transparent object, carrying out space alignment on the RGB-D image and the TIR image through hand-eye calibration, and respectively carrying out feature extraction on the RGB-D image and the TIR image to obtain an RGB-D feature map and a TIR feature map; The depth correction module is used for fine-tuning the depth correction model, processing the RGB-D characteristic map and the TIR characteristic map by using the encoder-decoder, and repairing the depth map of the transparent object; The fine-tuning depth correction model specifically comprises: Performing Bayesian optimization on the depth correction model, randomly selecting a super-parameter combination for training, and obtaining an initial training result based on a verification set; Constructing a proxy model by using Bayes, estimating expected output of an objective function under different super-parameter combinations, obtaining corresponding confidence, selecting the super-parameter combinations for training by acquisition functions based on the proxy model by Bayes optimization, obtaining an advanced training result based on a verification set, feeding back the advanced training result to the proxy model again, gradually converging to obtain excellent super-parameter combinations, and applying the excellent super-parameter combinations to the depth correction model; the acquisition function is used for comprehensively considering expected output and confidence coefficient of the super-parameter combinations, and specifically comprises the steps of calculating to obtain expected output of each super-parameter combination, performing difference with target output obtained by target function output to obtain expected difference, and simultaneously coupling the expected difference of each super-parameter combination with the confidence coefficient to obtain expected estimated value of each super-parameter combination, wherein the acquisition function selects the super-parameter combination with the maximum expected estimated value; The grabbing gesture generating module is used for inputting a depth map of the transparent object, outputting a six-degree-of-freedom grabbing gesture of the transparent object, and transmitting the six-degree-of-freedom grabbing gesture to the control module; The control module is used for obtaining a robot grabbing path based on six-degree-of-freedom grabbing pose, controlling a robot end effector to execute grabbing actions, and adjusting grabbing strategies based on real-time feedback.
2. The multi-modal depth perception and capture system based on transparent objects of claim 1, wherein the obtaining the RGB-D feature map and the TIR feature map comprises the following steps: integrating an RGB-D camera and a TIR camera, wherein the RGB-D camera is used for capturing an RGB-D image of a transparent object, and the RGB-D image comprises color information and depth information of the transparent object; the TIR camera is used for capturing a TIR image of the transparent object, wherein the TIR image comprises thermal radiation information of the transparent object; The relative positions of the RGB-D camera and the TIR camera are respectively obtained through a calibration plate, the coordinate systems of the RGB-D camera and the TIR camera are aligned by hand eye calibration, the RGB-D image and the TIR image are registered based on the hand eye calibration result, and the TIR image is subjected to delay compensation in the registration process; Carrying out multi-level feature extraction on the RGB-D image to obtain RGB-D features of the RGB-D image, wherein the RGB-D features comprise color features, edge features, angle features and shape features of the transparent object, and an RGB-D feature map is formed; And extracting temperature characteristics in the TIR image, including the surface temperature distribution characteristics and the temperature gradient characteristics of the transparent object, and forming a TIR characteristic graph.
3. The multi-modal depth perception and capture system based on transparent objects of claim 2 wherein the performing delay compensation on the TIR image during the registration process comprises: The method comprises the steps of registering by matching characteristic points in an RGB-D image and a TIR image, removing noise and artifacts introduced in the registering process after registering, matching the characteristic points in the RGB-D image and the TIR image by using a characteristic descriptor, removing wrong matching points by using a random sampling consistency algorithm, and estimating a transformation matrix by using a least square method to realize the space alignment of the RGB-D image and the TIR image; in the delay compensation process, the load parameters of the computing unit are fetched from the database, and the GPU thread number, the block size and the memory bandwidth are regulated and controlled; and simultaneously, regulating and controlling the number of CPU threads based on the number of CPU cores, and performing multithreading processing.
4. The multi-modal depth perception and capture system of claim 3 wherein the removing noise and artifacts introduced during registration comprises: Respectively acquiring image noise of the registered RGB-D image and the TIR image, setting the size of a sliding window, gradually moving the sliding window on the registered RGB-D image and the TIR image, calculating the local noise standard deviation of the registered RGB-D image and the TIR image, and taking the local noise standard deviation of the registered RGB-D image and the TIR image as an estimated value of noise intensity; The local noise standard deviations of the registered RGB-D image and the TIR image are respectively input into a preset lookup table in a database, the lookup table comprises an RGB-D standard deviation lookup table and a TIR standard deviation lookup table, the Gaussian filter standard deviations respectively corresponding to the local noise standard deviations of the registered RGB-D image and the TIR image are obtained through lookup, and the Gaussian filter standard deviations are input into the Gaussian filter for application.
5. The multi-modal depth perception and capture system based on transparent objects of claim 1, wherein the applying the excellent super-parameter combination to the depth correction model further comprises performing a balanced fitting process on the excellent super-parameter combination, and the method specifically comprises: the balance fitting treatment is to eliminate over fitting and under fitting in the excellent super-parameter combination; applying the excellent super-parameter combination to the depth correction model, and performing model training again to obtain a verification training result, wherein the verification training result comprises training loss and verification loss; If the training loss descending proportion exceeds a preset descending proportion threshold value and the verification loss ascends, judging that the model is in an overfitting state; If the descending proportion of the training loss and the verification loss exceeds a preset descending proportion threshold value, judging that the model is in an under-fitting state; the overfitting, specific elimination process includes: Extracting a training loss descending proportion, performing difference making with a preset descending proportion threshold value to obtain a training loss descending proportion difference value, inputting the training loss descending proportion difference value into a mapping set of a preset training loss descending proportion difference value-regularization coefficient adjustment value in a database, and performing mapping matching to obtain a regularization coefficient adjustment value; the under fitting, specific elimination process includes: extracting a training loss descending proportion and a verification loss descending proportion, respectively carrying out difference making with a preset descending proportion threshold value to obtain a loss descending proportion difference value and a verification descending proportion difference value, inputting the loss descending proportion difference value into a mapping set of a preset loss descending proportion difference value-network layer number increment value in a database to carry out mapping matching to obtain a network layer number increment value, and inputting the verification descending proportion difference value into a mapping set of a preset verification descending proportion difference value-training layer number increment value in the database to carry out mapping matching to obtain a training layer number increment value.
6. The multi-modal depth perception and capture system based on transparent objects of claim 1 wherein the repairing of the depth map of the transparent object comprises: The encoding of the RGB-D image and the TIR image is extracted through the encoder network, the encoding of the RGB-D image and the TIR image is sent to the decoder, the encoding of the RGB-D image and the TIR image is processed, and a depth map of the transparent object with high resolution is generated through an up-sampling operation, wherein the depth map is used for representing the position information of the transparent object in a three-dimensional space, and each pixel value of the depth map represents the distance from the position to the camera.
7. The multi-mode depth perception and capture system based on transparent objects of claim 1, wherein the output obtains six-degree-of-freedom capture poses of the transparent objects, and the specific processing conditions are as follows: Performing systematic error analysis on the depth map, detecting systematic errors caused by the RGB-D camera and the TIR camera, calibrating an error area, and repairing the depth map by adopting a repairing network based on depth learning; and carrying out feature extraction on the depth map to obtain morphological feature information of the transparent object, generating a plurality of candidate grabbing points based on the morphological feature information of the transparent object, analyzing the pose and the position of the transparent object after the candidate grabbing points are generated, and obtaining the six-degree-of-freedom pose of the transparent object by utilizing a regression network.
8. The multi-mode depth perception and capture system based on transparent objects of claim 1, wherein the six-degree-of-freedom capture pose obtaining a robot capture path and controlling a robot end effector to perform a capture action comprises: after the six-degree-of-freedom grabbing pose of the transparent object is acquired, the robot calculates a grabbing path from the current state to the target object through inverse kinematics, wherein the grabbing path comprises the starting position of an end effector of the robot, the position of the target object and the rotation angle; After path planning is completed, the robot determines the opening and closing positions and the grabbing angles of the clamping jaws and the strength of the clamping jaws according to the shape, the pose and the grabbing points of the object; the control instruction is generated by a robot motion control system and indicates the end effector to accurately move along the planned path to execute grabbing operation.
9. The transparent object-based multi-modal depth perception and capture system of claim 1, wherein the real-time feedback-based adjustment capture strategy comprises: Monitoring whether the grabbing action is deviated or not through a mechanical sensor in the grabbing process, and automatically adjusting the force of the clamping jaw based on the deviation, wherein the method comprises the following steps: Comparing the grabbing force with a preset grabbing force standard interval, if the grabbing force is within the grabbing force standard interval, determining no deviation, if the grabbing force is larger than the grabbing force standard interval upper limit value, determining that the force needs to be reduced, and if the grabbing force is smaller than the grabbing force standard interval lower limit value, determining that the force needs to be increased.

Description

Multi-mode depth perception and grabbing system based on transparent object Technical Field The invention relates to the field of robot operation, in particular to a transparent object-based multi-mode depth perception and grabbing system. Background Transparent objects (e.g., glasses, bottles, etc.) present significant challenges to robotic perception and operation due to their specular and refractive properties. Conventional RGB-D sensors often have difficulty effectively capturing accurate depth information when facing transparent objects or highly reflective surfaces. Reflection from the surface of the transparent object can cause noise to the sensor or loss of depth data, resulting in unreliable gripping and manipulation. There are prior art techniques using TIR (thermal imaging) cameras to identify transparent objects, but RGB and TIR cameras have some alignment errors in space of acquired image and depth data due to the different mounting positions. If there is no precise alignment, subsequent depth restoration and grasp gesture estimation will be affected, resulting in inaccurate final grasp gesture. Especially in robotic grasping processes, precise alignment of sensor data is critical. The prior art is disclosed as patent publication No. CN118700130B, and is a hand-held transparent object pose estimation method and a robot grabbing control method, the method comprises the steps of obtaining RGB and depth images, an RGB-D feature encoder Enco R extracting RGB-D image features, a Deco R1 decoding conventional features, a Deco R2 decoding geometric information and a hand segmentation map, assisting pose estimation, filtering transparent objects and background interference in the depth map by using a pixel multiplication technology, extracting hand depth features by Enco D and Deco D, fusing and stacking the RGB and depth features, and a pose decoding module decoding the fused features to obtain accurate 6D pose information of the transparent objects. The prior art, such as the patent with the bulletin number of CN114619447B, is a grabbing method, a grabbing device and a robot, and relates to the technical field of robots, wherein the method is used for grabbing an object to be grabbed in an object container, the object container comprises an edge object positioned at the edge position of the object container, and the method comprises the steps of determining a target direction according to the appointed direction of a pose coordinate system of the edge object when the object to be grabbed is the edge object; the included angle between the target direction and the designated direction is not more than a preset value, the designated direction is perpendicular to the central axis of the object container and points to the direction deviating from the central axis, the central axis of the object container is perpendicular to the bottom of the object container, and the control clamp takes the target direction as the opening direction to grasp the edge object. Based on the foregoing, it can be seen that in the field of robotic grasping, although background interference is filtered using pixel multiplication techniques, such techniques may have difficulty completely eliminating these incomplete depth information when processing transparent objects. And the method mainly depends on the combination of the RGB-D image and the depth map, and can better extract the conventional characteristics and geometric information, but a large number of interference factors (such as reflection, shadow and the like) can appear in the image under a complex background or shielding environment, so that the model can not accurately estimate the pose of an object and the robustness of the system is affected. Disclosure of Invention Aiming at the defects of the prior art, the invention provides a multi-mode depth sensing and grabbing system based on a transparent object, which is realized by the following technical scheme that the multi-mode depth sensing and grabbing system based on the transparent object comprises: the multispectral sensing module is used for capturing an RGB-D image and a TIR image of the transparent object, carrying out space alignment on the RGB-D image and the TIR image through hand-eye calibration, and respectively carrying out feature extraction on the RGB-D image and the TIR image to obtain an RGB-D feature map and a TIR feature map. And the depth correction module is used for fine-tuning the depth correction model, processing the RGB-D characteristic map and the TIR characteristic map by using the encoder-decoder, and repairing the depth map of the transparent object. The grabbing gesture generating module is used for inputting the depth map of the transparent object, outputting and obtaining six-degree-of-freedom grabbing gestures of the transparent object, and transmitting the six-degree-of-freedom grabbing gestures to the control module. The control module is used for obtaining a robot grabbing path bas