CN-116551697-B - Robot vision closed-loop dumping method and device based on weak supervision liquid perception

CN116551697BCN 116551697 BCN116551697 BCN 116551697BCN-116551697-B

Abstract

The invention relates to a robot vision closed-loop dumping method and device based on weak supervision liquid perception, wherein the method comprises the following steps of semi-automatically collecting a water pouring scene image with an image-level label; the method comprises the steps of constructing a weak supervision liquid perception model to generate a category activation diagram, extracting a two-dimensional liquid region by using the category activation diagram generated by the trained weak supervision liquid perception model, estimating the pose and the size of a source container and a target container, reconstructing a three-dimensional liquid track based on the two-dimensional liquid region and the pose and the size, forming feedback according to the pose of the target container and the three-dimensional liquid track, controlling a robot to adjust the position of the source container, and pouring water into the target container, so that visual closed-loop control is realized. Compared with the prior art, the liquid pouring device can accurately control liquid not to spill out of the target container in the pouring process.

Inventors

FU YANWEI
XUE XIANGYANG
LIN HAITAO

Assignees

复旦大学

Dates

Publication Date: 20260508
Application Date: 20230616

Claims (8)

1. The robot vision closed-loop dumping method based on weak supervision liquid perception is characterized by comprising the following steps of: semi-automatically collecting a water pouring scene image with an image-level label; the class activation diagram generated by the trained weak supervision liquid sensing model is utilized to extract a two-dimensional liquid area; performing pose and size estimation of the source container and the target container; Reconstructing a three-dimensional liquid trajectory based on the two-dimensional liquid region, and the pose and size estimates; Forming feedback according to the pose of the target container and the three-dimensional liquid track, controlling the robot to adjust the position of the source container and pouring water into the target container, so as to realize visual closed-loop control; the weak supervision liquid perception model comprises a plurality of transducer modules used as main network extraction characteristics and a plurality of layers of perceptron layers used for outputting image categories, wherein the transducer modules comprise a multi-head attention block used for outputting a plurality of attention force diagrams; the liquid monitoring process of the weakly supervised liquid perception model comprises the following steps: 1) Dividing the collected water pouring scene data into a plurality of image blocks with set sizes, flattening the image blocks and linearly projecting the image blocks into a plurality of marks; 2) The tag is sent to a transducer module to obtain a corresponding attention map, and the expression of the attention map is: Wherein the query Keys (C) Sum value Is formed by projecting a plurality of attention blocks, , In order to pay attention to the number of force attempts, And The characteristic dimensions of the key and the value respectively, Height and width of the image; 3) All attention diagrams are spliced and sent to a feedforward layer, and a feature diagram is obtained; 4) Weight proportion determined according to final category of each feature in feature map Weighted generation to obtain final class activation diagram Wherein, the method comprises the steps of, Are numbers in the feature map.
2. The robot vision closed-loop pouring method based on weak supervision liquid perception according to claim 1, wherein the pouring scene image comprises positive sample data and negative sample data, and the collecting process is as follows: pouring liquid from a source container with liquid to a target container by executing pouring action, storing RGB color image frames of liquid outflow time, and simultaneously storing RGB color image frames of a scene with water flow opened by a faucet on a washing table; negative sample data, namely, only a source container without liquid is used, and simultaneously, an RGB color image frame that a water faucet on a washing table is closed and no water flow scene is also stored.
3. The robot vision closed-loop dumping method based on weak supervision liquid perception according to claim 1, wherein the weak supervision liquid perception model is optimized and trained by a gradient descent algorithm through a back propagation algorithm, the loss functions comprise classification loss and feature contrast loss functions, the feature contrast loss functions comprise positive contrast loss and contrast loss which are respectively used for gathering features of similar areas and separating foreground features from background features, and the positive contrast loss is used for classifying the features of similar areas And negative contrast loss The expressions of (2) are respectively: in the formula, For pixel position The corresponding local foreground features are used to determine, For pixel position A corresponding local background feature is provided to the user, And The number of foreground and background local features, respectively.
4. The robot vision closed-loop dumping method based on weak supervision liquid perception according to claim 1, wherein the class activation map generated by the trained weak supervision liquid perception model is used for extracting two-dimensional liquid areas, specifically, according to the class activation map generated by the trained weak supervision liquid perception model, a threshold value is selected for binarizing the activation map to obtain potential two-dimensional liquid areas, and the potential two-dimensional liquid areas are masked from the potential two-dimensional liquid areas through morphological frameworks And extracting the thinned two-dimensional liquid region.
5. The method for robot vision closed-loop dumping based on weak supervision liquid perception according to claim 1, wherein the pose and size estimation of the source container and the target container is specifically: given a source container, a target container and RGB-D images, firstly detecting a mask of the container in the images through a 2D detection network, and then sending the mask into a class-level object pose and size estimation network for pose and size estimation to obtain the three-degree-of-freedom position, three-degree-of-freedom rotation and three-dimensional size of the container.
6. The robot vision closed-loop dumping method based on weak supervision liquid perception according to claim 1, wherein the reconstructing three-dimensional liquid track is specifically: Pose of given source container Plane equation aligned with the container mouth and gravity orientation: Wherein, the Is a point on the plane of the sheet, And Respectively are Perpendicular and parallel to gravity vector Components of (2); Given camera internal parameters The ray back projected from the camera center is noted as: in the formula, Is the ray origin; is a scalar, defined as any location along a ray; is the direction vector of the ray; the 3D intersection of the final ray and the plane is: Obtaining a mask of a potential two-dimensional liquid region Pixel coordinates of (2) Then, corresponding liquid three-dimensional point cloud is estimated 。
7. The robot vision closed-loop dumping method based on weak supervision liquid perception according to claim 1, wherein vision closed-loop control is performed based on the pose of a target container and the calculation error of the three-dimensional liquid track tail end, specifically: According to the pose and the size of the obtained source container and the obtained target container, calculating a liquid pouring initial point to pour the liquid, after the liquid flows out, calculating an error according to the pose of the target container and the tail end of the three-dimensional track, and then sending the error to a controller to calculate the motion quantity corresponding to the robot, so that real-time closed-loop tracking is completed, and the liquid can flow into the target container without being spilled.
8. A robot vision closed loop dumping device based on weak supervision liquid perception, characterized in that the method of any one of claims 1 to 7 is adopted, and the device comprises: a pouring scene data collection part for semi-automatically collecting pouring scene images with image-level labels; the weak supervision liquid sensing part is used for sensing a potential liquid two-dimensional area from the water pouring scene image; An object pose and size estimating section for estimating a six-degree-of-freedom pose and a three-dimensional size of the source container and the target container; a three-dimensional liquid track reconstruction part, which is used for reconstructing the detected liquid two-dimensional area into a three-dimensional point cloud by combining the pose and the size; And the visual closed-loop control part is used for forming feedback according to the three-dimensional liquid and the pose of the target container, controlling the robot to adjust the position of the source container and pouring water into the target container.

Description

Robot vision closed-loop dumping method and device based on weak supervision liquid perception Technical Field The invention relates to the technical field of robot grabbing control, in particular to a robot vision closed-loop dumping method and device based on weak supervision liquid sensing. Background Robots have made tremendous progress in recent years in the ability to grasp and manipulate rigid objects. Manipulation of non-rigid objects (such as liquids, cloths, ropes, etc.) remains a significant challenge, however, because the inherent flexibility of these objects makes them imperceptible due to their lack of fixed geometry. Among other things, sensing liquids from images is extremely challenging because the sensing of liquids relies on refraction of light as the primary visual cue, and depth sensors have difficulty making depth measurements of them. Therefore, the liquid sensing capability of the robot is improved, so that the robot can better engage in cooking, beverage service, plant watering and other service tasks, and the robot is particularly important. Pouring water is a highly relevant task in the field of liquid handling, which involves a number of significant challenges. These challenges include (1) the need for large amounts of pixel annotation data to promote effective training, (2) the lack of prominent visual cues in the image, and (3) the lack of a reliable depth measurement system to accomplish accurate pouring of liquids. To address the challenges described above, researchers have explored the use of additional sensors to generate authentic tags, including: 1) The pixel annotation of the real liquid area is obtained with a thermal imager and hot water, however, this method is very time consuming and relies on additional equipment. 2) Relying on audio rather than visual signals to help the robots tip, limits their role in noisy environments. In addition, the existing methods utilize estimated pose and size of the container to calculate the initial pour point without any liquid perception as feedback, thus not ensuring that the liquid is accurately poured into the target container. 3) The self-supervision method can convert the colored liquid into transparent liquid without manual labeling so as to detect the liquid area. However, this approach is limited by the necessity to use colored liquids and statically placed transparent containers, which limits its applicability in more general environments. Disclosure of Invention The invention aims to overcome the defects of the prior art and provide a robot vision closed-loop pouring method and device based on weak supervision liquid perception, which can accurately control liquid not to spill out of a target container in the pouring process. The aim of the invention can be achieved by the following technical scheme: according to a first aspect of the present invention, there is provided a robot vision closed loop dumping method based on weakly supervised liquid perception, the method comprising: semi-automatically collecting a water pouring scene image with an image-level label; the class activation diagram generated by the trained weak supervision liquid sensing model is utilized to extract a two-dimensional liquid area; performing pose and size estimation of the source container and the target container; Reconstructing a three-dimensional liquid trajectory based on the two-dimensional liquid region, and the pose and size estimates; And forming feedback according to the pose of the target container and the three-dimensional liquid track, controlling the robot to adjust the position of the source container and pouring water into the target container, so as to realize visual closed-loop control. Preferably, the water pouring scene image comprises positive sample data and negative sample data, and the collecting process is as follows: pouring liquid from a source container with liquid to a target container by executing pouring action, storing RGB color image frames of liquid outflow time, and simultaneously storing RGB color image frames of a scene with water flow opened by a faucet on a washing table; Negative sample data, the collection process is similar to that of positive sample data, but only a source container without liquid is used, and meanwhile, RGB color image frames without water flow scene are stored when a water faucet on a washing table is closed. Preferably, the weakly supervised liquid perception model comprises a plurality of transducer modules used as main network extraction features and a multi-layer perceptron MLP layer for outputting image categories, wherein the transducer modules comprise a multi-head attention block for outputting a plurality of attention attempts Preferably, the liquid monitoring process of the weakly supervised liquid perception model comprises: 1) Dividing the collected water pouring scene data into a plurality of image blocks with set sizes, flattening the image blocks and linearly projecting the