CN-122008324-A - LAG-Net-based mechanical arm grabbing detection method

CN122008324ACN 122008324 ACN122008324 ACN 122008324ACN-122008324-A

Abstract

The invention discloses a LAG-Net based mechanical arm grabbing detection method which comprises the steps of S1, collecting a Cornell data set and a Jacquard data set, preprocessing the data, constructing a training set and a verification set, S2, constructing a LAG-Net grabbing detection model, wherein the model adopts a lightweight U-shaped network architecture, comprises a MambaLite encoder of a selective characteristic enhancement module, a bottleneck network formed by a lightweight residual block, a pixel-level grabbing gating module, a decoder of a width refinement branch and a grabbing detection head for realizing decoupling prediction of grabbing positions, angles and clamping jaw widths, and S3, training the LAG-Net grabbing detection model by utilizing the training set, evaluating the model based on evaluation indexes and obtaining an optimal grabbing detection model. Aiming at the technical problems that the existing grabbing detection method is high in calculation cost and difficult to achieve both detection precision and real-time performance in a complex scene, the invention provides the grabbing detection method with high-efficiency characteristic expression and light-weight design. Compared with the prior art, the method and the device have the advantages that the grabbing detection precision and robustness are effectively improved while the weight of the model is kept, the calculation complexity is reduced, the efficient and stable grabbing detection can be realized in a resource-limited environment, and an efficient and practical solution is provided for the robot grabbing task.

Inventors

TAO YANG
WANG SHUIQING
SHI HONGBO
SONG BING
TAN SHUAI

Assignees

华东理工大学

Dates

Publication Date: 20260512
Application Date: 20260414

Claims (7)

1. A LAG-Net-based mechanical arm grabbing detection method is characterized by comprising the following steps: step 100, collecting a Cornell data set and a Jacquard data set, preprocessing the data, and constructing a training set and a verification set; Step 200, constructing a LAG-Net grabbing detection network model, wherein the LAG-Net comprises an encoder, a bottleneck fusion layer, a decoder and grabbing detection heads, the encoder comprises three stages, each stage is embedded with MambaLite modules for selective feature enhancement, the bottleneck fusion layer consists of LIGHTWEIGHT RESIDUAL BLOCK of three stages of cascade connection, and a decoder output characteristic diagram containing a pixel-level grabbing gating module and a width refinement branch is processed by the grabbing detection heads and then outputs position prediction, angle prediction and width prediction; And 300, training the LAG-Net grabbing detection model by using a training set, and evaluating the model based on an evaluation index to obtain an optimal grabbing detection model.
2. The method for detecting forest fire based on LAG-Net grabbing detection as claimed in claim 1, wherein in step 200, the MambaLite module divides the input feature map X into three complementary subspaces along the channel dimension Global branching Capturing long-range dependencies by axial adaptive pooling in order to scale the complexity of two-dimensional global modeling from Down to The branches are subjected to global average pooling along the height direction and the width direction respectively, space attention weight is generated through a lightweight gating mechanism, detail texture features are extracted from local branches by adopting 5x 5 depth separable convolution, original features are directly reserved by identical branches, and the global branch proportion lambdag of three coding stages is sequentially set to gradually increased values so as to adapt to the gradual feature requirements from shallow texture to deep semantic.
3. The LAG-Net based mechanical arm grabbing detection method of claim 1, wherein in step 200, the LIGHTWEIGHT RESIDUAL BLOCK decouples the standard 3×3 convolution into a depth separable convolution structure, specifically comprising the steps that a depth convolution layer DWConv adopts 3×3 channel-by-channel convolution, a point-by-point convolution layer PWConv adopts 1×1 convolution to complete channel fusion, a BN normalization layer and a ReLU activation function are matched, input is directly added to output through residual connection, three-stage cascade LIGHTWEIGHT RESIDUAL BLOCK performs progressive refining on deep semantic features on the premise that the number of channels is kept to be 4C, and the number of parameters and the calculated amount of LIGHTWEIGHT RESIDUAL BLOCK are reduced by about 8-9 times compared with that of standard residual blocks.
4. The LAG-Net based mechanical arm grabbing detection method of claim 1, wherein in step 200, the GRASPNESS soft gating mechanism generates a pixel-level spatial gating pattern M through a1×1 convolution and Sigmoid activation function, and multiplies the pixel-level spatial gating pattern M by element with a feature pattern Fd output by a decoder to obtain a gated feature pattern f_gated=m ⊗ Fd, the mechanism dynamically suppresses a background interference area, focuses a network on a high probability grabbing area, and finally outputs a grabbing quality pattern Fq, a cosine component pattern Fcos and a sine component pattern Fsin through a grabbing detection head respectively.
5. The LAG-Net based mechanical arm grabbing detection method of claim 1, wherein in step 200, the independent width refining branches are independent of position and angle prediction branches, 3 x 3 depth convolution is adopted to extract spatial features, then channel information is fused through 1x 1 point convolution, a BN normalization layer and a ReLU activation function are matched, a grabbing width map Fw is finally output through a grabbing detection head, and feature coupling interference between width prediction and position/angle tasks is avoided due to independent design of the width branches.
6. The LAG-Net based mechanical arm grabbing detection method of claim 1, wherein in step 300, the LAG-Net grabbing detection model is trained, an optimizer is used AdamW, an initial learning rate is set to be 1 x 10 "3, a cosine annealing strategy is adopted to attenuate to 1 x 10" 5 , 100 epochs are trained, and a batch size is 32.
7. The LAG-Net based mechanical arm grabbing detection method of claim 1, wherein in step 300, the evaluation index is a success rate, and the similarity of the predicted rectangle and the grabbing frame of the real label is greater than 25% and the angle difference is smaller than 30 degrees.

Description

LAG-Net-based mechanical arm grabbing detection method Technical Field The invention relates to the field of robot grabbing, in particular to a LAG-Net-based mechanical arm grabbing detection method. Background With the rapid development of intelligent manufacturing and unmanned operation, the mechanical arm vision grabbing technology has become an important research direction in the robot field. The efficient and accurate grabbing and detecting technology is not only a key technology for realizing automatic sorting, assembling and warehouse logistics automation, but also an important support for promoting industrial intelligent upgrading. The core task of the mechanical arm grabbing detection is to extract the characteristics of a target object from an image acquired by a visual sensor, accurately predict grabbing positions, grabbing angles and opening and closing widths of clamping jaws, and guide the mechanical arm to finish stable grabbing actions. In recent years, a deep learning-based method has made remarkable progress in this field. Typical approaches mostly employ Convolutional Neural Networks (CNNs) or encoder-decoder structures, mapping from images to grabbing parameters is achieved through end-to-end learning. At the same time, some studies have attempted to introduce attention mechanisms, transformations, or State Space Models (SSMs) to enhance global context modeling capabilities, thereby improving accuracy and success rate of grabbing. However, the prior art has significant limitations in practical industrial applications. On one hand, in order to improve the global feature perception capability of objects in complex scenes, some methods rely on global modeling mechanisms such as a transducer or an SSM, but the methods are usually accompanied by a large amount of parameters and high memory consumption, and the computation complexity is increased in a quadratic manner, so that the reasoning delay is obviously increased, and the real-time deployment on an industrial pipeline or an edge embedded system is not facilitated. On the other hand, although the traditional convolution network has higher calculation efficiency, the traditional convolution network is limited by a local receptive field, and the traditional light-weight scheme is in multi-channel pruning or separable convolution, and lacks the capability of realizing global selectivity characteristic enhancement under the condition of not introducing a complex global modeling mechanism. In addition, the mechanical arm grabbing task relates to three dimensions of a position, an angle and a clamping jaw width, the predicting tasks have different feature requirements, and the existing method generally uses a shared feature map to output all predicted quantities at the same time, and an effective task decoupling mechanism is lacked, so that feature interference is easy to cause, and grabbing precision and stability are affected. In terms of jaw width prediction, existing methods lack independent modeling, resulting in "grippable but unstable" situations in high density stacking or object gripping with large shape variances. Disclosure of Invention The invention aims to solve the technical problems that the existing mechanical arm grabbing detection method is high in calculation complexity, difficult to consider both detection precision and real-time performance, and limited in detection performance caused by severe prediction coupling of grabbing positions, angles and clamping jaw widths, and provides a LAG-Net-based mechanical arm grabbing detection method for realizing high-precision grabbing detection under high-efficiency feature expression and lightweight design. The technical proposal adopted by the invention mainly comprises the following steps: step 100, collecting a Cornell data set and a Jacquard data set, preprocessing the data, and constructing a training set and a verification set; Step 200, constructing a LAG-Net grabbing detection network model, wherein the LAG-Net comprises an encoder, a bottleneck fusion layer, a decoder and grabbing detection heads, the encoder comprises three stages, each stage is embedded with MambaLite modules for selective feature enhancement, the bottleneck fusion layer consists of LIGHTWEIGHT RESIDUAL BLOCK of three stages of cascade connection, and a decoder output characteristic diagram containing a pixel-level grabbing gating module and a width refinement branch is processed by the grabbing detection heads and then outputs position prediction, angle prediction and width prediction; And 300, training the LAG-Net grabbing detection model by using a training set, and evaluating the model based on an evaluation index to obtain an optimal grabbing detection model. Preferably, in step 100, a Cornell dataset and a jacquad dataset are collected, a grabbing rectangle five-tuple g= (x, y, h, w, θ) is used for labeling, and data augmentation operations such as random overturn, rotation, color dithering and the l