CN-121999467-A - Multi-level obstacle classification and identification method for unmanned carrier facing metal processing environment

CN121999467ACN 121999467 ACN121999467 ACN 121999467ACN-121999467-A

Abstract

The invention discloses a multi-level obstacle classification and identification method of an unmanned carrier for a metal processing environment, which comprises the steps of obtaining RGB images and depth point cloud data of a binocular camera, mapping the RGB images and the depth point cloud data to the same coordinate system to generate RGB-D characteristic tensors, inputting the characteristic tensors into a double-flow attention network, respectively extracting RGB texture characteristics and depth geometric characteristics, carrying out pixel level fusion through a cross attention module, inputting the fusion characteristics into an adaptive weighting network resistant to high-light interference, identifying a high-light area through calculating gradient covariance, carrying out weight reduction treatment on the high-light area by utilizing a reflection inhibition layer to generate an obstacle candidate area frame, classifying the candidate area characteristics into an identification network, and outputting obstacle category labels and confidence. The invention realizes the accurate identification of the multi-level obstacle in the metal processing environment, and improves the perceived reliability and the operation safety of the unmanned carrier.

Inventors

ZHANG FENG
MING CHENDONG
PENG XUNMIN
HE CHANG
YANG YANG
JIANG YANG

Assignees

长沙爱达智能科技有限公司

Dates

Publication Date: 20260508
Application Date: 20260119

Claims (7)

1. The multi-level obstacle classification and identification method for the unmanned carrier facing the metal processing environment is characterized by comprising the following steps of: S1, acquiring original RGB image data and corresponding depth point cloud data acquired by an unmanned carrier front vision binocular camera, mapping the RGB image data into a depth point cloud coordinate system, and generating an RGB-D four-dimensional characteristic tensor; S2, inputting the RGB-D four-dimensional feature tensor into a feature extraction network based on a double-flow attention mechanism, wherein the feature extraction network comprises an RGB feature extraction branch and a depth geometric feature extraction branch, the RGB feature extraction branch extracts texture and color features by using a convolution layer to obtain RGB features, and the depth geometric feature extraction branch extracts local geometric structural features by using an improved point cloud feature extraction network to obtain depth geometric features; S3, inputting the fusion feature map to an adaptive weighted region generation network for resisting high-light interference, firstly calculating a covariance matrix of brightness gradient and depth gradient of each pixel point in the fusion feature map to identify a high-light region, then introducing a reflection suppression weight layer, generating a spatial attention mask according to the identified high-light region, carrying out weight reduction processing on high-light region features in the fusion feature map, enhancing non-high-light region features, and finally generating a series of candidate region frames containing potential barriers based on the weighted feature map; and S4, inputting the local features corresponding to the candidate region frames into an identification network for classifying the obstacles, outputting the confidence level of the four types of obstacles of each candidate region by the identification network, selecting the category with the highest confidence level as a final identification result, and outputting the specific category labels and the confidence levels of all the obstacles.
2. The method for classifying and identifying the multi-level obstacle of the unmanned carrier for metal working environment according to claim 1, wherein in the step S1, the step of mapping the RGB image data into the depth point cloud coordinate system and generating the RGB-D four-dimensional feature tensor comprises: s101, acquiring an internal reference matrix of a binocular camera Sum-outer parameter matrix Wherein the reference matrix Comprises focal length and optical center coordinate parameters, and the external parameter matrix Comprising a rotation matrix between left and right cameras Translation vector ; S102, for each three-dimensional point in the depth point cloud Wherein Is taken as a point The X-axis coordinate values in the camera coordinate system, Is taken as a point The Y-axis coordinate values in the camera coordinate system, Is taken as a point Z-axis coordinate values in camera coordinate system, calculating points by projective transformation Corresponding pixel coordinates on RGB image plane The calculation formula is as follows: ; Wherein, the Is the first Horizontal pixel coordinates of individual point clouds in the RGB image, Is the first The vertical pixel coordinates of the individual point cloud points in the RGB image, Is an internal reference matrix of the binocular camera; S103, according to the pixel coordinates Extracting RGB color values for corresponding locations from an RGB image Wherein For pixel coordinates The red channel value of the corresponding position, For pixel coordinates The green channel value of the corresponding location, For pixel coordinates Blue channel values at corresponding positions; s104, fusing the three-dimensional space coordinates of each point cloud point with the corresponding RGB color values to generate six-dimensional data points And organized as RGB-D four-dimensional feature tensors Wherein Is a characteristic tensor Is provided with a plurality of grooves, wherein the grooves are arranged in the height dimension of (1), Is a characteristic tensor Is defined by the width dimension of the (c) in the (c), Is a characteristic tensor And (2) the number of channels , Representing real number set, the first three channels respectively store X coordinate, Y coordinate and Z coordinate information, and the last three channels respectively store 、、 Color information.
3. The method for classifying and identifying the multi-level obstacle of the unmanned carrier for metal working environment according to claim 2, wherein in the step S2, the step of extracting the local geometric feature by the deep geometric feature extraction branch by using the improved point cloud feature extraction network comprises the following steps: S201, from the RGB-D four-dimensional feature tensor The first three channels are extracted to form depth coordinate tensor Wherein Representing pixel location Corresponding three-dimensional space coordinates; S202, tensor of the depth coordinate Reorganization into point cloud data format Wherein , For the total number of point clouds, Is the first Three-dimensional coordinate vectors of the individual points; s203, the improved point cloud feature extraction network adopts a hierarchical sampling and feature aggregation architecture and comprises two scale levels, wherein each scale level comprises a sampling layer, a grouping layer and a feature extraction layer; s204, at a first scale level, using a furthest point sampling algorithm to obtain point cloud data Middle sampling A center point in which Forming a first layer center point set For each center point Radius of circumference Searching neighborhood points in the meter range to form the first Local point set For each local point set Extracting local features by using a three-layer multi-layer perceptron, and aggregating the local features through a maximum pooling operation to obtain first-layer features ; S205, at the second scale level, collecting from the central point of the first layer Middle sampling Center points forming a second layer of center point sets For each second layer center point, at radius Searching neighborhood points in the meter range, extracting local features by using a three-layer multi-layer perceptron for each local point set, and obtaining second-layer features by maximum pooling ; S206, upsampling and fusing the features extracted from the two scale levels through a feature propagation layer, and using a distance-based interpolation method to perform second-layer features Upsampling to the first layer center point position, and the first layer features Finally, the fused characteristics are up-sampled to the original point cloud resolution, and the local geometric structure characteristics of each point are obtained through the two layers of multi-layer perceptron; S207, recombining local geometric structure features of the point cloud format back to the image format to obtain depth geometric feature tensor Where 128 is the number of channels of the depth geometry.
4. The method for classifying and identifying the multi-level obstacle of the unmanned carrier for metal working environment according to claim 3, wherein in the step S2, the step of fusing the RGB features and the depth geometric features by the cross attention module comprises: S211, extracting branches through RGB features to obtain RGB-D four-dimensional feature tensors The RGB feature extraction branch comprises four convolution blocks, each convolution block comprises a convolution layer, a batch normalization layer BN and a ReLU activation function, and finally the RGB feature tensor is output Wherein , 256 Is the channel number of the RGB feature; s212, tensor of the depth geometric features through downsampling operation Adjusting to the same spatial resolution as the RGB features, using bilinear interpolation From resolution Downsampling to The number of channels is then adjusted to 256 by the convolution layer to obtain an adjusted depth geometry ; S213, tensor of the RGB features Generating query vectors through a linear transformation layer By a convolution layer Realization of the adjusted depth geometry Generating key vectors through two independent linear transformation layers Sum vector All through convolution layers Implementation of query vector Key vector Sum vector Respectively remolding into a two-dimensional matrix form; S214, calculating a cross attention weight matrix The calculation formula is as follows: ; Wherein, the In order to cross the attention weighting matrix, Is a key vector matrix 256 Is the dimension of the key vector, i.e. the number of channels, Is that A function; s215, based on the cross attention weight matrix Vector of values And carrying out weighted summation to generate attention output characteristics, wherein the calculation formula is as follows: ; Wherein the method comprises the steps of Outputting a feature matrix for attention; s216, outputting attention to the feature Remodelling back to image format By residual connection with the original RGB feature tensor Fusing to generate a final fused feature map 。
5. The method for classifying and identifying the multi-level obstacle of the unmanned carrier for the metal processing environment according to claim 4, wherein in the step S3, a covariance matrix of a brightness gradient and a depth gradient of each pixel point in the fusion feature map is calculated, and the step of identifying the highlight region comprises: S301, from the RGB-D four-dimensional feature tensor RGB color information is extracted from the image data, and each pixel position is calculated Luminance value of (2) ; S302, regarding the brightness value Computing luminance gradients using Sobel operator ; S303, from the RGB-D four-dimensional feature tensor Extracting the third channel of depth coordinate information to obtain depth value tensor Wherein Representing pixel location Z-axis coordinate value, corresponding to depth value of point in camera coordinate system, tensor of depth value Computing depth gradients using Sobel operators ; S304 for each pixel position Defining pixel positions Centered square neighborhood window In the neighborhood window Internal calculation of luminance gradient And depth gradient Covariance value of (2) ; S305, according to the covariance value And brightness gradient Judging the highlight area, wherein the judging conditions are as follows: If it is And is also provided with The pixel position is then determined Marked as highlight areas; Wherein the method comprises the steps of As a threshold value of the covariance, Is a brightness gradient threshold; Generating a highlight region mask When the pixel is positioned When the highlight region judgment condition is satisfied Otherwise 。
6. The method for classifying and identifying the multi-level obstacle of the unmanned carrier for metal processing environment according to claim 5, wherein in the step S3, a spatial attention mask is generated according to the identified highlight region, the highlight region features in the fusion feature map are subjected to weight reduction processing, and the non-highlight region features are enhanced, the method comprises the following steps: S311 masking the highlight region Downsampling to a fused feature map by a max pooling operation Is obtained, the resolution of the down-sampled highlight region mask ; S312, high light mask based on downsampling Generating a reflection suppression weight tensor The calculation formula is as follows: ; Wherein, the For pixel position A reflection suppressing weight value at the position, Is an enhanced weighting coefficient for non-highlight regions, Is the weight-reducing coefficient of the high light area, For downsampled pixel locations A highlight region mask value at; S313, tensor of the reflection inhibition weight Expanding to and merging feature maps The same dimension, the expanded weight tensor And the fusion characteristic diagram Element-by-element multiplication is performed to generate a weighted feature map ; S314, based on the weighted feature map Constructing a region candidate network RPN for generating a candidate region frame, and acquiring a classification score of the candidate region frame and a boundary frame regression parameter; S315, in the feature map Is set at each position of (1) Definition of the definition Anchor frames of different dimensions and aspect ratios, the dimensions being A pixel with an aspect ratio of Judging whether the obstacle is contained or not according to the classification score for each anchor frame, and if the foreground score is greater than the threshold value 0.7, reserving the anchor frame; s316, applying non-maximum suppression NMS to all reserved candidate area frames, and reserving the top 2000 candidate area frames with highest scores after NMS; s317 four-component group is adopted for each candidate region frame Representation of wherein Is the horizontal coordinate of the upper left corner of the candidate region frame in the feature map coordinate system, Is the vertical coordinate of the upper left corner of the candidate region frame in the feature map coordinate system, For the width of the candidate region box, Is the height of the candidate region box.
7. The method for classifying and identifying the multi-level obstacle of the unmanned carrier vehicle for the metal working environment according to claim 6, wherein in the step S4, the step of inputting the local features corresponding to the candidate region frames to the fine-grained identification network for classifying the obstacle comprises the steps of: s401 for the first Candidate region frame From weighted feature maps Extracting local feature of corresponding region, uniformly adjusting candidate regions with different sizes into local feature tensor with fixed size of 7×7×256 by bilinear interpolation ; S402, tensor of local characteristics of 7 multiplied by 256 Flattened into one-dimensional feature vectors Flattening the characteristic vector Input into a full-connection classification network which comprises three full-connection layers and output Unnormalized scores for individual category obstacles ; S404, pairing The unnormalized scores of the individual classes of obstacles are applied to a Softmax function to generate Confidence vector for obstacle-like Wherein Is the first The confidence of the obstacle-like feature, Is the total number of obstacle categories; s405, selecting the category with the highest confidence as the first category Final recognition results of the candidate regions, if the confidence of the final recognition results is less than 0.5, the first step The candidate regions are marked as background or unknown obstacle and discarded; s406, outputting an obstacle recognition result list for all candidate areas, wherein each result comprises the coordinates of a candidate area frame Category labels, confidence levels, and integrity Maintaining confidence vectors 。

Description

Multi-level obstacle classification and identification method for unmanned carrier facing metal processing environment Technical Field The invention relates to the technical field of obstacle recognition, in particular to a multi-level obstacle classification recognition method of an unmanned carrier facing a metal processing environment. Background With the rapid development of intelligent manufacturing technology, unmanned vehicles (AGVs) are increasingly used in metal processing workshops. The metal processing environment has the characteristics of strong illumination change, complex metal surface reflection, diversified obstacle types and the like, and extremely high requirements are provided for the environment sensing and obstacle recognition capability of the AGV. The accurate identification and classification of obstacles is one of the key technologies for ensuring the safe and efficient operation of the AGV, and is directly related to production safety and operation efficiency. Currently, AGV obstacle recognition techniques rely primarily on schemes of lidar, vision sensors, or a combination of both. Conventional lidar-based methods, while capable of obtaining accurate range information, present challenges in metal working environments. Specular reflection of the laser beam is caused by high light reflection on the surface of the metal workpiece, so that ranging data is distorted or lost, and accurate detection of obstacles is seriously affected. Meanwhile, the laser radar can only provide geometric information, is difficult to distinguish obstacles with different materials and types, and cannot meet the requirement of fine classification. The vision-based obstacle recognition method can acquire rich texture and color information, but has obvious defects under the strong illumination condition of the metal processing environment. The highlight areas of the metal surface may form overexposed spots in the RGB image, resulting in feature extraction failure, while the dark areas may lose important detailed information. The existing RGB-D fusion method combines color images and depth information, but a simple splicing or weighting strategy is often adopted in a feature fusion stage, and correlation between RGB features and depth features cannot be fully considered, so that the fusion effect is poor. In addition, the barriers in the metal processing environment are characterized by multiple layers and multiple dimensions, and comprise static barriers such as metal plates and workpieces on the ground, dynamic barriers such as forklifts and other AGVs, and overhead barriers such as hanging slings and pipelines. Most of the existing methods adopt single-scale feature extraction, so that features of obstacles with different scales are difficult to capture effectively, and the existing methods are poor in small target detection and long-distance recognition. Therefore, development of an obstacle recognition method for realizing multi-level precise classification by effectively suppressing high light interference according to the characteristics of metal processing environment is needed. Disclosure of Invention In view of the above, the invention provides a multi-level obstacle classification and identification method for an unmanned carrier facing a metal processing environment, which aims to realize the precise identification and classification of the multi-level obstacle in the metal processing environment and improve the perceived reliability and the operation safety of the unmanned carrier in a complex industrial environment by constructing a highlight region self-adaptive suppression mechanism, a multi-scale point cloud characteristic extraction network and an RGB-depth cross attention fusion module. The invention provides a multi-level obstacle classification and identification method of an unmanned carrier facing a metal processing environment, which comprises the following steps: S1, acquiring original RGB image data and corresponding depth point cloud data acquired by an unmanned carrier front vision binocular camera, mapping the RGB image data into a depth point cloud coordinate system, and generating an RGB-D four-dimensional characteristic tensor; S2, inputting the RGB-D four-dimensional feature tensor into a feature extraction network based on a double-flow attention mechanism, wherein the feature extraction network comprises an RGB feature extraction branch and a depth geometric feature extraction branch, the RGB feature extraction branch extracts texture and color features by using a convolution layer to obtain RGB features, and the depth geometric feature extraction branch extracts local geometric structural features by using an improved point cloud feature extraction network to obtain depth geometric features; S3, inputting the fusion feature map to an adaptive weighted region generation network for resisting high-light interference, firstly calculating a covariance matrix of brightness gradi