CN-122008203-A - Multi-mode feature fusion-based weld joint visual detection and robot coordinate conversion method and system

CN122008203ACN 122008203 ACN122008203 ACN 122008203ACN-122008203-A

Abstract

The invention relates to the technical field of industrial robot vision control, and discloses a welding seam vision detection and robot coordinate conversion method and system based on multi-mode feature fusion, which can realize high-precision automatic detection and positioning of welding seams in a complex industrial environment and accurately convert detection results into executable space coordinates of a robot. In the scheme of the invention, images and teaching coordinates are acquired during modeling, a template image block is cut, multi-mode features are pre-calculated, relevant data are stored after physical distances are calculated, indexes are updated, when welding seams are detected, teaching coordinate cutting reasoning image block detection is called, offset cutting and model size rollback mechanisms are assisted, welding seams are determined through multi-mode fusion similarity screening, pixel coordinates are calculated, finally depth values are obtained, the three-dimensional coordinates of a camera are restored through de-distortion of the pixel coordinates, and robot base coordinates are obtained through homogeneous transformation and output. The invention is suitable for complex industrial environments such as multiple welding seams, multiple interference, high reflection and the like, and can obviously improve the automation level and the welding precision of the welding robot.

Inventors

HUANG ZHOU
HU LIANG
LI CONGCONG
ZHOU YICHENG
LI GENG
LI XIAODONG

Assignees

四川长虹虹微科技有限公司

Dates

Publication Date: 20260512
Application Date: 20260130

Claims (12)

1. The welding line visual detection and robot coordinate conversion method based on multi-mode feature fusion is characterized in that, The method comprises the following steps: s1, a welding line modeling stage: S11, acquiring a reference image and a corresponding depth image, and receiving the center point coordinates of the welding seam taught manually; s12, cutting out a template block from a reference image by taking a teaching point as a center; S13, performing multi-mode feature pre-calculation including gray level map conversion, convolutional neural network feature vector extraction and ORB feature descriptor extraction on a template block; S14, calculating the physical distance from the reference line to the first welding point according to the depth value of the corresponding position of the first welding point and the position of the preset reference line; s15, saving the template block and modeling metadata, and updating a template index file corresponding to the product type; s2, a welding line detection stage: s21, receiving an operation period image, and acquiring corresponding teaching point coordinates according to the product type and the weld index; s22, cutting out an inference block from an input image, inputting a neural network detection model to detect a weld joint, and cutting along the left and right offset of the coordinates of a teaching point and re-detecting when a target is not detected; S23, starting a dynamic model size rollback mechanism, and automatically switching to model secondary detection with larger input size when a model with a default size fails to detect a target; S24, when a plurality of candidate targets are detected, calculating multi-mode fusion similarity between each candidate target and a template block, and selecting a candidate target with the highest similarity as a welding seam detection result; s25, calculating welding key point pixel coordinates according to a welding seam detection result; S3, coordinate conversion stage: s31, obtaining depth values of positions of welding key points; S32, performing de-distortion treatment on the pixel coordinates, and recovering three-dimensional points of a camera coordinate system by combining the depth values; S33, converting the camera coordinates into robot base coordinates through the Eye-to-Hand homogeneous transformation matrix and outputting the robot base coordinates.
2. The method for visual inspection of weld joints and coordinate transformation of robots based on multi-modal feature fusion according to claim 1, wherein in step S12, a square image block with a preset size is cut out from a reference image as a template block according to the following formula, and when the cut area exceeds the image boundary, black pixels are used for filling: ; Wherein, the For the reference image to be a reference image, For the purpose of teaching the coordinates of the points, For the size of the template to be the same, To trim the resulting stencil image block.
3. The method for converting welding line vision detection and robot coordinates based on multi-modal feature fusion according to claim 1, wherein in step S13, the gray scale conversion is to convert the template image from RGB color space to gray scale image, and the conversion formula is: ; Wherein, the Pixel values of 3 color channels of red, green and blue respectively, The converted gray value; The convolutional neural network feature vector extraction is to extract depth feature vectors of template images by using a pre-trained convolutional neural network, and normalize the feature vectors; the ORB feature descriptor extraction is to extract key points and descriptors of the template image by adopting an ORB algorithm.
4. The method for visual inspection and robot coordinate transformation of a weld joint based on multi-modal feature fusion as set forth in claim 1, wherein in step S14, the physical distance from the reference line to the first weld joint point is calculated by: ; Wherein, the As the distance of the pixel is a function of the pixel, As the depth value of the object, For the scaling factor to be calibrated, For the width of the image to be the same, Is the actual physical distance.
5. The method for visual inspection of weld joints and coordinate transformation of robot based on multi-modal feature fusion as set forth in claim 1, wherein in step S22, cutting out an inference block from the input image includes: and cutting an inference block with a preset size from the input image by taking the x coordinate of a pixel of the center point of the input image as the ordinate of the cutting image block and the y coordinate of a pixel of the teaching point as the abscissa of the cutting image block.
6. The method for welding seam visual detection and robot coordinate transformation based on multi-modal feature fusion according to claim 1, wherein in step S24, the calculating the multi-modal fusion similarity between each candidate target and the template tile includes: calculating the structural similarity between the candidate target and the template block by adopting an SSIM algorithm: ; Wherein SSIM (x, y) represents the structural similarity between the candidate target and the template tile; And The mean of the candidate object and the template tile respectively, And The variances of the candidate target and template tiles respectively, Is covariance; And Is a constant; Extracting feature vectors of candidate targets by using a convolutional neural network which is the same as the modeling stage, and calculating cosine similarity of the feature vectors of the template block as depth feature similarity: ; Wherein, the For the feature vector of the template tile, S cnn is depth feature similarity; ORB feature descriptors of candidate targets are extracted and matched with ORB descriptors of template blocks, and local feature similarity is calculated according to the number of matching point pairs and matching distance: ; Wherein, the To match the average hamming distance of the point pairs, In order to match the number of pairs of points, And The number of key points of the candidate target and the template block respectively; S orb is local feature similarity; And carrying out weighted fusion on the structural similarity, the depth feature similarity and the local feature similarity: ; Wherein, the 、、 Is a weight coefficient, satisfies 。
7. The method for visual inspection of a weld joint and coordinate transformation of a robot based on multi-modal feature fusion as set forth in claim 1, wherein in step S25, the welding key points include a left welding point and a right welding point, and the calculation method of the welding key point coordinates is as follows: ; Wherein, the In order to detect the frame left edge coordinates, In order to detect the width of the frame, And For a predetermined transversal scaling factor, As an ordinate of the reference edge, Is the vertical offset.
8. The method for visual inspection and robot coordinate transformation of a weld joint based on multi-modal feature fusion as set forth in claim 1, wherein in step S31, when the depth value of the welding key point position is missing, the depth repair is performed by adopting an adaptive surface fitting method, comprising: Collecting effective depth value points in a preset search range around a target point; Filtering abnormal values by using a median absolute deviation method, and keeping the abnormal values to meet Data points for the condition, wherein, Is the first The depth value of the individual points is set, For the median of all depth values, Is the median absolute deviation; preferential use of elliptic surface models Performing least square fitting, and backing to a circular surface model when the fitting fails Wherein, the method comprises the steps of, As the parameters of the model to be fitted, For the pixel coordinates, Is a depth value; substituting the coordinates of the target point into a fitting curved surface equation to calculate a predicted depth value.
9. The method for visual inspection and robot coordinate transformation of weld joint based on multi-modal feature fusion as set forth in claim 1, wherein in step S31, when the depth value of the welding key point position is missing, estimating the depth value by using a neighborhood statistical method: Determining a search direction according to whether the key point is a left point or a right point; Taking certain pixels from left and right rows of the key points to form a sampling window, and calculating the proportion of effective points in the window; When the effective point proportion meets the threshold value condition, the depth value of the target point is estimated by using Gaussian weighted spline interpolation, and the weight function is that Wherein, the method comprises the steps of, Is the first The distance of the sample points to the target point, Is Gaussian kernel width parameter; And searching adjacent rows or taking non-zero depth values of preset sequence numbers along the searching direction when the condition is not met.
10. The method for visual inspection of weld joints and coordinate transformation of robots based on multi-modal feature fusion as set forth in claim 1, wherein in step S32, performing de-distortion processing on pixel coordinates and restoring three-dimensional points of a camera coordinate system in combination with depth values, comprises: taking an original distortion image directly output by a camera as an input image; Using pre-calibrated camera reference matrices And distortion coefficient Calling a de-distortion function to convert the original pixel coordinates into normalized camera coordinates : ; Wherein, the Is the original pixel coordinates; Then, recovering the three-dimensional point under the camera coordinate system according to the normalized coordinate and the depth value: ; Wherein, the Is a depth value.
11. The method for visual inspection of weld joints and coordinate transformation of robot based on multi-modal feature fusion as set forth in claim 10, wherein in step S33, transforming camera coordinates into robot base coordinates through Eye-to-Hand homogeneous transformation matrix includes: And finally, converting three-dimensional points under a camera coordinate system into a robot base coordinate system by using a homogeneous transformation matrix obtained by Eye-to-Hand calibration: ; Wherein, the For 4 x 4 homogeneous transformation matrices, including rotation matrices Translation vector : ; P base is the robot base coordinates.
12. The welding line visual detection and robot coordinate conversion system based on multi-mode feature fusion is characterized in that, The system comprises a modeling module, a detection module and a coordinate conversion module; the modeling module includes: the template clipping unit is used for clipping a template block from the reference image according to the teaching point coordinates; The feature extraction unit is used for carrying out gray level conversion, convolutional neural network feature extraction and ORB feature extraction on the template image blocks; The index management unit is used for maintaining a storage structure and a version index of the template file; The detection module comprises: The image preprocessing unit is used for finishing image cutting, size adjustment and normalization; The neural network reasoning unit is used for executing target detection and outputting candidate targets; The model rollback unit is used for automatically switching the alternative model when the default model does not detect the target; The similarity matching unit is used for calculating the multi-mode fusion similarity between each candidate target and the template block when a plurality of candidate targets exist, and selecting the candidate target with the highest similarity as a detection result; the key point calculating unit is used for calculating welding key point coordinates according to the position of the detection frame; The coordinate conversion module includes: The depth restoration unit is used for restoring the missing depth value by adopting a surface fitting or neighborhood statistical method; the distortion correction unit is used for converting the pixel coordinates into normalized coordinates by using the calibration parameters; And the coordinate transformation unit is used for transforming the camera coordinates into the robot base coordinates through the Eye-to-Hand homogeneous transformation matrix.

Description

Multi-mode feature fusion-based weld joint visual detection and robot coordinate conversion method and system Technical Field The invention relates to the technical field of industrial robot vision control, in particular to a welding line vision detection and robot coordinate conversion method and system based on multi-mode feature fusion. Background With the continuous development of industrial automation technology, welding robots are increasingly widely applied in manufacturing industry, and accurate detection and positioning of welding seams are key links for guaranteeing automatic welding quality. However, existing weld detection and localization techniques still face many challenges in practical industrial scenarios: (1) The existing weld joint detection system is poor in environmental adaptability, and depends on manually set image processing rules, such as traditional methods of edge detection, morphological processing and the like. The method has poor adaptability to the change of the type of the welding line, is easy to be interfered by environmental factors such as illumination change, metal reflection and the like, has lower detection precision on irregular welding lines, and is difficult to meet the requirement of high-precision welding. (2) The difficulty in depth information acquisition is that the surface of the metal workpiece has high light reflection characteristics, which results in a large number of invalid areas, namely depth holes, generated by the depth camera during acquisition. Most of the existing depth map restoration algorithms are designed aiming at common scenes, so that the problem of special depth deletion of the metal surface is difficult to effectively treat, and three-dimensional space information of a welding line area cannot be accurately restored. (3) The problem of multi-target weld joint identification is that in an actual welding scene, a plurality of weld joints with similar forms often exist in a single image view. When the existing detection system faces the situation, an effective target distinguishing mechanism is lacked, a specific target welding line to be welded is difficult to accurately identify, and misjudgment is easy to generate. (4) The problem of positioning accuracy is that the pixel coordinates detected in the image are converted into the base coordinates executable by the robot, and a plurality of steps such as camera distortion correction, hand-eye calibration and the like are needed. When the existing system processes the conversion link, the correct order of distortion correction is often ignored, or the problem of improper application of calibration parameters exists, so that obvious deviation of robot positioning occurs. (5) The problem of the adaptability of the detection model is that the detection model with fixed input size has larger difference of detection effect on weld targets with different scales. Small-size welds are prone to missed detection, while large-size welds may be incomplete to detect, lacking an adaptive multi-scale detection mechanism. Therefore, a weld detecting and positioning method capable of overcoming the above technical defects is needed to improve the automation level and welding accuracy of the industrial welding robot. Disclosure of Invention The technical problem to be solved by the invention is to provide a welding seam visual detection and robot coordinate conversion method and system based on multi-mode feature fusion, which can realize high-precision automatic detection and positioning of welding seams in a complex industrial environment and accurately convert detection results into executable space coordinates of a robot. The technical scheme adopted for solving the technical problems is as follows: In one aspect, the invention provides a welding line visual detection and robot coordinate conversion method based on multi-mode feature fusion, which comprises the following steps: s1, a welding line modeling stage: S11, acquiring a reference image and a corresponding depth image, and receiving the center point coordinates of the welding seam taught manually; s12, cutting out a template block from a reference image by taking a teaching point as a center; S13, performing multi-mode feature pre-calculation including gray level map conversion, convolutional neural network feature vector extraction and ORB feature descriptor extraction on a template block; S14, calculating the physical distance from the reference line to the first welding point according to the depth value of the corresponding position of the first welding point and the position of the preset reference line; s15, saving the template block and modeling metadata, and updating a template index file corresponding to the product type; s2, a welding line detection stage: s21, receiving an operation period image, and acquiring corresponding teaching point coordinates according to the product type and the weld index; s22, cutting out an inference block fr