CN-122023979-A - Laser radar and camera target level semantic fusion method and system based on back projection

CN122023979ACN 122023979 ACN122023979 ACN 122023979ACN-122023979-A

Abstract

The invention relates to the technical field of robot perception and automatic driving, and discloses a laser radar and camera target level semantic fusion method and system based on back projection, wherein the method comprises the steps of constructing a laser radar-camera calibration model based on environmental arc characteristics, extracting image arc characteristics and laser radar intersection point coordinates, optimizing, and solving a projection matrix between sensors; designing a multi-dimensional semantic segmentation algorithm based on back projection, obtaining a segmentation region through back projection of a target boundary box, and combining ground point cloud filtering, distance self-adaptive clustering, regional point cloud purification and regional growth to realize unified segmentation of 2D/3D laser point clouds; the method can improve semantic segmentation precision (mIoU on SEMANTICKITTI data set reaches 77.1 percent and exceeds the prior art by 1.4 percent) under the condition of lacking dense marking data, sparse point cloud or distance change, reduce background interference and computational load, and is suitable for scenes such as mobile robots, automatic driving and the like.

Inventors

WANG CHAO
YUAN XINGYU
WANG SHUTING
XIONG TIFAN
ZHANG YUBO
CHEN HAO
LIU JIE

Assignees

宁波华锐机器人科技有限公司

Dates

Publication Date: 20260512
Application Date: 20251223

Claims (9)

1. A laser radar and camera target level semantic fusion method based on back projection is characterized by comprising the following steps: s1, laser radar-camera calibration based on environmental arc characteristics: S11, extracting line features and arc features in the image, wherein the line features are obtained by adopting a Contrast Guide Line Segment Detection (CGLSD) algorithm, and the arc features are obtained by adopting an arc support line segment detector (ASLD) algorithm to extract elliptic contours from the denoising gray level image; s12, optimizing the target intersection point coordinates under the laser radar coordinate system according to the laser radar angle resolution The polar coordinates of the closest point of the edge are adjusted to Wherein The angular resolution represented by the laser beam closest to the edge, The distance value returned by the laser beam is limited to the angle error Within the range; S13, constructing and solving a projection matrix, namely solving an initial projection matrix through linear least squares and Singular Value Decomposition (SVD) based on point-line constraint, and performing nonlinear optimization by using a Levenberg-Marquardt algorithm to obtain a projection matrix between the laser radar and the camera ; S2, multi-dimensional laser point cloud semantic segmentation based on back projection: S21, back projection segmentation, namely utilizing the projection matrix obtained in the step S13 Back projecting a target boundary frame in a camera image to a laser radar coordinate system to obtain an approximate region of a target point cloud; S22, filtering the ground point cloud, namely removing the ground point cloud by adopting a simple morphological filter (SMRF) to obtain a refined point cloud if the input is a 3D point cloud, and directly reserving the approximate area point cloud in the step S21 if the input is a 2D point cloud; s23, distance self-adaptive clustering, namely, distance between a laser radar and a target By adapting the radius DBSCAN algorithm of (2), wherein Satisfy the following requirements , Is constant and is ; S24, purifying regional point cloud, namely filtering background noise through similar point deletion (removing small clusters with the similarity exceeding 50 percent) and enlarged frame deletion (removing clusters with the laser points with the number exceeding 5 percent of the same category before enlargement after enlargement of the boundary frame); s25, region growth, namely expanding growth in the global point cloud by taking the maximum point cloud cluster obtained in the step S24 as a seed, and recovering the target point cloud intercepted by the camera field of view; S3, target-level semantic fusion, namely associating the semantic segmentation point cloud obtained in the step S2 with target semantic information of the camera image, and outputting a unified target-level semantic fusion result.
2. The method for semantic fusion between a laser radar and a camera target level based on back projection according to claim 1, wherein in step S11, the actual environmental target corresponding to the arc feature is a circular landmark or a sphere, the image is an ellipse, and the equation is passed through the matrix The representation is: ; Secondary type Satisfy the following requirements Wherein Respectively representing pixel coordinates where the pixel point is located in the image.
3. The method of claim 1, wherein in step S13, the projection matrix is used for fusion of laser radar and camera target level semantics The construction of (3) satisfies the following conditions: ; Wherein the method comprises the steps of , , ; For a 3 x 3 rotation matrix, For a 3 x 1 translation vector, As an internal reference matrix of the camera, For the depth value of a point in the camera coordinate system, For the image pixel coordinates, Is the coordinates of a point in the laser radar coordinate system, Is the first of the matrix Columns.
4. The method for semantic fusion between a lidar and a camera target based on back projection according to claim 1, wherein in step S13, the objective function of the nonlinear optimization is: ; Wherein the method comprises the steps of In order to determine the number of data sets, Is the first An oval matrix corresponding to the group data, Is the intersection point coordinate of the laser radar, The calibration matrix and the transposition thereof are used for the laser radar and the camera.
5. The method for fusion of laser radar and camera target level semantics based on back projection as claimed in claim 1, wherein in step S21, the mapping relation of back projection satisfies ; Wherein the method comprises the steps of For a central projection mapping of the camera optical center to the target point, For truncated mapping of the laser scan plane to the projection line, For the target point in the camera coordinate system, Is the corresponding point under the laser radar coordinate system.
6. The method of claim 1, wherein in step S23, for 2D point cloud, the distance from the center of the laser radar to each point is based on Selection adaptation For 3D point cloud, selecting based directly on the distance between points 。
7. The method for fusing laser radar and camera target level semantics based on back projection of claim 1 is characterized in that in step S24, rules of deleting similar points are that all clusters are traversed, clusters with fewer points are removed if similarity between the clusters exceeds 50%, and clusters with smaller cluster occupation ratio are removed if the clusters have a containing relation.
8. The method of claim 1, wherein in step S25, the neighborhood criterion for region growing is that if the Euclidean distance between the candidate point and the seed point is smaller than that in step S23 And classifying the candidate points into target clusters.
9. The system for realizing the target-level semantic fusion method of the laser radar and the camera based on the back projection is characterized by comprising a processor and a program stored in a storage medium readable by the processor; the processor, when executing the program, implements the steps of the method of any of claims 1-8, and the system further comprises: the sensor module comprises a laser radar (2D/3D) and a camera and is used for acquiring environmental point cloud and image data; the feature extraction module is used for executing the line feature and arc feature extraction of the step S11; the calibration module is used for executing coordinate optimization and projection matrix solving in the steps S12-S13; the semantic segmentation module is used for executing segmentation and purification of the steps S21-S25; and the fusion module is used for executing the association and output of the target-level semantic information in the step S3.

Description

Laser radar and camera target level semantic fusion method and system based on back projection Technical Field The invention relates to the technical field of robot perception and automatic driving, in particular to a laser radar and camera target level semantic fusion method and system based on back projection. Background In the field of mobile robots and autopilot, semantic fusion is one of core technologies for improving the environmental awareness. Lidar (Lidar) may provide high-precision depth information but lack texture features, and cameras may provide rich color and texture information but depth measurement is subject to light interference. The fusion of the two can be complemented insufficiently, however, the prior art has the following key problems: The calibration accuracy and the environmental adaptability are poor, the traditional laser radar-camera calibration depends on artificial targets such as checkerboard, specific polygons and the like, data are required to be acquired at multiple positions, the operation is complicated, accumulated errors are easy to generate, the non-target calibration method depends on indoor linear characteristics (such as wall surfaces and wall corners), the accuracy is suddenly reduced in an industrial environment or an outdoor scene with sparse characteristics, and the arc-shaped environmental characteristics (such as circular road signs and pipelines) cannot be utilized; The semantic segmentation is sensitive to the distance and the point cloud sparsity, namely, a large number of background points are easily introduced in the traditional projection method in the existing segmentation algorithm, so that the calculation efficiency is low, a neural network-based method (such as PointPainting, cylinder D) needs large-scale labeling data, the point cloud sparsity is realized along with the increase of the target distance, and the segmentation precision is obviously reduced (such as mIoU is reduced to below 60% when the distance exceeds 50 m); The method lacks a unified multi-dimensional point cloud segmentation scheme that 2D laser radar point clouds are only distributed on a single plane, and outline features are absent, wherein 3D laser radar point clouds are overlapped by a plurality of layers of 2D point clouds, the difference of data structures of the two is large, the existing algorithm is designed aiming at a single dimension, and unified semantic segmentation of the 2D/3D point clouds cannot be realized; The semantic segmentation method based on deep learning requires a large amount of computing resources and long processing time, and is difficult to meet the application scene with high real-time requirement; Therefore, a high-precision semantic fusion method which can adapt to complex environment calibration, reduce the influence of distance and point cloud sparseness and support multidimensional point cloud is needed. Disclosure of Invention The invention provides a laser radar and camera target level semantic fusion method and system based on back projection, which promote solving of the problems in the background technology. The invention provides the following technical scheme: in order to achieve the purpose, the invention adopts the following technical scheme that the laser radar and camera target level semantic fusion method based on back projection comprises the following steps: Laser radar-camera calibration based on arc characteristics, multi-dimensional point cloud semantic segmentation based on back projection and target level semantic fusion; optionally, the lidar-camera calibration based on the environmental arc characteristics includes: the method utilizes an arc-shaped target in the environment, comprising a round road sign and a sphere, and establishes a space corresponding relation between the laser radar and a camera, and comprises the following specific steps: Step S11, extracting image features: Line feature extraction, namely, using a Contrast Guide Line Segment Detection (CGLSD) algorithm, taking edge contrast as a guide, enhancing line segment continuity, filtering noise, and extracting an object edge linear equation (since a plurality of interference lines can appear when fitting pixel line segments in an image, the line feature extraction can be realized by fitting a linear slope threshold value, such as ); Arc feature extraction, namely detecting arc support line segments from a denoising gray level diagram by adopting an improved LSD algorithm, and screening out the number of support pointsAnd span angleIs a target ellipse, which is expressed mathematically as: ; Correspondence matrix form: Wherein the method comprises the steps of Respectively representing pixel coordinates where the pixel point is located in the image,Representing coefficient parameters obtained by converting an elliptic equation into a quadratic form; step S12, optimizing the intersection point coordinates of the laser radar: when the laser radar scans the targ