CN-121982116-A - Multi-mode joint calibration method, system and vehicle

CN121982116ACN 121982116 ACN121982116 ACN 121982116ACN-121982116-A

Abstract

The application discloses a multi-mode joint calibration method, a system and a vehicle, which belong to the technical field of automatic driving multi-mode perception and comprise the steps of extracting point cloud characteristics in point cloud data by utilizing pre-trained double-branch characteristics, extracting image characteristics in image data by utilizing the pre-trained double-branch characteristics, extracting point cloud characteristics in the point cloud data, converting the point cloud characteristics into point cloud aerial view characteristics, and converting the point cloud characteristics into image aerial view characteristics; the method comprises the steps of carrying out feature fusion on the point cloud aerial view features and the image aerial view features to obtain fused aerial view features, encoding and decoding the fused aerial view features, and determining external parameter between the laser radar and the camera. According to the scheme, the point cloud and the image data are unified into the aerial view features through the double-branch feature extraction, and the joint calibration of multiple cameras and different types of laser radars is supported through feature fusion and the end-to-end flow of encoding and decoding, so that the calibration robustness and the calibration precision under a complex scene can be effectively improved.

Inventors

XU CHENGJUN

Assignees

浙江凌艾未来科技有限公司
浙江零跑科技股份有限公司

Dates

Publication Date: 20260505
Application Date: 20260120

Claims (10)

1. The multi-mode joint calibration method is characterized by comprising the following steps of: Acquiring point cloud data of a laser radar and image data of at least one path of camera; Extracting point cloud features in the point cloud data by utilizing point cloud branches, converting the point cloud features into point cloud aerial view features, extracting image features in the image data by utilizing image branches, and converting the image features into image aerial view features, wherein the point cloud branches and the image branches are two branches in a pre-trained double-branch feature extraction network model; performing feature fusion on the point cloud aerial view features and the image aerial view features to obtain fused aerial view features; and encoding and decoding the fused aerial view features, and determining external parameter between the laser radar and the camera, wherein the external parameter is used for carrying out joint calibration on the laser radar and the camera.
2. The multi-mode joint calibration method according to claim 1, wherein the extracting the point cloud features in the point cloud data by using the point cloud branches and converting the point cloud features into the point cloud aerial view features comprises: Processing the point cloud data by using a sparse convolution network to generate three-dimensional voxel characteristics; And expanding the three-dimensional voxel characteristic along a height dimension to obtain the point cloud aerial view characteristic.
3. The multi-modal joint calibration method according to claim 1, wherein the extracting image features in the image data and converting the image features into image bird's eye view features by using image branches comprises: Performing feature extraction on the image data by adopting a neural network of a transducer architecture to obtain image features; projecting the image features to a world coordinate system defined by a coordinate system of the laser radar by using a preset initial external parameter matrix and internal parameters of the camera to obtain projected aerial view features; and pooling the projected aerial view features to obtain image aerial view features.
4. A multi-modal joint calibration method as claimed in claim 3 wherein the step of generating the initial extrinsic matrix comprises: Acquiring a true value extrinsic matrix between the laser radar and the camera; generating a random error matrix, wherein the random error matrix comprises a random Euler angle error and a random displacement error; and calculating the product of the true value extrinsic matrix and the inverse matrix of the random error matrix to obtain the initial extrinsic matrix.
5. The multi-modal joint calibration method according to claim 1, wherein the encoding and decoding the fused aerial view features to determine the parameters of the external parameters between the lidar and the camera includes: performing multi-scale feature extraction and up-sampling on the fused aerial view features to obtain enhanced aerial view features; Carrying out semantic analysis on the enhanced aerial view features by adopting a transducer encoder to obtain semantic features; Based on the semantic features, a parameter of a skin between the lidar and the camera is determined.
6. The multi-modal joint calibration method as claimed in claim 4, wherein the training step of the dual-branch feature extraction network model includes: Collecting a point cloud data sample of the laser radar and an image data sample of the camera; Constructing an initial point cloud branch based on a preset convolution network, constructing an initial image branch based on a neural network of a transducer architecture, and constructing an initial double-branch feature extraction network model based on the initial point cloud branch and the initial image branch; training the initial point cloud branches by using the point cloud data samples to obtain the aerial view characteristics of the intermediate point cloud; training the initial image branches by using the image data sample to obtain an aerial view characteristic of the intermediate image; fusing, encoding and decoding the aerial view characteristic of the intermediate point cloud and the aerial view characteristic of the intermediate image to generate intermediate external parameters; calculating a loss value corresponding to the intermediate external parameter by adopting a preset loss function; and carrying out iterative optimization on the initial double-branch feature extraction network model based on the loss value to obtain a pre-trained double-branch feature extraction network model.
7. The multi-modal joint calibration method according to claim 6, wherein the loss function includes one or more of an attitude error, a displacement error, and a point cloud distance error; Wherein the posing error represents the difference between the predicted quaternion in the intermediate extrinsic parameters and the true quaternion in the true extrinsic parameters; The displacement error represents the difference between a predicted displacement vector in the intermediate extrinsic parameter and a true displacement vector in the true extrinsic parameter; And the point cloud distance error represents the distance deviation between the point cloud converted by the intermediate external parameter and the point cloud data of the laser radar which are originally acquired.
8. The method of claim 1, wherein after the acquiring the point cloud data of the lidar and the image data of the at least one camera, the method comprises: Preprocessing the point cloud data and the image data, updating the point cloud data and the image data, wherein the preprocessing comprises at least one of downsampling the point cloud data, downsampling and de-distorting the image data, and time-stamping the point cloud data and the image data.
9. A multi-modal joint calibration system, comprising: the data acquisition module is used for acquiring point cloud data of the laser radar and image data of at least one path of camera; The double-branch feature extraction module is used for extracting point cloud features in the point cloud data by utilizing point cloud branches and converting the point cloud features into point cloud aerial view features, extracting image features in the image data by utilizing image branches and converting the image features into image aerial view features, wherein the point cloud branches and the image branches are two branches in a pre-trained double-branch feature extraction network model; The feature fusion module is used for carrying out feature fusion on the point cloud aerial view features and the image aerial view features to obtain fused aerial view features; And the calibration module is used for encoding and decoding the fused aerial view characteristics, determining external parameter between the laser radar and the camera, and performing joint calibration on the laser radar and the camera by the external parameter.
10. A vehicle comprising a memory and a processor, the memory having stored thereon a computer program which, when executed by the processor, implements the method of any of claims 1-8.

Description

Multi-mode joint calibration method, system and vehicle Technical Field The application relates to the technical field of automatic driving multi-mode sensing, in particular to a multi-mode joint calibration method, a system and a vehicle. Background In an automatic driving multi-mode sensing system, the cooperative work of a laser radar and a camera depends on high-precision sensor calibration, and the precision of the sensor calibration directly determines the precision of key functions such as obstacle detection, environment sensing and the like of the system. The related multi-sensor calibration method comprises off-line calibration and on-line calibration, wherein the off-line calibration scheme is characterized in that special targets and high-precision automation equipment are required to be arranged on a vehicle production line, and external parameter calibration is completed through a fixed flow. However, due to the single factory calibration environment, the obtained offline parameters are difficult to adapt to complex and changeable road scenes, and when the sensor is replaced, collided or subjected to long-term use, parameter drift occurs, the sensor needs to be returned to the factory for recalibration, so that the robustness of the calibration result is poor. The online calibration scheme is mainly realized through multi-stage calibration, point, line or surface characteristics under a specific environment are selected, and then the external parameter is solved through constructing an optimization function. The scheme depends on environmental information, error accumulation exists, and the robustness of a calibration result is poor under complex scenes such as severe shielding scenes. Disclosure of Invention The multi-mode joint calibration method, the multi-mode joint calibration system and the vehicle are provided, and the problem that the robustness of a calibration result is poor in the multi-sensor calibration method in the related technology is solved. In a first aspect, a multi-mode joint calibration method is provided, including the following steps: Acquiring point cloud data of a laser radar and image data of at least one path of camera; Extracting point cloud features in the point cloud data by utilizing point cloud branches, converting the point cloud features into point cloud aerial view features, extracting image features in the image data by utilizing image branches, and converting the image features into image aerial view features, wherein the point cloud branches and the image branches are two branches in a pre-trained double-branch feature extraction network model; performing feature fusion on the point cloud aerial view features and the image aerial view features to obtain fused aerial view features; and encoding and decoding the fused aerial view features, and determining external parameter between the laser radar and the camera, wherein the external parameter is used for carrying out joint calibration on the laser radar and the camera. In some embodiments, the extracting the point cloud features in the point cloud data using the point cloud branches and converting the point cloud features into the point cloud aerial view features includes: Processing the point cloud data by using a sparse convolution network to generate three-dimensional voxel characteristics; And expanding the three-dimensional voxel characteristic along a height dimension to obtain the point cloud aerial view characteristic. In some embodiments, the extracting image features in the image data and converting to image bird's eye view features using image branches includes: Performing feature extraction on the image data by adopting a neural network of a transducer architecture to obtain image features; projecting the image features to a world coordinate system defined by a coordinate system of the laser radar by using a preset initial external parameter matrix and internal parameters of the camera to obtain projected aerial view features; and pooling the projected aerial view features to obtain image aerial view features. In some embodiments, the generating of the initial extrinsic matrix includes: Acquiring a true value extrinsic matrix between the laser radar and the camera; generating a random error matrix, wherein the random error matrix comprises a random Euler angle error and a random displacement error; and calculating the product of the true value extrinsic matrix and the inverse matrix of the random error matrix to obtain the initial extrinsic matrix. In some embodiments, the encoding and decoding the fused aerial view features, determining the parameters of the external parameters between the lidar and the camera, includes: performing multi-scale feature extraction and up-sampling on the fused aerial view features to obtain enhanced aerial view features; Carrying out semantic analysis on the enhanced aerial view features by adopting a transducer encoder to obtain semantic features; Based on the se