CN-121995394-A - Monocular vision perception enhancement method based on laser radar

CN121995394ACN 121995394 ACN121995394 ACN 121995394ACN-121995394-A

Abstract

The invention discloses a monocular vision perception enhancement method based on a laser radar, and relates to the technical field of intelligent traffic. The method comprises the steps of collecting vehicle video images through a monocular camera, collecting three-dimensional point cloud data containing diffuse reflection calibration plate features through a laser radar, establishing two-dimensional pixel and three-dimensional world coordinate mapping through internal reference calibration and combined calibration, generating a full-resolution lookup table, realizing real-time acquisition of vehicle three-dimensional coordinates through YOLO target detection, and optimizing and correcting parameter drift through motion trail verification and fixed reference feature iteration. The invention effectively solves the dilemma that the existing single equipment is limited in positioning precision, economy and stability under the pivot scene, and has extremely strong practical use value.

Inventors

HE SHULIN
LIU ZHUQING
HUANG CHENBIN
CHEN QIRONG
ZHENG JIANYING
ZHAO XUWEI
DENG XIAO
CHEN QIAN
YANG TIANYI
CAI ZENGYI
GAO XIAOFEI
ZHANG LIANG

Assignees

华东建筑设计研究院有限公司

Dates

Publication Date: 20260508
Application Date: 20260122

Claims (10)

1. The monocular vision perception enhancement method based on the laser radar is characterized by comprising the following steps of: Step 100, acquiring a vehicle video image of a scene of a comprehensive transportation hub parking garage through a monocular camera, and acquiring three-dimensional point cloud data of the scene through a laser radar, wherein the three-dimensional point cloud data comprises characteristic information of a diffuse reflection calibration plate; step 200, based on the vehicle video image and the three-dimensional point cloud data, performing internal reference calibration and joint calibration of a camera and a laser radar on the monocular camera, and determining optical parameters of the monocular camera and a three-dimensional space relative position relationship between the monocular camera and a laser radar deployment point; Step 300, based on the optical parameter and the three-dimensional space relative position relation, establishing a conversion model from two-dimensional pixel coordinates to three-dimensional world coordinates, and generating a full resolution lookup table; Step 400, extracting a vehicle boundary frame from the vehicle video image by using a YOLO target detection algorithm, and mapping the vehicle boundary frame to obtain three-dimensional space coordinates of the vehicle by using the full resolution lookup table to realize real-time positioning of the vehicle; And 500, carrying out iterative optimization on the optical parameters and the three-dimensional space relative position relation based on the vehicle motion trail consistency check constructed by the vehicle three-dimensional space coordinates and scene fixed reference characteristics, and correcting parameter drift.
2. A monocular vision perception enhancement method based on lidar as claimed in claim 1, wherein the step 100 comprises: Step 110, disposing the laser radar at a non-shielding position behind the monocular camera, and disposing the diffuse reflection calibration plate in the comprehensive transportation hub parking lot library scene; Step 120, controlling the monocular camera and the laser radar to synchronously record a plurality of sets of calibration data with different poses through an ROS-dock script, and changing the poses of the diffuse reflection calibration plate for a plurality of times in the recording process; And 130, collecting vehicle images through the monocular camera in dim light, shielding and multidirectional driving environments in the comprehensive transportation hub parking garage scene, screening the collected vehicle images, marking the screened vehicle images by using a labelimg tool, and constructing a training data set.
3. A monocular vision perception enhancement method based on lidar as claimed in claim 1, wherein the step 200 comprises: Step 210, a clear frame of the calibration image in the calibration data is intercepted, MATLAB is used for processing the clear frame, and an internal reference matrix and a distortion coefficient of the monocular camera are solved, wherein the internal reference matrix and the distortion coefficient jointly form the optical parameter; Step 220, filtering the three-dimensional point cloud data by using CloudCompare, primarily filtering irrelevant point clouds by limiting a coordinate range, and precisely filtering the point clouds related to the diffuse reflection calibration plate fixing equipment by calculating the lowest height of a z-axis, so as to keep the effective point clouds of the diffuse reflection calibration plate; Step 230, performing corner extraction processing on the effective point cloud to obtain three-dimensional corner coordinates of the diffuse reflection calibration plate; And 240, establishing point pair constraint between the three-dimensional angular point coordinates and pixel coordinates of corresponding angular points in the calibration image, inputting the optical parameters, removing distortion of the pixel coordinates, establishing a camera imaging model, reversely pushing out the gesture of the monocular camera based on a perspective projection equation under the point pair constraint condition by solvePnP algorithm through minimizing a reprojection error, obtaining an optimal rotation vector and a translation vector, converting the rotation vector into a three-dimensional rotation matrix through a Rodrigues transformation, combining the three-dimensional rotation matrix and the translation vector to form an extrinsic transformation matrix in a homogeneous form, and forming the three-dimensional space relative position relation by the extrinsic transformation matrix.
4. A method of enhancing monocular vision perception based on lidar according to claim 3, wherein the defined coordinate range is set according to the actual size of the comprehensive transportation hub parking garage scene, and the z-axis minimum height is determined by the actual installation height of the diffuse reflection calibration plate fixing device.
5. A method of enhancing monocular vision perception based on lidar as claimed in claim 3, wherein the step 230 comprises: Step 231, detecting the number of the effective point clouds, if the number of the effective point clouds is less than a preset threshold, outputting abnormal prompt information of insufficient number of the point clouds and terminating subsequent operation, if the number of the effective point clouds reaches or exceeds the preset threshold, executing step 232; step 232, carrying out plane segmentation on the effective point cloud through a random sampling consistency algorithm, and identifying an internal point cloud conforming to the plane characteristics of the diffuse reflection calibration plate; Step 233, reducing the dimension of the internal point cloud to a two-dimensional plane coordinate system through a principal component analysis algorithm, and mapping the internal point cloud to a best fit plane; step 234, automatically clustering the internal point cloud after dimension reduction by a density-based spatial clustering algorithm, and adopting a preset optimal clustering radius parameter; And step 235, extracting extreme points of each cluster along the main direction as characteristic boundary end points, fitting the characteristic boundary end points through a minimum circumscribed rectangle algorithm to obtain two-dimensional rectangle parameters of the diffuse reflection calibration plate, and mapping the two-dimensional rectangle parameters back to an original three-dimensional space coordinate system through an inverse transformation function of principal component analysis to obtain the three-dimensional angular point coordinates.
6. A monocular vision perception enhancement method based on lidar as claimed in claim 1, wherein the step 300 comprises: Step 310, reading an internal reference matrix and a distortion coefficient in the optical parameters, calculating normalized coordinates of the image pixel coordinate points according to the internal reference matrix aiming at the input image pixel coordinate points, and remapping back to a pixel coordinate system after distortion correction; Step 320, expanding the pixel points after distortion correction into a homogeneous coordinate form, and converting the pixel points in the homogeneous coordinate form into a camera normalization coordinate system through an inverse matrix of the internal reference matrix to obtain a line-of-sight direction vector which starts from the monocular camera optical center and passes through the pixel points; Step 330, transforming the sight line direction vector from a camera coordinate system to a world coordinate system by using a three-dimensional rotation matrix in the three-dimensional space relative position relationship, describing the sight line direction vector as a world coordinate point equal to the sum of a sight line direction vector multiplied by a scale parameter in a ray direction and the monocular camera optical center coordinate by a parameter equation under the world coordinate system, substituting the parameter equation into a ground plane equation to obtain the scale parameter, and calculating to obtain a three-dimensional world coordinate corresponding to the image pixel coordinate point according to the scale parameter; Step 340, creating a three-dimensional array structure according to the image resolution of the monocular camera, wherein the dimensions of the three-dimensional array structure are the vertical pixel resolution, the horizontal pixel resolution and the three-dimensional world coordinates of the corresponding pixel points, traversing all pixels line by line, assigning the three-dimensional world coordinates corresponding to each pixel to the corresponding position of the three-dimensional array structure according to the operations from step 310 to step 330, and generating and storing a full-resolution lookup table, wherein the storage format of the full-resolution lookup table is npy format.
7. A monocular vision perception enhancement method based on lidar as claimed in claim 1, wherein the step 400 includes: Step 410, selecting a lightweight YOLO series model, and performing model training based on the training data set to obtain a vehicle detection model; Step 420, processing the vehicle video image through the vehicle detection model, and outputting the identification information of the vehicle, the coordinates of the boundary frame and the coordinates of the pixels of the lower border of the boundary frame; And 430, inputting the coordinates of the boundary frame and the pixel coordinates of the lower border of the boundary frame into the full resolution lookup table, and obtaining the corresponding three-dimensional space coordinates of the vehicle through index mapping.
8. A monocular vision perception enhancement method based on lidar as claimed in claim 1, wherein the step 500 includes: Step 510, constructing a vehicle motion track based on the vehicle three-dimensional space coordinates, and judging the consistency of the vehicle motion track through a kinematic model, if the vehicle three-dimensional space coordinates of the continuous preset frame number deviate from the predicted value of the kinematic model by more than a preset deviation threshold value, continuing to execute step 520, if the vehicle three-dimensional space coordinates of the continuous preset frame number do not deviate from the predicted value of the kinematic model or the deviation degree does not deviate from the predicted value of the kinematic model by more than the preset deviation threshold value, jumping to step 420, and continuing to process the vehicle video image acquired in real time by the monocular camera through the vehicle detection model; Step 520, identifying a fixed reference feature in the comprehensive transportation hub parking garage scene through a YOLO target detection algorithm, extracting a current pixel coordinate of the fixed reference feature, inputting the current pixel coordinate into the full resolution lookup table to map to obtain a current three-dimensional coordinate, and synchronously storing a true three-dimensional coordinate of the fixed reference feature in an initial joint calibration stage; step 530, constructing a reprojection error minimization objective function by taking the true three-dimensional coordinates of the fixed reference features as a benchmark, and iteratively correcting the relative position relationship between the optical parameters and the three-dimensional space through the objective function; Step 540, substituting the optimized optical parameters and the three-dimensional space relative position relationship into the conversion model, and re-executing the operations from step 310 to step 340 to generate a new full-resolution lookup table.
9. The method for enhancing monocular vision perception based on lidar according to claim 8, wherein the kinematic model comprises a uniform linear motion equation and a circular motion equation, the preset frame number is five frames, the preset deviation threshold is three centimeters, and the re-projection error minimization objective function is In which, in the process, For the pixel coordinates of the current reference feature, Is the true three-dimensional coordinates of the reference feature, As an internal reference matrix of the camera, In the form of a three-dimensional rotation matrix, In order to translate the vector of the vector, Is a projection function.
10. The method for enhancing monocular vision perception based on lidar according to claim 8, wherein the iterative optimization process triggered in step 500 is required to satisfy a dual trigger condition, wherein the first trigger condition is that the vehicle motion trajectory consistency deviation continuously exceeds the preset deviation threshold, the second trigger condition is that the number of effectively identified fixed reference features is not less than three, and the iterative optimization process is asynchronously executed.

Description

Monocular vision perception enhancement method based on laser radar Technical Field The invention relates to the technical field of intelligent traffic, in particular to a monocular vision perception enhancement method based on a laser radar. Background The rapid development of intelligent transportation promotes the comprehensive transportation hub to be intelligent and efficient, and real-time accurate positioning of vehicles is a core support for realizing intelligent functions such as parking space management, traffic scheduling, safety control and the like in the hub. The comprehensive transportation hub parking garage is used as a key area with dense traffic flow and complex scene, has the problems of changeable illumination, frequent vehicle intersection, prominent shielding condition and the like, and has double requirements on the accuracy, stability and economy of the positioning technology, so that a high-efficiency positioning solution for adapting to the scene is needed. The vehicle positioning under the scene mainly depends on a single laser radar or a single monocular camera scheme, wherein the single laser radar can provide high-precision three-dimensional point cloud data to meet the positioning precision requirement, but equipment purchasing and maintenance cost is high, the single monocular camera is difficult to widely deploy in a large-scale junction scene, the single monocular camera is applied to a certain extent by virtue of low cost and convenient deployment, depth information cannot be acquired due to the technical limitation of the single monocular camera, three-dimensional coordinates of the vehicle are difficult to accurately solve, the actual positioning requirement of a complex scene cannot be adapted, and the two schemes cannot balance the positioning precision and economy, so that the single monocular camera is a key bottleneck for restricting the intelligent landing of a comprehensive traffic junction parking garage. Disclosure of Invention The invention aims to provide a monocular vision perception enhancement method based on a laser radar, which aims to solve the problems in the background technology. In order to achieve the purpose, the invention provides a monocular vision perception enhancement method based on a laser radar, which comprises the following steps: Step 100, acquiring a vehicle video image of a scene of a comprehensive transportation hub parking garage through a monocular camera, and acquiring three-dimensional point cloud data of the scene through a laser radar, wherein the three-dimensional point cloud data comprises characteristic information of a diffuse reflection calibration plate; step 200, based on the vehicle video image and the three-dimensional point cloud data, performing internal reference calibration and joint calibration of a camera and a laser radar on the monocular camera, and determining optical parameters of the monocular camera and a three-dimensional space relative position relationship between the monocular camera and a laser radar deployment point; Step 300, based on the optical parameter and the three-dimensional space relative position relation, establishing a conversion model from two-dimensional pixel coordinates to three-dimensional world coordinates, and generating a full resolution lookup table; Step 400, extracting a vehicle boundary frame from the vehicle video image by using a YOLO target detection algorithm, and mapping the vehicle boundary frame to obtain three-dimensional space coordinates of the vehicle by using the full resolution lookup table to realize real-time positioning of the vehicle; And 500, carrying out iterative optimization on the optical parameters and the three-dimensional space relative position relation based on the vehicle motion trail consistency check constructed by the vehicle three-dimensional space coordinates and scene fixed reference characteristics, and correcting parameter drift. Preferably, the step 100 includes: Step 110, disposing the laser radar at a non-shielding position behind the monocular camera, and disposing the diffuse reflection calibration plate in the comprehensive transportation hub parking lot library scene; Step 120, controlling the monocular camera and the laser radar to synchronously record a plurality of sets of calibration data with different poses through an ROS-dock script, and changing the poses of the diffuse reflection calibration plate for a plurality of times in the recording process; And 130, collecting vehicle images through the monocular camera in dim light, shielding and multidirectional driving environments in the comprehensive transportation hub parking garage scene, screening the collected vehicle images, marking the screened vehicle images by using a labelimg tool, and constructing a training data set. Preferably, the step 200 includes: Step 210, a clear frame of the calibration image in the calibration data is intercepted, MATLAB is used for processing the