CN-121430603-B - Quick robust monocular vision inertial positioning method and system

CN121430603BCN 121430603 BCN121430603 BCN 121430603BCN-121430603-B

Abstract

The invention discloses a quick robust monocular vision inertial positioning method and system, which comprise the steps of configuring parameters of an IMU and a monocular camera, preprocessing IMU data, initializing an IMU gesture, extracting image features of the monocular camera, removing abnormal matching points and carrying out IMU pre-integration, initializing the monocular vision, optimizing an odometer gesture, namely judging whether feature points are mismatched or not based on IMU pre-integration results, constructing a feature point re-projection error jacobian matrix, conducting blocking and diagonalization on the re-projection error jacobian matrix to respectively optimize inverse depth of the feature points and image frame gesture, selecting a key frame based on a key frame identification rule after finishing current frame gesture updating, judging according to a current frame, and eliminating a certain frame in a sliding window to reserve a sliding window space for adding a latest frame. According to the invention, the inverse depth of the feature points and the image frame posture are optimized respectively based on the blocking and diagonalization processing of the heavy projection error jacobian matrix, so that the calculation complexity is reduced.

Inventors

LU WEINING
YU XINLONG
WANG LONGFEI

Assignees

江淮前沿技术协同创新中心

Dates

Publication Date: 20260512
Application Date: 20251231

Claims (10)

1. A quick robust monocular vision inertial positioning method is characterized by comprising the steps of configuring parameters of an IMU and a monocular camera, preprocessing IMU data, initializing an IMU gesture, extracting image features of the monocular camera, removing abnormal matching points and carrying out IMU pre-integration, initializing monocular vision, optimizing an odometer gesture, constructing a feature point re-projection error jacobian matrix based on an IMU pre-integration result to judge whether feature points are mismatched or not, and respectively optimizing inverse depth and image frame gesture of the feature points through blocking and diagonalization of the feature point re-projection error jacobian matrix; The step of optimizing the attitude of the odometer specifically comprises the following steps: Step 6.1, screening the feature points which are subjected to pretreatment, have feature tracking times greater than or equal to preset times and are subjected to initialization and reconstruction of three-dimensional coordinates as alternative optimization feature points in a sliding window, judging whether the feature points are mismatched according to an IMU (inertial measurement unit) pre-integration result, and if the feature points meet observability, judging that the feature points are optimizable feature points; Step 6.2, constructing a feature point re-projection error jacobian matrix, and respectively calculating the image frame attitude error and the feature point global coordinate error jacobian matrix; step 6.3, correcting the reprojection error jacobian matrix, eliminating the dimension of the matrix, reserving a characteristic inverse depth error item, and iterating the landmark for a plurality of times; Step 6.4, iteratively solving an error compensation value of the attitude state by using the updated and optimized global coordinate error value of the characteristic point; Step 6.3 specifically includes: The modified reprojection error jacobian matrix satisfies the upper triangularization form, Wherein, the Is a conversion matrix, and the upper triangular matrix is a modified reprojection error jacobian matrix , Is a modified feature inverse depth error term jacobian, And Is a modified state variable error term jacobian, The residual term of the re-projection is decomposed, And Is a decomposed residual matrix block; And Is the decomposed error matrix block; Then, the modified reprojection error Jacobian matrix is subjected to Feature inverse depth error term jacobian matrix in (a) SVD decomposition is carried out: Wherein, the And Is a matrix of the invertible nature of the matrix, Is a matrix Is to be used in the present invention, Is a diagonal matrix of the type, Is to correct the inverse depth error term of the feature point, , Respectively solving an inverse depth state compensation value of each characteristic point based on the filtering updating step, compensating and updating the three-dimensional coordinates of the characteristic points, and calculating a re-projection error elimination wild value point of the updated characteristic points: Wherein, the Is the diagonal matrix The number of elements to be added to the composition, Is a measurement error item of the inverse depth error of the feature point, sequentially completes the calculation of the inverse depth error of the corrected feature point, and solves the inverse depth error of the feature point based on the inverse of the conversion matrix And updating the feature point inverse depth estimation Outputting the optimal feature point inverse depth estimation : Step 6.4 specifically includes: Inverse depth error of feature points obtained in step 6.3 Residual function for correcting state variable errors Updating the state variable error using a filtering update step: Wherein, the Is an estimate of the state variable and, Is the optimal state variable estimation, and thus, the pose optimization of the current frame is completed.
2. The rapid robust monocular vision inertial positioning method of claim 1, wherein the monocular camera is mounted at the front end of the unmanned aerial vehicle, and the monocular camera optical center spindle is mounted at an angle of 30 degrees obliquely downwards relative to the plane of the unmanned aerial vehicle spindle during mounting, and the IMU is arranged at the mass center position of the machine body or other positions of the machine body for accurately calibrating the rotation and translation parameters.
3. The method for rapid robust monocular vision inertial positioning of claim 1, wherein the IMU data preprocessing step specifically comprises: step 2.1, acquiring measurement data of the IMU, and storing the measurement data in sequence before and after time sampling to form an IMU measurement data set; and 2.2, synchronizing the IMU and the monocular camera time based on the accurate calibrated sampling time deviation of the IMU and the monocular camera, and dividing the IMU measurement data set into a plurality of subsets, wherein the sampling time of the IMU measurement data in the subsets is larger than the sampling time of the image frame of the last monocular camera and smaller than or equal to the sampling time of the current image frame.
4. A fast robust monocular vision inertial positioning method as claimed in claim 3, wherein the IMU pose initialization specifically comprises: Step 3.1, sequentially reading subsets of the IMU measurement data set corresponding to the image frames according to the time sampling sequence; Step 3.2 sets a counter count initial initiation =0, and each time measurement data of one IMU is processed, the counter count is incremented by 1; step 3.3 setting the Angle and position Change amount threshold And a measurement data quantity threshold value count max of the IMU, when the angle change quantity Or amount of change in position If the number count of the measured data of the IMU is greater than the threshold value or the number count max of the measured data of the IMU is greater than the threshold value, initializing the IMU gesture, setting the corresponding image frame which completes the IMU gesture initialization as a 0 th frame image, and setting the 0 th frame image as a key frame.
5. The method of claim 1, wherein extracting image features of the monocular camera and removing abnormal matching points and performing IMU pre-integration specifically comprises: step 4.1, extracting ORB characteristic points on the monocular camera image frames, realizing inter-frame characteristic tracking by using a fast approximate nearest neighbor method on the ORB characteristic points, and recording matching points; Step 4.2, using MAGSAC to remove abnormal matching points, and adding the tracked ORB characteristic points to an image queue; step 4.3, calculating IMU pre-integration increment between adjacent image frames; and 4.4, calculating the IMU gesture and speed at the current image frame moment according to the IMU initial gesture and IMU data, and updating the covariance item by using UT transformation.
6. The rapid robust monocular vision inertial positioning method of claim 1, wherein monocular vision initialization specifically comprises: Step 5.1 the pixel position of any feature point extracted based on the step 4 is And the corresponding pixel positions of the feature points tracked in the current frame are Calculating the characteristic average parallax of the current frame and the 0 th frame image, starting a monocular vision initialization step when the characteristic average parallax meets the threshold value requirement and the number of image frames is larger than that of a sliding window, setting an image frame sliding window, and sequentially storing the 0 th frame image, the image frames meeting the requirement and the intermediate image frames in the sliding window; Step 5.2 selecting the image frame with the largest parallax with the current frame from all the acquired monocular camera image frames before the initialization in the sliding window as the reference frame, and obtaining the rotation of any kth frame image frame in the sliding window in the 0 th frame image coordinate system by constructing the SFM problem And translation Any characteristic point Three-dimensional coordinates of (a) Constructing a residual error function; step 5.3, optimizing the residual function by using an LM optimization method; And 5.4, reconstructing three-dimensional coordinates of the feature points based on the optimized image frame true scale posture and the feature pixel coordinates.
7. The quick robust monocular vision inertial positioning method of claim 1, wherein after judging according to the current frame and removing a certain frame in the sliding window to add the latest frame, further comprising the steps of closed loop detection and optimization, specifically comprising: Step 9.1, performing dictionary similarity calculation on the current key frame and the historical key frame, and selecting an optimal candidate historical key frame; Step 9.2, carrying out RANSAC matching on the current key frame and the candidate historical key frame to remove the wild value points and carrying out PNP so as to solve the relative gesture; and 9.3, constructing a residual function between the matched frames, adding the current key frame into the gesture graph, selecting part of key frames from the gesture graph to perform gesture graph optimization so as to reduce the computational complexity, and correcting the gesture of other key frames.
8. A quick robust monocular vision inertial positioning system is characterized by comprising a parameter configuration module, a data preprocessing module, an initialization module, a pre-integration module, a monocular vision initialization module and an attitude optimization module, wherein the parameter configuration module is used for configuring parameters of an IMU and a monocular camera, the data preprocessing module is used for preprocessing IMU data, the initialization module is used for initializing an IMU attitude, the pre-integration module is used for extracting image features of the monocular camera, removing abnormal matching points and carrying out IMU pre-integration, the monocular vision initialization module is used for monocular vision initialization, the attitude optimization module is used for optimizing an odometer attitude, the attitude optimization module comprises the steps of judging whether feature points are in mismatching or not based on IMU pre-integration results, constructing a feature point re-projection error jacobian matrix, dividing the re-projection error jacobian matrix and diagonalizing to optimize inverse depth and image frame attitudes of feature points respectively, and selecting key frames based on key frame recognition rules after the current frame attitude update is completed, and the positioning module is used for judging according to the current frames, removing a certain frame in a sliding window to be reserved for adding a latest frame sliding window space; The step of optimizing the attitude of the odometer specifically comprises the following steps: Step 6.1, screening the feature points which are subjected to pretreatment, have feature tracking times greater than or equal to preset times and are subjected to initialization and reconstruction of three-dimensional coordinates as alternative optimization feature points in a sliding window, judging whether the feature points are mismatched according to an IMU (inertial measurement unit) pre-integration result, and if the feature points meet observability, judging that the feature points are optimizable feature points; Step 6.2, constructing a feature point re-projection error jacobian matrix, and respectively calculating the image frame attitude error and the feature point global coordinate error jacobian matrix; step 6.3, correcting the reprojection error jacobian matrix, eliminating the dimension of the matrix, reserving a characteristic inverse depth error item, and iterating the landmark for a plurality of times; Step 6.4, iteratively solving an error compensation value of the attitude state by using the updated and optimized global coordinate error value of the characteristic point; Step 6.3 specifically includes: The modified reprojection error jacobian matrix satisfies the upper triangularization form, Wherein, the Is a conversion matrix, and the upper triangular matrix is a modified reprojection error jacobian matrix , Is a modified feature inverse depth error term jacobian, And Is a modified state variable error term jacobian, The residual term of the re-projection is decomposed, And Is a decomposed residual matrix block; And Is the decomposed error matrix block; Then, the modified reprojection error Jacobian matrix is subjected to Feature inverse depth error term jacobian matrix in (a) SVD decomposition is carried out: Wherein, the And Is a matrix of the invertible nature of the matrix, Is a matrix Is to be used in the present invention, Is a diagonal matrix of the type, Is to correct the inverse depth error term of the feature point, , Respectively solving an inverse depth state compensation value of each characteristic point based on the filtering updating step, compensating and updating the three-dimensional coordinates of the characteristic points, and calculating a re-projection error elimination wild value point of the updated characteristic points: Wherein, the Is the diagonal matrix The number of elements to be added to the composition, Is a measurement error item of the inverse depth error of the feature point, sequentially completes the calculation of the inverse depth error of the corrected feature point, and solves the inverse depth error of the feature point based on the inverse of the conversion matrix And updating the feature point inverse depth estimation Outputting the optimal feature point inverse depth estimation : Step 6.4 specifically includes: Inverse depth error of feature points obtained in step 6.3 Residual function for correcting state variable errors Updating the state variable error using a filtering update step: Wherein, the Is an estimate of the state variable and, Is the optimal state variable estimation, and thus, the pose optimization of the current frame is completed.
9. The rapid robust monocular vision inertial positioning system of claim 8, wherein the monocular camera is mounted at the front end of the unmanned aerial vehicle, and the monocular camera optical center spindle is mounted at an angle of 30 ° downward relative to the plane of the unmanned aerial vehicle spindle during mounting, and the IMU is disposed at the center of mass position of the machine body or other position of the machine body for accurately calibrating the rotational translation parameters.
10. A fast robust monocular vision inertial positioning system according to claim 8, wherein the preprocessing of IMU data specifically includes: step 2.1, acquiring measurement data of the IMU, and storing the measurement data in sequence before and after time sampling to form an IMU measurement data set; and 2.2, synchronizing the IMU and the monocular camera time based on the accurate calibrated sampling time deviation of the IMU and the monocular camera, and dividing the IMU measurement data set into a plurality of subsets, wherein the sampling time of the IMU measurement data in the subsets is larger than the sampling time of the image frame of the last monocular camera and smaller than or equal to the sampling time of the current image frame.

Description

Quick robust monocular vision inertial positioning method and system Technical Field The invention relates to a computer vision technology, in particular to a monocular vision inertial positioning method and a monocular vision inertial positioning system. Background With the development of robot technology, robots are increasingly applied to various complex environments such as industrial production, home services, medical aid, space exploration, etc. The robot needs to autonomously move and execute tasks in an unknown environment, the position of the robot needs to be determined in real time, an environment map is constructed, and the visual SLAM (instant positioning and map construction, simultaneous Localization AND MAPPING) technology just meets the key requirement, so that basic support is provided for autonomous navigation, path planning and the like of the robot. The continual advances in computer vision technology have made it possible to acquire environmental information via vision sensors and process it. The vision sensor has the advantages of low cost, abundant information, strong portability and the like, can capture a great deal of environment details including texture, color, depth and other information, and provides a richer data basis for solving SLAM problems. Meanwhile, continuous optimization of an image processing algorithm, a feature extraction and matching algorithm and the like provides technical support for development of visual SLAM. Early vision SLAM research mainly focuses on basic problems of a monocular vision system, such as camera positioning, three-dimensional reconstruction, feature point matching and the like, and theoretical bases comprise SLAM based on a filter, a triangulation method and the like, but the basic problems are limited by calculation capability, the running speed of the system is slow, the system is sensitive to noise and shielding, the robustness is poor, and dynamic scenes and large-scale environments are difficult to process. With the improvement of the hardware performance of the computer, particularly the popularization of the GPU, the method provides strong support for the real-time processing of the visual SLAM. 2007. The PTAM (PARALLEL TRACKING AND MAPPING, parallel processing tracking and mapping) algorithm proposed in the year is one of the marked achievements of the stage, realizes the parallel tracking and mapping framework for the first time, and introduces a key frame mechanism. 2015. The visual SLAM system is further perfected by the annual ORB-SLAM algorithm, the whole system is divided into three threads of tracking, map creation and closed loop detection processing, and the method is high in precision and can run in real time. In addition, global optimization technology such as a graph optimization-based method gradually replaces a filter method, and the advent of RGB-D cameras also provides convenience for obtaining scene depth information. The visual SLAM technology gradually matures, and is widely applied and deeply studied in the industry and academia. On one hand, the accuracy and the robustness of SLAM are remarkably improved through stronger feature extraction and optimization algorithm, and on the other hand, multi-mode fusion becomes development emphasis, such as the deep combination of visual SLAM and IMU (Inertial Measurement Unit ) and the fusion with other sensors such as laser radar, so that the performance and the reliability of the system are further improved. LiDAR and RGB-D sensors have considerable weight, while stereo cameras may require minimal camera spacing to operate, both of which limit their usability on small aerial robots and the like. In addition, monocular camera sensors are widely used and inexpensive, further enhancing their attractiveness for widespread deployment on resource-constrained platforms. The small aerial robot is limited in load capacity, and the monocular camera sensor is widely deployed on the micro platform by virtue of light weight and small volume. The existing monocular vision odometer scheme is difficult to simultaneously consider the calculation complexity and the positioning accuracy, the optimization-based scheme has robustness and occupies high calculation resources, and the filtering-based scheme realizes matrix dimension reduction by eliminating characteristic points so as to reduce the calculation complexity, but leads to the reduction of the positioning repetition accuracy. The existing state propagation scheme adopts a linearization mode, and linearization errors are increased in a scene of intense motion. Disclosure of Invention The invention aims to solve the technical problem of reducing the complexity of monocular vision inertial positioning calculation and simultaneously considering positioning precision. The invention solves the technical problems by the following technical means that the quick robust monocular vision inertial positioning method comprises the following steps