CN-121977524-A - Visual-laser-inertial coupling SLAM method oriented to underground dynamic environment

CN121977524ACN 121977524 ACN121977524 ACN 121977524ACN-121977524-A

Abstract

A visual-laser-inertial coupling SLAM method for an underground dynamic environment comprises the steps of real-time segmentation tracking of a dynamic object based on fusion of an image and point cloud, dynamic object detection algorithm based on false matching voxel occupancy, and multi-source fusion positioning and mapping. The invention integrates the multi-source information of vision, laser radar and inertia measuring unit, realizes high-precision positioning and semantic map construction in complex underground roadway environment, maintains high robustness and high precision in underground environment with dynamic interference and structural degradation through technologies such as multistage dynamic detection and tracking, degradation detection and tight coupling fusion positioning, deep learning semantic segmentation, and the like, solves the problems of positioning drift and map distortion caused by dynamic object interference, and provides technical support for intelligent perception, digital twin construction and intelligent mine management of underground space of coal mine.

Inventors

ZHANG QIUZHAO
LU ZHIYING
WANG WENDA
GUO JIANYE
LIU WANLI
LI ZENGKE

Assignees

中国矿业大学

Dates

Publication Date: 20260505
Application Date: 20251222

Claims (5)

1. The visual-laser-inertial coupling SLAM method for the underground dynamic environment is characterized by comprising the following steps of: S1, carrying out real-time segmentation tracking on a dynamic object based on fusion of an image and a point cloud, adopting a multi-target tracking algorithm based on a Kalman filter to realize stable and robust track tracking, and associating a current frame detection target with a predicted track through an improved Hungary algorithm fused with color histogram characteristics; S2, a dynamic object detection algorithm based on false matching voxel occupancy rate performs false matching operation in the pose estimation process, calculates the voxel occupancy rate condition in a point cloud clustering voxel box after scan to map registration, removes voxel grids with voxel grid occupancy rate change exceeding a set threshold value, realizes high-precision pose estimation and static map construction, and provides a pure point cloud map for downstream roadway environment object identification, segmentation and labeling; S3, multisource fusion positioning and mapping, namely realizing high-precision three-dimensional map construction by adopting a pose estimation frame tightly coupled with a laser radar and an inertial measurement unit, fusing dynamic point cloud information and dynamic object detection improved by image point cloud, introducing a degradation detection module, coping with degradation of a roadway structure through resampling and additional constraint, and realizing high-precision map construction.
2. The visual-laser-inertial coupling SLAM method for downhole dynamic environment of claim 1, wherein the specific process of S1 is: S1.1, identifying a dynamic object based on a dense optical flow method, namely, based on camera pose primarily estimated at the front end, utilizing RANSAC algorithm to robustly estimate homography matrix from matching points from adjacent frames t to frames t+1 And accordingly, performing image compensation on the frame t+1 to eliminate background pixel displacement caused by the motion of the camera, wherein the coordinate mapping relation in the compensation process is as follows: ; Wherein, the Pixel coordinates on the compensated front and back images, respectively; the image at acquisition time t Homography compensated image at time t+1 Then, downsampling the two frames to reduce the resolution, and to ensure the details and definition of the original image after downsampling, downsampling by using a bilinear interpolation method is performed, with the following results: ; Wherein, the Representing the first downsampled image The value of the individual pixel points is calculated, Representing the value of the kth pixel in the original image, Representing the corresponding coordinates of the pixel points in the original image, Representing distance The coordinates of the nearest 4 pixels, Is the column coordinate of the pixel to the left of x, Is the column coordinate of the pixel to the right of x, Is the row coordinate of the pixel above y, Is the row coordinate of the pixel below y; Computing downsampled before and after images To obtain low-resolution optical flow field, up-sampling the low-resolution optical flow field twice to restore the resolution of original optical flow field image, and after obtaining the optical flow field of original resolution image, to distinguish dynamic and static areas in the image based on each pixel point Horizontal optical flow amplitude of (2) And vertical optical flow amplitude To calculate the optical flow amplitude f, namely: ; the magnitude of the optical flow amplitude f is used for representing the displacement of the pixel point between two frames, and each pixel point is used for If the optical flow amplitude f > fth, the displacement of the pixel point between two frames exceeds the set threshold value, and the pixel point is marked as a dynamic pixel point Otherwise, it is static pixel point The following formula is shown: ; S1.2, improving Mask R-CNN neural network dynamic target detection by adjusting ResNet down-sampling modules, and adding an average pooling layer before convolution in a ResNet-D path B; S1.3, improving PAFPN a feature pyramid network, namely in PAFPN, firstly constructing a plurality of branches by adopting different downsampling rates and convolution kernel sizes from a bottom feature map, generating a feature pyramid by each branch, then fusing the feature pyramids in each branch by adopting a learnable path aggregation mechanism, dynamically adjusting an aggregation path by utilizing a set of learnable weight matrixes according to the feature map size and the channel number of each branch, and finally inputting the generated fused feature pyramids into a subsequent segmentation network; s1.4, selecting an object frame based on an RPN network, namely, assuming that the center point coordinate, the width and the height of one Anchor Box are represented, Representing the normalized offset of the center point coordinates, A logarithmic scale factor representing the width and height for four values predicted The calculation formula is as follows: ; Wherein, the Representing the center point coordinates, width and height of the predicted Anchor Box, Respectively representing the width and the height of an Anchor Box; S1.5, a SORT dynamic target detection algorithm is that on the basis of a Hungary algorithm, color histogram features are added, and a vector representing image features is obtained by counting the occurrence frequency of each color in an image, so that target matching is realized, wherein the process is as follows: Three components of HSV are quantized, and the value ranges of HSV are respectively expressed as , According to Is arranged in the form of a number of which the value range is expressed as
3. Let the pixel point with color i in the image be shared And if so, the total number of the pixel points is as follows: ; the probability of color i occurrence is expressed as: ; Wherein, the Is a color histogram; applying the color histogram to target tracking, calculating the similarity between the two histograms by using the pasteurization distance, the formula is as follows: ; Wherein, the Respectively represent vectors , As a coefficient of similarity(s) to be used, Is based on Is a measure of the distance of (a), Representing the final similarity; s1.6, the data of the image and the point cloud are correlated, namely, when the point cloud is projected onto the image, the internal parameters and the external parameters of the camera are determined, and the internal parameters of the camera comprise the focal length in the X-axis direction Focal length in y-axis direction Image principal point position And distortion parameters The specific form of the camera internal reference matrix K is as follows: ; External parameters Consists of a rotation matrix R and a translation vector t, and the point cloud is obtained through external parameters From radar to camera coordinate system The following steps: ; The internal reference matrix sets the points in the camera coordinate system Conversion to pixel coordinates in an image pixel coordinate system : ; Wherein (u, v) represents a point The pixel coordinates in the image pixel coordinate system, Z is the normalized coefficient, For the lower point of the camera coordinate system The process of projection is expressed by the following formula: ; S1.7, an information fusion algorithm based on the mass center of the object is that the instance segmentation information and the dynamic target information are obtained from the image to generate a dynamic target label, and the dynamic target label is removed in a point cloud map, wherein the specific process is as follows: S1.7-1, carrying out point cloud clustering to obtain n point cloud clusters, wherein for the ith point cloud cluster, the center point of the point cloud cluster is the point cloud cluster The combination is expressed as Wherein i=1, 2, n; S1.7-2, based on an internal reference matrix and an external reference matrix, projecting clustered point clouds onto an image, constructing a geometric weighting relation by utilizing the center of a point cloud cluster and the center point of an example segmentation detection frame, and constructing a projection correlation weighting relation construction judgment threshold value by utilizing the point cloud data of the projected point cloud clusters in the detection frame: ; Wherein, the Is the matching score between the jth point cloud cluster and the ith 2D detection box, As the weight coefficient of the light-emitting diode, The geometric distance is weighted in such a way that, Overlaying weighted terms for the proxels; For the center point coordinates of the ith 2D detection frame, Coordinates of the centroid of the jth point cloud cluster projected onto the image, Representing the euclidean distance between two points, Is the width and height of the ith detection frame; all points of the jth point cloud cluster are projected to the image, and then the points fall outside the ith detection frame, Is the total point number in the j-th point cloud cluster, when the point cloud cluster midpoint is closer to the instance segmentation detection frame midpoint distance, The larger the value is, the stronger the distance correlation between the two is, and the weaker the distance correlation is, through Whether the point cloud cluster accords with the target class or not is judged according to the size of the point cloud cluster and whether the point cloud cluster accords with a threshold value or not; S1.7-3, in the point cloud clusters of all the acquired instance division labels, the information of each point is expressed as Wherein For the coordinates of the point cloud in the three-dimensional world, As a point cloud cluster tag, In the case of a category label, Is confidence; In the subsequent point cloud map dynamic object elimination, real dynamic point cloud data is extracted from the point cloud clusters by combining with the dynamic object instance segmentation information obtained through target tracking, and the real dynamic point cloud data is eliminated in key frames of the point cloud map construction.
4. The visual-laser-inertial coupling SLAM method for downhole dynamic environment of claim 1, wherein the specific process of S2 is: S2.1, providing an original measured value by a ground segmentation algorithm laser radar based on a depth map in a distance value mode of each laser beam, converting the original measured value into the depth map, calculating an angle threshold value in each column and two adjacent rows, wherein the angle represents the gradient of scanning points of two adjacent laser lines, and calculating an angle value by the following formula: ; Wherein, the For the tilt angle of the line connecting two adjacent scanning points, Representing the distance between point B to point C and point a to point B respectively, The distance difference between the two points in the horizontal and vertical directions, Is the distance value of the r row and the c column in the depth map, Is the r-1 th line in the depth map, The pitch angle of the laser beams in the r-1 row and the r row is calculated to form a matrix R and c are coordinate values of corresponding lines and columns on the depth image, savitsky-Golay filtering is carried out on each column of the angle matrix for smoothing, then BFS algorithm is used for searching and marking ground points on the angle graph, and comparison is carried out The difference of angle values in the four fields of up, down, left and right of the matrix is not exceeded if the threshold value is not exceeded Then the two adjacent elements are marked together by BFS algorithm; S2.2, using a depth map for a clustering segmentation algorithm based on the depth map, calculating two measurement points for the measurement points on any two laser lines The value, assuming point O as the laser radar center, OA and OB as the measurement points of the two emitted laser lines, according to The value of the angle determines whether points a and B are from the same object surface, Calculated according to the following formula: ; Wherein, the Indicating the distance from point B to point H, Represents the distance from point a to point H, The distance is given as the distance of the OA, For the distance of OB, For most targets, the angle value is always kept relatively large, when the difference of the depth values of adjacent laser radar lines on the depth map is far greater than the angle resolution of the laser radar, a relatively small value is taken, and an angle threshold parameter is set If (if) Less than If the depth difference between the point A and the point B is large, the point A and the point B are not considered to belong to the same target, otherwise, the point A and the point B are considered to come from the same target; After clustering and dividing the non-ground point cloud, each clustering target is represented by an external rectangular frame, and different colors represent different categories; The calculation process of the minimum circumscribed rectangle of the point cloud is as follows, and for any clustering target, the mean and variance of the point cloud set are calculated at first: ; Wherein, the To cluster a point in the cloud of target points, The method is characterized in that the centroid of the point cloud is obtained by carrying out eigenvalue decomposition on a covariance matrix, and solving eigenvalues and eigenvectors: ; Wherein, the For three feature vectors corresponding to the covariance matrix, converting the clustering point cloud into a feature vector system, and calculating the position, the dimension and the direction of the circumscribed rectangle of the clustering point cloud as follows: ; Wherein, the Is the center point of the rectangular frame, Is the center of the mean value of the point cloud, Is the local coordinate system direction of the rectangular frame, d is the vector At the end of the chain Coordinates on the new substrate being constructed; S2.3, dynamic point cloud fine rejection based on sliding window nested point cloud processing, wherein in a sliding window mechanism, a sliding window is arranged to contain N key frames, and the pose set of the N key frames is set The newly inserted frame is Based on the deep cluster map result of S2.2, obtaining In a point cloud cluster set Projecting each cluster centroid into a historical keyframe within a sliding window : ; Wherein, the To project the cluster centroid to coordinates in the ith historical keyframe coordinate system, Inverse pose transformation matrix for new key frame for use in mapping Transforming to a world coordinate system; For the kth cluster Is a centroid of (2); The camera pose transformation matrix of the ith key frame is used for calculating the position consistency of each cluster in the sliding window: ; Wherein, the To cluster the position consistency standard deviation within the sliding window, Representing the average value of projection coordinates of the cluster centroid in all historical key frames, and judging the dynamics based on the change rate of the cluster volume: ; Wherein, the For the volume change rate of the clusters, is used for judging whether the clusters are dynamic objects, To cluster the volumes in the current keyframe, For each cluster, for the average volume of the clusters in the sliding window Calculating the reprojection error of the video in each key frame: ; Wherein, the For the reprojection error of the kth cluster in the ith keyframe, K is a camera internal reference matrix and is a projection function, Is the coordinates of the jth point in the kth cluster under the world coordinate system, For the observed pixel coordinates of the kth point in the image, Projecting the three-dimensional points to the average reprojection error of the statistical clusters in the image coordinate system in the sliding window: ; Wherein, the The average reprojection error of the kth cluster in the sliding window is calculated, G is the total number of clusters, and the cluster motion consistency and the reprojection error are comprehensively considered: ; Wherein, the Are all weight coefficients, and the kth cluster is obtained Is used for the dynamic scoring of (a), For average re-projection errors Is used for the normalization denominator of (1), As a dynamic cluster judgment condition: ; Wherein, the For the dynamic cluster determined to be dynamic cluster The point clouds in the static clusters are further subjected to consistency verification, and the point clouds in the static clusters are subjected to consistency verification Calculating the reprojection error statistic of the image in the sliding window: ; Wherein, the The re-projection error in the kth keyframe, Average re-projection errors in all key frames of the sliding window, And (3) carrying out point cloud scale judgment on the point clouds by adopting an adaptive threshold value on the standard deviation of the reprojection errors in all key frames of the sliding window: ; Wherein, the To determine if a point is a dynamic average re-projection error threshold, To determine if a point is a re-projection error standard deviation threshold for a dynamic point, Is provided.
5. The visual-laser-inertial coupling SLAM method of claim 1, wherein the specific process of S3 is: S3.1, a multi-source fusion positioning and mapping system based on degradation detection, wherein a degradation detection module uses priori pose and de-distortion point cloud provided by an IMU to identify degradation directions and calculates additional constraints to realize robust positioning and mapping in a complex coal mine roadway environment, and the expression is as follows: ; Wherein, the Is a laser radar measurement residual function, Is the state of the system to be optimized, Is the coordinate of the jth laser spot in the lidar coordinate system, Is a pose transformation matrix to be optimized from an IMU coordinate system to a global world coordinate system, The surface element measurement model is utilized from the fixed external parameter matrix of the laser radar reaching the IMU, The method is characterized in that a matching Point in a map fuses a frame to a point_to_plane residual error of the map as a constraint function to iteratively optimize and calculate the pose; Performing degradation detection by fusing IMU pose priori, distortion correction point cloud and incremental global map, and performing the sea-son matrix constrained by the cost function Described again as the following: ; Wherein, the Is the coordinate of the ith laser point in the world coordinate system, N is the number of point clouds participating in calculation, Is the unit normal vector of the i-th laser point corresponding to the matching plane, Is an auxiliary vector combined with the prior pose of the IMU, Constraint information corresponding to the rotational and translational degrees of freedom is recorded respectively, Is a coupling term between rotation and translation, and SVD algorithm is used for the pair Analyzing the matrix, and decomposing the characteristic matrix into: ; Wherein, the The orthogonal matrix is composed of characteristic vectors, and column vectors of the orthogonal matrix are respectively stretched into a translation space and a rotation space; The diagonal matrix is a diagonal matrix, and diagonal elements of the diagonal matrix are corresponding characteristic values; S3.2, detecting observability, namely positioning capability, of the system in each degree of freedom by analyzing the magnitude of the characteristic value, wherein the positioning capability of each characteristic vector direction is one of the following three states: ; wherein none indicates that the direction is completely non-positionable, partial indicates that the direction is partially positionable, full indicates that the direction is completely positionable, and positioning capabilities of the system in 6 degrees of freedom of 3 translations and 3 rotations are summarized by a tuple: ; Wherein, the Representing the observability of the translational degrees of freedom, Representing observability of rotational degrees of freedom; representing a summary of the observability of the system in all degrees of freedom, according to Judging that the system is in a state of non-positionable nonlocalizable, partially positionable partially-localizable or fully positionable localizable; S3.3 for each degradation direction Constructing corresponding linear constraint: When (when) Belongs to the translational degradation direction set When (1): ; Wherein, the Is a set of translational degradation directions, Is the increment of translation to be optimized, Is a priori amount of translation; When (when) Belongs to the rotation degradation direction set When (1): ; Wherein, the Is a set of rotational degradation directions, Is the rotation increment to be optimized, Is a priori amount of rotation; all constraints are represented uniformly in matrix form: ; and adopting a projection method to carry out constraint optimization updating: ; ; Wherein, the Is the amount of unconstrained updates, Is a covariance matrix in an unconstrained state, C, d are a constraint matrix and a constraint value vector, respectively, and G is a projection gain matrix.

Description

Visual-laser-inertial coupling SLAM method oriented to underground dynamic environment Technical Field The invention relates to a visual-laser-inertial coupling SLAM method for an underground dynamic environment, and belongs to the technical field of mine intelligent monitoring. Background The intelligent and digital transformation of coal mines is an important direction of energy safety and high-quality development of mines in China. High-precision three-dimensional modeling and real-time positioning of underground space have become important bases for supporting mine automation, intelligent decision-making and safe production. The underground coal mine tunnel environment has the typical characteristics of long and narrow space, insufficient illumination, single geometric characteristic, serious dust interference and the like. The traditional mode relying on manual measurement and modeling is low in efficiency and limited in precision, and the requirements of real-time sensing and dynamic monitoring are difficult to meet. In recent years, a multi-source fusion simultaneous localization and mapping (Simultaneous Localization AND MAPPING, SLAM) technology is gradually applied to the mine field, and can acquire high-density three-dimensional point cloud data of a complex underground roadway environment in real time and high efficiency, so that the multi-source fusion simultaneous localization and mapping (Simultaneous Localization AND MAPPING, SLAM) technology is an important technical means for constructing a high-precision three-dimensional map. However, in a complex coal mine downhole environment, the positioning accuracy and mapping effect of the SLAM algorithm are significantly affected by downhole structural degradation and dynamic interference factors (such as personnel, vehicles, equipment movements, etc.). The existence of dynamic objects can damage the consistency of feature matching, so that pose estimation errors are accumulated, and the overall image construction precision and the system robustness are reduced. Therefore, how to maintain the stability and high precision of the SLAM system in the underground dynamic interference environment has become one of the key technical bottlenecks of high-precision positioning and semantic mapping of the current underground roadway scene. Currently, SLAM research on underground roadways mainly focuses on multi-sensor data fusion, and positioning accuracy is improved through joint modeling of vision, laser radar and Inertial Measurement Units (IMUs). However, in actual coal mine production, real-time detection, segmentation and tracking of dynamic objects are still under-studied. How to effectively identify and reject the 'ghost' and pseudo features caused by the dynamic target, thereby realizing the dynamic update of the high-precision three-dimensional map, and being a problem which needs to be solved in the current academic world and engineering application. Most of the existing dynamic scene SLAM algorithms are still built on the assumption of a static environment, and the detection and elimination processes of dynamic targets are limited. In the laser SLAM field, the main stream methods comprise dynamic elimination technologies based on voxel filtering, visual point analysis and point cloud segmentation, but the methods have complex calculation, poor real-time performance and insufficient adaptability to the underground environment. The implementation of three-dimensional point cloud detection, segmentation and tracking in a dynamic scene is a key direction for improving the map precision and the system robustness. In summary, the existing three-dimensional mapping and positioning method for the underground roadway of the coal mine mainly has the following problems in a dynamic interference environment: The method comprises the steps of (1) sufficiently identifying and eliminating dynamic object interference, (2) reducing positioning accuracy of a multi-source sensor fusion algorithm under a structure degradation scene, and (3) ensuring insufficient accuracy and instantaneity of point cloud semantic identification, and being difficult to realize high-accuracy semantic map construction. Disclosure of Invention The invention aims to provide a visual-laser-inertial coupling SLAM method for an underground dynamic environment, which can realize multi-source fusion of robust positioning and high-precision semantic mapping under the underground dynamic interference condition and provides technical support for intelligent perception, digital twin construction and intelligent mine management of underground space of a coal mine. In order to achieve the above object, the present invention provides a visual-laser-inertial coupling SLAM method for a downhole dynamic environment, comprising the steps of: S1, carrying out real-time segmentation tracking on a dynamic object based on fusion of an image and a point cloud, adopting a multi-target tracking algorithm based