CN-121977525-A - Visual SLAM method based on multi-feature fusion for dynamic scene

CN121977525ACN 121977525 ACN121977525 ACN 121977525ACN-121977525-A

Abstract

The application discloses a visual SLAM method based on multi-feature fusion in a dynamic scene, and relates to the technical field of image processing. The method comprises the steps of obtaining an RGB image and a depth image to be processed, wherein the RGB image and the depth image are acquired synchronously, respectively extracting ORB point features and LSD line features from the RGB image, respectively eliminating dynamic ORB point features in the ORB point features and dynamic LSD line features in the LSD line features extracted from the RGB image, and constructing a sparse 3D map based on the ORB point features and the LSD line features of the RGB image and the depth image. According to the application, ORB point features and LSD line features are combined in sparse 3D map construction, and dynamic features, namely dynamic elements, in the ORB point features and LSD line features are removed, so that the performance of visual SLAM can be remarkably improved.

Inventors

LUO JINGWEN
LIU JIALE

Assignees

云南师范大学

Dates

Publication Date: 20260505
Application Date: 20260108

Claims (8)

1. A visual SLAM method for multi-feature fusion-based in a dynamic scene, comprising: Acquiring an RGB image and a depth image to be processed, wherein the RGB image and the depth image are acquired synchronously; respectively extracting ORB point characteristics and LSD line characteristics from the RGB image; respectively eliminating dynamic ORB point characteristics in the ORB point characteristics extracted from the RGB image and dynamic LSD line characteristics in the LSD line characteristics; constructing a sparse 3D map based on the ORB point features and the LSD line features of the RGB image and the depth image; The removing of the dynamic ORB point feature of the ORB point features and the dynamic LSD line feature of the LSD line features extracted from the RGB image, respectively, includes: Inputting YOLOv the RGB image into a YOLOv target detection model to obtain a boundary box of a dynamic object; Tracking the ORB point characteristics in the boundary frame in the RGB images of the continuous frames through an optical flow tracking method to obtain optical flow residual errors of the ORB point characteristics in the boundary frame in the RGB images of the continuous frames; Calculating the distance from the ORB point characteristic in the boundary frame to the polar line in the RGB image of the continuous frame by adopting a polar line constraint method; Eliminating the ORB point features, the distance of which is greater than a preset distance threshold, in the boundary box; Determining the ORB point feature that the optical flow residual error in the boundary box is larger than a preset optical flow residual error threshold value as the dynamic ORB point feature; Determining an edge region of the bounding box; Regarding the edge area as the boundary box, continuing to execute the step of tracking the ORB point characteristics in the boundary box in the RGB images of continuous frames through an optical flow tracking method to obtain optical flow residual errors of the ORB point characteristics in the boundary box in the RGB images of continuous frames until the dynamic ORB point characteristics of the edge area are determined; rejecting the dynamic ORB point features; The removing of the dynamic ORB point feature of the ORB point features and the dynamic LSD line feature of the LSD line features extracted from the RGB image, respectively, includes: calculating a geometric error of the LSD line feature within the bounding box in the RGB images of successive frames by a geometric error function; Determining the LSD line characteristics of which the geometric errors are larger than a preset error threshold value in the boundary box as the dynamic LSD line characteristics; Determining an edge region of the bounding box; determining an optical flow residual error, an LBD descriptor error, a geometric relationship between a line segment and the boundary box and a global geometric verification result of the line segment at the end points of the line segment between the LSD line features in the edge region; Determining the LSD line characteristics of which the optical flow residual error, the LBD descriptor error, the geometric relation between the line segment and the boundary box and the global geometric verification result of the line segment all meet the corresponding preset conditions as the dynamic LSD line characteristics; removing the dynamic LSD line characteristics; wherein the geometric error function is: ; Wherein the method comprises the steps of An LSD line feature pair indicating that two frames were successfully matched, The length of the line segment is indicated, Representing the midpoint of the line segment, The direction of the line segment is indicated, The transformation matrix is represented by a representation of the transformation matrix, Representing the translation vector.
2. The method of claim 1, wherein the ORB point features include corner points, main directions of corner points, and descriptors of corner points; the extracting ORB point features and LSD line features from the RGB image respectively includes: Constructing an image pyramid of the RGB image; Performing corner detection on each layer of sub RGB image in the image pyramid by adopting a FAST algorithm to obtain corner points of each layer of sub RGB image; Uniformly sampling the corner points of the sub RGB images of each layer by adopting a quadtree algorithm so as to ensure that the uniformly sampled corner points of the sub RGB images of each layer are uniformly distributed in space; calculating the main directions of the corner points of the sub RGB images of each layer by adopting a gray centroid method; Performing Gaussian blur on the sub RGB image of each layer to reduce noise interference of the sub RGB image of each layer; and calculating descriptors of the corner points of the sub RGB image of each layer by adopting a BRIEF algorithm.
3. The method of claim 1, wherein the LSD line features include a location of a line segment, a line segment direction, and a length of a line segment; the extracting ORB point features and LSD line features from the RGB image respectively includes: performing gradient calculation on the RGB image to obtain gradient amplitude and gradient direction of each pixel in the RGB image; calculating the horizontal line angle of each pixel in the RGB image according to the gradient direction; Constructing a unit vector field according to the gradient amplitude and the horizontal line angle of each pixel in the RGB image; carrying out Gaussian blur on the unit vector field to reduce noise interference; Classifying the unit vector fields into the same connected domain; Screening a target connected domain according to the rectangle degree of the connected domain, wherein the target connected domain is used as a candidate region of a line segment; taking the side length of the minimum circumscribed rectangle of the target connected domain as the length of a line segment; fitting the minimum circumscribed rectangle of the target connected domain by a least square method to obtain the direction and the position of the line segment.
4. The method of claim 1, wherein the separately culling dynamic ORB point features of the ORB point features and dynamic LSD line features of the LSD line features extracted from the RGB image is preceded by: Removing invalid ones of the LSD line features extracted from the RGB image, comprising: Descending order sorting is carried out on the LSD line characteristics according to the lengths of the line segments, so that the LSD line characteristic set after descending order sorting is obtained; taking the average value of the lengths of all the line segments in the LSD line characteristic set as the diameter and taking the midpoint of each line segment as the center of a circle to obtain a plurality of circular ranges; counting the number of ORB point features within each of the circular ranges; And eliminating the LSD line features of the line segments corresponding to the midpoints of the circular range, wherein the number of ORB point features is larger than the preset point feature threshold.
5. The method of claim 4, wherein the culling invalid ones of the LSD line features extracted from the RGB image further comprises: combining the LSD line features extracted from the RGB image, comprising: determining line segment tilt angles between each of said LSD line features; Screening out an initial LSD line characteristic pair with the line segment inclination angle larger than a preset angle threshold value; Determining the similarity of the initial LSD line feature pairs; Screening out a target LSD line characteristic pair with similarity larger than a preset similarity threshold; and merging the target LSD line characteristic pairs through a least square method.
6. The method of claim 1, wherein the constructing a sparse 3D map based on the ORB point features and the LSD line features of the RGB image and the depth image comprises: Matching the ORB point features and the LSD line features in each of the RGB images, respectively; Determining a camera pose for acquiring the RGB image based on the matched ORB point features and the LSD line features; Screening a key RGB image according to the camera pose and saving key frame information of the key RGB image, wherein the key frame information at least comprises the depth image acquired synchronously with the key RGB image, the ORB point characteristic of the key RGB image, the LSD line characteristic of the key RGB image and the camera pose of the RGB image; determining candidate loop key RGB images according to a BOW model and the key frame information of the historical key RGB images; Determining the similarity of the current key RGB image and the candidate loop key RGB image according to the LSD line characteristic of the current key RGB image and the ORB point characteristic of the candidate loop key RGB image; performing loop detection on the current key RGB image based on the similarity; After loop detection is finished, global optimization is carried out on the key frame information of the current key RGB image by utilizing BA; and constructing a sparse 3D map by using the optimized key frame information of the current key RGB image.
7. The method of claim 6, wherein after the loop-back detection is completed, globally optimizing the key frame information of the current key RGB image using BA comprises: and constructing a color 3D dense point cloud map by using the optimized key frame information of the current key RGB image through a PCL point cloud library.
8. The method of claim 6, wherein said matching the ORB point features and the LSD line features in each of the RGB images, respectively, comprises: matching the ORB point features of each RGB image according to descriptors in the ORB point features of each RGB image; Calculating LBD descriptors according to the LSD line characteristics; matching the LSD line features of each of the RGB images according to the LBD descriptors in the LSD line features of each of the RGB images.

Description

Visual SLAM method based on multi-feature fusion for dynamic scene Technical Field The present application relates generally to the field of image processing technology. More particularly, the present application relates to a visual SLAM method based on multi-feature fusion for use in dynamic scenarios. Background Visual SLAM (Visual Simultaneous Localization AND MAPPING, visual synchronous positioning and mapping, also called V-SLAM for short) is a technology for enabling a robot to realize self positioning and environment map construction in an unknown environment at the same time. The method has the core that the movement track of the robot is estimated in real time through the data of visual sensors such as a camera and the like, and an environment model is generated, so that key technical support is provided for the mobile robot to realize autonomous navigation in a complex environment, and the method is widely applied to the fields of automatic driving, unmanned aerial vehicles, AR/VR and the like. In recent years, with the development of artificial intelligence and robotics, the visual SLAM technology has become a leading-edge research hotspot. Conventional vision SLAM systems, such as ORB-SLAM, PTAM, LSD-SLAM, etc., are typically based on static environmental assumptions, i.e., that objects in the observed scene are considered to remain stationary during operation of the vision SLAM system. However, the real world environment often contains dynamic objects such as pedestrians, vehicles, moving objects, etc., which can significantly impact the performance of the visual SLAM system, resulting in positioning drift, map distortion, and even system failure. In view of the foregoing, there is a need to provide a visual SLAM method for multi-feature fusion in a dynamic scene, so as to improve the robustness of the visual SLAM in the dynamic scene. Disclosure of Invention To solve at least one or more of the technical problems mentioned above, the present application proposes a visual SLAM method for multi-feature fusion-based in a dynamic scene, so as to improve the robustness of the visual SLAM in the dynamic scene. The application provides a visual SLAM method based on multi-feature fusion in a dynamic scene, which comprises the following steps: The method comprises the steps of obtaining an RGB image and a depth image to be processed, wherein the RGB image and the depth image are synchronously acquired, respectively extracting ORB point characteristics and LSD line characteristics from the RGB image, respectively eliminating dynamic ORB point characteristics in the ORB point characteristics and dynamic LSD line characteristics in the LSD line characteristics extracted from the RGB image, and constructing a sparse 3D map based on the ORB point characteristics and the LSD line characteristics of the RGB image and the depth image. In some embodiments, the ORB point features comprise corner points, main directions of the corner points and descriptors of the corner points, the extracting the ORB point features and the LSD line features from the RGB images respectively comprises constructing an image pyramid of the RGB images, detecting the corner points of the sub-RGB images of each layer in the image pyramid by adopting a FAST algorithm to obtain the corner points of the sub-RGB images of each layer, uniformly sampling the corner points of the sub-RGB images of each layer by adopting a quadtree algorithm to ensure that the spatial distribution of the corner points of the sub-RGB images of each layer after uniform sampling is uniform, calculating the main directions of the corner points of the sub-RGB images of each layer by adopting a gray centroid method, carrying out Gaussian blur on the sub-RGB images of each layer to reduce noise interference of the sub-RGB images of each layer, and calculating the descriptors of the corner points of the sub-RGB images of each layer by adopting a BRIEF algorithm. In some embodiments, the LSD line features comprise positions of line segments, line segment directions and lengths of the line segments, the steps of respectively extracting ORB point features and LSD line features from the RGB image comprise gradient calculation of the RGB image to obtain gradient amplitude and gradient directions of each pixel in the RGB image, calculating horizontal line angles of each pixel in the RGB image according to the gradient directions, constructing a unit vector field according to the gradient amplitude and the horizontal line angles of each pixel in the RGB image, carrying out Gaussian blur on the unit vector field to reduce noise interference, classifying the unit vector field into same connected domains, screening target connected domains according to the rectangle degree of the connected domains, wherein the target connected domains serve as candidate areas of the line segments, fitting the side length of the minimum circumscribed rectangle of the target connected domains as t