CN-121577019-B - Indoor SLAM map construction method and system

CN121577019BCN 121577019 BCN121577019 BCN 121577019BCN-121577019-B

Abstract

The invention provides an indoor SLAM map construction method and system, and relates to the technical field of image processing, wherein the construction method comprises the following steps: A unified global space reference is established through cross-equipment space-time calibration, global static features extracted by an environmental camera are utilized to supplement the visual field blind area information of the robot in real time, meanwhile, dynamic interference features are accurately removed by combining continuous frame analysis, and finally, the global features are used as anchor point constraint to continuously correct the pose of the robot in back-end optimization. The complete technology chain enables the constructed indoor three-dimensional map to have higher integrity, precision and consistency, the positioning robustness and the navigation reliability of the robot in a dynamic activity environment with dense shelves are remarkably improved, the performance dependence and the scene transformation cost of the robot body sensor are reduced, and a stable and efficient environment perception foundation is provided for automatic operation in industries such as storage, file management and the like.

Inventors

LV YIZHU
LI LI
LU ZHIQIANG
Lv Nanxin

Assignees

浙江北泰智能科技股份有限公司

Dates

Publication Date: 20260508
Application Date: 20260126

Claims (10)

1. An indoor SLAM map construction method is characterized by comprising the following steps: observing a static calibration target through a vision sensor carried by a robot and an environment camera arranged indoors, and establishing a coordinate conversion relation between a robot local coordinate system where the vision sensor is positioned and a global coordinate system where the environment camera is positioned; Based on the global image acquired by the environment camera, extracting static features, and based on the global coordinate system, constructing and updating a global landmark database according to the static features; Extracting local features based on a local image and corresponding depth information acquired by the vision sensor, forming a local feature point cloud based on the local features and the corresponding depth information based on the robot local coordinate system, converting the static features in the global landmark database to the robot local coordinate system based on the coordinate conversion relation, and matching and fusing the static features with the local feature point cloud to form a fused SLAM point cloud feature set; Based on the global image, identifying a dynamic target and generating a dynamic region mask, converting the dynamic region mask to the global coordinate system, converting the dynamic region mask to the robot local coordinate system based on the coordinate conversion relation, and eliminating feature points positioned in the dynamic region mask under the robot local coordinate system from the fused SLAM point cloud feature set to obtain an updated SLAM point cloud feature set; And constructing a pose graph based on the updated SLAM point cloud feature set, adding static features in the global landmark database as global anchor points into the pose graph for constraint optimization to obtain an optimized robot pose, and generating the indoor three-dimensional map based on the optimized robot pose, depth information acquired by the vision sensor and the global landmark database.
2. The indoor SLAM map construction method of claim 1, wherein the establishing a coordinate conversion relationship between a robot local coordinate system in which the vision sensor is located and a global coordinate system in which the environmental camera is located comprises: Determining a plurality of static calibration targets, and acquiring visual sensor images and environment camera images corresponding to the static calibration targets through the visual sensors and the environment cameras respectively; Extracting feature point pixel coordinates corresponding to the static calibration target from the vision sensor image and the environment camera image respectively; based on a perspective-n point algorithm, obtaining an initial rotation matrix and a translation vector according to the pixel coordinates of the feature points, the parameter data and the actual physical size of the static calibration target; and optimizing the rotation matrix and the translation vector by a preset optimization algorithm to obtain the coordinate conversion relation.
3. The indoor SLAM map construction method of claim 1, wherein the extracting static features based on the global image acquired by the environmental camera comprises: Processing the global image according to a preset semantic segmentation network to obtain a semantic segmentation result, wherein the semantic segmentation result comprises a static target class result; extracting the static features corresponding to the static targets according to the static target category results; The preset semantic segmentation network is constructed based on an improved YOLOv s network, the static features comprise the position and outline information of the static targets in the global image, and the static target categories comprise shelves, stand columns and channel identification lines.
4. The indoor SLAM map construction method of claim 3, wherein the construction process of the preset semantic segmentation network comprises: acquiring an original YOLOv s network, and improving the original YOLOv s network to obtain an improved YOLOv s network; wherein the original YOLOv s network comprises a backbone network, a Neck network and an output network, and the improvement process comprises: Replacing a standard Bottleneck residual module in the backbone network with an enhanced residual module integrating a channel and a spatial attention mechanism so as to enhance the feature extraction capability of a scene rule structure; and adding a cross-scale attention module in the Neck network, and modifying the output network into a detection-segmentation integrated head.
5. The indoor SLAM map building method of claim 3, wherein said semantic segmentation results further comprise dynamic object class results, wherein said identifying dynamic objects and generating dynamic region masks based on global images acquired by said environmental camera comprises: Performing inter-frame difference operation on the global image to obtain a dynamic candidate region, and comparing the dynamic candidate region with the semantic segmentation result; if any dynamic candidate region overlaps with the region of the dynamic target class result, judging that the corresponding dynamic candidate region corresponds to the dynamic target; generating two-dimensional pixel region coordinates of the dynamic target based on the dynamic candidate region corresponding to the dynamic target; And converting the coordinates of the two-dimensional pixel region into a three-dimensional space region under the global coordinate system based on the coordinate conversion relation, and generating a dynamic region mask.
6. The indoor SLAM map construction method of claim 1, wherein the constructing a pose map based on the updated SLAM point cloud feature set comprises: According to the angular velocity and acceleration data acquired by the inertial measurement unit of the robot, determining the attitude angle change data and the instantaneous velocity of the robot, and calculating the initial motion trail of the robot according to the driving mileage data based on the attitude angle change data and the instantaneous velocity of the robot; Determining a key frame based on the initial motion track and the local image, and taking the track pose of the moment corresponding to the key frame as the initial pose of a corresponding key frame node, wherein the key frame corresponds to the key frame node one by one; taking the static features in the global landmark database as nodes with fixed positions to form fixed nodes; Based on the updated SLAM point cloud feature set, when a robot key frame observes static features corresponding to the fixed nodes, an observation constraint edge is established between the corresponding key frame node and the fixed nodes; establishing a motion constraint edge between adjacent key frame nodes based on the initial motion trail; and obtaining the pose graph based on the key frame node, the fixed node, the observation constraint edge and the motion constraint edge.
7. The indoor SLAM map construction method of claim 6, wherein adding the static features in the global landmark database as global anchor points to the pose map for constraint optimization to obtain the optimized robot pose comprises: Solving the pose graph through a nonlinear optimization algorithm to obtain the optimized robot pose; The objective function of the nonlinear optimization algorithm is formed by weighting and summing the pose prediction error term and the global anchor point constraint error term.
8. The indoor SLAM map construction method of claim 1, wherein the generating the indoor three-dimensional map based on the optimized robot pose, the depth information collected by the vision sensor, and the global landmark database comprises: Generating a three-dimensional point cloud map of a robot direct observation area based on the optimized robot pose and the depth information; Invoking static characteristic data of a corresponding region in the global landmark database, supplementing structural information of a region which is not directly observed by the robot through a three-dimensional reconstruction algorithm, and generating a corresponding three-dimensional geometric model; registering and fusing the three-dimensional point cloud map and the three-dimensional geometric model under the global coordinate system to form the indoor three-dimensional map.
9. An indoor SLAM map construction system, comprising: The acquisition unit is used for observing the static calibration target through a vision sensor carried by the robot and an environment camera arranged indoors, and establishing a coordinate conversion relation between a robot local coordinate system where the vision sensor is positioned and a global coordinate system where the environment camera is positioned; The processing unit is used for obtaining static characteristics based on the global image acquired by the environment camera, and constructing and updating a global landmark database according to the static characteristics based on the global coordinate system; The processing unit is further used for extracting local features based on the local images and the corresponding depth information acquired by the vision sensor, forming local feature point clouds based on the local features and the corresponding depth information of the robot, converting the static features in the global landmark database to the local coordinate system of the robot based on the coordinate conversion relation, and matching and fusing the static features with the local feature point clouds to form a fused SLAM point cloud feature set; The processing unit is further used for identifying a dynamic target and generating a dynamic region mask based on the global image, converting the dynamic region mask into the global coordinate system, converting the dynamic region mask into the robot local coordinate system based on the coordinate conversion relation, and eliminating feature points located in the dynamic region mask under the robot local coordinate system from the fused SLAM point cloud feature set to obtain an updated SLAM point cloud feature set; the construction unit is used for constructing a pose graph based on the updated SLAM point cloud feature set, adding static features in the global landmark database as global anchor points into the pose graph to perform constraint optimization to obtain an optimized robot pose, and generating the indoor three-dimensional map based on the optimized robot pose, depth information acquired by the vision sensor and the global landmark database.
10. An indoor SLAM map construction apparatus comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the indoor SLAM map construction method according to any one of claims 1 to 8 when executing the computer program.

Description

Indoor SLAM map construction method and system Technical Field The invention relates to the technical field of image processing, in particular to an indoor SLAM map construction method and system. Background The existing indoor SLAM technology mainly relies on a local sensor carried by a mobile robot to realize positioning and mapping. The typical scheme comprises a method based on a laser radar, a method for acquiring high-precision geometric point cloud through scanning, wherein the method is high in cost and difficult to provide semantic information, a method based on a visual sensor, wherein the method is used for recovering a three-dimensional structure and texture of a scene through feature matching, and is sensitive to illumination change and weak texture environments, and a collaborative scheme for fusing multi-source data such as vision and inertia, and the like, so that the robustness and the precision of a system are improved. However, in indoor scenes such as warehouses, archives and the like where shelves are dense and channels are narrow and personnel and equipment move frequently, the scheme relying on the self-perception of a robot moving platform faces a common bottleneck, firstly, a sensor field of view is limited, a local observation blind area is easy to form in a goods stacking and goods shelf shielding area, so that a map is lost or geometrically distorted, secondly, although research is focused on dynamic scene processing, in environments such as warehouses where dynamic objects are similar to static background appearance and motion can intermittently occur, accurate identification and elimination of dynamic features still have great challenges, pose estimation drift is easy to be caused, and more importantly, the existing method essentially relies on the self-motion track of a robot to carry out recursive map splicing and optimization, and lacks a fixed, global and continuous observation reference to anchor the map coordinate system, so that accumulated errors cannot be fundamentally restrained, and the overall consistency of the map is reduced after long-time and large-scale operation. These drawbacks together limit the practical application of existing SLAM techniques in automated warehouse operations requiring high precision and reliability. Disclosure of Invention The present invention solves one or more of the above-mentioned problems of the related art. In order to solve the problems, the invention provides an indoor SLAM map construction method and system. In a first aspect, the present invention provides an indoor SLAM map construction method, including: observing a static calibration target through a vision sensor carried by a robot and an environment camera arranged indoors, and establishing a coordinate conversion relation between a robot local coordinate system where the vision sensor is positioned and a global coordinate system where the environment camera is positioned; Based on the global image acquired by the environment camera, extracting static features, and based on the global coordinate system, constructing and updating a global landmark database according to the static features; Extracting local features based on a local image and corresponding depth information acquired by the vision sensor, forming a local feature point cloud based on the local features and the corresponding depth information based on the robot local coordinate system, converting the static features in the global landmark database to the robot local coordinate system based on the coordinate conversion relation, and matching and fusing the static features with the local feature point cloud to form a fused SLAM point cloud feature set; Based on the global image, identifying a dynamic target and generating a dynamic region mask, converting the dynamic region mask to the global coordinate system, converting the dynamic region mask to the robot local coordinate system based on the coordinate conversion relation, and eliminating feature points positioned in the dynamic region mask under the robot local coordinate system from the fused SLAM point cloud feature set to obtain an updated SLAM point cloud feature set; And constructing a pose graph based on the updated SLAM point cloud feature set, adding static features in the global landmark database as global anchor points into the pose graph for constraint optimization to obtain an optimized robot pose, and generating the indoor three-dimensional map based on the optimized robot pose, depth information acquired by the vision sensor and the global landmark database. Optionally, the establishing a coordinate conversion relationship between the local coordinate system of the robot where the vision sensor is located and the global coordinate system where the environmental camera is located includes: Determining a plurality of static calibration targets, and acquiring visual sensor images and environment camera images corresponding to the static calibration