CN-121999160-A - Visual SLAM key frame selection and map updating method, system and readable storage medium

CN121999160ACN 121999160 ACN121999160 ACN 121999160ACN-121999160-A

Abstract

The invention relates to the field of computer vision and robot autonomous navigation, and particularly discloses a vision SLAM key frame selection and map updating method, a system and a readable storage medium. The method comprises the steps of evaluating semantic segmentation quality in real time through a parallel lightweight confidence prediction network, adaptively selecting semantic key frames by combining geometric and semantic confidence conditions, performing complete semantic segmentation on the key frames only, maintaining a layered semantic map, dynamically managing interlayer migration based on confidence histories, and distributing weights positively related to the confidence to map points in back-end optimization, wherein the high confidence static layer and the low confidence observation layer are included. The invention obviously reduces the calculation cost while ensuring the semantic SLAM precision, enhances the robustness of the system under the conditions of segmentation noise and dynamic environment, and is suitable for real-time operation on mobile equipment.

Inventors

SHEN YURUI
CHEN SIPENG
LIU XINHUA
HAO JINGBIN
Hua Dezheng
YAN JUN

Assignees

宿迁云擎智能科技有限公司

Dates

Publication Date: 20260508
Application Date: 20260127

Claims (10)

1. The visual SLAM key frame selection and map updating method is characterized by comprising the following steps of: S1, for an input current visual frame, generating a global semantic confidence score and a pixel-level semantic confidence heat map of the frame by using a lightweight confidence prediction network running in parallel; s2, comprehensively evaluating the geometric change condition and the semantic change condition of the current frame, wherein the semantic change condition at least partially depends on the confidence information generated in the step S1, so as to determine whether the current frame is selected as a semantic key frame; S3, if the current frame is selected as a semantic key frame, acquiring a dense semantic segmentation result by using a main semantic segmentation network, and associating and fusing the segmentation result with a corresponding three-dimensional map point according to confidence information of the frame; S4, maintaining a layered semantic map, wherein the map at least comprises a high-confidence static layer and a low-confidence observation layer, and the high-confidence static layer mainly stores three-dimensional map points which are from high-confidence semantic key frames and have stable semantic labels and positions and are used for subsequent SLAM pose optimization; and S5, distributing optimization weights positively correlated with the confidence level of the map points to the re-projection error constraint constructed by the map points from the high-confidence static layer when SLAM back-end optimization is executed.
2. The visual SLAM key frame selection and map updating method of claim 1, wherein said step S1 starts two parallel processing lines, a lightweight confidence prediction network and a geometric feature extraction, respectively.
3. The visual SLAM key frame selection and map updating method of claim 1, wherein said step S1 lightweight confidence prediction network is trained by: and using the intersection ratio of the segmentation output of the main semantic segmentation network on the training data set and the real label obtained by calculation as a training target, so that the lightweight network learns to predict the segmentation quality of the main segmentation network on any input image.
4. The visual SLAM key frame selection and map updating method of claim 1, wherein said step S2 further comprises a dynamic scoring filtering step of calculating a proportion of regions in the current frame where the pixel level semantic confidence heat map value is below a second threshold, and if the proportion exceeds a dynamic region threshold, suppressing the current frame from being selected as a semantic key frame.
5. The method for selecting and updating a map for a visual SLAM key frame of claim 4, wherein the key frame selection condition in step S2 is determined by both geometric conditions and semantic conditions, and the semantic conditions are dominant, and the decision logic is as follows: Geometry pre-screening, namely checking whether basic geometry conditions are met or not, and if not, directly skipping the depth semantic processing of the current frame; Semantic quality and variation assessment, further assessing semantic conditions if geometric pre-screening passes: Quality condition if global confidence score The current frame is considered to have extremely high segmentation quality and potential of becoming a key frame; changing conditions of calculating semantic scene descriptor of current frame Descriptor for last semantic key frame Cosine distance between If (1) The semantic content of the scene is considered to be changed obviously, and a new key frame is needed for recording; Dynamic interference suppression by computing pixel-level confidence heat maps Median value below low confidence threshold The proportion of the pixels of (2) If (if) The current frame is considered to be too interfered and is restrained from becoming a key frame; Final decision of the current frame Selected as semantic key frames if and only if: This condition ensures that the key frames have both high information value and high quality.
6. The visual SLAM key frame selection and map updating method of claim 5, wherein said step S2 semantic scene description vector is extracted from middle layer features of said lightweight confidence prediction network.
7. The visual SLAM key frame selection and map updating method of claim 1, wherein said data migration between said high confidence static layer and said low confidence observation layer in step S4 comprises: if the semantic observation in the low confidence observation layer is repeatedly observed in a plurality of continuous frames and the average confidence is increased to exceed a first threshold value, migrating the associated three-dimensional points and semantic information thereof to a high confidence static layer; Map points in the high confidence static layer decay with time if not observed in a plurality of consecutive keyframes, and migrate to the low confidence observation layer or mark as to be deleted when the decayed confidence value is below a second threshold.
8. A visual SLAM key frame selection and map updating system, configured to perform the method of any one of claims 1-7, comprising: an image input module; The confidence prediction module is used for running the lightweight confidence prediction network; the key frame decision module is used for executing the step S2; The semantic segmentation module is used for running the main semantic segmentation network; the hierarchical map management module is used for realizing the map maintenance and updating functions in the step S4; and the SLAM optimization engine module is used for executing pose and map joint optimization in the step S5.
9. The visual SLAM key frame selection and map updating system of claim 8, wherein said system is deployed on an embedded computing platform of a robot, an augmented reality device, or an autonomous vehicle, said lightweight confidence prediction network and said subject sense segmentation network sharing a portion of an underlying feature extraction layer to further reduce computational effort.
10. A computer readable storage medium, having stored thereon a computer program which, when executed by a processor, implements the visual SLAM key frame selection and map updating method described above.

Description

Visual SLAM key frame selection and map updating method, system and readable storage medium Technical Field The invention relates to the field of computer vision and robot autonomous navigation, in particular to a simultaneous localization and map construction (SLAM) method, and particularly relates to a vision SLAM key frame selection and map updating method, a system and a readable storage medium. Background With the development of technologies such as robots and automatic driving, positioning and map construction are core technologies for realizing autonomous sensing and navigation of equipment. The traditional vision SLAM system mainly relies on geometric features (such as points, lines and planes) for pose estimation and map construction, but has robustness problems in dynamic scenes, weak texture environments or long-term operation. In recent years, semantic SLAM gives semantic tags to map points by introducing semantic segmentation information, so that the understanding capability and scene adaptability of an SLAM system are remarkably improved. However, the existing semantic SLAM technology generally has the following problems: 1. The method has the advantages of huge calculation cost, high calculation complexity of a high-quality semantic segmentation model (such as a model based on a deep convolutional neural network or a transducer), and difficulty in running on a mobile terminal or an embedded device in real time. The existing scheme generally performs complete semantic segmentation processing on each frame of image, which results in low system frame rate. 2. Semantic noise pollution-the segmentation quality difference of the semantic segmentation model under different scenes is obvious, and particularly under the conditions of illumination change, shielding, motion blurring and the like, the segmentation result may contain a large number of errors. The existing semantic SLAM system often uses all segmentation results without distinction, so that wrong semantic information pollutes a map, and further positioning accuracy is affected. 3. Map maintenance is inefficient-existing semantic SLAM systems typically simply bind semantic information to geometric maps, lacking continuous tracking and dynamic management of semantic confidence. In long-term operation, when the environment semantics change (such as object movement, new addition or disappearance), the system cannot adaptively update the semantic map, so that the map is outdated and the positioning is invalid. Disclosure of Invention In order to solve the problems in the background technology, the invention develops a visual SLAM key frame selection and map updating method, a system and a readable storage medium, which can obviously reduce the calculation complexity and enhance the robustness of the system under the conditions of segmentation noise and dynamic environment while guaranteeing the semantic SLAM precision. In order to achieve the above purpose, the present invention provides the following technical solutions: The visual SLAM key frame selection and map updating method comprises the following steps: S1, for an input current visual frame, generating a global semantic confidence score and a pixel-level semantic confidence heat map of the frame by using a lightweight confidence prediction network running in parallel; s2, comprehensively evaluating the geometric change condition and the semantic change condition of the current frame, wherein the semantic change condition at least partially depends on the confidence information generated in the step S1, so as to determine whether the current frame is selected as a semantic key frame; S3, if the current frame is selected as a semantic key frame, acquiring a dense semantic segmentation result by using a main semantic segmentation network, and associating and fusing the segmentation result with a corresponding three-dimensional map point according to confidence information of the frame; S4, maintaining a layered semantic map, wherein the map at least comprises a high-confidence static layer and a low-confidence observation layer, and the high-confidence static layer mainly stores three-dimensional map points which are from high-confidence semantic key frames and have stable semantic labels and positions and are used for subsequent SLAM pose optimization; and S5, distributing optimization weights positively correlated with the confidence level of the map points to the re-projection error constraint constructed by the map points from the high-confidence static layer when SLAM back-end optimization is executed. Further, the step S1 starts two parallel processing lines, which are respectively a lightweight confidence prediction network and a geometric feature extraction. Further, the step S1 lightweight confidence prediction network is trained by: and using the intersection ratio of the segmentation output of the main semantic segmentation network on the training data set and the real label obt