CN-121982697-A - VSLAM feature point optimization method and system based on semantic stability

CN121982697ACN 121982697 ACN121982697 ACN 121982697ACN-121982697-A

Abstract

The invention belongs to the technical field of computer vision and robots, and relates to a VSLAM feature point optimization method and a VSLAM feature point optimization system based on semantic stability, wherein the method comprises the following steps of generating an image frame to be processed carrying initial pose information based on continuous time sequence scene image data processing; generating an initial identity three-dimensional landmark set containing structural landmarks and transient landmarks, establishing and updating a long-term stability assessment file, performing qualification judgment on the transient landmarks according to preset identity promotion threshold conditions, generating landmark identity migration instructions, generating an identity updated three-dimensional landmark set, generating a landmark identity association optimization weight set, and outputting an optimized camera pose and an optimized global map. The invention solves the problems that the three-dimensional map is deformed in an indiscriminate processing mode, the camera track deviates from the real path, and the long-term stability and the positioning accuracy of the system are affected.

Inventors

HUANG CHENGUANG
LI BING
CHEN KAI
ZHANG KANG
HONG YUANQIAN
GUI ZHENGRONG
ZHU JIAJUN
WANG WENYUAN
Tan Yidao

Assignees

中国建筑第四工程局有限公司

Dates

Publication Date: 20260505
Application Date: 20251201

Claims (10)

1. The VSLAM feature point optimization method based on semantic stability is characterized by comprising the following steps of: S1, acquiring scene image data with continuous time sequences, and generating an image frame to be processed carrying initial pose information based on scene image data processing with the continuous time sequences; S2, identifying semantic information and three-dimensional feature points in an image frame to be processed carrying initial pose information, and initializing the three-dimensional feature points according to the semantic information to generate an initial identity three-dimensional landmark set containing structural landmarks and instantaneous landmarks; S3, calculating and recording position deviation of the instantaneous landmarks in the initial identity three-dimensional landmark set in a cross-period repositioning event so as to establish and update a long-term stability assessment file; S4, based on the long-term stability assessment file, performing qualification judgment on the instant landmark according to a preset threshold condition of identity promotion so as to generate a landmark identity migration instruction; S5, executing a landmark identity migration instruction, and changing landmark identities in the initial identity three-dimensional landmark set to generate an updated three-dimensional landmark set; S6, according to the current identity of each landmark in the three-dimensional landmark set with updated identity, applying a differentiated optimization weight distribution strategy to distribute optimization weights for each landmark so as to generate a landmark identity association optimization weight set; And S7, binding, adjusting and optimizing the camera motion trail and the three-dimensional map structure by applying the landmark identity association optimization weight set so as to output the optimized camera pose and the optimized global map.
2. The semantic stability-based VSLAM feature point optimization method of claim 1, wherein generating the image frame to be processed carrying the initial pose information comprises the steps of: starting a vision sensor, continuously collecting images in the environment, and forming an image sequence containing a time stamp; the method comprises the steps of preliminarily estimating camera motion of each frame of image relative to a previous frame by analyzing characteristic point matching relations between adjacent images, and adding an initial camera pose estimated value for each frame of image; and packaging the image frames containing the initial pose estimation values to generate the image frames to be processed carrying the initial pose information.
3. The semantic stability-based VSLAM feature point optimization method of claim 1, wherein generating an initial identity three-dimensional landmark set comprising structural landmarks and transient landmarks comprises the steps of: Operating a semantic segmentation model on an image frame to be processed carrying initial pose information so as to identify semantic information of a fixed object type or a potential movable object type to which each pixel in the image belongs; extracting visual features from an image frame to be processed carrying initial pose information and calculating three-dimensional space coordinates of the visual features to form three-dimensional feature points; According to the semantic information of the corresponding pixels of each three-dimensional feature point in the image, initializing a structural landmark identity if the three-dimensional feature point is from a fixed object type, and initializing an instantaneous landmark identity if the three-dimensional feature point is from a potential movable object type; all three-dimensional feature points assigned to the structural landmark identities or the transient landmark identities are integrated to generate an initial identity three-dimensional landmark set.
4. The semantic stability-based VSLAM feature point optimization method of claim 1, wherein the long-term stability assessment archive is established and updated, comprising the steps of: creating an independent long-term stability assessment archive for each transient landmark in the initial identity three-dimensional landmark set; when the equipment successfully triggers a repositioning event in the constructed region, searching whether the currently observed instantaneous landmark corresponds to the history record or not; And if so, calculating the position deviation between the currently observed global coordinates and the global coordinates of the historical record, and tracking the position deviation, the current observation time stamp and the unique identifier for identifying the repositioning event to the exclusive long-term stability assessment file.
5. The semantic stability-based VSLAM feature point optimization method of claim 1, wherein the preset threshold condition for identity promotion includes an independent repositioning event number threshold and a position deviation tolerance threshold; traversing the long-term stability assessment file of each instant landmark, counting the number of unique identifiers of independent repositioning events, and calculating the average value of all recorded position deviations; When the number of the independent reams meets the number of times threshold and the average value meets the position deviation tolerance threshold, generating an instruction for executing promotion for the instant landmark, and adding the instruction to the landmark identity migration instruction.
6. The semantic stability-based VSLAM feature point optimization method of claim 1, wherein generating landmark identity migration instructions comprises the steps of: Monitoring the positional deviation of the existing structural landmarks in the new repositioning event; And when the position deviation of the structural landmark continuously exceeds a preset stable condition, generating an instruction for executing degradation for the structural landmark, and adding the instruction to the landmark identity migration instruction.
7. The semantic stability-based VSLAM feature point optimization method of claim 1, wherein generating the identity updated three-dimensional landmark set comprises the steps of: Receiving a landmark identity migration instruction and locking a target landmark point contained in the instruction; For the instant landmark receiving the promotion instruction, the identity attribute of the instant landmark is modified from the instant landmark to the structural landmark; for a "structural landmark" that receives the "execute downgrade" instruction, its identity attribute is modified from a "structural landmark" to a "transient landmark"; and recombining all landmark points subjected to identity attribute modification to form a three-dimensional landmark set with updated identity.
8. The semantic stability-based VSLAM feature point optimization method of claim 1, wherein generating a landmark identity association optimization weight set comprises the steps of: traversing the three-dimensional landmark set with updated identity, and reading the current identity of each landmark point; distributing the highest optimization weight for landmark points with all identities being structural landmarks, which indicates that the landmark points have the highest credibility and influence in the subsequent optimization calculation; Distributing optimization weights for all landmark points with the identity of 'instantaneous landmarks', wherein the optimization weights indicate that auxiliary and temporary positioning information is provided, but influence in global optimization is inhibited; binding all landmark points with the corresponding optimization weights to generate a landmark identity association optimization weight set containing all landmarks and weight mapping relations thereof.
9. The semantic stability-based VSLAM feature point optimization method of claim 1, wherein outputting the optimized camera pose and the optimized global map comprises the steps of: Constructing an objective function with minimized reprojection error as a distance between a projection point of the three-dimensional landmark point projected back to the image plane according to the pose of the camera and an actual observation point; Multiplying a reprojection error item generated by each three-dimensional landmark point by an optimization weight corresponding to the reprojection error item in the landmark identity association optimization weight set in an objective function; And solving the weighted objective function to jointly optimize the spatial positions of all the historical camera poses and all the three-dimensional landmark points so as to output an optimized camera pose and an optimized global map.
10. VSLAM feature point optimization system based on semantic stability, which is characterized by comprising the following modules: The data acquisition and preprocessing module is used for acquiring scene image data with continuous time sequences and generating an image frame to be processed carrying initial pose information based on scene image data processing with the continuous time sequences; The semantic information and feature recognition module is used for recognizing semantic information and three-dimensional feature points in the image frame to be processed carrying initial pose information, and carrying out identity initialization on the three-dimensional feature points according to the semantic information to generate an initial identity three-dimensional landmark set containing structural landmarks and instantaneous landmarks; The instantaneous landmark stability assessment module is used for calculating and recording the position deviation of the instantaneous landmarks in the initial identity three-dimensional landmark set in a cross-period repositioning event so as to establish and update a long-term stability assessment file; the landmark identity promotion judging module is used for judging qualification of the instant landmark according to a preset threshold condition of identity promotion based on the long-term stability assessment file so as to generate a landmark identity migration instruction; the landmark identity migration execution module executes a landmark identity migration instruction to change landmark identities in the three-dimensional landmark set of the initial identity so as to generate a three-dimensional landmark set with updated identities; The optimization weight distribution module is used for distributing optimization weights to each landmark according to the current identity of each landmark in the three-dimensional landmark set with updated identity by applying a differentiated optimization weight distribution strategy so as to generate a landmark identity association optimization weight set; And the weighting binding adjustment optimization module applies a landmark identity association optimization weight set to carry out binding adjustment optimization on the camera motion trail and the three-dimensional map structure so as to output an optimized camera pose and an optimized global map.

Description

VSLAM feature point optimization method and system based on semantic stability Technical Field The invention belongs to the technical field of computer vision and robots, and relates to a VSLAM feature point optimization method and system based on semantic stability. Background In visual synchronous localization and mapping (SLAM) applications, especially in long-term operation or large-scale exploration scenarios, the system faces serious challenges arising from environmental dynamics. The real world is filled with movable objects such as pedestrians, vehicles or rearranged furniture, which are erroneously identified by algorithms as part of a static environment, resulting in accumulated positioning errors and distortions of the map structure, eventually rendering the map and positioning results unreliable, which is a core problem to be solved by the current technology. A solution commonly adopted in the industry is to globally correct camera motion trajectories and three-dimensional maps through backend optimization algorithms, such as binding adjustments. The method takes all the historical camera pose and the observed three-dimensional landmark points as variables, and performs joint optimization by minimizing all the observed reprojection errors, so as to obtain a globally consistent solution. The method has good performance in static scenes, and is a standard component for the rear-end optimization of the visual SLAM system. Based on the problems, the three-dimensional map is deformed in an indiscriminate processing mode, and the camera track also deviates from the real path, so that the long-term stability and the positioning accuracy of the system are affected. Disclosure of Invention In a first aspect, the present invention provides a semantic stability-based VSLAM feature point optimization method, which adopts the following technical scheme: The VSLAM feature point optimization method based on semantic stability comprises the following steps: S1, acquiring scene image data with continuous time sequences, and generating an image frame to be processed carrying initial pose information based on scene image data processing with the continuous time sequences; S2, identifying semantic information and three-dimensional feature points in an image frame to be processed carrying initial pose information, and initializing the three-dimensional feature points according to the semantic information to generate an initial identity three-dimensional landmark set containing structural landmarks and instantaneous landmarks; S3, calculating and recording position deviation of the instantaneous landmarks in the initial identity three-dimensional landmark set in a cross-period repositioning event so as to establish and update a long-term stability assessment file; S4, based on the long-term stability assessment file, performing qualification judgment on the instant landmark according to a preset threshold condition of identity promotion so as to generate a landmark identity migration instruction; S5, executing a landmark identity migration instruction, and changing landmark identities in the initial identity three-dimensional landmark set to generate an updated three-dimensional landmark set; S6, according to the current identity of each landmark in the three-dimensional landmark set with updated identity, applying a differentiated optimization weight distribution strategy to distribute optimization weights for each landmark so as to generate a landmark identity association optimization weight set; And S7, binding, adjusting and optimizing the camera motion trail and the three-dimensional map structure by applying the landmark identity association optimization weight set so as to output the optimized camera pose and the optimized global map. The further scheme of the invention generates the image frame to be processed carrying the initial pose information, and comprises the following steps: starting a vision sensor, continuously collecting images in the environment, and forming an image sequence containing a time stamp; the method comprises the steps of preliminarily estimating camera motion of each frame of image relative to a previous frame by analyzing characteristic point matching relations between adjacent images, and adding an initial camera pose estimated value for each frame of image; and packaging the image frames containing the initial pose estimation values to generate the image frames to be processed carrying the initial pose information. According to a further aspect of the invention, generating an initial identity three-dimensional landmark set comprising a structural landmark and an instantaneous landmark, comprises the steps of: Operating a semantic segmentation model on an image frame to be processed carrying initial pose information so as to identify semantic information of a fixed object type or a potential movable object type to which each pixel in the image belongs; extracting visual features fro