CN-122023714-A - Binocular three-dimensional vector map construction and merging method and system

CN122023714ACN 122023714 ACN122023714 ACN 122023714ACN-122023714-A

Abstract

The invention provides a binocular stereo vector map construction and combination method and system, wherein the method comprises the steps of extracting sparse three-dimensional support points from binocular images, generating three-dimensional triangular grids, and detecting scene planes through plane parameter space clustering; the method comprises the steps of extracting vertical plane signs based on a scene plane and projecting to generate plane projection line segments, reconstructing a vector plane map by solving an optimization problem with track and topology constraint, establishing cross-session plane map association and calculating initial transformation based on geometric comparison, consistency map construction and maximum group search of the vector plane map, optimizing relative pose through multi-stage point cloud registration, transforming and fusing observed quantity to a historical map coordinate system, and generating a combined map through global optimization. The invention realizes real-time and robust plan reconstruction under the indoor environment with weak textures and obvious visual angles and illumination changes, and completes reliable association and merging of the multi-session map by taking the plan as a global structure prior.

Inventors

WU YIHONG
WEI HAO
WANG HAOLIN

Assignees

中国科学院自动化研究所

Dates

Publication Date: 20260512
Application Date: 20251229

Claims (10)

1. The binocular stereo vector map constructing and merging method is characterized by comprising the following steps: Extracting sparse three-dimensional support points from a binocular image, generating a three-dimensional triangular mesh based on the sparse three-dimensional support points, mapping triangles in the three-dimensional triangular mesh to a plane parameter space, and clustering and detecting a scene plane in the plane parameter space, wherein the sparse three-dimensional support points are used for representing a plane structure in the scene; Extracting a vertical plane landmark based on a detected scene plane, projecting the vertical plane landmark to the ground to generate a plane projection line segment, constructing and solving a binary selection optimization problem with track constraint and topology closure constraint, and selecting a subset from the plane projection line segment to reconstruct a two-dimensional plane map to obtain a vector plane map; Based on the vector planar map, establishing a planar map association between the current vector planar map and the historical vector planar map through geometric comparison, consistency map construction and maximum group search, calculating initial transformation between the current vector planar map and the historical vector planar map based on the planar map association, optimizing relative pose between the current vector planar map and the historical vector planar map through multi-stage point cloud registration based on the initial transformation, transforming and fusing observed quantity of the current vector planar map to a historical vector planar map coordinate system based on the optimized relative pose, and executing global optimization to generate a combined map.
2. The method of claim 1, wherein generating a three-dimensional triangular mesh based on the sparse three-dimensional support points, mapping triangles in the three-dimensional triangular mesh to a planar parameter space, comprises: Triangulating by taking the two-dimensional coordinates of the sparse three-dimensional support points on the image plane as vertexes to obtain a two-dimensional triangle grid; Projecting each triangle in the two-dimensional triangle mesh to a three-dimensional space by using the three-dimensional coordinates of the sparse three-dimensional support points to generate an initial three-dimensional triangle mesh; Trimming the initial three-dimensional triangular mesh according to a geometric quality criterion, and removing the degraded triangle to obtain an optimized three-dimensional triangular mesh; calculating a unit normal vector of each triangle in the optimized three-dimensional triangular mesh; and calculating a coordinate point corresponding to each triangle in the optimized three-dimensional triangular grid in a plane parameter space according to the unit normal vector of the triangle and the distance from the triangle to the origin of coordinates.
3. The method of claim 1, wherein clustering the detected scene planes in the plane parameter space comprises: In the plane parameter space, clustering the coordinate points obtained by triangle mapping based on density to obtain a plurality of point sets, wherein each point set corresponds to a potential coplanar region; determining sparse three-dimensional support points corresponding to each triangle generating the point sets aiming at each point set; Performing plane fitting on the sparse three-dimensional support points determined by each point set through a random sampling consistency algorithm to obtain a fitted plane model; Calculating the internal point proportion of each fitting plane model, wherein the internal point proportion is the proportion of the number of the sparse three-dimensional support points with the distance from the fitting plane model smaller than a set threshold value to the total number of the sparse three-dimensional support points for fitting the model; and screening out a fitting plane model with the interior point proportion meeting a set threshold value based on the calculated interior point proportion, and taking the fitting plane model as a detected scene plane.
4. The method of claim 1, wherein the extracting a vertical plane landmark based on the detected scene plane, projecting the vertical plane landmark to the ground, and generating a plane projection line segment, comprises: selecting a plane with normal vector meeting perpendicularity condition from the detected scene planes; Re-fitting the plane based on the inner points of the selected plane by utilizing a random sampling consistency algorithm to obtain a vertical plane road sign; Projecting the inner points of each vertical plane road sign to the ground plane to form a two-dimensional point set; Performing straight line fitting or convex hull calculation on each two-dimensional point set to obtain an initial two-dimensional line segment; and combining the initial two-dimensional line segments which are close in orientation and adjacent in space to form a final plane projection line segment.
5. The method for constructing and merging a binocular stereo vector map according to claim 1, wherein the constructing and solving a binary selection optimization problem with a trajectory constraint and a topology closure constraint, reconstructing a two-dimensional plan from a subset of the planar projection line segments to obtain a vector planar map comprises: Taking all plane projection line segments as a candidate line segment set, and defining a binary selection variable for each candidate line segment; Establishing a binary selection optimization problem, wherein the optimization target is to minimize a composite energy function, the composite energy function comprises a data fitting item, a data coverage item and a model complexity item, the optimization problem meets track constraint and topology closure constraint, the track constraint refers to candidate line segments penetrated by a moving track are not selected, and the topology closure constraint refers to that the connectivity of vertexes formed by intersecting the candidate line segments meets preset closure or opening conditions; Solving the binary selection optimization problem to obtain an optimal value of each binary selection variable; selecting binary selection variables from the candidate line segment set according to the optimal value of each binary selection variable to indicate the binary selection variables as selected line segments, and forming an optimal line segment subset for reconstruction; and connecting and combining the line segments in the optimal line segment subset to form the vector plane map.
6. The method of building and merging a binocular stereoscopic vector map according to claim 1, wherein the building a plan association between a current vector planar map and a history vector planar map based on the vector planar map through geometric comparison, consistency map building and maximum clique searching, and calculating an initial transformation between the current vector planar map and the history vector planar map based on the plan association, comprises: Geometric comparison is carried out on the wall line segments in the current vector planar map and the historical vector planar map, and an initial line segment association pair set is established based on line segment length similarity; based on the initial line segment association pair set, constructing an undirected consistency graph, wherein nodes are association pairs, and edges represent consistency of angles, distances and end point connection relations between the association pairs; Searching the maximum group in the undirected consistency graph to obtain an associated subset meeting geometric consistency constraint; based on the associated subsets, respectively constructing a decentralised coordinate matrix of the current vector planar map and the historical vector planar map; singular value decomposition is carried out on the two decentralised coordinate matrixes, and an optimal rotation matrix for aligning two groups of point sets is calculated based on a decomposition result, wherein the two groups of point sets refer to point sets in a relevant line segment of a current vector planar map and a historical vector planar map; Calculating a translation vector based on the optimal rotation matrix and the centroid difference; constructing an initial transformation assumption by using the optimal rotation matrix and the translation vector, and verifying track consistency; The validated initial transformation hypothesis is determined as the initial transformation.
7. The method of claim 1, wherein optimizing the relative pose between the current vector planar map and the historical vector planar map by multi-stage point cloud registration based on the initial transformation, transforming and fusing the observed quantity of the current vector planar map to the historical vector planar map coordinate system based on the optimized relative pose, and performing global optimization to generate the merged map comprises: Performing first-stage point cloud registration, wherein the first-stage point cloud registration uses the initial transformation as an initial value, aggregates point clouds corresponding to the matched plane signs in the current vector plane map and the historical vector plane map, and outputs a first-stage optimization pose; Performing second-stage point cloud registration, wherein the second-stage point cloud registration takes the first-stage optimized pose as an initial value, builds a local point cloud map based on a current vector planar map, registers the local point cloud map with a historical vector planar map, and outputs a final relative pose; calculating the appearance similarity of the current key frame and the key frame in the historical vector planar map, comprehensively evaluating by combining the geometric consistency of map point projection, and screening the final candidate key frame from the historical vector planar map; performing feature guide matching between the final candidate key frame and the current key frame by taking the final relative pose as initial transformation, and optimizing the relative pose based on a matching result to obtain an optimized relative pose; Transforming all observables in the current vector planar map to a historical vector planar map coordinate system by using the optimized relative pose; Identifying and marking repeated road signs from two maps in a historical vector planar map coordinate system through spatial proximity and geometric consistency judgment, deleting secondary road sign examples in the repeated road sign group, and only reserving one representative road sign to form a duplicate elimination fusion map; And sequentially carrying out local beam method adjustment, pose map optimization and global beam method adjustment on the deduplication fusion map to obtain the combined map.
8. A binocular stereoscopic vector map construction and merging system, comprising: the plane extraction module is used for extracting sparse three-dimensional support points from the binocular image, generating a three-dimensional triangular grid based on the sparse three-dimensional support points, mapping triangles in the three-dimensional triangular grid to a plane parameter space, and clustering and detecting a scene plane in the plane parameter space, wherein the sparse three-dimensional support points are used for representing a plane structure in the scene; The plane map reconstruction module is used for extracting a vertical plane landmark based on the detected scene plane, projecting the vertical plane landmark to the ground to generate a plane projection line segment, constructing and solving a binary selection optimization problem with track constraint and topology closure constraint, and selecting a subset from the plane projection line segment to reconstruct a two-dimensional plane map to obtain a vector plane map; The map merging module is used for establishing a plan relation between the current vector planar map and the historical vector planar map through geometric comparison, consistency map construction and maximum group search based on the vector planar map, calculating initial transformation between the current vector planar map and the historical vector planar map based on the plan relation, optimizing relative pose between the current vector planar map and the historical vector planar map through multi-stage point cloud registration based on the initial transformation, transforming and merging observables of the current vector planar map to a historical vector planar map coordinate system based on the optimized relative pose, and executing global optimization to generate a merged map.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and running on the processor, wherein the processor implements the binocular stereo vector map construction and merging method of any one of claims 1 to 6 when the computer program is executed by the processor.
10. A non-transitory computer readable storage medium having stored thereon a computer program, wherein the computer program when executed by a processor implements the binocular stereo vector map construction and merging method of any of claims 1 to 6.

Description

Binocular three-dimensional vector map construction and merging method and system Technical Field The invention relates to the technical field of computer vision, in particular to a binocular stereo vector map construction and combination method and system. Background The indoor plan is important basic data of indoor robot navigation and high-level semantic understanding, and has wide application in scenes such as home-based senior service robots, autonomous flight platforms in complex buildings and the like. The real-time generation of the plan can not only provide geometric structure prior for positioning, but also provide semantic clues for high-level decisions, so that the engineering value in a large-scale indoor scene is increasingly prominent. However, prior art solutions still have significant limitations in achieving real-time, robust plan reconstruction and cross-session map merging, especially when relying only on low cost visual sensors, it is difficult to construct and utilize stable environmental structural representations under weak textures, large viewing angles and illumination variations. The concrete steps are as follows: There are significant limitations to the prior art methods in terms of plan view reconstruction. One type of method relies on off-line flow, which requires obtaining a complete environment map and reconstructing a plan view, and is difficult to meet the requirement of on-line application, and the other type of method relies on expensive sensors such as laser radar, RGB-D camera or deep neural network with high calculation cost, thereby improving deployment cost and system complexity. Synchronous positioning and mapping systems based on monocular or binocular cameras, while attempting to vectorize map construction with high-level geometric elements such as planes, generally lack the ability to understand the overall scene layout, making it difficult to obtain a complete, compact indoor vectorized representation. At the same time, plane extraction in weak texture scenes is not yet robust enough. In the aspects of map merging and global positioning, a visual synchronous positioning and map building system has become a key technology in the fields of robots, augmented reality and the like, but a multi-session synchronous positioning and map building system still faces challenges under the conditions of obvious visual angles and illumination changes. Traditional methods based on manual features do not have enough invariance to viewing angles and illumination, learning-based methods have limited generalization capability and are prone to perceptual confusion due to lack of global context. Global localization methods based on geometry, such as point cloud matching, are more robust to changes in viewing angle and illumination, but typically rely on expensive sensors. In addition, on the point cloud with sparse or simple structure, the partial geometric descriptors are insufficient in distinguishing property, a large number of pseudo matches are easy to generate, and the optimal solution searching difficulty is increased. Therefore, a plan view reconstruction and multi-session map merging technical scheme which is oriented to an indoor large-scale scene and has real-time performance and engineering usability needs to be provided, robust plane extraction and incremental plan view construction are realized under low-cost sensor configuration, global structure prior can be provided to support cross-session map association and merging, and rough-to-fine local registration and feature matching are realized at a key frame level, so that overall robustness and accuracy of a system are improved in a complex visual angle and illumination change environment. Disclosure of Invention The invention provides a binocular three-dimensional vector map construction and combination method and system, which are used for solving the problems that in the prior art, expensive sensors or high calculation force are relied on, robust environment structure representation is difficult to construct in real time under weak texture and severe visual angle illumination changes, and reliable combination of cross-session maps is realized. According to the invention, under the condition of only relying on binocular cameras and conventional computing resources, real-time and robust plan reconstruction is realized in an indoor environment with weak textures and obvious visual angles and illumination changes, and reliable association and combination of the multi-session map are completed by taking the plan as a global structure priori. The technical scheme provided by the invention is as follows: in a first aspect, the present invention provides a binocular stereo vector map construction and merging method, including: Extracting sparse three-dimensional support points from a binocular image, generating a three-dimensional triangular mesh based on the sparse three-dimensional support points, mapping triangles in the t