CN-121837514-B - Three-dimensional reconstruction generation method and device for complex scene

CN121837514BCN 121837514 BCN121837514 BCN 121837514BCN-121837514-B

Abstract

The invention belongs to the technical fields of computer vision, three-dimensional reconstruction, scene perception and algorithm design, and discloses a three-dimensional reconstruction generating method and device for a complex scene. A three-dimensional reconstruction generating method for a complex scene comprises a transparent and reflecting object detection module, a three-dimensional reconstruction refinement generating module and a next scanning site prediction module. A three-dimensional scanning device facing complex scenes comprises an RGB image acquisition unit and a polarization camera. The invention realizes the three-dimensional reconstruction with high quality, high efficiency and high autonomy of transparent and reflective objects and complex shielding scenes, and can be widely applied to the fields of industrial detection, cultural relic protection, reverse engineering and the like.

Inventors

YANG XIN
YIN XUEFENG
ZHANG ZHAOXUAN
JI YINGLIAN
CUI YAN
YIN BAOCAI

Assignees

大连理工大学

Dates

Publication Date: 20260512
Application Date: 20260313

Claims (4)

1. The three-dimensional reconstruction generating method for the complex scene is characterized by comprising a transparent and reflective object detection module, a three-dimensional reconstruction refinement generating module and a next scanning site prediction module, and comprises the following specific steps of: step1, a transparent and reflective object detection module; Aiming at an object with transparent and reflective characteristics in a target scene, a transparent object depth estimation frame PolarDepth based on polarization guidance is provided for estimating the depth of a transparent and reflective area of the object, polarDepth takes an RGB image acquired by an RGB image acquisition unit in a three-dimensional scanning device and a linear polarization degree image DoLP and a linear polarization angle image AoLP acquired by a polarization camera as inputs, adopts a double-branch structure, wherein an RGB branch consists of a monocular depth estimation network module for predicting a preliminary depth map Z 0 of the target scene, a polarization branch consists of a shape decoding module, and extracts the polarization characteristics of geometric parameters and material properties of the object surface according to the linear polarization degree image DoLP and the linear polarization angle image AoLP, and the geometric parameters of the estimated object surface are zenith angle theta and azimuth angle The material property is refractive index eta, zenith angle theta and azimuth angle By the formula [ sin theta cos ,sinθsin The cos theta T is converted into a surface normal line so as to obtain a surface normal line diagram N, and the preliminary depth diagram Z 0 output by RGB branches and the surface normal line diagram N output by polarization branches are sent into a depth correction module so as to obtain a corrected depth diagram Z; Step 2, a three-dimensional reconstruction refinement generation module; the specific implementation process of the step 2 is as follows: Firstly, according to the corrected depth map Z and the RGB image of the corresponding target scene, an initialized point cloud of the target scene to be reconstructed is obtained by projection, and is preprocessed, namely each point in the initialized point cloud P is obtained Calculating k neighbor point set thereof Average distance of (2) And standard deviation If it is To the point of Average distance of (2) Marking as outliers and removing to finally obtain a point cloud with outliers removed : Second, for point clouds after outlier removal Gradually add noise, warp After steps generate : Wherein, the For all of From the slave To the point of Is a continuous multiplication of (a); , Scheduling parameters for noise, the values of which are derived from the diffusion process Linearly increase to ; To meet the standard normal distribution random noise, then carrying out inverse denoising process, and predicting noise based on conditional diffusion model Gradual updating reverse denoising point cloud Where c is a conditional feature comprising a texture mask map: Wherein, the , To meet the standard of random noise for a positive distribution, Scheduling parameters for noise, the values of which are derived from the diffusion process Linearly increase to ; The dense point cloud is obtained after the reverse denoising process The Poisson equation is solved for grid reconstruction to obtain dense point cloud Is expressed by triangle gridding: Wherein, the Is a three-dimensional spatial carpollaston operator, , For the divergence of the vector field, Is a dense point cloud Three-dimensional space coordinates of each point in the (3); and 3, a next scanning station prediction module for autonomously predicting the position of the next scanning station.
2. The method of generating a three-dimensional reconstruction for a complex scene as recited in claim 1, wherein, The specific implementation process of the step 1 is as follows: firstly, an RGB image of a target scene is acquired through an RGB image acquisition unit, a preliminary depth map Z 0 of the target scene is obtained through a monocular depth estimation network module by the RGB image, the preliminary depth map Z 0 is marked out by a material mask map, the transparent and reflection area of the preliminary depth map Z 0 is marked out, the pixel point with the depth value of 0 is marked as 1, the rest is marked as 0, and meanwhile, the preliminary depth map Z 0 is used as prior input of a depth correction module; Secondly, obtaining a linear polarization degree image DoLP and a linear polarization angle image AoLP by utilizing a multipath polarization image of a target scene shot by a polarization camera at the same visual angle, and calculating geometric parameters and material properties of the object surface with transparent and reflective characteristics according to the linear polarization degree image DoLP and the linear polarization angle image AoLP, wherein the method comprises the following steps that the polarization camera acquires light intensity values in four polarization directions of 0 degree, 45 degree, 90 degree and 135 degree simultaneously in one exposure, and records the light intensity values as 、、、 Converting the light intensity values of the four polarization directions into Stokes vectors, including , wherein, Indicating the total light intensity of the light, Representing the difference in intensity of the linear polarization components in the 0 deg. and 90 deg. directions, The intensity difference of the linear polarization components in the 45 DEG and 135 DEG directions is represented by the following calculation formula: From the following components Calculating a linear polarization degree image DoLP and a linear polarization angle image AoLP, wherein the expression is as follows: Determining relation between the linear polarization degree image DoLP and the linear polarization angle image AoLP and the surface normal direction as well as the refractive index, and mining transparent and reflective areas in a target scene through a shape decoding module according to the linear polarization degree image DoLP and the linear polarization angle image AoLP to obtain the refractive index eta, zenith angle theta and azimuth angle of the object surface Further, the method passes through the formula [ sin theta cos ] ,sinθsin Constructing a surface normal map N on a camera coordinate system by cos theta T ; And performing global optimization on the depth through a depth correction module based on the preliminary depth map Z 0 and the surface normal map N obtained by the shape decoding module, and finally obtaining a corrected depth map Z.
3. The method of generating a three-dimensional reconstruction for a complex scene as recited in claim 1, wherein, The specific implementation process of the step 3 is as follows: First, dense point cloud of current scanning site The method comprises the steps of projecting to a two-dimensional plane to generate an occupied map, classifying each grid in the occupied map into three categories according to the state of the occupied map, namely occupied, free and unknown, extracting a boundary point set in the occupied map on the basis of the occupied map to be all free grids adjacent to the unknown grid; Secondly, constructing a graph structure taking boundary points as centers for representing structural information of a target scene, wherein node features of the graph structure comprise coordinates of the boundary points, the number of the boundary points in a neighborhood where the boundary points are positioned and the geodesic distance between the boundary points and a current scanning site; and finally, based on the fused characteristics, performing action sampling from the boundary point set by utilizing a reinforcement learning strategy network, selecting a next scanning site with global optimum, repeatedly executing the steps 1-3 based on the new scanning site, and completely reconstructing the whole target scene by using the least scanning sites.
4. The three-dimensional scanning device facing the complex scene is characterized in that the three-dimensional scanning device facing the complex scene is used for the three-dimensional reconstruction generating method facing the complex scene according to any one of claims 1-3, and comprises the following components: the RGB image acquisition unit is used as an optical imaging component and is composed of 8 high-definition ultra-wide angle fisheye lenses, and each high-definition ultra-wide angle fisheye lens is provided with an image sensor; the polarization camera comprises an image sensor module, a polarization filtering structure module, an image signal processor module and an interface and control module.

Description

Three-dimensional reconstruction generation method and device for complex scene Technical Field The invention belongs to the technical fields of computer vision, three-dimensional reconstruction, scene perception and algorithm design, and relates to a three-dimensional reconstruction generating method and device for a complex scene. Background In the fields of scene reconstruction, industrial detection, reverse engineering, cultural relic protection and the like, a high-precision three-dimensional modeling model is often required. However, conventional scanners are typically based on lidar or structured light modules for three-dimensional scanning, which is difficult to accommodate for the complex and diverse scene requirements. For example, for materials with specific complex characteristics in a scene, such as transparent objects, the existing method is mostly based on binocular or multi-view parallax, structured light, a depth camera or neural rendering to perform three-dimensional reconstruction, and usually only relies on intensity or RGB texture information to perform geometric inference, so that when refraction, specular reflection, multipath propagation and the like are obvious, problems of unstable depth estimation, boundary dislocation, surface detail deletion and the like are easy to occur, and accurate and reliable geometric prior is difficult to obtain. In contrast, the introduction of polarization imaging can acquire intensity and polarization information in a single frame at the same time, provide additional constraint for physical properties such as normal line, refractive index and the like of the target surface, improve the robustness and precision of depth estimation even in a complex scene with high light, transparent lamination or partial shielding, and provide higher-quality input data for subsequent three-dimensional generation and complementation. Furthermore, existing three-dimensional reconstruction techniques are highly dependent on the integrity of the acquired data. For complex scenes with complex structures and shielding, obtaining the scanning data with no loss and high quality often faces challenges, so that the problems of holes, geometric distortion, detail loss and the like of a reconstructed model are caused. And the three-dimensional generation technology provides a new solving path for the bottleneck. By learning a large-scale three-dimensional data set, the technology can intelligently infer and complement missing geometry and texture according to incomplete input to generate a complete and reasonable refined three-dimensional model, so that the hardware configuration requirement and data acquisition flow of the device are remarkably reduced, and the three-dimensional reconstruction efficiency of a complex scene is improved. Therefore, a set of three-dimensional reconstruction system is needed to be built, and the problems can be solved. Disclosure of Invention The invention aims at solving the problems of easy misalignment and poor integrity of a reconstruction result when a traditional scanning device and a reconstruction method are used for coping with transparent and reflecting objects and complex shielding scenes, and realizes more robust and efficient three-dimensional reconstruction of complex scenes through multi-technology fusion. The method comprises the steps of providing a transparent material object, introducing a polarization camera to accurately sense the optical characteristics of the material, enhancing the detection and characterization capability of the transparent material object, combining a three-dimensional reconstruction generation method, supplementing geometric structures and surface details caused by shielding or material interference from sparse input, improving reconstruction quality and integrity, dynamically planning scanning site distribution by utilizing a next scanning site prediction algorithm, minimizing shielding blind areas, optimizing a data acquisition process, and finally realizing autonomous three-dimensional reconstruction of a complex scene with high quality and high efficiency. The technical scheme of the invention is as follows: a three-dimensional reconstruction generating method facing complex scene comprises a transparent and reflective object detection module, a three-dimensional reconstruction refinement generating module and a next scanning site prediction module, and specifically comprises the following steps: step1, a transparent and reflective object detection module; aiming at an object with transparency and reflection characteristics in a target scene, a transparent object depth estimation frame PolarDepth based on polarization guidance is provided for estimating the depth of a transparent and reflection area of the object, polarDepth takes an RGB image acquired by an RGB image acquisition unit in a three-dimensional scanning device and a linear polarization degree image DoLP and a linear polarization angle image