CN-116721231-B - Three-dimensional reconstruction method and system for extensible scene based on unmanned aerial vehicle-mounted positioning

CN116721231BCN 116721231 BCN116721231 BCN 116721231BCN-116721231-B

Abstract

The invention discloses an extensible scene three-dimensional reconstruction method and system based on unmanned aerial vehicle airborne positioning, wherein three-dimensional reconstruction is carried out on the basis of color images and depth images acquired by a depth camera carried on an unmanned aerial vehicle, the method comprises the steps of uniformly sampling particles in a unit sphere in a 6D state space to obtain a particle swarm template, carrying out surface measurement on each input frame of depth image, projecting pixels of the depth image into a three-dimensional space, calculating normals of each three-dimensional point, simultaneously calculating a segmentation normal map, adaptively distributing reconstructed voxel memories in a GPU according to correlation among the three-dimensional point normal maps, tracking a fast-moving camera pose by only utilizing the depth image and a constructed three-dimensional model existing in a current GPU active space, constructing a dense three-dimensional point cloud model based on a measurement value of sensor noise fused with a truncated symbol distance field TSDF along with continuous transmission of a depth image frame sequence, and extracting the three-dimensional grid model when visualization is needed to generate the three-dimensional grid model.

Inventors

HUI HAIGANG
GOU GUOHUA
WANG XUANHAO
ZHANG HAO
LI JIAJIE
WANG SHENG
DENG HONGXING
XU GUILIN

Assignees

武汉大学

Dates

Publication Date: 20260508
Application Date: 20230517

Claims (10)

1. The three-dimensional reconstruction method of the extensible scene based on the unmanned aerial vehicle-mounted positioning is characterized by comprising the following steps of performing three-dimensional reconstruction based on color images and depth images acquired by a depth camera carried on the unmanned aerial vehicle, Step 1, uniformly sampling particles in a unit sphere in a 6D state space to obtain a particle swarm template, wherein any one particle represents an effective rigid body motion in a three-dimensional space; Step 2, carrying out surface measurement on each input frame of depth image, projecting pixels of the depth image into a three-dimensional space according to camera internal parameters, calculating the normal line of each three-dimensional point, and simultaneously calculating a segmentation normal line diagram; Step 3, adaptively distributing reconstructed voxel memories in the GPU according to the correlation among three-dimensional point normal lines to realize sparse expression of the surface, and managing the distributed voxel space after the camera leaves the current reconstruction area; Step 4, tracking the pose of the camera which moves rapidly by using the depth image and the constructed three-dimensional model existing in the current GPU active space, and calculating the relative position and the pose of the camera when the current frame is shot; and 5, along with continuous transmission of the depth image frame sequence, a dense three-dimensional point cloud model is constructed based on the measured value of the truncated symbol distance field TSDF fusion with sensor noise, and a three-dimensional surface is extracted when visualization is required to generate a three-dimensional grid model.
2. The method for reconstructing the extensible scene based on the unmanned aerial vehicle airborne positioning of claim 1, wherein in the step 1, a fixed number of particles are uniformly pre-sampled in a unit sphere of a 6D state space, each particle represents a rigid motion in a three-dimensional space, a pre-sampled particle set is called a particle swarm template, a multi-stage particle swarm template is constructed, and support pose estimation is used in combination with depth images with different resolutions.
3. The three-dimensional reconstruction method of an expandable scene based on unmanned aerial vehicle airborne positioning of claim 1, wherein in the step 2, the depth image preprocessing is performed by the following method, Calculating from depth camera reference matrix K and depth image I d to obtain vertex graph Step is based on vertex graph Calculating a corresponding normal map In a normal line diagram On the basis of (1) obtaining a segmentation normal map by calculating by using a region growing method
4. The three-dimensional reconstruction method of an extensible scene based on unmanned aerial vehicle airborne positioning of claim 1, wherein in the step 3, the sparse representation of the scene surface constrained by normal correlation is performed by the following method, When a new frame of depth image in the sequence is transmitted into the reconstruction system, traversing each pixel in the depth image, and constructing a ray for each depth measurement I d (u) and the photographing center Setting the cut-off range of TSDF as mu, along the ray Creating a line segment in the truncated band from I d (u) -mu to I d (u) +mu, obtaining the coordinates of voxels on the line segment from coarse to fine, calculating the hash value h of the line segment, checking whether the voxel block b corresponding to the hash value is allocated with memory in the GPU, if not, creating new entries in a hash table and a voxel block array for the distribution, using a segmentation normal map as a mask, distributing voxel blocks on the resolution level of the coarsest level for pixels in a segmentation plane area, directly distributing voxel space on the secondary coarse level beyond the coarsest level for pixels outside the segmentation area, and implicitly pointing to the position of the sub-block on the corresponding coarsest level hash table entry through a special mark; after the voxel memory allocation is completed, calculating the roughness r (b) of the surface of the voxel block b based on the correlation of the normals, and taking the roughness r (b) as an execution standard of fusing and subdividing voxels; Dividing the current level voxels when the roughness is greater than a division threshold t s , and merging the current level voxels when the roughness is less than a merging threshold t m ; Defining a spherical active space containing the current camera field of view, setting the spherical center of the active space to be positioned at a distance of l/2 from the depth camera according to the effective depth range l for different depth cameras, wherein the radius is l, when the camera moves to enable the current scene to leave the camera field of view, the corresponding surface voxels are also moved out of the active space, the voxels are transferred from the CPU to the GPU, and when the camera revisits the area which is rebuilt, the voxels are transferred from the GPU to the CPU.
5. The method for three-dimensional reconstruction of an extensible scene based on unmanned aerial vehicle airborne positioning of claim 1, wherein in said step 4, the realization of fast camera pose estimation by depth image only is realized by Step 4.1, sampling from the particle swarm template, and generating a segmentation normal map by the current depth image As a mask, projecting the depth image to the three-dimensional world by using pose particles with six degrees of freedom, checking whether the pose particles are positioned in an active space, filtering a particle set outside the active space, calculating the proportion of a segmentation block in an L 0 -level voxel block, and eliminating the pose particles if the proportion of the segmentation block is smaller than a corresponding threshold; Step 4.2, calculating a t-th frame depth image based on the six-degree-of-freedom pose particles constrained by step 4.1 Candidate optimal pose for kth iteration Degree of adaptation between Degree of adaptation More than the optimal pose of the previous iteration As the particle sampling range of the next iteration; step 4.3, setting three particle swarm templates with different resolutions and combinations of depth images in each iteration, and sequentially using the particle swarm templates and the depth images in the iteration; And 4.4, setting the condition of iteration termination as that the pose change of the six degrees of freedom before and after the optimization iteration is respectively smaller than a corresponding threshold value or the iteration number is larger than an iteration upper limit, and repeating the steps 4.2 and 4.3 until the iteration termination condition is triggered.
6. The method for three-dimensional reconstruction of an expandable scene based on unmanned aerial vehicle positioning according to claim 1, wherein in step 5, the reconstruction of the scene surface using the depth image with sensor noise is achieved by, Step 5.1, introducing a confidence factor alpha based on the fusion of the truncated symbol distance field TSDF, and observing the number of times of the model point Weighted average of Wherein the method comprises the steps of Sigma is an empirical value, gamma represents the normalized radial distance of the depth camera measurement error from the center of the camera; And 5.2, extracting the scene surface by adopting a ray casting method, and if a point X= (X, Y, Z) and voxels around the point are one level when interpolation of non-integral elements is carried out, solving by a known tri-linear interpolation coefficient through 8 corner points of the voxel at the point, and if the voxel at the point and voxels around the point have cross-level, respectively giving equations of interpolation function forms to eight grid points around the point by using the interpolation function form ：F(X,Y,Z)＝z 1 XYZ+a 2 XY+a 3 YZ+a 4 XZ+a 5 X+a 6 Y+a 7 Z+a 8 , to form a linear equation set.
7. An extensible scene three-dimensional reconstruction system based on unmanned aerial vehicle airborne positioning is characterized by being used for realizing the extensible scene three-dimensional reconstruction method based on unmanned aerial vehicle airborne positioning according to any one of claims 1-6.
8. The three-dimensional reconstruction system of an extensible scene based on unmanned aerial vehicle airborne positioning of claim 7, comprising the following modules, The first module is used for uniformly sampling particles in a unit sphere in a 6D state space to obtain a particle swarm template, wherein any one particle represents an effective rigid body motion in a three-dimensional space; The second module is used for carrying out surface measurement on each input frame of depth image, projecting pixels of the depth image into a three-dimensional space according to camera internal parameters, calculating the normal line of each three-dimensional point, and calculating a segmentation normal line diagram; The third module is used for adaptively distributing and reconstructing voxel memories in the GPU according to the correlation among three-dimensional point normal lines to realize sparsity expression of the surface, and managing the distributed voxel space after the camera leaves the current reconstruction area; The fourth module is used for tracking the pose of the camera which moves rapidly by using the depth image and the constructed three-dimensional model existing in the current GPU active space, and calculating the relative position and the pose of the camera when the current frame is shot; and a fifth module, configured to construct a dense three-dimensional point cloud model based on the measured value of the truncated symbol distance field TSDF fused with sensor noise along with continuous input of the depth image frame sequence, extract a three-dimensional surface when visualization is required, and generate a three-dimensional grid model.
9. The three-dimensional reconstruction system of an extensible scene based on unmanned aerial vehicle onboard positioning according to claim 7, wherein the system comprises a processor and a memory, wherein the memory is used for storing program instructions, and the processor is used for calling the stored instructions in the memory to execute the three-dimensional reconstruction method of the extensible scene based on unmanned aerial vehicle onboard positioning according to any one of claims 1 to 6.
10. The three-dimensional reconstruction system of an extensible scene based on unmanned aerial vehicle airborne positioning of claim 7, comprising a readable storage medium, wherein the readable storage medium is stored with a computer program, and the computer program, when executed, implements a three-dimensional reconstruction method of an extensible scene based on unmanned aerial vehicle airborne positioning as set forth in any one of claims 1 to 6.

Description

Three-dimensional reconstruction method and system for extensible scene based on unmanned aerial vehicle-mounted positioning Technical Field The invention belongs to the technical field of three-dimensional reconstruction, and particularly relates to a three-dimensional reconstruction technical scheme of an extensible scene of an unmanned aerial vehicle on-board real-time positioning. Background The real-scene three-dimensional model is used as an important data base for representing the real world and is widely applied to various fields such as smart cities, indoor designs, disaster prevention and relief, augmented reality, mixed reality and the like. Conventional three-dimensional reconstruction typically takes a two-dimensional color image as input and outputs a reconstructed three-dimensional model. However, the three-dimensional model is usually not complete enough and has low sense of reality due to the fixed input data. In addition, the huge calculation amount of multi-view depth estimation also leads to low reconstruction efficiency, so that the traditional three-dimensional reconstruction technology is difficult to expand to some application scenes with high requirements on real-time performance, such as augmented reality, mixed reality, unmanned aerial vehicle navigation, post-disaster search of unknown closed environments and the like. Although synchronous positioning and mapping (SLAM) technology can utilize a monocular color camera to realize real-time reconstruction of a scene, a sparse point cloud model generated by a monocular SLAM system only supports basic scene reconstruction, and real and visual three-dimensional scene expression is difficult to provide. Depth cameras (RGB-D cameras) based on time-of-flight or structured light provide reliable, dense depth measurements in highly integrated miniaturized devices. By means of a monocular color lens and a set of depth perception devices, the depth camera can capture color images (RGB images) and depth images (DEPTH IMAGE) of the surrounding environment with sufficient resolution and real-time speed, and the method provides possibility for real-time reconstruction of scenes and real-time updating of models. With the cost reduction of depth cameras, the depth cameras are increasingly favored by academia and industry, so that three-dimensional reconstruction technology based on the depth cameras is vigorously developed in the last decade. In combination with these features, consumer level depth cameras even precede some of the more expensive 3D scanning systems, especially for use in consumer level solutions. Currently, the three-dimensional reconstruction operation mode based on the depth camera is mostly that the depth camera is held by a hand, and the collected color image and the collected depth image are processed on a high-performance data processing terminal in real time, wherein the three-dimensional reconstruction operation mode comprises the steps of autonomous positioning and attitude determination of the depth camera (CAMERA TRACKING), and construction and updating of a three-dimensional model (Update Reconstruction). The existing three-dimensional reconstruction method based on the depth camera has three defects that firstly, the application condition of the system is limited to slow camera movement, which is generally less than 1m/s, under the fast camera movement, the reconstruction system is extremely easy to collapse, secondly, depending on high-performance computing equipment, the system can only show real-time performance under the condition that the high-performance graphics computing equipment is supported, on-board computers with limited computing resources are difficult to embody, thirdly, the three-dimensional reconstruction method is limited to limited graphics card memory resources, the reconstructable scene range is limited to a fixed and very limited size, and when the camera movement exceeds the range, the reconstruction system stops working. Due to the technical defects, the unmanned aerial vehicle with the rapid movement is greatly limited to reconstruct a large-scene three-dimensional model in real time by using an inexpensive, miniaturized and computer-borne with limited computing resources. Disclosure of Invention The invention mainly aims to overcome the defects and shortcomings of the prior art, and provides an unmanned aerial vehicle airborne real-time positioning and expandable scene three-dimensional reconstruction method. In order to achieve the above purpose, the technical proposal of the invention is an expandable scene three-dimensional reconstruction method based on unmanned aerial vehicle onboard positioning, which performs three-dimensional reconstruction based on color images and depth images acquired by a depth camera carried on an unmanned aerial vehicle, Step 1, uniformly sampling particles in a unit sphere in a 6D state space to obtain a particle swarm template, wherein any one particle represe