CN-121999114-A - Three-dimensional scene reconstruction method under complex environment

CN121999114ACN 121999114 ACN121999114 ACN 121999114ACN-121999114-A

Abstract

The invention discloses a three-dimensional scene reconstruction method in a complex environment, which is characterized by capturing scene information and constructing a three-dimensional model by adopting a neural network, and is characterized in that: the step of capturing scene information comprises the steps of firstly carrying out standardized processing such as color correction, noise removal and the like on a scene high-resolution image captured from multiple angles, and then extracting depth information of each pixel through geometric layout to ensure consistency and integrity of data. The method comprises the steps of constructing a three-dimensional model by adopting a neural network, optimizing a sampling process and simplifying input information according to acquired scene data information by spatial decomposition and assuming that the intersection part of a ray and a sampling point is a sphere, and improving network modeling efficiency, wherein the network comprises a density neural network and a color neural network, respectively outputting the volume density and the color information of a scene, and integrating the density and the color information by a volume rendering technology to generate a high-quality three-dimensional model. The invention obviously improves the accuracy and efficiency of three-dimensional reconstruction.

Inventors

KURBAN UBUL
Xi Peiming
YU SENWEN

Assignees

新疆大学
席培铭

Dates

Publication Date: 20260508
Application Date: 20241103

Claims (4)

1. A three-dimensional scene reconstruction method under a complex environment is used for reconstructing a three-dimensional scene and is characterized by comprising the following two steps: (1) Capturing scene information, namely capturing a scene from a plurality of view angles by using a high-resolution camera to ensure that enough view angles are obtained to cover all details and depth information in a complex scene, and performing standardized processing on a captured image, including color correction and noise removal, so as to obtain high-quality scene data information; (2) The three-dimensional model method comprises the steps of obtaining corresponding pose information and camera parameters according to collected scene data information, sampling scene data through pose information and camera parameter simulation rays, assuming an intersection part between rays passing through an origin of a camera and sampling points as a sphere, sampling a region in the sphere, optimizing a ray sampling process by creating an occupied grid to enable a space region to skip sampling so as to improve efficiency, decomposing the sampled space information into feature vectors of three orthogonal multi-level texture planes to represent position information of a space by using space decomposition, training the obtained feature vectors as input of the neural network, wherein the neural network is divided into two parts, namely a first part is a density neural network, the feature vectors representing the space information are used as input of the network, the output is the volume density of the scene information, a second part is a color neural network, the input of the neural network is the output of the first neural network, the direction information subjected to ball harmonic coding treatment is added, the output is the color of the scene information, and finally, the volume density and the color of the three-dimensional model of the scene are rendered by using a rendering body technology.
2. The method of claim 1, wherein the capturing of the scene from the plurality of view angles by the high-resolution camera in the step (1) ensures that enough view angles are obtained to cover all details and depth information in the complex scene, the normalizing the captured image including color correction and noise removal, and obtaining high-quality scene data information comprises the steps of: (1) Capturing scene information, setting a plurality of cameras, capturing the scene information from different angles, and ensuring that all areas in a complex scene are covered; (2) Ensuring high-quality image data, carrying out color correction on the captured image, ensuring that colors are consistent among different cameras, then performing noise removal, removing interference information in the image, and improving the definition and accuracy of the image; (3) Extracting depth information, and calculating the depth information of each pixel point according to the geometric layout of the camera; (4) Generating standardized scene data, performing standardized processing on all captured images and depth information to ensure that the resolution, the size and a coordinate system of all data are consistent, and adopting common image processing technologies such as smoothing, sharpening and contrast adjustment to improve the data quality; (5) And (3) sorting the data, ensuring the integrity and consistency of the scene information, and deleting redundant or unnecessary data, thereby obtaining high-quality scene data information.
3. The method for constructing a three-dimensional model of a complex scene using a neural network according to claim 1, wherein the method comprises the steps of (2) obtaining corresponding pose information and camera parameters according to collected scene data information, sampling scene data by using pose information and camera parameter model rays, sampling a region in a sphere by assuming a sphere at an intersection part between rays passing through a camera origin and sampling points, optimizing a ray sampling process by creating an occupied grid so that a white region skips sampling to improve efficiency, then decomposing the sampled space information into feature vectors of three orthogonal multi-stage texture planes to represent spatial position information by using spatial decomposition, training the obtained feature vectors as inputs of the neural network, wherein the neural network is divided into two parts, namely, a first part density neural network, the feature vectors representing the spatial information are used as inputs of the network, the second part is a color neural network, the inputs of the neural network are used as inputs of the first neural network, the direction information is processed by ball harmonic coding, the outputs as colors of the information, and finally, the scene is rendered by the three-dimensional model of the scene by the color rendering method comprising the steps of: (1) Using pose information and parameters of the camera, light rays emanating from the origin of the camera are simulated. Determining the path of the ray in the scene by the formula r (t) =o+td, where o is the camera origin, d is the direction vector, t is the distance; (2) Multi-level texture coding uses a cone projection method to determine the portion of a ray projected from the camera origin that intersects the scene. In the process of light sampling, an intersection point between the light and a scene is assumed to be a sphere, and the radius of the sphere is determined according to the focal length and the distance between the origin of the camera and the sampling point, so that sampling is ensured to be in a specific area of space; (3) The ray sampling process is optimized by creating an occupancy grid. The occupancy grid uses binary markers to ensure that blank areas are skipped and unnecessary sampling is avoided in areas without objects. The efficiency of light sampling can be remarkably improved by using the occupied grids; (4) And decomposing the spatial information of the sampling points into feature vectors of three orthogonal multi-stage texture planes by using spatial decomposition, so as to realize multi-resolution feature description of the three-dimensional space. Each plane has a feature vector for representing the position and features of the space; (5) The three-dimensional model of the scene is represented by a neural network, which is composed of two parts. The first part is a dense neural network, the input is a feature vector derived from a multi-level texture plane, and the output is the volume density of the scene information. The output of the partial network may represent the density of objects in the scene; (6) The second part is a color neural network, and the input is the output of the density neural network and the direction information subjected to the spherical harmonic coding process. The output is the color information of the scene, and the color is output by combining the direction of the light rays and the characteristics in the scene through a color neural network; (7) Weighting and integrating the density and the color of each sampling point by using a volume rendering technology to synthesize the density and the color information into a three-dimensional scene model, thereby rendering the three-dimensional scene model; in the step (2), the step of determining the radius of the sphere according to the focal length and the distance between the origin of the camera and the sampling point refers to calculating the radius of the sphere according to the formula (1): (1) wherein f is the focal length, and the radius of the circle where the camera ray intersects the sampling point is defined by Obtaining, wherein o is the origin of the camera, and d is the direction vector; In step (4), the spatial information of the sampling points is decomposed into feature vectors of three orthogonal multi-stage texture planes by using spatial decomposition, and the corresponding multi-stage texture query level is calculated by formula (2): (2) Wherein the method comprises the steps of Is a multi-stage texture The radii Bmax and Bmin of the feature elements in the base level of (b) are the maximum and minimum angles, respectively, of the Axis Aligned Bounding Box (AABB) of the three-dimensional space of interest.
4. The method for reconstructing a three-dimensional scene in a complex environment according to claim 3, wherein in the steps (5) to (7), the three-dimensional model of the scene is represented by a neural network, which means that feature vectors of spatial information based on multi-level texture coding are learned through the neural network, thereby better representing voxel information and colors of the scene, and finally, density and color information are synthesized into the three-dimensional scene model by weighting and integrating the density and the colors of each sampling point through a volume rendering technology.

Description

Three-dimensional scene reconstruction method under complex environment Technical Field The invention relates to the technical field of image processing and analysis, in particular to a three-dimensional scene reconstruction method in a complex environment. Background Three-dimensional reconstruction techniques are an advanced computational process aimed at automatically recovering the three-dimensional structure of an object or scene from two-dimensional data. With the development of technology and the increasing demand for depth information, how to accurately recover a complete three-dimensional model from a limited perspective becomes a key challenge in this research field. Three-dimensional reconstruction focuses on using algorithms and computer vision techniques to automatically construct an accurate representation of a three-dimensional space by analyzing multiple images or other forms of measurement data. This approach not only provides more spatial and shape information than two-dimensional images, but also reproduces the real appearance and dimensions of the object in environments that cannot be directly measured or accessed. As an important component based on computer vision technology, the three-dimensional reconstruction has very wide application prospect in the fields of cultural relic protection, medical imaging, robot navigation and the like. The three-dimensional reconstruction method based on the neural network represents an innovative breakthrough in the field of three-dimensional reconstruction, and aims to overcome the limitation of the traditional three-dimensional reconstruction method. By learning successive volumetric representations of a scene from sparse image data through a deep neural network, high quality images from never-seen perspectives can be generated, providing new possibilities for three-dimensional scene rendering. Unlike traditional three-dimensional reconstruction methods based on geometric or optical principles, neural network-based three-dimensional reconstruction methods do not directly rely on feature point matching, direct measurement of depth information, or specific hardware devices. It simulates the color and density of each point in a scene by training a deep neural network, using a sparse set of two-dimensional images as input. The network learns how to predict the color and density of light passing through the scene, thereby enabling high quality images to be stained at any viewing angle. The three-dimensional reconstruction technology based on the neural network still has a plurality of problems which are not solved (1) because the structure of the neural network is complex, the neural network requires a great deal of time for optimization calculation, (2) because the multi-layer perceptron in the neural network is not fine enough for processing high-frequency details, the model reconstructed in three dimensions in a complex environment is low in precision, and (3) the capability of rapidly and highly accurately processing scene details under different scale observations is lacking. Disclosure of Invention The invention aims to overcome the defects and shortcomings in the prior art and provide a three-dimensional scene reconstruction method under a complex environment, which not only can effectively realize the generation of a three-dimensional scene model under the complex environment, but also has higher detection efficiency, and can realize the three-dimensional model reconstruction of the scene more quickly under the complex environment, thereby improving the accuracy and the effectiveness of the three-dimensional model. In order to achieve the purpose, the invention is realized by the following technical scheme that the three-dimensional scene reconstruction method in the complex environment is used for generating a three-dimensional model of the complex scene and is characterized by comprising the following two steps: The method comprises the steps of capturing scene information, capturing the scene from multiple view angles by using a high-resolution camera to ensure that enough view angles are obtained to cover all details and depth information in a complex scene, and carrying out standardization processing on captured images, including color correction and noise removal, so as to obtain high-quality scene data information. And secondly, constructing a three-dimensional model of the complex scene by adopting a neural network. The method comprises the steps of acquiring corresponding pose information and camera parameters according to acquired scene data information, sampling scene data through pose information and camera parameter simulation rays, presuming an intersection part between rays passing through an origin of a camera and sampling points as a sphere, sampling an area in the sphere, optimizing a ray sampling process by creating an occupied grid to enable a blank area to skip sampling so as to improve efficiency, decomposing the sampled spa