CN-122023652-A - Structured light three-dimensional reconstruction method based on frequency domain-airspace double-domain fusion

CN122023652ACN 122023652 ACN122023652 ACN 122023652ACN-122023652-A

Abstract

The invention discloses a structured light three-dimensional reconstruction method and system based on frequency domain-airspace double-domain fusion, and belongs to the technical field of deep learning and three-dimensional measurement. The method comprises the steps of collecting and preprocessing stripe images, respectively extracting spatial domain features and frequency domain features through a double-branch encoder, wherein the frequency domain encoder adopts 2D-FFT transformation and a complex convolution network to explicitly extract phase, frequency and modulation degree information, carrying out bidirectional attention fusion on the spatial domain features and the frequency domain features on 4 scales through a cross-domain attention fusion module, utilizing a gating mechanism to adaptively adjust fusion weights, utilizing the frequency domain features to guide an up-sampling process through a frequency domain guiding decoder, combining attention residual jump connection to recover spatial resolution, outputting a depth map and carrying out post-processing optimization. The invention fully utilizes the frequency domain physical priori of the structured light stripe, and realizes the three-dimensional reconstruction of the single-frame stripe image with high precision and high efficiency.

Inventors

WANG QIAN
LI MING
MA JUNHAO

Assignees

北京工业大学

Dates

Publication Date: 20260512
Application Date: 20260128

Claims (4)

1. The structured light three-dimensional reconstruction method based on the frequency domain-airspace double-domain fusion is characterized by comprising the following steps of: s1, performing system calibration on a camera and a projector to obtain conversion relations among the camera, the projector and a world coordinate system; S2, collecting and preprocessing a stripe image, namely projecting a sinusoidal stripe pattern to the surface of an object to be detected by using a projector, collecting a deformed stripe image by using a camera, and carrying out normalization processing on the image; s3, extracting features through a double-branch encoder: 1) Carrying out multi-layer convolution coding on the stripe image through a space domain coder, and extracting multi-scale space domain features; 2) Carrying out frequency domain transformation on the stripe image through a frequency domain coder, and extracting multi-scale frequency domain features; S4, performing feature fusion through a cross-domain attention fusion module, namely generating a first attention weight based on the spatial domain features for adjusting the frequency domain features, generating a second attention weight based on the frequency domain features for adjusting the spatial domain features, and adaptively adjusting the contribution ratio of the two domain features through a gating mechanism and fusing; s5, upsampling through a frequency domain guide decoder, namely generating guide weights by utilizing frequency domain features, upsampling the fusion features, enhancing the fusion features through the guide weights, and recovering the spatial resolution layer by layer; s6, outputting the depth map and performing post-processing optimization.
2. The method according to claim 1, wherein the frequency domain encoder extracts frequency domain features including phase information, frequency information and modulation degree information by performing frequency domain transformation on the fringe image in the step S3.
3. The structured light three-dimensional reconstruction method based on the frequency domain-space domain fusion according to claim 1, wherein the cross-domain attention fusion module in step S4 comprises: 1) Generating a Query based on the airspace feature, generating a Key and a Value based on the frequency domain feature, calculating a first attention weight and enhancing the frequency domain feature; 2) Generating a Query based on the frequency domain features, generating Key and Value based on the airspace features, calculating a second attention weight and enhancing the airspace features; 3) Generating fusion weights through a gating mechanism, and carrying out self-adaptive weighted fusion on the enhanced spatial domain characteristics and the enhanced frequency domain characteristics.
4. The method for three-dimensional reconstruction of structured light based on frequency domain-spatial domain fusion according to claim 1, wherein the up-sampling process of the frequency domain pilot decoder in step S5 comprises: 1) Generating a guiding weight by utilizing the frequency domain characteristics; 2) Upsampling the fusion feature; 3) Enhancing the upsampling feature with a guide weight; 4) Merging with the encoder features and restoring spatial resolution layer by layer.

Description

Structured light three-dimensional reconstruction method based on frequency domain-airspace double-domain fusion Technical Field The invention relates to the technical field of three-dimensional measurement and deep learning, in particular to a structured light three-dimensional reconstruction method based on frequency domain-airspace double-domain fusion. Background The structured light three-dimensional measurement is widely applied to the fields of industrial detection, robot navigation, cultural relic restoration, medical measurement and the like due to simple system, high measurement speed and high precision. The typical system consists of a projector and a camera, and is characterized in that periodic stripes are projected to the surface of an object, a deformed stripe image is acquired, the corresponding relation between projection coordinates and the surface of the object is restored through phase encoding and decoding, and finally three-dimensional shape information of the object is acquired through triangulation. Conventional structured light three-dimensional reconstruction methods are mainly classified into two types, i.e., a multi-frame method (e.g., a phase shift method, a gray code method) and a single-frame method (e.g., a Fourier Transform Profilometry (FTP)). The multi-frame method has the advantages of high precision, sub-pixel level, long acquisition and processing time and incapability of meeting real-time measurement requirements, and is used for projecting a plurality of images (usually 3-12 images). The single frame method has the advantage of high speed and only needs single frame image. The method has the defects of lower precision, easiness in being influenced by noise, serious loss of detail information and difficulty in meeting the requirement of high-precision measurement. In recent years, deep learning technology has made breakthrough progress in the field of computer vision, and researchers have begun to apply the deep learning technology to three-dimensional reconstruction of structured light in an attempt to achieve single-frame high-precision reconstruction. The existing deep learning method mainly adopts an end-to-end mode, and a depth map is directly predicted from a stripe image. Although the existing deep learning method improves the accuracy and speed of single frame reconstruction to a certain extent, the existing method has the following problems that firstly, the existing method mainly only performs feature extraction and processing in a space domain (image domain), and the physical essence of the structured light stripe image is ignored. In practice, the fringe image is essentially a modulation of a sinusoidal signal, whose frequency domain characteristics (frequency, phase, modulation degree) are directly related to the depth information of the object. Traditional methods such as Fourier Transform Profilometry (FTP) are based on frequency domain analysis to achieve three-dimensional reconstruction, but existing deep learning methods have failed to make efficient use of this method. Second, existing deep learning methods tend to lose high frequency detail information (e.g., edges, textures) during downsampling of the encoder. Meanwhile, spatial domain features are mainly relied on in the up-sampling process, and recovery capacity is limited, so that a reconstruction result is blurred in an edge area and details are unclear. Finally, in the depth abrupt change region (object edge and shielding boundary), the existing method generally has the problems of precision reduction and outline blurring, and the reconstruction quality is affected. These drawbacks seriously affect the application of the deep learning technique in the field of three-dimensional reconstruction. Disclosure of Invention In order to solve the problems, the invention provides a structured light three-dimensional reconstruction method based on frequency domain-airspace double-domain fusion, which comprises the following steps: s1, performing system calibration on a structured light three-dimensional measurement system to obtain a space conversion relation among a camera, a projector and a world coordinate system; S2, collecting and preprocessing stripe images to obtain normalized single-frame deformed stripe images; s3, carrying out feature extraction on the stripe image based on the double-branch encoder structure, and respectively acquiring airspace features and frequency domain features; S4, carrying out self-adaptive fusion on the features with different scales through a variable-scale fusion module; s5, adopting a decoder structure and combining attention residual jump connection to reconstruct fusion characteristics step by step; s6, outputting an absolute phase diagram, and combining system calibration parameters to finish phase-to-depth conversion so as to realize three-dimensional morphology reconstruction. In the step S1, the internal parameters and the external parameters of the camera an