CN-122023718-A - Real-time high-precision 3D face reconstruction and display system based on deep learning

CN122023718ACN 122023718 ACN122023718 ACN 122023718ACN-122023718-A

Abstract

The application relates to the technical field of computer vision and graphics, and discloses a real-time high-precision 3D face reconstruction and display system based on deep learning, which comprises a light field emission module, a light source module and a light source module, wherein the light field emission module is used for emitting space-time coding multispectral light fields to faces; the device comprises a synchronous acquisition module, a processing module and a spatial decoding channel, wherein the synchronous acquisition module is used for synchronously acquiring a coded image sequence formed after face reflection, the processing module is used for processing the coded image sequence through the spectral decoding channel to acquire a first surface attribute containing a material perceived surface normal and a material parameter, and the processing module is used for processing the coded image sequence through the spatial decoding channel in parallel to acquire a second surface attribute containing a geometric perceived surface normal. The method does not need special hardware, reduces the system cost and the application threshold, can acquire high-precision microscopic geometric and dynamic material information through double-channel decoding and collaborative fusion, does not depend on large-scale training data, and has good adaptability.

Inventors

ZHU HAIQING

Assignees

四维视图技术(汕尾)有限公司

Dates

Publication Date: 20260512
Application Date: 20260129

Claims (10)

1. The real-time high-precision 3D face reconstruction and display system based on deep learning is characterized by comprising: The light field transmitting module is used for transmitting a space-time coding multispectral light field aiming at a human face; the synchronous acquisition module is used for synchronizing with the light field emission module and acquiring a coded image sequence formed after the human face reflects the space-time coded multispectral light field; The processing module is connected with the light field emission module and the synchronous acquisition module, and the processing module is as follows: processing the coded image sequence through a spectral decoding channel to obtain a first surface attribute; Processing the coded image sequence through a spatial decoding channel to obtain a second surface attribute; And generating final surface attributes of the face based on the first surface attributes and the second surface attributes through a collaborative fusion engine, and using the final surface attributes for reconstructing and displaying the three-dimensional face.
2. The depth learning based real-time high precision 3D face reconstruction and display system according to claim 1, wherein the space-time coded multi-spectral light field transmitted by the light field transmitting module is formed by respectively displaying images superimposed with a preset high frequency spatial coding pattern in a preset time slice of a plurality of spectral channels.
3. The depth learning based real-time high precision 3D face reconstruction and display system of claim 1, wherein the spectral decoding channel is: processing the coded image sequence to separate out base images of different spectral channels; estimating a texture-aware surface normal of the face as part of the first surface attribute based on a differential operation between the base images of at least two different spectral channels; wherein the differential operation is expressed as: ; Wherein, the Is the pixel coordinates of the image and, To the pixel coordinates A differential image value at which the image is derived, Base image in pixel coordinates for blue channel The value at which the value is to be calculated, Base image in pixel coordinates for red channel A value at.
4. A real-time high-precision 3D face reconstruction and display system based on deep learning as recited in claim 3, wherein said spectral decoding channel is further configured to estimate a texture parameter of said face as another part of said first surface property based on a base image of said different spectral channels.
5. The depth learning based real-time high precision 3D face reconstruction and display system of claim 2, wherein the spatial decoding channel is: Separating a reflection pattern of the high frequency spatial coding pattern from the coded image sequence; Estimating a geometrically perceived surface normal of the face as the second surface property based on a geometric deformation between the reflection pattern and an original pattern of the high frequency spatial coding pattern.
6. The depth learning based real-time high precision 3D face reconstruction and display system of claim 1, wherein the collaborative fusion engine is: generating a fusion weight map based on the material parameters in the first surface attribute; And using the fusion weight map to perform weighted fusion on the material-aware surface normal in the first surface attribute and the geometric-aware surface normal in the second surface attribute to generate a final surface normal which is a part of the final surface attribute. Wherein the weighted fusion is expressed as: ; Wherein, the For the pixel coordinates corresponding to the surface points, To the pixel coordinates At the end of the surface normal of the substrate, To the pixel coordinates The fusion weight value at which to locate, To the pixel coordinates A material-aware surface normal to the surface, To the pixel coordinates A geometrically perceived surface normal at which, Is a vector normalization function.
7. The depth learning based real-time high precision 3D face reconstruction and display system of claim 1, wherein the processing module is further configured to: And applying the final surface attribute as a dynamic texture map to a basic face grid so as to realize real-time rendering and display of the three-dimensional face.
8. A real-time high-precision 3D face reconstruction and display system based on deep learning as claimed in claim 3 wherein said base image is obtained by applying a low pass filter to said sequence of encoded images.
9. The depth learning based real-time high precision 3D face reconstruction and display system of claim 2, wherein the high frequency spatial coding pattern is one or a combination of a checkerboard pattern, a random noise pattern, or a sinusoidal fringe pattern.
10. A real-time high-precision 3D face reconstruction and display method based on deep learning according to any one of claims 1 to 9, characterized by comprising the steps of: s1, transmitting a space-time coding multispectral light field aiming at a human face; s2, synchronously collecting a coded image sequence formed after the face reflects the space-time coded multispectral light field; s3, processing the coded image sequence through a spectrum decoding channel to obtain a first surface attribute; s4, processing the coded image sequence through a space decoding channel to obtain a second surface attribute; S5, generating a final surface attribute of the face based on the first surface attribute and the second surface attribute through a collaborative fusion engine, and using the final surface attribute for reconstructing and displaying the three-dimensional face.

Description

Real-time high-precision 3D face reconstruction and display system based on deep learning Technical Field The invention relates to the technical field of computer vision and graphics, in particular to a real-time high-precision 3D face reconstruction and display system based on deep learning. Background Three-dimensional face digitization with high fidelity is a key basic technology in a plurality of application fields such as metauniverse, virtual reality, film and television special effects, medical cosmetology and the like. The existing three-dimensional face reconstruction technical scheme is mainly divided into two main types. One type is based on passive vision, and usually uses a deep learning model to regress the geometry and texture of a three-dimensional face from a single or multiple two-dimensional images. Although the scheme has low requirements on hardware, the scheme can be realized by only using a common camera, but the scheme essentially solves a pathological problem, and the accuracy and detail sense of the reconstruction result are seriously dependent on the quality and coverage of a large-scale training data set. Therefore, the generated model often lacks microscopic surface details of the real face, such as pores, fine wrinkles, and the like, and the robustness of the effect is significantly affected when the model faces the illumination, gesture or face features which are not covered in the training dataset. Another type is based on active measurement, for example, using dedicated hardware devices such as structured light or time-of-flight sensors to directly obtain depth information. The scheme can realize higher geometric reconstruction precision, but the dependence on specific hardware leads to higher overall cost and larger volume of the system, and is difficult to popularize on general consumer-level equipment. In addition, the traditional structured light scheme can meet challenges when measuring highly reflective or dark surfaces, and mainly focuses on the acquisition of geometric shapes, and has technical defects for synchronously capturing the dynamic material properties of the skin of the face along with the change of expression and illumination. Disclosure of Invention Aiming at the defects of the prior art, the invention provides a real-time high-precision 3D face reconstruction and display system based on deep learning, which solves the problems of insufficient reconstruction precision, data dependence and high cost and limited universality of a method based on a passive vision and learning model and a method based on special hardware for active measurement. In order to achieve the above purpose, the first aspect of the invention provides a real-time high-precision 3D face reconstruction and display system based on deep learning, which comprises a light field emission module, a synchronous acquisition module and a processing module. The light field transmitting module is used for transmitting a space-time coding multispectral light field aiming at a human face. In one embodiment, the light field emission module is a display screen, which displays images superimposed with a predetermined high-frequency spatial coding pattern in a predetermined time slice of a plurality of spectrum channels, so as to form the space-time coded multispectral light field. The high frequency spatial coding pattern may be one of a checkerboard pattern, a random noise pattern, or a sinusoidal stripe pattern, or a combination thereof. The synchronous acquisition module is used for synchronizing with the light field emission module and acquiring a coded image sequence formed after the face reflects the space-time coded multispectral light field. In one embodiment, the synchronous acquisition module is a standard camera. The processing module is connected with the light field transmitting module and the synchronous acquisition module. The processing module comprises a spectrum decoding channel, a space decoding channel and a collaborative fusion engine. The spectral decoding channel is configured to process the encoded image sequence to obtain a first surface property, in particular, the spectral decoding channel applies a low-pass filter to the encoded image sequence to separate base images of different spectral channels, and then, the spectral decoding channel estimates a texture-aware surface normal of the face based on a differential operation between the base images of at least two different spectral channels, the differential operation being expressed as: ; Wherein, the Is the pixel coordinates of the image and,To the pixel coordinatesA differential image value at which the image is derived,Base image in pixel coordinates for blue channelThe value at which the value is to be calculated,Base image in pixel coordinates for red channelA value at. In addition, the spectrum decoding channel estimates the material parameters of the face based on the base images of the different spectrum channels. Th