CN-120107481-B - Single-image three-dimensional head reconstruction method and system based on guided diffusion model

CN120107481BCN 120107481 BCN120107481 BCN 120107481BCN-120107481-B

Abstract

The invention discloses a single-image three-dimensional head reconstruction method and a system based on a guide diffusion model, wherein the method comprises the steps of obtaining a head portrait from a single image to be reconstructed; reconstructing a three-dimensional head model according to the head portrait, mapping the three-dimensional head model onto the head portrait in a weak projection mode under a predicted camera matrix, obtaining a three-dimensional head model with front textures by utilizing pixel information of the head portrait, performing cylindrical UV expansion on the three-dimensional head model with the front textures to obtain a two-dimensional UV texture map, recording a UV mapping relation, inputting the two-dimensional UV texture map into a guiding diffusion network model to predict textures of an invisible area, repairing the textures to obtain a complete two-dimensional UV texture map, and reversely pasting the complete two-dimensional UV texture map back to the three-dimensional head model. The invention can reconstruct a three-dimensional head model with high fidelity, no artifact and rich and vivid texture details from individual single portrait pictures of different ages, sexes, race and facial expressions.

Inventors

LIU LEYUAN
CHEN XIAOQING
QIAN YUFEI
CHEN JINGYING
LIU SANNVYA
YANG ZONGKAI

Assignees

华中师范大学

Dates

Publication Date: 20260508
Application Date: 20250221

Claims (10)

1. A single image three-dimensional head reconstruction method based on a guided diffusion model is characterized by comprising the following steps: preprocessing a single image to be reconstructed to obtain a head portrait; Reconstructing a three-dimensional head model from the head portrait, optimizing the reconstructed three-dimensional head model, and recovering smooth details of the surface; The three-dimensional head model with smooth surface details is subjected to weak projection mapping on the head portrait under a predicted camera matrix, and pixel information of the head portrait is used as texture information of corresponding three-dimensional points, so that the three-dimensional head model with front textures is obtained; Performing cylindrical UV expansion on the three-dimensional head model with the front texture to obtain a two-dimensional UV texture map and recording a UV mapping relation; Inputting the two-dimensional UV texture map into a trained guiding diffusion network model to perform conditional prediction on textures of invisible areas in the two-dimensional UV texture map, and repairing to obtain a complete two-dimensional UV texture map; And reversely pasting the complete two-dimensional UV texture map back to the three-dimensional head model with the front texture by utilizing the UV mapping relation.
2. The method for reconstructing a single image three-dimensional head based on a guided diffusion model according to claim 1, wherein the step of reversely attaching the complete two-dimensional UV texture map back to the three-dimensional head model with front texture further comprises the step of inputting the complete UV texture map to a super-resolution network model to improve the resolution of the complete UV texture map.
3. The guided diffusion model-based single image three-dimensional head reconstruction method according to claim 1, wherein the weakly projected mapping of the three-dimensional head model with the restored surface smooth details onto the head portrait under the predicted camera matrix, and using pixel information of the head portrait as texture information of the corresponding three-dimensional points, thereby obtaining the three-dimensional head model with front texture comprises the steps of: performing interpolation processing on the three-dimensional head model with smooth surface details restored, marking the sitting of the three-dimensional points of the three-dimensional head model after the interpolation processing as V (X, Y, Z), and simultaneously taking the three-dimensional points with the values of more than 0.5 on the Z-axis of the three-dimensional head model as three-dimensional points of the front texture to be determined and the other three-dimensional points as invisible areas of the texture to be predicted; mapping three-dimensional points V (X, Y, Z) in a three-dimensional space onto the head portrait through weak projection mapping transformation under a predicted camera matrix to obtain two-dimensional pixel coordinate points V (X, Y) corresponding to each three-dimensional point V (X, Y, Z); and taking the pixel information of the two-dimensional pixel coordinate point V (X, Y) of the head portrait as texture information of the corresponding three-dimensional point V (X, Y, Z), thereby obtaining the three-dimensional head model with front textures.
4. The method for reconstructing a single image three-dimensional head based on a guided diffusion model according to claim 3, wherein the calculation formula of the weak projection mapping transformation is: Where f is the camera focal length.
5. A single image three-dimensional head reconstruction method based on a guided diffusion model according to claim 3, wherein said taking pixel information of a two-dimensional pixel coordinate point V (X, Y) of said head portrait as texture information of a corresponding three-dimensional point V (X, Y, Z) comprises the steps of: Extracting neighborhood information of a two-dimensional pixel coordinate point v (x, y) of the head portrait, determining the color of the two-dimensional pixel coordinate point v (x, y) according to the neighborhood information of the two-dimensional pixel coordinate point v (x, y), and marking the color as C x,y ; The color C x,y of the two-dimensional pixel coordinate point V (X, Y) is taken as texture information of the corresponding three-dimensional point V (X, Y, Z).
6. The guided diffusion model-based single image three-dimensional head reconstruction method according to claim 1, wherein the cylindrical UV expansion of the frontal textured three-dimensional head model comprises the steps of: Performing cylindrical UV expansion on the three-dimensional head model with the front texture, and marking the corresponding point in the UV space after performing cylindrical UV expansion on the three-dimensional points V (X, Y, Z) as T (u, V); The color of each triangular patch of the three-dimensional head model with the front texture is rendered onto a UV color area corresponding to a UV space through diffuse reflection, and the calculation formula is as follows: Wherein R (-) represents diffuse reflection rendering, area (u 1...n ,v 1...n ) represents a color region surrounded by a plurality of UV coordinates in UV space, area (V 1...n ) represents a triangle patch color region in a three-dimensional head model with front texture, Representing the two-dimensional UV texture map, Representing the frontal texture of the three-dimensional head model.
7. The guided diffusion model-based single image three-dimensional head reconstruction method of claim 6, wherein the UV mapping relationship is: Where ρ is the distance of the three-dimensional point V (X, Y, Z) to the Z-axis, θ is the angle of rotation counterclockwise from the positive X-axis direction to the projection of the point (X, Y) on the xy-plane, Z min is the minimum of the three-dimensional point of the three-dimensional head model in the Z-axis direction, and Z max is the maximum of the three-dimensional point of the three-dimensional head model in the Z-axis direction.
8. The guided diffusion model-based single image three-dimensional head reconstruction method according to claim 1, wherein the training of the guided diffusion network model comprises the steps of: The method comprises the steps of obtaining a two-dimensional UV texture map through UV mapping on a three-dimensional head model sample, exposing different proportions of the two-dimensional UV texture map to form a training picture sample set, and simultaneously obtaining masks of invisible areas corresponding to each training picture; And training the diffusion network model by using a training picture sample set.
9. The guided diffusion model-based single image three-dimensional head reconstruction method of claim 8, wherein the diffusion network model has a loss function of: where x is a training picture, P is a step size set during denoising, γ represents a current noise level, Representing the added noise of the diffusion network model predicted from x and y, epsilon representing the added noise for each time step, Representing a noise image The loss obtained by fitting the original training picture x after each time step is denoised, Z ε,γ represents the noise of the predicted current time step, the step P indicates that the noise reduction process is stopped from step 1 until step P, and T indicates matrix transposition.
10. A guided diffusion model-based single image three-dimensional head reconstruction system, comprising: The preprocessing module is used for preprocessing the single image to be reconstructed to obtain a head portrait; the reconstruction grid module is used for reconstructing a three-dimensional head model from the head portrait, optimizing the reconstructed three-dimensional head model and recovering smooth surface details; the weak projection coloring module is used for mapping the three-dimensional head model with smooth surface details restored to the head portrait in a weak projection mode under a predicted camera matrix, and taking pixel information of the head portrait as texture information of a corresponding three-dimensional point so as to obtain a three-dimensional head model with front textures; the cylindrical UV mapping module is used for performing cylindrical UV unfolding on the three-dimensional head model with the front texture to obtain a two-dimensional UV texture map and recording a UV mapping relation; The UV texture map guiding and repairing module is used for inputting the two-dimensional UV texture map into a trained guiding and diffusing network model, so as to perform conditional prediction on textures of invisible areas in the two-dimensional UV texture map, repairing to obtain a complete two-dimensional UV texture map, and reversely pasting the complete two-dimensional UV texture map back to the three-dimensional head model with the front textures by utilizing the UV mapping relation.

Description

Single-image three-dimensional head reconstruction method and system based on guided diffusion model Technical Field The invention belongs to the fields of computer vision and computer graphics, and in particular relates to a single-image three-dimensional head reconstruction method and system based on a guide diffusion model. Background Single image three-dimensional head reconstruction refers to the automatic generation of a three-dimensional head model represented by a mesh (mesh) from a single portrait image using an algorithm. With the rise of the meta universe, three-dimensional head reconstruction shows great application prospect. In addition, the three-dimensional head reconstruction has wide application prospect in the fields of film and television production, game entertainment, intelligent education and the like. At present, two modes of obtaining the three-dimensional head model are approximately adopted, namely (1) manual modeling or three-dimensional head model reconstruction based on a high-precision three-dimensional scanner and a stereoscopic vision system, and (2) three-dimensional head model reconstruction based on a deep learning technology. The mode (1) can obtain a high-precision three-dimensional head model, but the used equipment (such as a laser ranging three-dimensional scanner) is high in price, low in operability and difficult to popularize and apply, and an individual easily shakes in the scanning process to generate noise so as to cause incomplete scanning of the three-dimensional head model. And the (2) three-dimensional head model is directly reconstructed from the single portrait image based on the deep learning mode, so that the cost can be saved to the greatest extent and the convenience of operation can be greatly improved. In recent years, researchers in the field have successively proposed a series of methods based on deep learning to learn a priori knowledge from the data, but most of the research has focused on the problem of reconstruction of geometry, with little attention paid to reconstruction of the entire head texture. Although the prior art proposes methods of estimating three-dimensional shape and texture details of the head from unrestricted input (e.g., wild images), there are still a number of problems. The ideal reconstruction should be high fidelity and artifact free, more specifically it should faithfully convey the human head pose and texture details in the image, the three-dimensional head model generated should be a complete head without holes, without missing parts, and also not be a non-human head shape and other artifacts. The existing texture reconstruction methods mainly can be divided into two types, namely a texture reconstruction method based on a three-dimensional grid and a texture reconstruction method based on new view angle synthesis. Early researchers proposed reconstructing a three-dimensional head model on a 3DMM (three-dimensional deformable model) basis in combination with UV (two-dimensional texture coordinates corresponding to vertex information of geometry) texture mapping. However, 3DMM itself has limitations in that it can only make rough geometric estimations and uses a uniform template. Even in the case where the full UV texture can be restored, its fixed template causes the resulting result to be far from that of a real human head, the difference in the reconstruction result is quite significant, and the hair portion is not fully considered in the reconstruction process. In addition, there is a method of reconstructing a mesh by inputting image features and repairing defective textures using a diffusion model. Although the current diffusion model shows some effectiveness in repairing UV maps, the prior art research still fails to achieve the desired effect, and does not take into account the situation of the invisible area of the back surface during UV repair. In addition, there is another study to mainly use an image synthesis method to generate a head under each view angle, but only a small part of the images can generate a view angle of 360 degrees, and meanwhile, the problems of low resolution, incomplete images, inconsistent generation results and reality exist. In summary, although a single-image three-dimensional head texture reconstruction method has been advanced to some extent in recent years, the following problems still remain (1) that the reconstructed three-dimensional model has rough texture and low fidelity. While the texture of the three-dimensional head model reconstructed by some methods appears very realistic, it is difficult to match the identity of the person in the input image. (2) The texture of the invisible area of the input image cannot be reconstructed or the detail is less and the precision is poor. Although some methods can restore the texture of the facial expression, the expression ability is weak for the back texture portion other than the facial region. (3) The generalization capability of th