CN-116416376-B - Three-dimensional hair reconstruction method, system, electronic equipment and storage medium

CN116416376BCN 116416376 BCN116416376 BCN 116416376BCN-116416376-B

Abstract

The invention discloses a three-dimensional hair reconstruction method, a system, electronic equipment and a storage medium, which relate to the technical field of computer graphics and computer vision, wherein the method comprises the steps of obtaining RGB human images to be reconstructed; inputting RGB portrait to be reconstructed into a three-dimensional voxel hair reconstruction model to obtain target three-dimensional voxel hair, training the three-dimensional voxel hair reconstruction model by using a training portrait set, a three-dimensional hair training set and corresponding texts to reconstruct a network, wherein the reconstruction network comprises an image encoder and a trained voxel decoder, the trained voxel decoder is determined according to a trained VAE-Patch_ SNGAN network, the trained VAE-Patch_ SNGAN network trains the VAE-Patch_ SNGAN network by using the three-dimensional hair training set to obtain the target three-dimensional hair image, and reconstructing the target three-dimensional hair image according to the target three-dimensional voxel hair. The invention improves the reconstruction speed and precision of the three-dimensional hair.

Inventors

ZHOU YU
GAO XIN
Rong Chenchu
YU YAO

Assignees

南京大学

Dates

Publication Date: 20260508
Application Date: 20230302

Claims (7)

1. A method of reconstructing three-dimensional hair, the method comprising: acquiring RGB human images to be reconstructed; Inputting the RGB portrait to be reconstructed into a three-dimensional voxel hair reconstruction model to obtain target three-dimensional voxel hair, wherein the three-dimensional voxel hair reconstruction model is obtained by training a reconstruction network by utilizing a training portrait set, a three-dimensional hair training set and texts corresponding to the three-dimensional hair training set, the reconstruction network comprises an image encoder and a trained voxel decoder, the trained voxel decoder is determined according to a trained VAE-patch_ SNGAN network, the trained VAE-patch_ SNGAN network is obtained by training a VAE-patch_ SNGAN network by utilizing the three-dimensional hair training set, and the VAE-patch_ SNGAN network comprises a three-dimensional voxel generator and a first discriminator, and the three-dimensional voxel generator comprises a voxel encoder and a voxel decoder; Reconstructing a target three-dimensional hair image from the target three-dimensional voxel hair; The training process comprises the following steps: Determining a training portrait set, a three-dimensional hair training set and texts corresponding to the three-dimensional hair training set, wherein the training portrait set comprises a plurality of training RGB portraits, and the three-dimensional hair training set comprises occupation true values and direction true values corresponding to a plurality of initial hairline voxels; Determining a hair segmentation mask and a hair pattern of each training RGB portrait; Determining projection view angle parameters according to the training portrait set and the three-dimensional hair training set, wherein the projection view angle parameters comprise a rotation matrix and a translation vector; training the VAE-Patch_ SNGAN network based on the three-dimensional hair training set to obtain a trained VAE-Patch_ SNGAN network; Constructing a reconstruction network based on the image encoder and the trained voxel decoder; rendering the three-dimensional hair training set to obtain a rendered picture set; determining a rendering picture set with a text description based on texts corresponding to the rendering picture set and the three-dimensional hair training set; Training the reconstruction network based on the projection view angle parameters, the hair segmentation mask and the hair pattern of the training RGB portrait and the rendering picture set with the text description to obtain the three-dimensional voxel hair reconstruction model; Training the VAE-Patch_ SNGAN network based on the three-dimensional hair training set to obtain a trained VAE-Patch_ SNGAN network, wherein the training method specifically comprises the following steps of: constructing the VAE-Patch_ SNGAN network based on the three-dimensional voxel generator and the first arbiter; Inputting all the initial hair voxels into the VAE-Patch_ SNGAN network to obtain reconstructed hair voxels corresponding to the initial hair voxels; determining the occupation value and the direction value of each reconstructed hairline voxel; Constructing a first loss function of the VAE-patch_ SNGAN network based on the occupation value and the direction value of each reconstructed hairline voxel and the occupation true value and the direction true value corresponding to each initial hairline voxel; and fine-tuning the VAE-Patch_ SNGAN network based on the first loss function to obtain the trained VAE-Patch_ SNGAN network.
2. The method for reconstructing three-dimensional hair according to claim 1, wherein determining projection viewing angle parameters according to the training portrait set and the three-dimensional hair training set comprises: determining a plurality of 2D face key points in the RGB human images for training; And determining the projection visual angle parameters according to all the 2D face key points and the 3D face key points, wherein the 3D face key points are 3D face key points of a standard human body aligned with hairlines corresponding to hairline voxels in the three-dimensional hair training set.
3. The method according to claim 1, wherein training the reconstruction network to obtain the three-dimensional voxel hair reconstruction model based on the projection view angle parameter, the training RGB portrait hair segmentation mask and hair pattern, and the rendered picture set with text description, specifically comprises: Inputting a hair segmentation mask and a hair pattern of the training RGB portrait into the reconstruction network to obtain a plurality of reconstructed three-dimensional voxel hairs; Determining the occupancy value and the direction value of each reconstructed three-dimensional voxel hair; judging whether each rendering picture with the text description in the rendering picture set with the text description has a three-dimensional true value or not; If so, constructing a second loss function of the reconstruction network according to the occupation truth value and the direction truth value corresponding to the rendering picture with the text description and the occupation value and the direction value of each reconstructed three-dimensional voxel hair, and fine-tuning the reconstruction network based on the second loss function to obtain the three-dimensional voxel hair reconstruction model; If not, inputting the reconstructed three-dimensional voxel hair corresponding to the rendering picture with the text description and without the three-dimensional true value into a micro-volume renderer, rendering based on the projection view angle parameter to obtain a rendered hair segmentation mask and a hair directional diagram, constructing a third loss function of the reconstruction network according to the hair segmentation mask and the hair directional diagram of the training RGB portrait and the rendered hair segmentation mask and the hair directional diagram, and fine-tuning the reconstruction network based on the third loss function to obtain the three-dimensional voxel hair reconstruction model.
4. The method for reconstructing three-dimensional hair according to claim 1, wherein the method for determining the three-dimensional hair training set comprises: Acquiring an initial three-dimensional hair set, wherein the three-dimensional hair set comprises a plurality of groups of initial hair point clouds; And converting each group of initial hair point clouds into initial hair voxels, and determining an occupation value and a hair direction value of each initial hair voxel, thereby determining the three-dimensional hair training set.
5. A three-dimensional hair reconstruction system, the system comprising: The RGB portrait acquisition module is used for acquiring RGB portrait to be reconstructed; The target three-dimensional voxel hair determining module is used for inputting the RGB portrait to be reconstructed into a three-dimensional voxel hair reconstruction model to obtain target three-dimensional voxel hair, wherein the three-dimensional voxel hair reconstruction model is obtained by training a reconstruction network by utilizing a training portrait set, a three-dimensional hair training set and texts corresponding to the three-dimensional hair training set, the reconstruction network comprises an image encoder and a trained voxel decoder, the trained voxel decoder is determined according to a trained VAE-Patch_ SNGAN network, the trained VAE-Patch_ SNGAN network is obtained by training a VAE-Patch_ SNGAN network by utilizing the three-dimensional hair training set, and the VAE-Patch_ SNGAN network comprises a three-dimensional voxel generator and a first discriminator, and the three-dimensional voxel generator comprises a voxel encoder and a voxel decoder; the target three-dimensional voxel hair determining module is used for reconstructing a target three-dimensional hair image according to the target three-dimensional voxel hair; The training process comprises the following steps: Determining a training portrait set, a three-dimensional hair training set and texts corresponding to the three-dimensional hair training set, wherein the training portrait set comprises a plurality of training RGB portraits, and the three-dimensional hair training set comprises occupation true values and direction true values corresponding to a plurality of initial hairline voxels; Determining a hair segmentation mask and a hair pattern of each training RGB portrait; Determining projection view angle parameters according to the training portrait set and the three-dimensional hair training set, wherein the projection view angle parameters comprise a rotation matrix and a translation vector; training the VAE-Patch_ SNGAN network based on the three-dimensional hair training set to obtain a trained VAE-Patch_ SNGAN network; Constructing a reconstruction network based on the image encoder and the trained voxel decoder; rendering the three-dimensional hair training set to obtain a rendered picture set; determining a rendering picture set with a text description based on texts corresponding to the rendering picture set and the three-dimensional hair training set; Training the reconstruction network based on the projection view angle parameters, the hair segmentation mask and the hair pattern of the training RGB portrait and the rendering picture set with the text description to obtain the three-dimensional voxel hair reconstruction model; Training the VAE-Patch_ SNGAN network based on the three-dimensional hair training set to obtain a trained VAE-Patch_ SNGAN network, wherein the training method specifically comprises the following steps of: constructing the VAE-Patch_ SNGAN network based on the three-dimensional voxel generator and the first arbiter; Inputting all the initial hair voxels into the VAE-Patch_ SNGAN network to obtain reconstructed hair voxels corresponding to the initial hair voxels; determining the occupation value and the direction value of each reconstructed hairline voxel; Constructing a first loss function of the VAE-patch_ SNGAN network based on the occupation value and the direction value of each reconstructed hairline voxel and the occupation true value and the direction true value corresponding to each initial hairline voxel; and fine-tuning the VAE-Patch_ SNGAN network based on the first loss function to obtain the trained VAE-Patch_ SNGAN network.
6. An electronic device, comprising: One or more processors; A storage device having one or more programs stored thereon; the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of three-dimensional hair reconstruction of any one of claims 1 to 4.
7. A storage medium having stored thereon a computer program, wherein the computer program when executed by a processor implements the three-dimensional hair reconstruction method according to any one of claims 1 to 4.

Description

Three-dimensional hair reconstruction method, system, electronic equipment and storage medium Technical Field The invention relates to the technical field of computer graphics and computer vision, in particular to a three-dimensional hair reconstruction method, a three-dimensional hair reconstruction system, electronic equipment and a storage medium. Background Hair is an important appearance feature of humans. With the rapid development of industries such as games in recent years, digital virtual person modeling is becoming more important. Because of the complexity of hair geometry, creating high quality three-dimensional hair is one of the most time consuming tasks when modeling high quality virtual human characters. There is thus much effort devoted to achieving automated three-dimensional hair modeling. The existing method for reconstructing hair geometry from pictures mainly comprises two technical schemes of multi-view and single-view. The modeling scheme based on multiple views refers to the traditional multiple view reconstruction technology, depth values are solved through feature matching, rough hair point clouds are obtained, and then hair is constructed according to the point clouds. Such schemes mostly resort to expensive hardware devices such as hair capture systems using omni-directional controllable light sources, RGBD images using Kinect scanning, etc. The multi-view scheme can reconstruct more accurate hair geometry, but has complex equipment and long algorithm processing time, so that the multi-view scheme is not friendly to common users and is difficult to widely deploy and use. The modeling scheme based on the single view is realized by taking an RGB human image as input and using a data driving or deep learning method. The data driven method retrieves a most similar hairstyle as output by comparing the similarity of the input image to the hair in the database, and then further optimizes the retrieved hairstyle to approach the input image. This approach is time consuming and severely dependent on the database, and has limited ability to generate new data. The deep learning method can construct a supervision data set, and the mapping relation between the two-dimensional picture and the three-dimensional hairstyle is learned through the neural network. The supervision dataset, i.e. the dataset of the real hair style picture paired with the corresponding three-dimensional hair style, is often difficult to acquire, so that the three-dimensional hair in the rendering database is mostly used to construct the virtual dataset. However, the three-dimensional hair data is small in quantity, and the rendered picture is different from the real picture, which brings difficulty to the network use in the reasoning stage. In addition, the hair geometry is different from a human face which can be completely represented by a surface patch, so that the reverse gradient can only be transmitted back to the surface when the micro-rendering treatment is used, and the difficulty is brought to the unsupervised learning of a network. The single view method has the advantage of being user friendly and can reconstruct a more reasonable hair structure directly using the image as input. In particular, neural network based methods and can generate hairstyles that do not exist in the dataset but are similar, and the inference speed is fast. The disadvantage is that the general hair shape and structure is often restored, the details are lacking, and modeling of the hair type (e.g., braid) lacking in the training data is difficult. Therefore, the existing three-dimensional hair reconstruction method has the problems of low reconstruction speed and low accuracy. Disclosure of Invention The invention aims to provide a three-dimensional hair reconstruction method, a three-dimensional hair reconstruction system, electronic equipment and a storage medium, which improve the speed and accuracy of three-dimensional hair reconstruction. In order to achieve the above object, the present invention provides the following solutions: A method of reconstructing three-dimensional hair, the method comprising: acquiring RGB human images to be reconstructed; Inputting the RGB portrait to be reconstructed into a three-dimensional voxel hair reconstruction model to obtain target three-dimensional voxel hair, wherein the three-dimensional voxel hair reconstruction model is obtained by training a reconstruction network by utilizing a training portrait set, a three-dimensional hair training set and texts corresponding to the three-dimensional hair training set, the reconstruction network comprises an image encoder and a trained voxel decoder, the trained voxel decoder is determined according to a trained VAE-patch_ SNGAN network, the trained VAE-patch_ SNGAN network is obtained by training a VAE-patch_ SNGAN network by utilizing the three-dimensional hair training set, and the VAE-patch_ SNGAN network comprises a three-dimensional voxel genera