US-12626531-B2 - Systems, methods and media for deep shape prediction

US12626531B2US 12626531 B2US12626531 B2US 12626531B2US-12626531-B2

Abstract

Exemplary embodiments include a computer-implemented method of training a neural network for facial reconstruction including collecting a set of 3D head scans, combining each feature of each 3D head scan with a weight to create a modified set of 3D head scans, training the neural network using the modified set of head scans, and inputting a real digital facial image into the neural network for facial reconstruction. Further exemplary embodiments include the set of 3D head scans comprising approximately a tenth or less in quantity in comparison to a quantity of the modified set of 3D head scans. The modified set of 3D head scans may comprise features found in the set of 3D head scans or the modified set of 3D head scans may consist of features found in the set of 3D head scans.

Inventors

Verónica Costa Teixeira Pinto Orvalho
Alexis Paul Benoit Roche
Mariana Ribeiro Dias

Assignees

Didimo, Inc.

Dates

Publication Date: 20260512
Application Date: 20230124

Claims (16)

1 . A computer-implemented method of training a neural network for facial reconstruction comprising: collecting a synthetic dataset of non-existing 3D head scans produced by randomly sampling a 3D morphable model and rendered under diverse lighting conditions, camera settings, and pose conditions, wherein the diverse lighting conditions comprise rendering using high dynamic range images (HDRIs); combining each feature of each 3D head scan with a projected weight to create a modified set of 3D head scans; measuring an error between the projected weight and an actual weight; adjusting neural network weights for the error and repeating the measuring and adjusting until the error converges or is near or at zero; training the neural network using the modified set of 3D head scans; and inputting a real digital facial image into the neural network for the facial reconstruction.
2 . The computer-implemented method of claim 1 , the synthetic dataset of non-existing 3D head scans comprising approximately a tenth or less in quantity in comparison to a quantity of the modified set of 3D head scans.
3 . The computer-implemented method of claim 1 , wherein the modified set of 3D head scans comprises features found in the synthetic dataset of non-existing 3D head scans.
4 . The computer-implemented method of claim 1 , the facial reconstruction resulting in an estimate of a subject's head geometry based on a weighted sum of a plurality of individual modified 3D head scans.
5 . The computer-implemented method of claim 1 , the facial reconstruction performed without including a face of an actual human in the modified set of 3D head scans.
6 . The computer-implemented method of claim 1 , the facial reconstruction including recognition of a feature on the modified set of 3D head scans.
7 . The computer-implemented method of claim 6 , the feature being a dimension of a nose.
8 . The computer-implemented method of claim 6 , the feature being a dimension of an ear.
9 . The computer-implemented method of claim 1 , the facial reconstruction resulting in an estimate of a subject's jawline shape.
10 . The computer-implemented method of claim 1 , the facial reconstruction resulting in an estimate of a thickness of a subject's lip.
11 . The computer-implemented method of claim 1 , further comprising combining each feature of each 3D head scan with the projected weight, wherein the projected weight is randomly sampled from a 3D morphable-model parameter distribution derived from principal-component analysis of a training dataset, to create a modified set of 3D head scans.
12 . The computer-implemented method of claim 11 , wherein the error is computed using a loss function that combines prediction error of the 3D morphable-model parameter distribution and prediction error of a concurrently predicted normal map.
13 . The computer-implemented method of claim 12 , further comprising adjusting the neural network's weights for the error, wherein the adjustment is performed by an Adam optimizer executing with a fixed learning rate.
14 . The computer-implemented method of claim 13 , further comprising stopping the method when the error converges.
15 . The computer-implemented method of claim 13 , further comprising stopping the method when the error is near or at zero.
16 . The computer-implemented method of claim 1 , the facial reconstruction resulting in an estimate of a subject's shape of a face.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS This U.S. Non-Provisional patent application claims the priority benefit of U.S. Provisional Patent Application Ser. No. 63/303,194 filed on Jan. 26, 2022 and titled, “Systems, Methods and Media for Deep Shape Prediction,” the entirety of which, including all appendices, is hereby incorporated by reference. FIELD OF TECHNOLOGY Exemplary embodiments pertain to facial reconstruction. SUMMARY Exemplary embodiments include a computer-implemented method of training a neural network for facial reconstruction including collecting a set of 3D head scans, combining each feature of each 3D head scan with a weight to create a modified set of 3D head scans, training the neural network using the modified set of head scans, and inputting a real digital facial image into the neural network for facial reconstruction. Further exemplary embodiments include the set of 3D head scans comprising approximately a tenth or less in quantity in comparison to a quantity of the modified set of 3D head scans. The modified set of 3D head scans may comprise features found in the set of 3D head scans or the modified set of 3D head scans may consist of features found in the set of 3D head scans. The facial reconstruction may result in an estimate of a subject's head geometry based on a weighted sum of a plurality of individual modified 3D head scans. The facial reconstruction may result in an estimate of a subject's shape of a face. The facial reconstruction may be performed without including a face of an actual human in the modified set of 3D head scans and the reconstruction may include recognition of a feature on the modified set of 3D head scans. The feature may be a dimension of a nose, a dimension of an ear and/or other dimensions. The facial reconstruction may result in an estimate of a subject's jawline shape, an estimate of a thickness of a subject's lip and/or other estimates. Additionally, combining each feature of each 3D head scan with a projected weight may create a modified set of 3D head scans. The error between the predicted/putative weight and an actual weight may be measured and the neural network's weights for the error may be adjusted. The method may be stopped when the error converges and/or may be stopped when the error is near or at zero. BRIEF DESCRIPTION OF THE FIGURES In the description, for purposes of explanation and not limitation, specific details are set forth, such as particular embodiments, procedures, techniques, etc. to provide a thorough understanding of the present technology. However, it will be apparent to one skilled in the art that the present technology may be practiced in other embodiments that depart from these specific details. The accompanying drawings, where like reference numerals refer to identical or functionally similar elements throughout the separate views, together with the detailed description below, are incorporated in and form part of the specification, and serve to further illustrate embodiments of concepts that include the claimed disclosure and explain various principles and advantages of those embodiments. The methods and systems disclosed herein have been represented where appropriate by conventional symbols in the drawings, showing only those specific details that are pertinent to understanding the embodiments of the present disclosure so as not to obscure the disclosure with details that will be readily apparent to those of ordinary skill in the art having the benefit of the description herein. FIG. 1 shows exemplary synthetic samples. FIG. 2A shows a flow chart showcasing a process described herein. FIG. 2B shows another flow chart showcasing a process described herein. FIG. 3 shows an exemplary deep neural network. DETAILED DESCRIPTION The 3D geometric reconstruction of a human face from a single photo has been a very active research topic in the last 20 years due to its impact on a broad range of applications, such as: Facial recognition; Facial animation and reconstruction of expressions; Building avatars for gaming and VR; In the medical field, for the segmentation of anatomical structures and modeling of their variations; and In the forensics field, to estimate possible faces from a skull or to perform facial aging. The state-of-the-art for the 3D reconstruction of the human face relies on 3D Morphable Models (3DMM). A 3DMM is a statistical model that captures head shape variations in a population from a set of 3D head scans. These can be between-subject variations (shape differences across individuals in a neutral facial expression) or within-subject variations (changes in facial expressions). A 3DMM is built by first collecting and aligning meshes reconstructed from the head scans in the same topology. Next, a classical exploratory statistical method known as principal component analysis is used to extract features, which can be thought of as elementary deformations of a mean shape, that contribute most to the variance of the dat