US-12620205-B2 - Using two-dimensional images and machine learning to identify information pertaining to facial features

US12620205B2US 12620205 B2US12620205 B2US 12620205B2US-12620205-B2

Abstract

A method for training a machine learning model using information pertaining to a human face, the method includes generating training data for the machine learning model. Generating the training data includes generating a training input, the training input including information representing 2D images of human faces corresponding to a beauty target, and generating a target output for the training input. The target output identifies, for each of the 2D images of human faces corresponding to the beauty target, information identifying one or more facial features represented in the respective 2D image of human faces corresponding to the beauty target. The method further includes providing the training data to train the machine learning model on (i) a set of training inputs including the training input, and (ii) a set of target outputs including the target output.

Inventors

Sahara Lotti
Jesse Chang

Assignees

Brilliance of Beauty, Inc.

Dates

Publication Date: 20260505
Application Date: 20240410

Claims (20)

1 . A method for using a trained machine learning model using information pertaining to a human face, comprising: providing to the trained machine learning model a first input comprising two-dimensional (2D) image data representing a 2D image of a face of a subject; providing to the trained machine learning model, a second input comprising information identifying a three-dimensional (3D) model of the face of the subject corresponding to the 2D image of the face of the subject; generating, using the trained machine learning model, one or more outputs identifying (i) a plurality of facial features represented in the 2D image, (ii) a level of confidence that the plurality of facial features correspond to one or more actual facial features of the subject represented in the 2D image, (iii) an indication of first variation information representing differences between the plurality of facial features represented in the 2D image and one or more target facial features of a target face corresponding to a beauty target, (iv) a level of confidence that the first variation information accurately reflects the differences between the plurality of facial features represented in the 2D images and the one or more target facial features of the target face corresponding to the beauty target, (v) an indication of one or more landmarks of the 3D model, (vi) a level of confidence that the one or more landmarks of the 3D model correspond to the plurality of facial features represented in the 2D image, (vii) an indication of second variation information identifying differences between the one or more landmarks of the 3D model and one or more target landmarks of a target 3D model corresponding to the beauty target, and (viii) a level of confidence that the second variation information reflects the differences between the one or more landmarks of the 3D model and the one or more target landmarks of the target 3D model corresponding to the beauty target; and selecting, among a plurality of beauty products, a first beauty product based on the first variation information, and the second variation information.
2 . The method of claim 1 , wherein the first variation information describes differences between first relationships and first target relationships, wherein the first relationships are between the plurality of facial features represented in the 2D image of the face of the subject, and wherein the first target relationships are between the one or more target facial features of the target face corresponding to the beauty target.
3 . The method of claim 1 , further comprising: determining whether the level of confidence that the first variation information accurately reflects the differences between the plurality of facial features represented in the 2D images and the one or more target facial features of the target face corresponding to the beauty target satisfies a threshold level of confidence; and responsive to determining that the level of confidence satisfies the threshold level of confidence, providing, to a client device, an indication of the first variation information.
4 . The method of claim 1 , further comprising: receiving an indication of a user selection of the beauty target among a plurality of beauty targets; and providing to the trained machine learning model a third input comprising information identifying the beauty target selected among the plurality of beauty targets.
5 . The method of claim 1 , wherein the second variation information describes differences between second relationships and second target relationships, wherein the second relationships are between the one or more landmarks of the 3D model, and wherein the second target relationships are between the one or more target landmarks of the target 3D model corresponding to the beauty target.
6 . The method of claim 1 , further comprising: determining whether the level of confidence that the second variation information accurately reflects the differences between the one or more landmarks of the 3D model and the one or more target landmarks of the target 3D model corresponding to the beauty target satisfies a threshold level of confidence; and responsive to determining that the level of confidence satisfies the threshold level of confidence, providing, to a client device, an indication of the second variation information.
7 . The method of claim 1 , wherein the second variation information describes differences between geometric data and target geometric data, the target geometric data based on the one or more landmarks of the 3D model, and the target geometric data based on the one or more target landmarks of the target 3D model corresponding to the beauty target.
8 . The method of claim 1 , further comprising: providing, to a client device, a first notification identifying the first beauty product.
9 . The method of claim 8 , further comprising: providing, to the client device, a second notification identifying instructions on using the first beauty product to reduce the differences between the plurality of facial features represented in the 2D images and the one or more target facial features of the target face corresponding to the beauty target.
10 . A system comprising: a memory; and one or more processing devices communicatively coupled to the memory, the one or more processing devices configured to: provide to a trained machine learning model a first input comprising two-dimensional (2D) image data representing a 2D image of a face of a subject; provide to the trained machine learning model, a second input comprising information identifying a three-dimensional (3D) model of the face of the subject corresponding to the 2D image of the face of the subject; generate, with the trained machine learning model, one or more outputs identifying (i) a plurality of facial features represented in the 2D image, (ii) a level of confidence that the plurality of facial features correspond to one or more actual facial features of the subject represented in the 2D image, (iii) an indication of first variation information representing differences between the plurality of facial features represented in the 2D image and one or more target facial features of a target face corresponding to a beauty target, (iv) a level of confidence that the first variation information accurately reflects the differences between the plurality of facial features represented in the 2D images and the one or more target facial features of the target face corresponding to the beauty target, (v) an indication of one or more landmarks of the 3D model, (vi) a level of confidence that the one or more landmarks of the 3D model correspond to the plurality of facial features represented in the 2D image, (vii) an indication of second variation information identifying differences between the one or more landmarks of the 3D model and one or more target landmarks of a target 3D model corresponding to the beauty target, and (viii) a level of confidence that the second variation information reflects the differences between the one or more landmarks of the 3D model and the one or more target landmarks of the target 3D model corresponding to the beauty target; and select, among a plurality of beauty products, a first beauty product based on the first variation information, and the second variation information.
11 . The system of claim 10 , wherein the first variation information describes differences between first relationships and first target relationships, wherein the first relationships are between the plurality of facial features represented in the 2D image of the face of the subject, and wherein the first target relationships are between the one or more target facial features of the target face corresponding to the beauty target.
12 . The system of claim 10 , the one or more processing devices further configured to: determine whether the level of confidence that the first variation information accurately reflects the differences between the plurality of facial features represented in the 2D images and the one or more target facial features of the target face corresponding to the beauty target satisfies a threshold level of confidence; and responsive to determining that the level of confidence satisfies the threshold level of confidence, provide, to a client device, an indication of the first variation information.
13 . The system of claim 10 , the one or more processing devices further configured to: receive an indication of a user selection of the beauty target among a plurality of beauty targets; and provide to the trained machine learning model a third input comprising information identifying the beauty target selected among the plurality of beauty targets.
14 . The system of claim 10 , wherein the second variation information describes differences between second relationships and second target relationships, wherein the second relationships are between the one or more landmarks of the 3D model, and wherein the second target relationships are between the one or more target landmarks of the target 3D model corresponding to the beauty target.
15 . The system of claim 10 , the one or more processing devices further configured to: determine whether the level of confidence that the second variation information accurately reflects the differences between the one or more landmarks of the 3D model and the one or more target landmarks of the target 3D model corresponding to the beauty target satisfies a threshold level of confidence; and responsive to determining that the level of confidence satisfies the threshold level of confidence, provide, to a client device, an indication of the second variation information.
16 . The system of claim 10 , wherein the second variation information describes differences between geometric data and target geometric data, the target geometric data based on the one or more landmarks of the 3D model, and the target geometric data based on the one or more target landmarks of the target 3D model corresponding to the beauty target.
17 . The system of claim 10 , the one or more processing devices further configured to: provide, to a client device, a first notification identifying the first beauty product.
18 . The system of claim 17 , the one or more processing devices further configured to: provide, to the client device, a second notification identifying instructions on using the first beauty product to reduce the differences between the plurality of facial features represented in the 2D images and the one or more target facial features of the target face corresponding to the beauty target.
19 . A non-transitory computer-readable storage medium storing instructions that, when executed by one or more processing devices, cause the one or more processing devices to perform operations comprising: providing to a trained machine learning model a first input comprising two-dimensional (2D) image data representing a 2D image of a face of a subject; providing to the trained machine learning model, a second input comprising information identifying a three-dimensional (3D) model of the face of the subject corresponding to the 2D image of the face of the subject; generating, using the trained machine learning model, one or more outputs identifying (i) a plurality of facial features represented in the 2D image, (ii) a level of confidence that the plurality of facial features correspond to one or more actual facial features of the subject represented in the 2D image, (iii) an indication of first variation information representing differences between the plurality of facial features represented in the 2D image and one or more target facial features of a target face corresponding to a beauty target, (iv) a level of confidence that the first variation information accurately reflects the differences between the plurality of facial features represented in the 2D images and the one or more target facial features of the target face corresponding to the beauty target, (v) an indication of one or more landmarks of the 3D model, (vi) a level of confidence that the one or more landmarks of the 3D model correspond to the plurality of facial features represented in the 2D image, (vii) an indication of second variation information identifying differences between the one or more landmarks of the 3D model and one or more target landmarks of a target 3D model corresponding to the beauty target, and (viii) a level of confidence that the second variation information reflects the differences between the one or more landmarks of the 3D model and the one or more target landmarks of the target 3D model corresponding to the beauty target; and selecting, among a plurality of beauty products, a first beauty product based on the first variation information, and the second variation information.
20 . The non-transitory computer-readable storage medium of claim 19 , wherein the first variation information describes differences between first relationships and first target relationships, wherein the first relationships are between the plurality of facial features represented in the 2D image of the face of the subject, and wherein the first target relationships are between the one or more target facial features of the target face corresponding to the beauty target.

Description

TECHNICAL FIELD Aspects and embodiments of the disclosure relate to data processing, and more specifically, to using two-dimensional (2D) images and machine learning to identify information pertaining to facial features. BACKGROUND Image processing can include the manipulation of digital images using various techniques and algorithms to improve their quality, extract useful information, or perform specific tasks. SUMMARY The following is a simplified summary of the disclosure to provide a basic understanding of some aspects of the disclosure. This summary is not an extensive overview of the disclosure. It is intended to neither identify key or critical elements of the disclosure, nor delineate any scope of the particular embodiments of the disclosure or any scope of the claims. Its sole purpose is to present some concepts of the disclosure in a simplified form as a prelude to the more detailed description that is presented later. An aspect of the disclosure provides a computer-implemented method for training a machine learning model using information pertaining to a human face, the method comprising: generating training data for the machine learning model, wherein generating the training data comprises: generating a first training input, the first training input comprising information representing 2D images of human faces corresponding to a first beauty target; and generating a first target output for the first training input, wherein the first target output identifies, for each of the 2D images of human faces corresponding to the first beauty target, information identifying one or more facial features represented in the respective 2D image of human faces corresponding to the first beauty target; and providing the training data to train the machine learning model on (i) a set of training inputs comprising the first training input, and (ii) a set of target outputs comprising the first target output. In some aspects, generating the training data further comprises: generating a second target output for the first training input, wherein the second target output comprises information identifying relationships between the one or more facial features represented in each of the 2D images of human faces corresponding to the first beauty target, wherein the set of target outputs comprises the second target output. In some aspects, the 2D images of human faces corresponding to the first beauty target are first 2D images, wherein generating the training data further comprises: generating a second training input, the second training input comprising information representing second 2D images of human faces corresponding to a non-beauty target; generating a third target output for the second training input, wherein the third target output identifies, for each of the second 2D images of human faces corresponding to the non-beauty target, information identifying one or more facial features represented in the respective second 2D image of human faces corresponding to the non-beauty target; generating a fourth target output for the second training input, wherein the fourth target output comprises information identifying relationships between the one or more facial features represented in each of the 2D images of human faces corresponding to the non-beauty target; and generating a fifth target output for the second training input, wherein the fifth target output comprises information identifying variation information, the variation information representing differences between the information identifying the relationships between the one or more facial features represented in each of the 2D images of human faces corresponding to the first beauty target and the information identifying the relationships between the one or more facial features represented in each of the 2D images of human faces corresponding to the non-beauty target, wherein the set of training inputs comprises the second training input, and wherein the set of target outputs comprises the third target output, the fourth target output, and the fifth target output. In some aspects, the 2D images of human faces corresponding to the first beauty target are first 2D images, wherein generating the training data further comprises: generating a third training input, the third training input comprising information representing second 2D images of human faces corresponding to a second beauty target among a plurality of beauty targets, wherein the set of training inputs comprises the third training input. In some aspects, generating the training data further comprises: generating a fourth training input, the fourth training input comprising information identifying three-dimensional (3D) models of human faces corresponding to the first beauty target and the 2D images of human faces corresponding to the first beauty target, wherein the set of training inputs comprises the fourth training input. In some aspects, generating the training data further comprises: generating a sixth