US-12626533-B2 - Facial expression recognition using enrollment images

US12626533B2US 12626533 B2US12626533 B2US 12626533B2US-12626533-B2

Abstract

System and techniques are described herein for processing images to detect expressions of a subject. In one illustrative example, a method of recognizing facial expressions in one or more images includes obtaining, by a computing device, a first image of a person; obtaining expression information based on the first image and an anchor image associated with the person; and determining an expression classification associated with the first image based on the expression information.

Inventors

Peng Liu
Lei Wang
Ning Bi
Zhen Wang
Shiwei Jin

Assignees

QUALCOMM INCORPORATED

Dates

Publication Date: 20260512
Application Date: 20230130

Claims (20)

1 . A method for recognizing facial expressions in one or more images, comprising: obtaining, by a computing device, a first image of a person; obtaining expression information based on the first image of the person and an anchor image of the person using a neural network, wherein the expression information comprises feature vectors associated with facial features, wherein the expression information includes first expression information extracted from the anchor image and stored during enrollment of the person; and determining, using a trained machine learning model, an expression classification associated with the first image of the person based on the expression information obtained based on the first image of the person and the anchor image of the person.
2 . The method of claim 1 , wherein the expression classification comprises a facial expression of the person.
3 . The method of claim 1 , wherein obtaining the expression information further comprises: obtaining second expression information from the first image of the person using the neural network.
4 . The method of claim 3 , further comprising: combining the first expression information and the second expression information into probabilities associated with different expression classifications; and selecting the expression classification as a first expression associated with the first expression information or a second expression associated with the second expression information based on the probabilities associated with the different expression classifications.
5 . The method of claim 4 , wherein the neural network is trained based on a first combination of the first expression information and the second expression information and a second combination of the first expression information and the second expression information, and wherein the first expression information and the second expression information are extracted from a reference image of a subject and a biased image of the subject.
6 . The method of claim 5 , wherein the first combination of the first expression information and the second expression information comprises a concatenation of the first expression information and the second expression information.
7 . The method of claim 5 , wherein the second combination of the first expression information and the second expression information comprises a subtraction of the first expression information and the second expression information.
8 . The method of claim 5 , wherein the subject comprises a face of a person, the reference image comprises the person without a facial expression, and the biased image comprises the person with a facial expression.
9 . The method of claim 1 , further comprising: obtaining an enrollment image of the person during enrollment of the person.
10 . The method of claim 9 , further comprising: obtaining pose information associated with the person based on the enrollment image; obtaining facial information associated with a face of the person based on the enrollment image; and determining whether the enrollment image is selected as the anchor image based on the pose information and the facial information.
11 . The method of claim 1 , further comprising: generating the anchor image from the first image based on the first image not being selected as the anchor image.
12 . The method of claim 1 , further comprising: determining a state of the computing device based on a facial expression.
13 . An apparatus for recognizing facial expressions in one or more images, comprising: at least one memory; and at least one processor coupled to the at least one memory and configured to: obtain a first image of a person; obtain expression information based on the first image of the person and an anchor image of the person using a neural network, wherein the expression information comprises feature vectors associated with facial features, wherein the expression information includes first expression information extracted from the anchor image and stored during enrollment of the person; and determine, using a trained machine learning model, an expression classification associated with the first image of the person based on the expression information obtained based on the first image of the person and the anchor image of the person.
14 . The apparatus of claim 13 , wherein the expression classification comprises a facial expression of the person.
15 . The apparatus of claim 13 , wherein the at least one processor is configured to: obtain second expression information from the first image of the person using the neural network.
16 . The apparatus of claim 15 , wherein the at least one processor is configured to: combine the first expression information and the second expression information into probabilities associated with different expression classifications; and select the expression classification as a first expression associated with the first expression information or a second expression associated with the second expression information based on the probabilities associated with the different expression classifications.
17 . The apparatus of claim 16 , wherein the neural network is trained based on a first combination of the first expression information and the second expression information and a second combination of the first expression information and the second expression information, and wherein the first expression information and the second expression information are extracted from a reference image of a subject and a biased image of the subject.
18 . The apparatus of claim 17 , wherein the first combination of the first expression information and the second expression information comprises a concatenation of the first expression information and the second expression information.
19 . The apparatus of claim 17 , wherein the second combination of the first expression information and the second expression information comprises a subtraction of the first expression information and the second expression information.
20 . The apparatus of claim 17 , wherein the subject comprises a face of a person, the reference image comprises the person without a facial expression, and the biased image comprises the person with a facial expression.

Description

FIELD The present disclosure generally relates to facial expression recognition. In some examples, aspects of the present disclosure are related to facial expression recognition using enrollment images. BACKGROUND Deep neural networks can be used for various tasks, such as object detection. For example, convolutional neural networks can extract high-level features, such as facial shapes, from an input image, and use these high-level features to output a probability that, for example, an input image includes a dog, a cat, a boat, or a bird. While deep neural networks can detect faces, the detection of facial expressions can be difficult. SUMMARY In some examples, systems and techniques are described for detecting objects that are occluded in an image. The systems and techniques can improve the identification of objects and sub-features of those objects when the objects are at least partially occluded. In some examples, systems and techniques are described for recognizing facial expressions in one or more images. The systems and techniques can identify various facial expressions and alter behavior of a device based on the facial expressions. According to at least one example, a method is provided for recognizing facial expressions in one or more images. The method includes: obtaining, by a computing device, a first image of a person; obtaining expression information based on the first image and an anchor image associated with the person; and determining an expression classification associated with the first image based on the expression information. In another example, an apparatus for recognizing facial expressions in one or more images is provided that includes at least one memory and at least one processor coupled to the at least one memory. The at least one processor is configured to: obtain a first image of a person; obtain expression information based on the first image and an anchor image associated with the person; and determine an expression classification associated with the first image based on the expression information. In another example, a non-transitory computer-readable medium is provided that has stored thereon instructions that, when executed by one or more processors, cause the one or more processors to: obtain a first image of a person; obtain expression information based on the first image and an anchor image associated with the person; and determine an expression classification associated with the first image based on the expression information. In another example, an apparatus for recognizing facial expressions in one or more images is provided. The apparatus includes: means for obtaining a first image of a person; means for obtaining expression information based on the first image and an anchor image associated with the person; and means for determining an expression classification associated with the first image based on the expression information. In some aspects, one or more of the apparatuses described herein is, is part of, and/or includes a mobile device (e.g., a mobile telephone and/or mobile handset and/or so-called “smartphone” or other mobile device), an extended reality (XR) device (e.g., a virtual reality (VR) device, an augmented reality (AR) device, or a mixed reality (MR) device), a head-mounted device (HMD) device, a vehicle or a computing system, device, or component of a vehicle, a wearable device (e.g., a network-connected watch or other wearable device), a wireless communication device, a camera, a personal computer, a laptop computer, a server computer, another device, or a combination thereof. In some aspects, the apparatus includes a camera or multiple cameras for capturing one or more images. In some aspects, the apparatus further includes a display for displaying one or more images, notifications, and/or other displayable data. In some aspects, the apparatuses described above can include one or more sensors (e.g., one or more inertial measurement units (IMUs), such as one or more gyroscopes, one or more gyrometers, one or more accelerometers, any combination thereof, and/or other sensors). This summary is not intended to identify key or essential features of the claimed subject matter, nor is it intended to be used in isolation to determine the scope of the claimed subject matter. The subject matter should be understood by reference to appropriate portions of the entire specification of this patent, any or all drawings, and each claim. The foregoing, together with other features and aspects, will become more apparent upon referring to the following specification, claims, and accompanying drawings. BRIEF DESCRIPTION OF THE DRAWINGS Illustrative aspects of the present application are described in detail below with reference to the following figures: FIG. 1 is a block diagram illustrating an architecture of an image capture and processing system, in accordance with some examples; FIG. 2 is a diagram illustrating an example of a model for a convolutional neural ne