KR-102962772-B1 - Emotion prediction method based on virtual facial expression image augmentation

KR102962772B1KR 102962772 B1KR102962772 B1KR 102962772B1KR-102962772-B1

Abstract

A method for predicting emotions based on augmenting virtual facial expression images is provided. The emotion prediction method according to an embodiment of the present invention acquires a user face image, extracts facial expression features from the acquired user face image, predicts the user's emotion from the extracted facial expression features, and extracts facial expression features using an expression recognition network, which is an artificial intelligence model trained to receive the user face image as input and extract facial expression features. The expression recognition network is retrained by virtual face images augmented from face images that failed to recognize emotions. Accordingly, by augmenting the features of the facial expression images that failed to predict through error feedback, it is possible to improve the performance of facial expression recognition, thereby enabling the training of a general-purpose facial expression classifier.

Inventors

윤주홍
권용훈
김제우
박민규

Assignees

한국전자기술연구원

Dates

Publication Date: 20260512
Application Date: 20230517
Priority Date: 20221214

Claims (12)

In terms of emotion prediction methods, Step of acquiring a user face image; A step of extracting facial expression features from an acquired user face image; A step of predicting user emotions from extracted facial expression features; is included, The extraction step is, Expression features are extracted using an expression recognition network, which is an artificial intelligence model trained to receive a user's face image as input and extract expression features, and The facial expression recognition network is, Retrained by augmented virtual face images from face images that failed emotion recognition, and The extraction step is, Step of extracting face style features from a face image; Step of extracting facial expression features from a face image; and The method includes a step of fusing extracted face style features and face expression features; The face style feature extraction step is, Step of generating face mesh data from a face image; The method includes the step of extracting face style features from generated mesh data; Whether or not there is a failure in emotion recognition is, It is identified through user feedback or reactions to the service provided based on the sentiment prediction results, and Emotion prediction methods are, The method further includes the step of generating augmented face images using a generative network that receives input of a feature in which the face style feature of a face image that failed emotion recognition and the emotion label are fused, and generates and outputs a virtual face image. Generative networks are, It comprises a discriminator that identifies the authenticity of virtual face images generated by a generative network and an adversarial neural network, Generative networks are, It is trained to generate virtual face images that reduce the identification accuracy of the classifier but have a similarity to actual face images greater than a predetermined size, and The identifier is, An emotion prediction method characterized by being trained to increase the accuracy of authenticity identification for virtual face images generated in a generative network.
delete
delete
delete
delete
delete
delete
delete
delete
In emotion prediction systems, Expression recognition unit that extracts expression features from a user's face image; It includes an emotion prediction unit that predicts user emotions from extracted facial expression features, and The facial expression recognition unit, Expression features are extracted using an expression recognition network, which is an artificial intelligence model trained to receive a user's face image as input and extract expression features, and The facial expression recognition network is, Retrained by augmented virtual face images from face images that failed emotion recognition, and The facial expression recognition unit, Extracting face style features from face images, Extract facial expression features from a face image, and Fusion of extracted facial style features and facial expression features, Generate face mesh data from a face image, extract face style features from the generated mesh data, and The emotion prediction system is, An error detection unit that detects whether emotion recognition has failed through user feedback or reaction to the provided service based on the emotion prediction result; and The image augmentation unit further includes an image augmentation unit that generates augmented face images using a generative network that receives input of a feature in which the face style feature and the emotion label of a face image that failed emotion recognition are fused, and generates and outputs a virtual face image. Generative networks are, It comprises a discriminator that identifies the authenticity of virtual face images generated by a generative network and an adversarial neural network, Generative networks are, It is trained to generate virtual face images that reduce the identification accuracy of the classifier but have a similarity to actual face images greater than a predetermined size, and The identifier is, An emotion prediction system characterized by being trained to increase the accuracy of authenticity identification for virtual face images generated by a generative network.
In terms of emotion prediction methods, Step of acquiring a user face image; The method includes the step of extracting facial expression features from an acquired user face image; The extraction step is, Expression features are extracted using an expression recognition network, which is an artificial intelligence model trained to receive a user's face image as input and extract expression features, and The facial expression recognition network is, Retrained by virtual face images augmented from face images that failed emotion recognition predicted from extracted facial expression features, and The extraction step is, Step of extracting face style features from a face image; Step of extracting facial expression features from a face image; and The method includes a step of fusing extracted face style features and face expression features; The face style feature extraction step is, Step of generating face mesh data from a face image; The method includes the step of extracting face style features from generated mesh data; Whether or not there is a failure in emotion recognition is, It is identified through user feedback or reactions to the provided service based on emotion recognition results predicted from extracted facial expression features, and Emotion prediction methods are, The method further includes the step of generating augmented face images using a generative network that receives input of a feature in which the face style feature of a face image that failed emotion recognition and the emotion label are fused, and generates and outputs a virtual face image. Generative networks are, It comprises a discriminator that identifies the authenticity of virtual face images generated by a generative network and an adversarial neural network, Generative networks are, It is trained to generate virtual face images that reduce the identification accuracy of the classifier but have a similarity to actual face images greater than a predetermined size, and The identifier is, A facial expression recognition method characterized by being trained to increase the accuracy of authenticity identification for virtual face images generated in a generative network.
In emotion prediction systems, Acquisition unit for acquiring a user face image; A facial expression recognition unit that extracts facial expression features from an acquired user face image; is included, The facial expression recognition unit, Expression features are extracted using an expression recognition network, which is an artificial intelligence model trained to receive a user's face image as input and extract expression features, and The facial expression recognition network is, Retrained by virtual face images augmented from face images that failed emotion recognition predicted from extracted facial expression features, and The facial expression recognition unit, Extracting face style features from face images, Extract facial expression features from a face image, and Fusion of extracted facial style features and facial expression features, Generate face mesh data from a face image, extract face style features from the generated mesh data, and The emotion prediction system is, An error detector that detects whether emotion recognition has failed through user feedback or reactions to a provided service based on emotion recognition results predicted from extracted facial expression features; The image augmentation unit further includes an image augmentation unit that generates augmented face images using a generative network that receives input of a feature in which the face style feature and the emotion label of a face image that failed emotion recognition are fused, and generates and outputs a virtual face image. Generative networks are, It comprises a discriminator that identifies the authenticity of virtual face images generated by a generative network and an adversarial neural network, Generative networks are, It is trained to generate virtual face images that reduce the identification accuracy of the classifier but have a similarity to actual face images greater than a predetermined size, and The identifier is, A facial expression recognition system characterized by being trained to increase the accuracy of authenticity identification for virtual face images generated in a generative network.

Description

Emotion prediction method based on virtual facial expression image augmentation The present invention relates to a method for predicting user emotions, and more specifically, to a method and system for predicting a user's emotions by extracting facial expression information from a user image and using said information. Conventional emotion prediction technologies have relied on video information of human faces or on biosignal sensor data such as pulse and brainwaves. However, these methods have the disadvantage of making it difficult to assess the psychological state of users who show little change in facial expression or intend to conceal their emotions. Furthermore, when using biosignal data, there is the inconvenience of having to attach and detach sensors every time, which increases time and costs. On the other hand, numerous studies have proven that body language can also serve as key information for understanding a person's psychological state; for example, shaking one's legs or biting one's hands indicates a tense and anxious state. FIG. 1 is a diagram illustrating the configuration of a user facial expression recognition system according to an embodiment of the present invention. Figure 2 is a diagram for explaining the structure of a facial expression recognition network, Figure 2 is a diagram for explaining the structure of an emotion prediction network, Figure 4 is a diagram for explaining the structure of a virtual facial expression image generation network, Figure 5 is a diagram for explaining adversarial neural network-based learning, Figures 6 and 7 are drawings illustrating virtual facial expression images. The present invention will be described in more detail below with reference to the drawings. An embodiment of the present invention presents an emotion prediction method based on virtual facial expression image augmentation. It is a technology for predicting a user's emotions by continuously extracting facial expression information from an image captured from the front of the user and utilizing said information. In facial expression recognition, an embodiment of the present invention applies error feedback to the learning framework. Error feedback enables the recognition of difficult-to-recognize facial expressions by re-extracting features for expressions that failed to predict emotions and retraining the facial expression recognition network. This contrasts with the existing method of training the facial expression recognition network according to an end-to-end approach using given training data (facial expression images, facial expression labels). In addition, regarding the retraining of the facial expression recognition network, the embodiment of the present invention applies a feature augmentation technique to diversify facial expressions that are difficult to recognize, thereby mitigating performance degradation caused by a lack of training data and enabling facial expression recognition in various environments. FIG. 1 is a diagram illustrating the configuration of a user facial expression recognition system according to an embodiment of the present invention. As illustrated, the user facial expression recognition system according to an embodiment of the present invention comprises an image augmentation unit (110), a relearning unit (120), an error detection unit (130), a facial expression recognition unit (140), and an emotion prediction unit (150). The facial expression recognition unit (140) extracts facial expression feature data from a user's face image. The face image can be obtained by detecting only the face region from the user image. The facial expression recognition unit (140) can be implemented as a facial expression recognition network, which will be described in detail later with reference to FIG. 2. The emotion prediction unit (150) predicts the user's emotion by analyzing facial expression feature data extracted from the facial expression recognition unit (140). The emotion prediction unit (150) can be implemented as an emotion prediction network, which will be described in detail later with reference to FIG. 3. The image augmentation unit (110) generates virtual facial expression images for retraining the facial expression recognition network. The image augmentation unit (110) can be implemented as a virtual facial expression image generation network, which will be described in detail later with reference to FIG. 4. The error detection unit (130) detects whether there is an error in the user's emotion predicted by the emotion prediction unit (150), that is, whether the emotion prediction failed. Whether there was a failure can be detected through the user's feedback or reaction to the service provided based on the emotion prediction result. The retraining unit (120) controls the generation of virtual facial expression images by the image augmentation unit (110). Specifically, it causes the image augmentation unit (110) to generate virtual facial expression images fo