KR-20260064017-A - EMOTION RECOGNITION DEVICE AND METHOD FOR RECOGNIGING FACIAL EMOTIONS

KR20260064017AKR 20260064017 AKR20260064017 AKR 20260064017AKR-20260064017-A

Abstract

The present invention relates to an emotion recognition device for recognizing facial emotions of an object of interest from an input image, comprising: an image processing module that generates a cropped image in which an object of interest is cut out from the input image, a masked image in which the background excluding the object of interest is emphasized from the input image, and facial landmarks necessary to understand the facial structure and expression of the object of interest in the input image; an image and context feature extraction module that extracts image features for the object of interest from the cropped image and extracts context features for the background from the masked image; a landmark feature extraction module that extracts landmark features for the facial expression of the object of interest from the facial landmarks; a feature combining module that combines the extracted image features, context features, and landmark features; and an emotion classification module that classifies facial emotions of the object of interest based on the combined features.

Inventors

한동석
사비나 제시카 콜라코

Assignees

경북대학교 산학협력단

Dates

Publication Date: 20260507
Application Date: 20241031

Claims (10)

As an emotion recognition device that recognizes facial emotions of an object of interest from an input image, An image processing module that generates a cropped image in which an object of interest is cut out from the input image, a masked image in which the background excluding the object of interest is emphasized from the input image, and face landmarks necessary to understand the face structure and expression of the object of interest in the input image; An image and context feature extraction module that extracts image features for the object of interest from the cropped image and extracts context features for the background from the masked image; A landmark feature extraction module that extracts landmark features for the facial expression of the object of interest from the above face landmarks; A feature combining module that combines the extracted image features, context features, and landmark features; and An emotion recognition device comprising an emotion classification module that classifies the facial emotion of the object of interest based on the combined features described above.
In paragraph 1, The above image and context feature extraction module is, It is composed of an image feature extraction unit for extracting the above image features and a context feature extraction unit for extracting the above context features, and An emotion recognition device comprising, wherein each of the image feature extraction unit and the context feature extraction unit comprises at least one convolution layer for extracting each of the image feature and context feature, at least one pooling layer for reducing the dimensionality of a feature map, and at least one specialized block for applying higher weights to important features according to the feature map.
In paragraph 1, The above landmark feature extraction module is, The above landmark features are extracted by applying a compound scaling method that dynamically adjusts the number of channels and layers, and An emotion recognition device that outputs the above-mentioned extracted landmark features through a multi-scale fully connected layer.
In paragraph 1, It further includes an attention module that applies an attention mechanism to the image features and context features to assign higher weights to features with higher importance among the image features and context features in classifying the facial emotion of the object of interest, and The above attention module is, An emotion recognition device comprising at least one spatial attention block that assigns a higher weight to a face region within the input image and at least one channel attention block that assigns a higher weight to an important feature channel.
In paragraph 1, The above emotion classification module is, Classify the facial emotion of the object of interest as a discrete emotion and output one of the predefined emotion labels, and An emotion recognition device that outputs predicted values of continuous emotion attributes for the discrete emotions classified above.
As a method for an emotion recognition device to recognize the facial emotion of an object of interest from an input image, A step in which an image processing module generates a cropped image in which an object of interest is cut out from the input image, a masked image in which the background excluding the object of interest is emphasized from the input image, and face landmarks necessary for understanding face structure and expression; A step in which an image and context feature extraction module extracts image features for the object of interest from the cropped image and extracts context features for the background from the masked image; A landmark feature extraction module extracts landmark features for the facial expression of the object of interest from the face landmarks; A feature combining module combining the extracted image features, context features, and landmark features; and A face emotion recognition method comprising the step of classifying the face emotion of the object of interest based on the combined features of the emotion classification module.
In paragraph 6, The above image and context feature extraction module is, It is composed of an image feature extraction unit for extracting the above image features and a context feature extraction unit for extracting the above context features, and A face emotion recognition method comprising, wherein each of the image feature extraction unit and the context feature extraction unit comprises at least one convolution layer for extracting each of the image feature and context feature, at least one pooling layer for reducing the dimensionality of a feature map, and at least one specialized block for applying higher weights to important features according to the feature map.
In paragraph 6, The step of extracting the above landmark features is as follows: A step of extracting the above landmark features by applying a compound scaling method that dynamically adjusts the number of channels and layers, and A face emotion recognition method comprising the step of outputting the extracted landmark features through a multi-scale fully connected layer.
In paragraph 6, After the step of extracting the image features and context features, the method further includes a step in which an attention module applies an attention mechanism to the image features and context features to assign a higher weight to the features that have higher importance in classifying the facial emotion of the object of interest among the image features and context features. The above attention module is, A face emotion recognition method comprising at least one spatial attention block that assigns a higher weight to a face region within the input image and at least one channel attention block that assigns a higher weight to an important feature channel.
In paragraph 6, The step of classifying the facial emotions of the object of interest above is as follows: A step of classifying the facial emotion of the object of interest into a discrete emotion and outputting one of the predefined emotion labels, and A face emotion recognition method comprising the step of outputting a predicted value of a continuous emotion attribute for the discrete emotions classified above.

Description

Emotion Recognition Device and Method for Recognizing Facial Emotions The present invention relates to an emotion recognition device and method for recognizing facial emotions. With the recent advancement of artificial intelligence, research on emotion recognition technology utilizing various methods such as facial recognition, natural language processing, and voice analysis is actively underway. Emotion recognition technology refers to a technology that automatically detects and interprets human emotional states by utilizing diverse information such as video, audio, and biosignals; in particular, visual channels that utilize video information possess the advantages of high information density and a non-contact nature. Facial expressions, a key element of communication through visual channels, rely heavily on facial landmarks—key points on the face such as the eyes, nose, mouth, eyebrows, and jawline—and these facial landmarks are essential for detecting fine muscle movements that indicate major emotions such as happiness, sadness, fear, surprise, disgust, and anger. However, existing emotion recognition technologies fail to sufficiently integrate facial landmark data and image context attributes, and tend to overlook background information within images that could significantly improve the accuracy of emotion recognition. For example, even in a photograph of a person smiling, considering the person's location and situation can provide richer information about their emotional state; however, many current emotion recognition technologies ignore background information or infer emotions by analyzing only facial expressions, which leads to problems with low reliability and accuracy. In addition, existing emotion recognition technologies struggle to handle ambiguity when similar facial expressions, such as surprise and fear, can represent different emotions, and tend to overlook fine facial features that convey subtle emotions. Therefore, to overcome the limitations of such emotion recognition technologies, research is needed on technologies that can more accurately recognize emotional states by integrating characteristics of the object of interest with contextual features. FIG. 1 is a diagram showing the internal blocks of an emotion recognition device according to an embodiment of the present invention, FIG. 2 is a diagram illustrating the operation of the emotion recognition device of FIG. 1 classifying emotion labels and predicting VAD attributes. FIG. 3 is a diagram illustrating the detailed operation of the image and context feature learning module of FIG. 1. FIG. 4 is a diagram illustrating the detailed operation of the landmark feature learning module of FIG. 1. FIG. 5 is a diagram for explaining the detailed operation of the attention module of FIG. 1. And, FIG. 6 is a flowchart showing the operation of an emotion recognition device recognizing facial emotions according to an embodiment of the present invention. The following detailed description of the invention refers to the accompanying drawings, which illustrate specific embodiments in which the invention may be practiced. These embodiments are described in sufficient detail to enable those skilled in the art to practice the invention. It should be understood that various embodiments of the invention are different but need not be mutually exclusive. For example, specific shapes, structures, and characteristics described herein may be implemented in other embodiments without departing from the spirit and scope of the invention in relation to one embodiment. It should also be understood that the location or arrangement of individual components within each disclosed embodiment may be changed without departing from the spirit and scope of the invention. Accordingly, the following detailed description is not intended to be limiting, and the scope of the invention is limited only by the appended claims, including all equivalents to those claimed therein, provided appropriately described. Similar reference numerals in the drawings refer to the same or similar functions across various aspects. The components according to the present invention are defined by functional distinction rather than physical distinction, and can be defined by the functions each performs. Each component may be implemented as hardware or as program code and processing units that perform each function, and the functions of two or more components may be included and implemented in a single component. Therefore, it should be noted that the names assigned to the components in the following embodiments are not intended to physically distinguish each component but are assigned to imply the representative function performed by each component, and that the technical concept of the present invention is not limited by the names of the components. Preferred embodiments of the present invention will be described in more detail below with reference to the drawings. FIG. 1 is a diagram showing the