CN-117292426-B - Expression recognition method and system based on uncertainty learning

CN117292426BCN 117292426 BCN117292426 BCN 117292426BCN-117292426-B

Abstract

The invention discloses an expression recognition method and system based on uncertainty learning, comprising the following steps of S1, obtaining a face image to be recognized, S2, preprocessing the input face image to obtain a standardized face image, S3, initializing deep learning training conditions, S4, training an expression recognition model by utilizing a plurality of modules, wherein the expression recognition model comprises a face feature extraction module, a multidimensional feature classification module, a potential tag distribution learning module and an attention consistency module, updating model weight parameters, and S5, obtaining an output expression category by the face image through the expression recognition model. According to the expression recognition method and system based on uncertainty learning, the problem of uncertainty of facial expressions in a field environment is solved by combining the potential label distribution and the attention consistency method, the accuracy of expression recognition is improved by mining the potential label distribution, and meanwhile the interpretability of a model is enhanced by utilizing the attention diagram generation method.

Inventors

LI JING
HU TIANYU

Assignees

天津理工大学

Dates

Publication Date: 20260508
Application Date: 20231013

Claims (7)

1. The expression recognition method based on uncertainty learning is characterized by comprising the following specific steps: Step 1, acquiring a face image of a user by using a camera and built image acquisition software, preprocessing the input face image by using a face detector and a key point detector, and acquiring a preprocessed standardized face image; Initializing deep learning training conditions, acquiring and loading pre-training weight parameters of a deep neural network model, setting the deep learning training parameters, and preprocessing a large-scale facial expression data set for deep learning training; The method comprises the following sub-steps of training an expression recognition model by using a deep learning method, wherein the sub-steps of the training process of the expression recognition model in the step 3 are as follows: The face feature extraction module extracts the face features in the face image sample through the face feature extraction network of the main branch and the auxiliary branches of the expression recognition model respectively; the face feature extraction network is ResNet residual neural network, and the main branch and the auxiliary branch share network weight parameters; step 3.2, a multidimensional feature classification module, wherein the extracted facial features respectively obtain a predicted probability value and an attention map of each category of each branch through a main branch and a plurality of auxiliary branches of the expression recognition model, adjust the predicted expression categories of the main branch and the auxiliary branches, and update weight parameters of the deep neural network model; The main branch of the expression recognition model comprises a multi-category classifier, wherein the multi-category classifier can acquire a probability prediction value of a face image to be recognized for each category; the auxiliary branches of the expression recognition model comprise a class II classifier, and the number of the auxiliary branches is equal to the number of expression categories; the two-class classifier can acquire the predicted probability values of the face image to be recognized for the target class and other classes, wherein the target class is the expression class of the auxiliary branch where the two-class classifier is positioned for classifying, and the target class of each auxiliary branch is different from each other; the attention map is formed by carrying out weighted fusion on the output probability value of each expression category and the face image to be recognized by using the expression recognition model, and the generated visual thermodynamic map can reflect the region of attention of the model on the face image for any predicted expression category; Step 3.3, a potential distribution learning module acquires the predicted probability value of each auxiliary branch for each target class, splices a plurality of predicted probability values, smoothes the probability distribution and then uses the smoothed probability distribution as potential label distribution, and then uses the potential label distribution as a real label to guide main branch learning and update the weight parameters of the deep neural network model; A step 3.4 of attention consistency module, which is to acquire attention force diagrams of each auxiliary branch for respective target categories and attention force diagrams of the main branch for all categories, splice attention force diagrams of a plurality of auxiliary branches, maintain consistency of the attention force diagrams of the main branch and the auxiliary branch and update weight parameters of the deep neural network model; And 4, taking the depth neural network model with the updated weight parameters as an expression recognition model, inputting the face image to be recognized into the expression recognition model, obtaining predicted values of the model for different expression categories, and outputting the expression category with the maximum predicted probability as an expression recognition result.
2. The expression recognition method based on uncertainty learning according to claim 1, wherein the preprocessing of the face image in step 1 comprises the sub-steps of: Step 1.1, detecting a face image of the identified original image by using a face detector OpenCV, and if a face region is detected, cutting a rectangular face region to be used as the face image to be identified; And 1.2, detecting face key points of the face image to be identified by using a face key point detector Dlib, aligning the faces of the face image to be identified according to the face key points, and obtaining a standardized face image through image normalization.
3. The expression recognition method based on uncertainty learning according to claim 1, wherein the sub-step of initializing the deep learning training condition in step 2 is: 2.1, pre-training a deep neural network model by using an MS-Celeb-1M face recognition data set, obtaining model weight parameters of the deep neural network model, initializing the deep neural network model by taking the pre-trained weight parameters as initial values, and setting an initial learning rate and a parameter optimization algorithm of deep learning; The deep neural network model is ResNet residual neural network model, the initial learning rate is 1e-4, and the parameter optimization algorithm is Adam algorithm; Training a deep neural network model by using a public expression data set collected under a field environment scene, and re-dividing a face image sample in the data set into a training set and a verification set, wherein a standardized face expression image obtained in an image preprocessing mode is used as a face image sample to be recognized; The image preprocessing mode comprises an image enhancement mode such as gray level transformation, random horizontal overturn, gaussian noise addition and the like.
4. The expression recognition method based on uncertainty learning of claim 3, wherein the sub-step of adjusting model parameters according to the loss value in step 3.2 is: Step 3.2.1, calculating a cross entropy loss value of the main branch, and adjusting a weight parameter of the main branch of the deep neural network model: Wherein N is the number of facial expression image samples, and y i and p i respectively represent the label value and the model predictive value of the ith image sample; Step 3.2.2, calculating cross entropy loss values of a plurality of auxiliary branches, and adjusting weight parameters of the auxiliary branches of the deep neural network model: And c is the number of facial expression image samples.
5. The method according to claim 4, wherein the sub-step of adjusting model parameters according to the loss value in the step 3.3 is: step 3.3.1, splicing probability prediction values of a plurality of auxiliary branches for target categories: Wherein the method comprises the steps of A probability prediction value after Softmax for each target class for a plurality of auxiliary branches; Step 3.3.2, smoothing probability distribution after auxiliary branch splicing to serve as potential label distribution: where T is a superparameter, the probability distribution becomes smoother when T > 1; step 3.3.3, calculating a main branch prediction distribution and a potential label distribution JS divergence loss value, and adjusting weight parameters of a main branch of the deep neural network model: Wherein the method comprises the steps of Prediction probability distribution after different prediction categories Softmax for the main branch.
6. The method according to claim 4, wherein the sub-step of adjusting model parameters according to the loss value in the step 3.4 is: Step 3.4.1. Attention to target class by concatenating multiple auxiliary branches strives to: Wherein the method comprises the steps of Striving for attention to respective target categories for a plurality of auxiliary branches; step 3.4.2, calculating Euclidean loss values of the main branch and the auxiliary branch attention map, and adjusting weight parameters of the deep neural network model: Wherein the method comprises the steps of Attention is sought for the different prediction categories of the main branch.
7. An expression recognition system based on uncertainty learning, which implements the expression recognition method based on uncertainty learning according to any one of claims 1 to 6, and is characterized by comprising a detection module, a preprocessing module, an input module, a training module and an output module; The detection module is used for detecting a face region from an original image to be recognized through a face detector OpenCV and outputting a cut rectangular face image to the preprocessing module; The preprocessing module is used for acquiring the coordinates of the key points of the human face through the key point detector Dlib of the human face, preprocessing the human face image to be recognized and outputting the standardized human face image after the human face alignment to the input module; The input module is used for preprocessing a facial expression image in the expression data set, initializing a pre-trained expression recognition model, setting training parameters of deep learning, and outputting the preprocessed data set and the initialized expression recognition model to the training module; the training module comprises a cloud server, an initialized expression recognition model is trained through facial expression image samples in the preprocessed facial expression data set, and the trained expression recognition model is output to the output module; And the output module is used for outputting the predicted facial expression category through the expression recognition model which is input by the training module and is completed by training according to the standardized facial image in the preprocessing module.

Description

Expression recognition method and system based on uncertainty learning Technical Field The invention belongs to the field of image processing and pattern recognition, and particularly relates to an expression recognition method and system based on uncertainty learning. Background Facial expression is one of the most important emotional expressions of humans, which can reflect our psychological states and character features. In recent years, facial expression recognition has attracted a great deal of attention, and has become a research hotspot in academia. In addition, facial expression recognition also shows great potential in the industrial field, and has been widely applied to the fields of monitoring safety, medical care and the like. Because of extremely high similarity among different facial expressions, and meanwhile, the manually marked expression images have strong subjectivity, and the uncertainty of the expression becomes a key challenge in recent years. The potential label distribution learning and the attention consistency are combined, and a facial expression recognition method based on uncertainty learning is provided to solve the problem of uncertainty of facial expression labeling. Disclosure of Invention The invention provides an expression recognition method and system based on uncertainty learning, which are used for solving the problems in the background technology. The technical scheme of the invention comprises the following steps: Step 1, acquiring a face image of a user by using a camera and built image acquisition software, preprocessing the input face image by using a face detector and a key point detector, and acquiring a preprocessed standardized face image; Step 1.1, detecting a face image of the identified original image by using a face detector OpenCV, and if a face region is detected, cutting a rectangular face region to be used as the face image to be identified; step 1.2, detecting face key points of a face image to be identified by using a face key point detector Dlib, aligning the faces of the face image to be identified according to the face key points, and then obtaining a standardized face image through image normalization; Initializing deep learning training conditions, acquiring and loading pre-training weight parameters of a deep neural network model, setting the deep learning training parameters, and preprocessing a large-scale facial expression data set for deep learning training; 2.1, pre-training a deep neural network model by using an MS-Celeb-1M face recognition data set, obtaining model weight parameters of the deep neural network model, initializing the deep neural network model by taking the pre-trained weight parameters as initial values, and setting an initial learning rate and a parameter optimization algorithm of deep learning; The deep neural network model is ResNet residual neural network model, the initial learning rate is 1e-4, and the parameter optimization algorithm is Adam algorithm; Training a deep neural network model by using a public expression data set collected under a field environment scene, and re-dividing a face image sample in the data set into a training set and a verification set, wherein a standardized face expression image obtained in an image preprocessing mode is used as a face image sample to be recognized; The image preprocessing mode comprises image enhancement modes such as gray level transformation, random horizontal overturning, gaussian noise addition and the like; Training an expression recognition model by using a deep learning method, and updating weight parameters of the expression recognition model by using a plurality of modules, wherein the method specifically comprises a face feature extraction module, a multi-dimensional feature classification module, a potential tag distribution learning module and an attention consistency module; The face feature extraction module extracts the face features in the face image sample through the face feature extraction network of the main branch and the auxiliary branches of the expression recognition model respectively; the face feature extraction network is ResNet residual neural network, and the main branch and the auxiliary branch share network weight parameters; step 3.2, a multidimensional feature classification module, wherein the extracted facial features respectively obtain a predicted probability value and an attention map of each category of each branch through a main branch and a plurality of auxiliary branches of the expression recognition model, adjust the predicted expression categories of the main branch and the auxiliary branches, and update weight parameters of the deep neural network model; The main branch of the expression recognition model comprises a multi-category classifier, wherein the multi-category classifier can acquire a probability prediction value of a face image to be recognized for each category; The auxiliary branches of the expression recognitio