CN-116311414-B - Feature-enhanced lightweight network FGNet facial expression recognition method

CN116311414BCN 116311414 BCN116311414 BCN 116311414BCN-116311414-B

Abstract

The invention discloses a feature-enhanced lightweight network FGNet facial expression recognition method, and belongs to the technical field of pattern recognition. The method comprises the steps of designing a lightweight network model FGNet based on Keras neural network frames aiming at the problem that a deep convolutional neural network is too complex, adding channel slicing and channel shuffling on the basis of a back, designing a lightweight and efficient attention module to extract characteristics of different scales and output rich multi-scale information by utilizing the extraction capability of a network attention module on channel and space characteristics so as to improve the performance of the network model, then providing self-adaptive class weights aiming at the problem of unbalanced class of a facial expression database sample, weighting the self-defined loss function so as to further improve the accuracy of various facial expression recognition, and finally designing a facial expression recognition system platform and demonstrating to realize end-to-end facial expression recognition.

Inventors

ZHOU LIFANG
LI SIQIN
LI WEISHENG
QUAN HAO

Assignees

重庆邮电大学

Dates

Publication Date: 20260512
Application Date: 20221206

Claims (2)

1. The feature-enhanced lightweight network FGNet facial expression recognition method is characterized by comprising the following steps of: 101. constructing a lightweight face recognition network frame-characteristic reinforced ghost network frame FGNet by taking a 'ghost' network-GhostNet as a basic model and combining the 'ghost' characteristic diagram generation characteristic of a 'ghost' module, the 'shuffling' characteristic of a channel of ShuffleNetV and the characteristic multiplexing characteristic, generating a 'ghost' characteristic diagram by linear operation to optimize the quantity of network parameters, and optimizing the capability of extracting the characteristics of the network model by utilizing the 'shuffling' characteristic of the channel; 102. designing an attention module to extract features with different scales and output multi-scale information by utilizing the extraction capability of the attention module in the network to the channel and the spatial features, and adding the attention module in the network model in the step 101; 103. Calculating weights for each type of expression samples according to the number of the database samples, and weighting the weights on a self-defined loss function, so that the balance of the training attention of the network to each expression type is realized; 104. Importing FGNet a network model into a facial expression recognition system to realize real-time facial expression recognition; The design of FGNet network frames in step 101 is shown as follows: A1, generating a feature map by adopting the characteristic of a ' ghost ' module proposed in a ' ghost ' network, and applying a series of linear transformations on the original feature map so as to generate a ' phantom ' feature map containing required information, specifically, an original feature map Y ' is generated by carrying out convolution on an input x once, and then linear operation is applied to each original feature in the Y Generating s 'phantom' feature maps: (1) Wherein the method comprises the steps of Is the i-th original feature map in Y', Is the j-th linear operation for generating the j-th 'phantom' feature map ; B1, the network module is specifically designed as follows, the input x is subjected to a layer of 3 x 3 depth separable convolution to obtain a feature matrix Then the two-dimensional convolution with 1 multiplied by 1 is carried out to obtain The characteristic of the hourglass block is utilized to obtain the characteristic of the hourglass block through 2D convolution of 1 multiplied by 1 After a layer of 3 x 3 depth division convolution, a feature matrix is obtained Input x is compared with the identity mapping characteristic of ResNet Performing an identity mapping to obtain And then will And (3) with The first layer output is obtained after the feature fusion Will obtain As the input x of the next layer depth separable convolution, the nested circulation of the module is realized, and after the module passes through an n-layer network, the output is that ; Dividing an input feature diagram into two branches in a channel dimension by using channel slicing operation, wherein the number of the channels is a and A-a respectively, the left branch is mapped equally, the right branch comprises a network module of the step A, and input channels and output channels are the same; D1, inputting the finally output characteristics of the step C1 into an attention module designed in the step 102 to obtain a multi-scale characteristic diagram; e1, classifying expression images by adopting a loss function designed in the step 103 at the network terminal, and obtaining the accuracy of final recognition; In the step 102, an attention module is designed to extract different scale features and output rich multi-scale information, specifically: a2, firstly constructing multi-scale features, and firstly splitting input X into S parts on the assumption that the input X is input , Representing the split S group, extracting different scale features from different parts, and finally splicing the extracted multi-scale features through Concat, wherein the process is shown in the following formula: (2) (3) (4) In the middle of For each part of the output characteristic diagram, F is the final output characteristic diagram, conv is convolution operation, K is the size of a convolution kernel, and G is the channel number grouping size; B2, obtaining attention vectors through the effective channel attention ECA to extract different scale features, namely extracting attention weights for different part features, and outputting the attention vectors Can be expressed as: (5) and C2, splicing the attention vectors for realizing the interaction of the attention information and fusing the cross-dimension information: (6) D2, re-correcting the attention vector by Softmax, wherein the corrected attention vector a is defined as follows: (7) E2, applying the corrected attention vector to the multi-scale feature map and taking the result as output, wherein the output Y is defined as follows: (8); in step 103, an adaptive class weight is provided and weighted to a self-defined loss function, specifically: a3, the Focal Loss function Focal Loss is a Loss function optimized on the cross entropy Loss function, and the weight of the difficult sample is distributed by adding parameters, wherein the Focal Loss function of two categories is popularized and used on a multi-category task, and the formula of the multi-category Focal Loss function s is as follows: (9) In the middle of Predicting the probability of the sample belonging to the category for the model, representing the difficulty and the ease of classifying the sample, when When the sample is classified easily, when When the sample is classified as an easy-to-classify sample; for the balance parameter, represents the adjusted weight of the corresponding positive sample when the loss is calculated, For attenuation parameters, the parameters are dominant, and when the values of the parameters are set, the parameters follow In the event of an increase in the volume, To be correspondingly reduced; B3, aiming at the problem of sample class unbalance, self-adaptive class weight is designed , Based on the expression library sample total number and the number of each sample category, the expression library sample total number is calculated by the following formula: (10) Wherein the method comprises the steps of For the number of categories of the expression database, , Is the first The weight of the class, Is the first The number of samples of the class; C3, weighting the category obtained by the formula (10) Weighting multiple classified focal point loss functions to obtain weighted focal point loss functions The formula is as follows: (11) d3, cascading Center Loss function Center Loss to reduce sample distance in the same category, increase sample distance in different categories, center Loss function Is defined as follows: (12) Wherein, the Representing the center of the y-th category, For the number of samples to be taken, Feature vectors for samples of class i; E3, training the deep neural network by adopting the obtained weighted focal point loss function and the central loss function in a combined supervision manner, and finally obtaining a loss function The definition is as follows: = = + (13) In the middle of For balancing loss functions And The recognition result is further optimized.
2. The method for facial expression recognition by using a feature-enhanced lightweight network FGNet according to claim 1, wherein the process of importing a FGNet network model into a facial expression recognition system to implement facial expression recognition is as follows: A. Firstly, using python to build a facial expression recognition system realized based on a deep convolutional neural network, realizing a system program by Keras, openCv and PyQt5 libraries, and creating controls such as a selection model, real-time shooting, image opening, time consumption, recognition result and the like in a UI interface; B. mapping the operation behaviors of the corresponding UI controls to the corresponding functions to execute the operations; C. The method comprises the steps of clicking a 'model selection' button, selecting a model file-FGNet model, then identifying based on the model, clicking a 'real-time camera' to open a camera to identify facial expressions in a real-time picture, or clicking a 'picture selection' button, selecting a facial picture to identify the expressions, and displaying the identification result with highest time and probability consumed by the identification and the identification probability of each type of expression after the system identifies the expressions.

Description

Feature-enhanced lightweight network FGNet facial expression recognition method Technical Field The invention belongs to the technical field of computer mode recognition, and particularly relates to a facial expression recognition method. Background Facial expression is the most abundant resource and an effective way for expressing emotion of people most easily when people communicate in a non-language way, and plays a very important role in people's communication. The expression contains abundant human behavior information, is a main carrier of emotion, and can express subtle emotional reactions of people and psychological states corresponding to the people through facial expression, so that the importance of the expression information in the communication between people can be seen. Facial expression recognition technology is focused with increasing importance of expression information, and is a current research hot spot. The facial expression recognition is a process of facial expression image acquisition, expression image preprocessing, expression feature extraction and expression classification by using a computer, and the method analyzes the expression information of a person by using the computer so as to infer the psychological state of the person and finally achieve the purpose of realizing intelligent interaction between human and machine. The expression recognition technology is one of contents of emotion computer research, is a very challenging task of multi-disciplinary intersection such as psychology, physiology, computer vision, biological feature recognition, emotion calculation, artificial psychological theory and the like, and has important roles and significance for natural and harmonious man-machine interaction, distance education, safe driving and the like. In the facial expression recognition process, different from other image vision fields, there are a plurality of difficulties that firstly, the characteristics of the facial expression are very fine, the subtle changes of the five sense organs possibly represent distinct emotions, secondly, the duration of the facial expression is shorter, the duration of a certain expression is often less than 1 second, the real-time performance of the facial expression detection is highly required, and furthermore, the difference of expression data among different faces is large, the noise resistance to illumination and image background is weak, and the characteristics bring a plurality of challenges to the development and application of facial expression recognition. A face picture has a large amount of information, and expressions made by a person at different moments in a video sequence are not completely the same, so that effective information such as texture features, five sense organs features and the like of an image needs to be extracted during expression recognition. The extraction of the effective information has important significance for improving the recognition speed and accuracy. Expression feature extraction is the most important part in the expression recognition research process, and the robustness and the integrity of the extracted features have a decisive influence on the final recognition result. In recent years, a series of feature extraction methods have been proposed, such as local binary pattern, gabor wavelet transform, principal component analysis (PRINCIPAL COMPONENT ANALYSIS, PCA), and the like, which are widely used in various kinds of image processing. After the feature extraction work is completed, the expressions need to be classified according to the extracted features. A series of feature classification methods, such as bayesian classifiers, support vector machines, K-nearest neighbor algorithms, etc., are widely used as researchers continue to explore. However, these conventional feature extraction methods mainly rely on researchers to manually extract facial image features, which not only consumes a lot of time and effort, but also has the limitation of human factors, and cannot fully describe facial expressions. In addition, the traditional classification method needs a plurality of prior conditions, and can generate misjudgment on the problem of unbalanced sample classification and is sensitive to missing data. Today, with the continuous emergence of computers with increasingly powerful computing power and large data sets, deep learning algorithms have been developed vigorously. Compared with the traditional method, the deep learning algorithm integrates two processes of feature extraction and classification, reduces the operation flow, can automatically extract the intrinsic features of sample data, has strong feature extraction capability, can extract more abstract and more essential features of images, does not need manual participation of researchers, has better recognition effect and robustness, and is very excellent in various contests related to computer vision. The convolutional neural network is used