CN-122004860-A - Psychological stress assessment method and system based on multi-level fusion of emotion characteristics

CN122004860ACN 122004860 ACN122004860 ACN 122004860ACN-122004860-A

Abstract

The invention discloses a psychological stress assessment method and a psychological stress assessment system based on emotion feature multi-level fusion, which relate to the technical fields of image processing, artificial intelligence and deep learning and comprise the following steps of obtaining emotion pupil waves and emotion face image samples; the method comprises the steps of obtaining facial expression characteristics by means of combination of cavity convolution and channel attention and space attention, decomposing and recombining emotion pupil waves by means of variational modal decomposition, extracting pupil time sequence characteristics by means of deep learning, introducing emotion cross attention mechanism to achieve emotion multi-modal multi-level characteristic fusion, constructing a ridge regression model based on a machine learning method, and completing psychological pressure assessment. According to the psychological stress assessment method and system based on the multi-level fusion of the emotion features, dynamic changes of pupils and facial expressions are synchronously obtained, and the multi-level fusion is utilized to carry out deep coupling on pupil waves and facial features, so that subjective deviation is reduced, and the accuracy and stability of psychological stress assessment are improved.

Inventors

LI MI
Dai Zongan
LI XIN
YANG JIAN

Assignees

北京工业大学

Dates

Publication Date: 20260512
Application Date: 20260203

Claims (10)

1. The psychological stress assessment method based on the multi-level fusion of the emotion characteristics is characterized by comprising the following steps of: S1, pupil waves and facial images of a detected person in different emotion states are collected, and emotion pupil waves and emotion facial image samples are formed; s2, adopting cavity convolution to combine a channel attention and a space attention mechanism to acquire facial expression characteristics; S3, decomposing and recombining emotion pupil waves by adopting a variational modal decomposition VMD, and extracting pupil space time sequence characteristics by utilizing deep learning; S4, introducing an emotion cross attention mechanism to perform emotion multi-mode multi-level feature fusion; s5, building a ridge regression model based on a machine learning method to complete psychological stress assessment.
2. The psychological stress assessment method based on multi-level fusion of emotion characteristics according to claim 1, wherein the specific steps of S1 are as follows: S11, presetting an emotion video library, wherein the emotion video library comprises calm, sad, happy, fear and tension video libraries; s12, randomly selecting one emotion video from each emotion video library; S13, integrating the selected emotion videos; s14, playing the integrated emotion videos through a player and providing the emotion videos for an evaluator to watch; s15, synchronously acquiring expression video information and pupil wave information of the evaluated person by using a camera.
3. The psychological stress assessment method based on multi-level fusion of emotion characteristics according to claim 1, wherein the specific steps of S2 are as follows: s21, preprocessing the surface condition video information, wherein the preprocessing comprises downsampling, face detection and registration; s22, constructing a cavity convolution network, capturing facial global expression information, and extracting cavity convolution characteristics; S23, strengthening a channel attention mechanism for the cavity convolution output characteristics, giving higher weight to channels corresponding to the surface area, and inhibiting interference of channels of irrelevant areas; s24, generating a space attention weight graph, optimizing a space attention mechanism and outputting facial expression feature vectors.
4. The psychological stress assessment method based on multi-level fusion of emotional characteristics according to claim 3, wherein the specific steps of S21 are as follows: s211, downsampling, namely starting from the 1 st frame for the expression video, sampling 1 time every 10 frames, and reducing the original video length by 90%; And S212, detecting the positions of key points of the human face of each frame of image of the downsampled video, wherein the key points comprise eyes, a nose and a mouth, cutting down the image according to 224 multiplied by 224 according to the position coordinates of the key points, reducing background noise, and enabling all the images to be aligned with the 1 st frame of the video.
5. The psychological stress assessment method based on multi-level fusion of emotion features according to claim 3, wherein the step S23 is characterized in that the extrusion-excitation SE module processes the cavity convolution output features, learns the weights of all channels, gives higher weights to channels corresponding to eye and lip areas, and suppresses interference of channels of background and hair areas.
6. The psychological stress assessment method based on the multi-level fusion of the emotional characteristics of claim 1, wherein S3 is characterized in that the number of decomposition modes is determined through the center frequency, mode selection recombination is carried out by utilizing the correlation coefficient and the information entropy, noise modes are removed, and the information modes are left; Pupil spatial temporal features are extracted by employing a combination of manba mamba and a self-attention model transducer.
7. The psychological stress assessment method based on multi-level fusion of emotion characteristics according to claim 1, wherein the specific steps of S4 are as follows: s41, carrying out cross fusion on features of different emotions in the same mode by using emotion cross attention to acquire feature relations among different emotions; s42, cross-fusing the expression modal characteristics and pupil wave modal characteristics by using cross attention to acquire characteristic relations among different modalities.
8. The psychological stress assessment method based on multi-level fusion of emotional characteristics according to claim 7, wherein the specific steps of S41 are as follows: S411, calculating between different emotions through a weight matrix: ; Q is a Query matrix Query, the dimension is n multiplied by d k , n is a target position to be concerned, K is a Key matrix Key, the dimension is m multiplied by d k , m is a Value matrix Value, the dimension is m multiplied by d v ,d v , d k is the dimension of a Key vector, T is a matrix transposition and supplements letter meanings, and QK T represents the association between every two of various emotions; the Softmax is normalized by rows, so that the sum of each row is 1, and the attention weight of each emotion to other emotion is represented; s412, weighted summation of the value vectors by using the attention weight, and output is generated: Z = Attention Weight·V; Wherein Z is the final output vector.
9. The psychological stress assessment method based on multi-level fusion of emotion characteristics according to claim 1, wherein the specific steps of S5 are as follows: S51, constructing a training data set, namely collecting expression video data and pupil wave data of a tested person, collecting DASS-21 pressure gauge data and generating a training sample; S52, sample division and parameter initialization, namely dividing a training sample into a training set, a verification set and a test set, wherein the candidate range of a regularization parameter lambda of the initial ridge regression model is [0.01,0.1,1,10,100], and a least square method is adopted as a basic optimization algorithm; S53, model training and parameter optimization, namely training a ridge regression model by minimizing a loss function with an L2 regular term by taking a training set as input, performing performance evaluation on models corresponding to different lambda values by utilizing a verification set, calculating an average absolute error MAE of the verification set, and selecting the lambda value with the minimum MAE as a final model parameter; S54, model evaluation and psychological stress estimation, namely inputting the test set into a trained ridge regression model, and adopting MAE, root Mean Square Error (RMSE) and decision coefficient R 2 as evaluation indexes to realize quantitative evaluation of psychological stress.
10. A system for applying a psychological stress assessment method based on multi-level fusion of emotional characteristics according to any of the preceding claims 1 to 8, comprising: The data acquisition module is used for acquiring a plurality of emotion categories corresponding to the target emotion indexes, respectively acquiring pupil waves and facial images of the tested person in each emotion state, and comprises an expression characteristic extraction module and a pupil wave characteristic extraction module; The emotion feature extraction module is used for acquiring facial expression features by adopting cavity convolution and combining a channel attention and a space attention mechanism, and comprises a channel attention module and a space attention module; the time sequence feature extraction module is used for decomposing and recombining emotion pupil waves by adopting a variational modal decomposition VMD, and extracting pupil space time sequence features by utilizing deep learning; The multi-level feature fusion module is used for introducing an emotion cross attention mechanism to realize emotion multi-mode multi-level feature fusion, and comprises a feature extraction module between different emotions in the same mode and a feature extraction module between the same emotions in different modes: the pressure evaluation module is used for constructing a ridge regression model based on a machine learning method to complete psychological pressure evaluation.

Description

Psychological stress assessment method and system based on multi-level fusion of emotion characteristics Technical Field The invention relates to the technical fields of image processing, artificial intelligence and deep learning, in particular to a psychological stress assessment method and system based on multi-level fusion of emotion characteristics. Background In modern society, stress has become a ubiquitous phenomenon including not only various reactions of the body and the mind, but also the rapid pace and numerous demands of modern life have gradually evolved mental stress into increasingly common mental health problems. Stress can cause severe illnesses such as depression, addiction, stroke, heart attack, etc., and excessive stress can also have an impact on brain structure and function. In addition to having an impact on physical and mental health, mental stress may also lead to significant losses in the industry. Individuals in a state of mental stress are more likely to make mistakes when performing daily tasks in the work. Early detection, clinical intervention and disease prevention are particularly important in view of the severe consequences of stress. Although stress is a subjective, multidimensional phenomenon, it is difficult to evaluate it comprehensively by objective means. However, in the prior art, a psychological pressure assessment method which only depends on a self-assessment scale or a static image is adopted, and the method has the defects of strong subjectivity, low ecological efficiency, incapability of capturing dynamic pressure processes, isolation of information of each mode and the like. Disclosure of Invention The invention aims to provide a psychological stress assessment method and system based on emotion characteristic multi-level fusion, which solve the problems in the background technology. In order to achieve the above purpose, the invention provides a psychological stress assessment method based on multi-level fusion of emotion characteristics, which comprises the following steps: S1, pupil waves and facial images of a detected person in different emotion states are collected, and emotion pupil waves and emotion facial image samples are formed; s2, adopting cavity convolution to combine a channel attention and a space attention mechanism to acquire facial expression characteristics; S3, decomposing and recombining emotion pupil waves by adopting a variational modal decomposition VMD, and extracting pupil space time sequence characteristics by utilizing deep learning; S4, introducing an emotion cross attention mechanism to perform emotion multi-mode multi-level feature fusion; s5, building a ridge regression model based on a machine learning method to complete psychological stress assessment. Preferably, the specific steps of S1 are as follows: S11, presetting an emotion video library, wherein the emotion video library comprises calm, sad, happy, fear and tension video libraries; s12, randomly selecting one emotion video from each emotion video library; S13, integrating the selected emotion videos; s14, playing the integrated emotion videos through a player and providing the emotion videos for an evaluator to watch; s15, synchronously acquiring expression video information and pupil wave information of the evaluated person by using a camera. Preferably, the specific steps of S2 are as follows: s21, preprocessing the surface condition video information, wherein the preprocessing comprises downsampling, face detection and registration; s22, constructing a cavity convolution network, capturing facial global expression information, and extracting cavity convolution characteristics; S23, strengthening a channel attention mechanism for the cavity convolution output characteristics, giving higher weight to channels corresponding to the surface area, and inhibiting interference of channels of irrelevant areas; s24, generating a space attention weight graph, optimizing a space attention mechanism and outputting facial expression feature vectors. Preferably, the specific steps of S21 are as follows: s211, downsampling, namely starting from the 1 st frame for the expression video, sampling 1 time every 10 frames, and reducing the original video length by 90%; And S212, detecting the positions of key points of the human face of each frame of image of the downsampled video, wherein the key points comprise eyes, a nose and a mouth, cutting down the image according to 224 multiplied by 224 according to the position coordinates of the key points, reducing background noise, and enabling all the images to be aligned with the 1 st frame of the video. Preferably, the step S23 is to process the cavity convolution output characteristics through the extrusion-excitation SE module, learn the weight of each channel, assign higher weight to the channels corresponding to the eyes and lips areas, and inhibit the interference of the channels of the background and hair areas. Preferably, the S3