CN-122020459-A - Multi-mode cognitive load detection method
Abstract
The application discloses a multi-mode cognitive load detection method, which comprises the steps of performing independent coding processing on target multi-mode data to obtain different mode characteristics after preliminary coding, dynamically learning weights of all modes by using a channel attention mechanism based on the different mode characteristics after preliminary coding, realizing dynamic reconstruction of the multi-mode characteristics by broadcasting Hadamard products, respectively inputting reconstructed feature vectors into a load task sensitive encoder and an individual sensitive encoder to generate task sensitive characteristics and individual sensitive characteristics, generating final characteristics, and classifying cognitive load levels by a multi-layer perceptron based on the final characteristics. The application can improve the representation capability of the model on the multi-modal data, thereby improving the performance of the cognitive load detection model based on the multi-modal data.
Inventors
- LIU YONGJIN
- YANG PEI
Assignees
- 清华大学
Dates
- Publication Date
- 20260512
- Application Date
- 20260120
Claims (10)
- 1. A method for multimodal cognitive load detection comprising: performing independent coding treatment on the target multi-mode data to obtain different mode characteristics after preliminary coding; Based on the different mode characteristics after preliminary encoding, dynamically learning the weight of each mode by using a channel attention mechanism, and realizing the dynamic reconstruction of the multi-mode characteristics by broadcasting Hadamard products; inputting the reconstructed feature vectors into a load task sensitive encoder and an individual sensitive encoder respectively, generating task sensitive features and individual sensitive features, and generating final features; Based on the final characteristics, classification of cognitive load levels is performed by a multi-layer perceptron.
- 2. The method of claim 1, further comprising, prior to independently encoding the target multi-modal data: Dividing each mode data in the CL-Drive data set into non-overlapping samples according to a 10-second window length, wherein the corresponding load level label constructs a load class label by using the tested subjective load scores of the corresponding mode signals according to the load level two-class task class and the load level three-class task class; Screening the segmented modal data, and removing the continuous missing and quenched 2-second modal data to obtain an effective sample; Dividing all effective samples by adopting a 10-fold cross-validation mode, taking each part as a test set in sequence, and forming a training set by the corresponding remaining 9 parts; and taking the training set as target multi-mode data for independent coding processing.
- 3. The method according to claim 2, wherein the performing independent encoding processing on the target multi-modal data to obtain the different primarily encoded modal characteristics includes: Encoding the target multi-modal data by a modal encoding module, wherein the modal encoding module includes a modal encoder for each modal preliminary encoding The mode attention dynamic fusion two parts, the mode encoder The system consists of 3 convolution blocks with the same structure and a global average pooling layer, wherein the convolution blocks comprise a one-dimensional convolution layer, a batch normalization layer, a linear rectification unit, a one-dimensional convolution layer, a batch normalization layer, a linear rectification unit and a maximum pooling layer in sequence; the encoding process is expressed as: Wherein, the Representing the input data of the corresponding modality M, For the encoder of the corresponding mode M, For corresponding input Is a coded feature of (a).
- 4. A method according to claim 3, wherein dynamically learning the weights of the modes using a channel attention mechanism based on the preliminarily encoded different mode characteristics and implementing the dynamic reconstruction of the multi-mode characteristics by broadcasting hadamard products comprises: characterizing the different modes after preliminary encoding Respectively as a characteristic channel to form a characteristic Expressed as: The Stack () is a stacking function, and is used for forming a multi-channel data by using input as different channels; the EEG, ECG, EDA, and size modality data encoded features, Representing 4 channels after stacking, respectively corresponding to , Representing the characteristics of the four mode characteristics after being stacked; features to be characterized The method is input to a modal attention module to realize modal dynamic reconstruction, wherein the modal attention module comprises dynamic weight learning and feature reconstruction based on dynamic weight, and the specific process is as follows: Where N is the dimension of the feature after the modal data encoding, as well as the broadcast Hadamard product, and Flatten () represents the feature flattening, Which represents Sigmod activation functions, LN () represents a linear layer, In order to modify the linear cell activation function, Representation of I (i=1, 2,3, 4), Representation of The mean (i=1, 2,3, 4), To learn the weights for the four modalities, And reconstructing the characteristic vector.
- 5. The method of claim 4, wherein inputting the reconstructed feature vectors into the load-task-sensitive encoder and the individual-sensitive encoder, respectively, generating task-sensitive features and individual-sensitive features, and generating final features, comprises: The reconstructed feature vector Sensor encoder for sensing tasks by load And individual sensitive encoder Separate coding, realizing the decomposition of input features to load task sensitivity and individual sensitivity, load task sensitivity encoder And individual sensitive encoder For perceptrons that contain a hidden layer, the encoding process is expressed as: Wherein, the Representing the characteristics after the last stage of encoding, And Task and test sensitive features, respectively; Based on the task sensitive feature and the tested sensitive feature, generating a final feature Expressed as: Wherein, the Is the final feature.
- 6. The method of claim 5, wherein classifying the cognitive load level by a multi-layer perceptron based on the final characteristics comprises: Will eventually end up The method comprises the steps of inputting a cognitive load level classification module and realizing load level classification, wherein the cognitive load level classification module is a multi-layer perceptron comprising two hidden layers, a Dropout layer and a ReLU activation layer are arranged between the two hidden layers, and the specific calculation process of the cognitive load level class is as follows: Wherein, the Is the classification result.
- 7. The method according to claim 6, wherein the loss function of the multi-modal cognitive load detection method is specifically: Wherein, the In order to classify the loss term(s), And Load task sensitivity loss and individual sensitivity loss items respectively, In order to reconstruct the loss of the device, Is a weight coefficient.
- 8. The method as recited in claim 7, further comprising: The classification loss term is a cross entropy loss function and is used for realizing optimization of the model on cognitive load level classification; The load task sensitive loss item and the individual sensitive loss item are respectively used for realizing that the sample distance corresponding to the same load level induction task and the individual label is as close as possible, otherwise, the sample distance is as far as possible to optimize the target, and the perception of the model to the load level induction task and the individual is improved; the load level inducing task labels use load level class labels corresponding to the samples, and the individual labels correspond to different tested numbers. Load task sensitivity loss Is defined as: Wherein, the As a function of the similarity measure, As a function of the temperature parameter(s), In order to indicate the function, The vector after being encoded by the load-task sensitive encoder for the kth sample, A load level evoked task tag corresponding to the kth sample; Individual sensitivity loss term The definition is as follows: Reconstruction loss For implementing input features by Load task sensitive code and individual sensitive code characteristics obtained through decomposition Reconstructing input features Reconstructing the loss term Defined as a mean square error function : Wherein the reconstructed feature vector is obtained by a perceptron comprising a linear layer I.e. MSE () represents the mean square error function.
- 9. A multi-modal cognitive load detection device, comprising: the first module is used for independently encoding the target multi-modal data to obtain different modal characteristics after preliminary encoding; The second module is used for dynamically learning the weight of each mode by using a channel attention mechanism based on the preliminarily coded different mode characteristics and realizing the dynamic reconstruction of the multi-mode characteristics by broadcasting the Hadamard product; the third module is used for inputting the reconstructed feature vectors into the load task sensitive encoder and the individual sensitive encoder respectively, generating task sensitive features and individual sensitive features, and generating final features; and a fourth module, configured to classify the cognitive load level by using the multi-layer perceptron based on the final feature.
- 10. An electronic device comprising a processor and a memory; Wherein the processor runs a program corresponding to executable program code stored in the memory by reading the executable program code for implementing the method according to any one of claims 1-8.
Description
Multi-mode cognitive load detection method Technical Field The invention relates to the technical field of artificial intelligence, in particular to a multi-mode cognitive load detection method. Background The existing cognitive load detection method mainly comprises a subjective measurement method and an objective measurement method, wherein the subjective measurement method collects the cognitive load experience of a person through a scale, and the subjective measurement method has the advantages of low cost, easiness in implementation, incapability of immediately feeding back and larger subjective influence. The objective measurement is mainly based on signals such as physiology, behavior and the like, and a machine learning method is used for mapping an input signal to a cognitive load level so as to realize the prediction of the cognitive load level. In recent years, with the development and popularization of wearable equipment, a cognitive load objective measurement method is researched and paid a great deal of attention. The existing cognitive load objective measurement method can be divided into a method based on single mode and a method based on multiple modes according to the number of supported data modes. The unimodal method is based on unimodal data, such as based on brain signal EEG 2, photoplethysmography PPG, electrocardiographic ECG, galvanic skin response EDA/GSR, etc. Although the single-mode method is easier to realize and has a certain progress, different modes can provide a useful clue for the detection of the cognitive load, and the complementary information of the different modes on the detection task of the cognitive load cannot be utilized based on the single-mode method. In recent years, attention is paid to multi-mode-based cognitive load detection, but the conventional multi-mode-based cognitive load detection method generally adopts simple strategies such as feature stitching and the like for multi-mode fusion, and the representation capability of a model on multi-mode data is limited. The effective representation of the model to the multi-modal data is realized, and the improvement of the performance of the multi-modal-based cognitive load objective measurement method is still full of challenges. Disclosure of Invention The invention mainly aims to provide a multi-mode cognitive load detection method which solves the problems of limitation of the prior art in multi-mode data representation and performance limitation of a model in a cognitive load detection task. Another object of the present invention is to provide a multi-modal cognitive load detection device. A third object of the present invention is to propose an electronic device. In order to achieve the above object, an embodiment of a first aspect of the present invention provides a method for detecting a multi-modal cognitive load, including: performing independent coding treatment on the target multi-mode data to obtain different mode characteristics after preliminary coding; Based on the different mode characteristics after preliminary encoding, dynamically learning the weight of each mode by using a channel attention mechanism, and realizing the dynamic reconstruction of the multi-mode characteristics by broadcasting Hadamard products; inputting the reconstructed feature vectors into a load task sensitive encoder and an individual sensitive encoder respectively, generating task sensitive features and individual sensitive features, and generating final features; Based on the final characteristics, classification of cognitive load levels is performed by a multi-layer perceptron. Optionally, before the independent encoding processing is performed on the target multi-mode data, the method further includes: Dividing each mode data in the CL-Drive data set into non-overlapping samples according to a 10-second window length, wherein the corresponding load level label constructs a load class label by using the tested subjective load scores of the corresponding mode signals according to the load level two-class task class and the load level three-class task class; Screening the segmented modal data, and removing the continuous missing and quenched 2-second modal data to obtain an effective sample; Dividing all effective samples by adopting a 10-fold cross-validation mode, taking each part as a test set in sequence, and forming a training set by the corresponding remaining 9 parts; and taking the training set as target multi-mode data for independent coding processing. Optionally, the performing independent encoding processing on the target multi-mode data to obtain different primarily encoded mode features includes: Encoding the target multi-modal data by a modal encoding module, wherein the modal encoding module includes a modal encoder for each modal preliminary encoding The mode attention dynamic fusion two parts, the mode encoderThe system consists of 3 convolution blocks with the same structure and a global average pool