CN-116304815-B - Motor imagery electroencephalogram signal classification method based on self-attention mechanism and parallel convolution

CN116304815BCN 116304815 BCN116304815 BCN 116304815BCN-116304815-B

Abstract

A motor imagery electroencephalogram signal classification method based on a multi-head self-attention mechanism and parallel convolution belongs to the field of computer software. Aiming at the problem of difficult feature extraction caused by low signal-to-noise ratio of the electroencephalogram signals, an improved network model based on EEGNet is provided, which is called EEG-MATCNet for short. Firstly, a parallel convolution layer is used for carrying out preliminary feature extraction on an original electroencephalogram signal, and convolution kernels with different scales can extract time features with different step sizes. Meanwhile, the attention weight of the electroencephalogram signals between the electrodes is calculated through a multi-head self-attention mechanism, so that the spatial characteristics can be better extracted during network training. In addition, the receptive field of the convolution kernel is improved through the time convolution network, so that the model can extract higher-level time characteristics. Experiments prove that the classification method provided by the invention can more effectively improve the characteristic extraction and classification performance of the motor imagery electroencephalogram signals.

Inventors

WANG DAN
ZHOU HAO
CHEN JIAMING
XU MENG

Assignees

北京工业大学

Dates

Publication Date: 20260505
Application Date: 20230307

Claims (4)

1. A motor imagery electroencephalogram signal classification method based on a self-attention mechanism and parallel convolution is characterized by comprising the following steps: Step 1, preprocessing data, namely performing band-pass filtering processing on motor imagery electroencephalogram signals by using a band-pass filter, and performing exponential moving average standardization on the filtered signals; Step 2, constructing an EEG-MATCNet model, using a parallel convolution layer to replace a common convolution layer in the EEGNet model to extract multi-scale time features, adding a spatial self-attention mechanism to enable a network to better extract the spatial features, and adding a time sequence convolution network to extract advanced time features; Step 3, inputting the training set and the verification set in the step 1 into EEG-MATCNet for training; step 4, inputting the test set in the step 1 into the trained model in the step 3 for classification, and evaluating classification accuracy; The step 2 is specifically as follows: The specific structure of EEG-MATCNet is mainly summarized in four parts, namely a parallel convolution layer, a self-attention layer, a time convolution network layer and a full connection layer, wherein a Pytorch building model is used, and each part is described in detail below: (1) Parallel convolution layer The optimal parallel structure is obtained through multiple experiments, namely, branch 1 uses 2 convolution kernels with the sizes of (1, 16) and the step length of 1, branch 2 uses 4 convolution kernels with the sizes of (1, 32) and the step length of 1, branch 3 uses 8 convolution kernels with the sizes of (1, 64) and the step length of 1 to conduct feature extraction, the convolution kernel filling modes of the 3 branches are all set to be same, gradient disappearance in network training is prevented through a batch normalization layer, and finally an ELU activation function is used for helping a network to converge more quickly; (2) Self-attention layer Firstly, the electroencephalogram signals of each electrode extracted by the preliminary time features are converted through a linear layer to obtain three vectors, namely a query vector (Q), a key vector (K) and a value vector (V), then the query vector of each electrode and the key vectors of all electrodes are subjected to dot product calculation through dot product scaling, and the attention weight is obtained through normalization through the dimension d k of the key vectors, and the vector with the same length as the input sequence and the same dimension as the weight matrix can be obtained through calculation of each query vector in the sequence, wherein the calculation formula of the output vector obtained through the process is as follows: After obtaining a feature vector strengthened along a space axis, flattening the brain voltage characteristics of the C dimension into 1 dimension by deep convolution with the convolution kernel size of (C, 1) and the step length of 1, wherein C represents the number of electrodes, then reducing the sampling rate by an average pooling layer with the kernel size of (1, 4) and the step length of (1, 4), introducing a Dropout mechanism to randomly discard the parameters learned in the upper layer of network in order to prevent the model from being over fitted, setting the discarding proportion to be 0.5, inputting the features into a separable convolution layer, wherein the separable convolution comprises two operations of deep convolution and point-by-point convolution, the convolution kernel size of the deep convolution is set to be (1, 16), the convolution step length is 1, the filling mode is set to be the same, the convolution kernel size of the point-by-point convolution is set to be (1, the convolution step length is 1, the filling mode is set to be the same, and then sequentially subjected to a normalization layer, an ELU activation function layer, the average pooling layer and the random discarding parameter layer are processed, and the average pooling layer size and the step length are both 1,8 and the random discarding proportion is set to be 0.5; (3) Time convolutional network layer In order to enlarge the receptive field of the convolution kernel, a 2-layer time convolution module is introduced, and the problem of gradient disappearance possibly caused by network training is avoided by using a residual connection mode, wherein the convolution kernel size of causal expansion convolution of the 1-layer time convolution module is set to be 4, and an expansion factor is set to be 1, so that each point of network output contains characteristic information of the first four points; (4) Full connection layer The obtained advanced time features are overlapped and input into a full-connection layer, the maximum norm constraint is added into the full-connection layer to carry out regularization treatment, the maximum norm value is set to be 0.25 so as to prevent the overfitting phenomenon, and finally the obtained advanced time features are input into a Softmax classifier to be classified, so that the finally judged motor imagery task type is obtained.
2. The motor imagery electroencephalogram classification method based on a self-attention mechanism and parallel convolution according to claim 1, wherein: The step 1 specifically comprises the following steps: (1) Carrying out common average reference on the original brain electrical signals; (2) Extracting 4-40Hz electroencephalogram signals through a 3-order Butterworth band-pass filter; (3) Performing exponential sliding average standardization on the filtered electroencephalogram signals, wherein an attenuation factor is set to be 0.999 so as to reduce the influence of numerical value difference on model effect; (4) Dividing the training set into a training set and a verification set according to the ratio of 4:1 for later 5-fold cross verification; (5) And selecting the electroencephalogram signals in a segmented way, wherein each segment represents a complete motor imagery electroencephalogram task, and the length of each intercepted electroencephalogram signal is 4s.
3. The motor imagery electroencephalogram classification method based on a self-attention mechanism and parallel convolution according to claim 1, wherein: the step 3 is specifically as follows: The training method comprises the steps of dividing a training set into 5 parts by using a 5-fold cross validation method, carrying out 5 experiments in total, taking out 4 different data each time as the training set, taking the rest as the validation set, inputting the training set into an EEG-MATCNet model for training, inputting 64 sections of EEG signals into a network for training each time, iterating 1000 rounds, recording an optimal loss value by adopting a cross entropy loss function in the training process, if the loss value of 300 rounds is lower than the optimal loss value, logging out the iteration in advance, recording the average accuracy of the training set and the validation set, saving the weight of the optimal model, adopting an Adam optimizer for alleviating gradient oscillation in the network training process, setting the learning rate to be 0.001, respectively carrying out model training and testing on 9 subjects to obtain 9 groups of validation set accuracy, and recording the average value as the final model accuracy.
4. The motor imagery electroencephalogram classification method based on a self-attention mechanism and parallel convolution according to claim 1, wherein: The step 4 is specifically as follows: inputting the test set in the step 1 into the trained model in the step3 for classification recognition, and evaluating the classification accuracy.

Description

Motor imagery electroencephalogram signal classification method based on self-attention mechanism and parallel convolution Technical Field The invention discloses a motor imagery electroencephalogram signal classification method based on a self-attention mechanism and parallel convolution, which can be used for decoding motor imagery electroencephalogram signals and belongs to the field of computers. Background Brain-machine interface technology is the leading research direction of multidisciplinary fusion, where brain electrical signals are acquired and decoded by a device, converted into commands, and then forwarded to an output device to perform the required operations. And is widely applied to the fields of biomedical treatment, entertainment, education, intelligent home furnishing, military and the like. Brain activity may be recorded by various neuroimaging methods. These methods may be invasive or non-invasive. The most popular brain-machine interface non-invasive brain wave acquisition method at present is electroencephalogram (EEG) detection. The popularity of electroencephalograms benefits from the low cost of the device, reduced complications compared to invasive surgery, portability, ease of setup and use, and the possibility of directly measuring neural activity. In an electroencephalogram-based brain-computer interface, motor imagery (Motor Imagery, MI) is a very classical paradigm, motor imagery electroencephalogram signals are electrical signals that are emitted on the scalp when a person imagines the movement of different parts of his body. When a person performs a task of motor imagery of the hand, the alpha (8-12 Hz) and beta (13-30 Hz) waves of the sensorimotor electroencephalogram signal on the opposite side of the motor hand decrease in amplitude, known as event-related desynchronization (ERD), while the alpha and beta waves of the sensorimotor electroencephalogram signal on the same side of the motor hand increase in amplitude, known as event-related synchronization (ERS). According to this rule, one's intention can be interpreted. Motor imagery is considered one of the most promising paradigms to aid rehabilitation in quadriplegia, spinal cord injury, and Amyotrophic Lateral Sclerosis (ALS) patients. Although brain-computer interface technology based on motor imagery paradigms has been widely applied in fields such as rehabilitation and medical treatment, the decoding performance of the brain-computer interface technology still cannot well meet the needs of practical application. Because the electroencephalogram signal is non-gaussian, non-stationary and nonlinear, the acquired EEG signal is susceptible to external noise (e.g., power frequency interference of electrical equipment) and internal noise (e.g., physiological sources, electro-oculogram signals). In addition, due to physiological differences of people, brain electrical signals of different subjects performing the same motor imagery task may be quite different, and brain electrical signals of the same subject may be quite different for the same imagery task at different times. Therefore, how to extract valid features from motor imagery electroencephalogram signals and perform correct decoding remains a challenging problem. The flow of a common motor imagery electroencephalogram signal classification algorithm is shown in the following chart and generally comprises four parts of preprocessing, feature extraction, feature selection and classification. The traditional feature extraction method is mainly used for selecting the characteristics of the time domain, the frequency domain or the space domain of the electroencephalogram signals. For example, temporal features at different points in time or for different time periods may be extracted in the time domain by means of means, variances, hjorth parameters, skewness, etc. The time-frequency domain characteristics of the original electroencephalogram signals can be extracted by utilizing wavelet transformation, power spectral density and fast Fourier transformation. The spatial features of the brain electrical signals are extracted using co-spatial modes (Common SPATIAL PATTERN, CSP) and variants thereof. And classifying the extracted features by using methods such as linear discriminant analysis, a support vector machine, a neural network, a Bayesian classifier and the like. However, the conventional method generally requires abundant prior knowledge and a large number of feature selection processes, and with the popularity of deep learning, more and more researchers try to apply an end-to-end depth model to classification of motor imagery electroencephalogram signals and obtain good effects. Lawhen et al propose EEGNet, a compact network based on CNN architecture, which is rolled along the time dimension and convoluted deeply along the space dimension, so that the experimental performance is stable in 4 different paradigms under the condition of greatly reducing training para