CN-121971088-A - Electroencephalogram emotion recognition method based on multi-scale space-time convolution and dynamic region division
Abstract
The invention discloses a multi-scale space-time convolution and dynamic region division electroencephalogram emotion recognition method, which relates to the field of brain-computer interfaces and emotion calculation and comprises the following steps of S1 carrying out data preprocessing flow on electroencephalogram signals of acquired users, S2 carrying out joint modeling on space-time modes on different time scales, S3 constructing a dynamic graph structure on a whole brain channel, S4 carrying out local state coding on each sub-graph in sequence, S5 sending global feature representation and local feature representation together into a cross attention module, and S6 inputting the obtained depth fusion feature representation into a multi-layer perceptron classifier and outputting emotion category prediction results. The multi-scale space-time structure and the global-local topological relation of the electroencephalogram signals are fully utilized while noise interference is effectively restrained, and the accuracy rate of emotion recognition and the robustness of cross-session and cross-tested are remarkably improved.
Inventors
- ZHU QIANG
- PAN WEI
- ZHANG ZIMING
- LI CHENFENG
- Xie Qige
- Hao changping
- WEI YUCHEN
- LI XINWEI
- HU XINRONG
Assignees
- 武汉纺织大学
Dates
- Publication Date
- 20260505
- Application Date
- 20251216
Claims (9)
- 1. The electroencephalogram emotion recognition method for multi-scale space-time convolution and dynamic region division is characterized by comprising the following steps of: S1, performing a data preprocessing flow on an electroencephalogram signal of an acquired user to obtain electroencephalogram characteristics; s2, inputting the obtained electroencephalogram characteristics into a multi-scale space-time convolution attention module, and carrying out joint modeling on space-time modes on different time scales to obtain multi-scale space-time characteristics; s3, inputting the obtained multi-scale space-time characteristics into a global state encoder, and constructing a dynamic graph structure for the whole brain channel to obtain global characteristic representation; s4, dividing a channel into a plurality of functional areas by adopting a data-driven dynamic sub-graph dividing strategy on the basis of multi-scale space-time characteristics, and sequentially carrying out local state coding on each sub-graph to obtain local characteristic representation; s5, the global feature representation and the local feature representation are sent to a cross attention module together to obtain a depth fusion feature representation; And S6, inputting the obtained depth fusion characteristic representation into a multi-layer perceptron classifier, and outputting emotion category prediction results.
- 2. The brain electrical emotion recognition method for multi-scale space-time convolution and dynamic region division according to claim 1, wherein the method comprises the following steps of: the multi-scale spatiotemporal convolution attention module includes: (1) The channel attention sub-module is used for carrying out weighted recalibration on the input brain electrical characteristics in the channel dimension, respectively applying global average pooling and global maximum pooling to the input characteristics to obtain two channel statistical characteristic branches, respectively carrying out one-dimensional convolution and nonlinear activation to obtain channel weights, adding the channel weights, normalizing the channel weights by a Sigmoid function, and multiplying the channel weights by the original characteristics channel by channel; (2) The space attention sub-module is used for carrying out weighted recalibration on the channel attention output in the time dimension, taking the weighted characteristics of the channel attention as input, respectively carrying out average pooling and maximum pooling along the channel dimension, carrying out one-dimensional convolution and Sigmoid mapping on the two obtained feature images after the channel dimension is spliced to obtain space/time attention weight, and multiplying the space/time attention weight with the channel attention output element by element; (3) The multi-scale one-dimensional convolution branch is used for extracting multi-scale time sequence characteristics under different convolution kernel scales and fusing, comprises at least three parallel one-dimensional convolution branches, wherein the convolution kernels of the three parallel one-dimensional convolution branches are different in size, outputs of the branches are spliced in a channel dimension, and the multi-scale time-space characteristic representation is obtained through time dimension average pooling after the outputs are processed by a normalization layer and a nonlinear activation layer.
- 3. The brain electrical emotion recognition method for multi-scale space-time convolution and dynamic region division according to claim 1, wherein the method comprises the following steps of: The specific implementation manner of the step S1 is as follows: S11, carrying out band-pass filtering on an original multi-channel EEG signal, and reserving a frequency band of 0.1-75 Hz to inhibit power frequency interference and low-frequency drift; s12, performing re-reference processing on the filtered EEG signals, namely subtracting the average value of all channels at each time point from each channel signal at the time point; s13, segmenting a continuous EEG sequence in a sliding window mode, wherein the window length is 4 seconds, and the step length is 4 seconds, so that a plurality of time segments are obtained; S14, calculating differential entropy characteristics on five frequency bands of 0.5-4 Hz, 4-8 Hz, 8-14 Hz, 14-31 Hz and 31-50 Hz for each time segment to obtain 5-dimensional frequency band characteristics of each channel; and S15, applying linear dynamic system smoothing filtering to the 5-dimensional frequency band characteristics of each channel along the time dimension, and adaptively setting state noise and observation noise covariance based on the frequency band variance to inhibit transient mutation and high-frequency noise, so as to obtain a final three-dimensional input tensor, wherein C is the number of channels, T is the number of time slices, and F=5 is the number of frequency bands.
- 4. The brain electrical emotion recognition method for multi-scale space-time convolution and dynamic region division according to claim 1, wherein the method comprises the following steps of: The specific implementation manner of the step S2 is as follows: S21, mapping input to a feature space by one-dimensional convolution for each frequency band feature; S22, introducing a channel attention mechanism on the mapped features to self-adaptively describe contribution differences of different electrode channels or feature channels to emotion discrimination, specifically, carrying out global average pooling and global maximum pooling on the features by channel attention branches to extract statistical description of channel layers, generating channel weights through point-by-point convolution and nonlinear activation, and carrying out channel recalibration on the original features; s23, further applying a space or time attention mechanism on the basis of the channel attention output, respectively executing global average pooling and global maximum pooling by the space or time attention branches along the channel dimension, and modeling local context relation through convolution operation; S24, setting a plurality of parallel one-dimensional convolution branches with different convolution kernel sizes according to the weighted attention characteristics so as to simultaneously extract short-term fluctuation, medium-term rhythm and longer-range time sequence dependency modes under different receptive field scales; S25, splicing and fusing the output of each scale branch in the channel dimension; s26, sequentially normalizing and SELU activating the fused features; And S27, finally carrying out average pooling along a time dimension to obtain a global multi-scale space-time characteristic representation of the frequency band, and providing a compact and robust input representation for subsequent cross-frequency band fusion and classification prediction.
- 5. The brain electrical emotion recognition method for multi-scale space-time convolution and dynamic region division according to claim 1, wherein the method comprises the following steps of: In step S3 or step S4, a dynamic codebook mechanism is provided, and is used for vector quantizing the edge weights of the dynamic graph and adaptively updating the codeword, where the dynamic codebook mechanism includes: (1) Feature mapping, namely expanding a continuous adjacency matrix obtained by an adaptive graph encoder into an edge weight feature vector according to edges or elements, sequentially carrying out normalization processing and linear mapping on the feature vector, and projecting the feature vector into a representation space with the same dimension as a codebook; (2) Codebook initialization, namely pre-constructing a leachable codebook containing a plurality of codeword vectors, wherein the codewords are obtained through random initialization or data statistics-based initialization at the beginning of training; (3) Nearest neighbor quantization, namely calculating Euclidean distance between each projected edge weight characteristic vector and all codeword vectors, selecting the codeword with the smallest distance as a quantization result, replacing the original continuous edge weight with a corresponding codeword representation in a table look-up mode, and restoring the original continuous edge weight to an adjacent matrix space by decoding a linear layer to realize discretization and sparsification of the edge weight; (4) Quantization loss constraint, namely introducing vector quantization loss containing a reconstruction term and a promise term in the training process, and constraining deviation between the output of the encoder and a corresponding codeword, so that on one hand, quantization error is reduced, and on the other hand, the output of the encoder is promoted to converge towards the selected codeword, thereby stabilizing codebook learning and inhibiting training oscillation; (5) The dynamic code word updating comprises the steps of counting the selected times of each code word in each training batch after the end of the training batch, calculating the code word utilization rate, executing dynamic updating operation on one or a plurality of code words when the utilization rate of the code words is detected to be lower than a preset threshold value, selecting a target code word from the code words with higher utilization rate, cloning the target code word, overlapping random noise, replacing the code words with low utilization rate, and carrying out re-random initialization on the code words with low utilization rate according to the current batch side weight characteristic distribution.
- 6. The brain electrical emotion recognition method for multi-scale space-time convolution and dynamic region division according to claim 1, wherein the method comprises the following steps of: The specific implementation manner of the step S3 is as follows: S31, constructing an adaptive graph encoder in a channel dimension based on multi-scale features, fusing space dimension and frequency band dimension information by applying left-square and right-square linear transformation to feature tensors to obtain multi-frequency band adjacent tensors, and symmetrically normalizing each frequency band adjacent matrix to obtain a full-brain dynamic adjacent matrix set; S32, mapping the normalized adjacent matrix to a low-dimensional representation space, introducing a leavable edge weight codebook, carrying out nearest neighbor quantization on the edge weight, restricting the deviation between the output of the encoder and the code word by using vector quantization loss, counting the utilization rate of each code word, and carrying out self-adaptive updating through strategies such as cloning active code words and random reset when the code word is not used for a long time; s33, constructing a residual space-time attention force diagram convolution module on the quantized global adjacency matrix, performing two-stage diagram aggregation and convolution operation on node characteristics, introducing channel attention and space attention at each stage to perform self-adaptive recalibration, and relieving the overcomplex problem through residual connection and random inactivation to obtain EEG diagram characteristic representation of a global level as a whole brain state coding vector.
- 7. The brain electrical emotion recognition method for multi-scale space-time convolution and dynamic region division according to claim 1, wherein the method comprises the following steps of: The specific implementation manner of the step S4 is as follows: S41, carrying out data-driven automatic clustering on EEG channels on a training set, namely firstly calculating a Pelson correlation coefficient between any two channels, converting a correlation coefficient into a distance and similarity measure, dividing the channels into a plurality of subareas by taking the similarity vector of each channel as a characteristic, and obtaining a regional label of each channel by adopting a clustering algorithm; S42, dividing the whole brain channel into a plurality of sub-graphs according to the regional labels, respectively constructing a local dynamic adjacency matrix and a local dynamic codebook for each sub-graph, constructing a local residual graph attention network, and extracting high-order topology and space-time characteristics in the region; And S43, aggregating the outputs of all the subgraphs to obtain regional level node representation, further constructing a cross-regional rough graph among the regional level nodes, coding the interaction relation among different regions through dynamic graph modeling and residual attention graph convolution similar to S31-S33, extracting global dependency characteristics of local-regional layers, and obtaining a local state coding vector.
- 8. The brain electrical emotion recognition method for multi-scale space-time convolution and dynamic region division according to claim 1, wherein the method comprises the following steps of: the data-driven dynamic region partitioning strategy includes: (1) Based on the brain electrical characteristics of the training set or the current session, calculating the pearson correlation coefficient between any two channels, and constructing a channel correlation matrix; (2) Converting the correlation coefficient into distance and/or similarity measurement, and taking a row of similarity vectors corresponding to each channel as a clustering feature; (3) Clustering all channels by adopting a clustering algorithm, and adaptively dividing the channels into a preset number of functional areas, wherein a clustering label to which each channel belongs is used as a channel area label; (4) Dividing a channel into a plurality of sub-graphs according to the channel region label, constructing a local dynamic adjacency matrix in each sub-graph, and respectively carrying out local graph convolution or graph annotation force coding on each sub-graph to obtain a corresponding region-level local feature representation.
- 9. The brain electrical emotion recognition method for multi-scale space-time convolution and dynamic region division according to claim 1, wherein the method comprises the following steps of: The cross-attention module comprises a bi-directional cross-attention structure, which specifically comprises: the first cross attention branch takes the global feature representation obtained in the step S3 as a query vector, takes the local feature representation obtained in the step S4 as a key vector and a value vector, and injects local fine granularity information into the global representation through multi-head attention calculation to obtain a first fusion feature; The second cross attention branch takes the local feature representation obtained in the step S4 as a query vector, takes the global feature representation obtained in the step S3 as a key vector and a value vector, and feeds back global context information to each local area through multi-head attention calculation to obtain a second fusion feature; And respectively applying residual connection and layer normalization to the first fusion feature and the second fusion feature, and obtaining final fusion feature representation through a gating weighting sum or feature splicing mode.
Description
Electroencephalogram emotion recognition method based on multi-scale space-time convolution and dynamic region division Technical Field The invention relates to the field of brain-computer interfaces and emotion calculation, in particular to an electroencephalogram emotion recognition method based on multi-scale space-time convolution and dynamic region division. Background Along with the continuous increase of social pressure and the acceleration of life rhythm, various emotion problems are more prominent, and timely detection of emotion states has become an important requirement to be solved. In recent years, emotion recognition based on electroencephalogram (EEG) is becoming an important way for describing the intrinsic nerve process of emotion and a research hotspot in the fields of brain-computer interfaces and emotion calculation due to the advantages of high time resolution, non-invasiveness, insensitivity to external camouflage and the like. The EEG signal has the advantages of high time resolution, low acquisition cost and the like, can reflect the electric activity change of the brain under different emotion states, and is widely applied to the scenes of mental health monitoring, man-machine interaction, self-adaptive recommendation and the like. The traditional EEG emotion recognition method generally extracts manual characteristics such as Power Spectrum Density (PSD), differential Entropy (DE) and the like after preprocessing steps such as band-pass filtering, re-referencing and the like, and models the manual characteristics by combining with classifiers such as Support Vector Machines (SVM), naive Bayes, K nearest neighbors and the like. The method can achieve a certain effect under the condition of small scale and same tested scene, but has insufficient robustness to complex noise environment, cross-session and cross-tested scene, and is difficult to fully mine potential high-order association between multiple channels and multiple frequency bands. In recent years, deep learning methods have been introduced into the field of EEG emotion recognition, where Convolutional Neural Networks (CNNs), cyclic neural networks (RNNs/LSTMs), time-series convolutional networks (TCNs), and other models have been used to automatically extract time-series features, and where graph neural networks and Graph Convolutional Networks (GCNs) have been used to construct electroencephalogram structures with electrodes as nodes and spatial or functional connections as edges, so as to model the association between brain regions in the graph domain. However, the following problems are common in the prior art: (1) The multi-scale space-time information is underutilized, namely EEG signals have obvious non-stationarity and multi-scale characteristics, and dynamic modes on different frequency bands and different time scales have different contributions to emotional states. In the existing method, partial work only carries out convolution or recursion modeling on a single time scale or a fixed frequency band, the joint description of multi-frequency band and multi-time scale space-time characteristics is lacked, and the short-time transient change and long-time emotion evolution process are difficult to capture in time; (2) Brain region segmentation lacks adaptivity-brain region-based modeling methods typically employ anatomical regions or artificial empirical segmentation to roughly divide the electrodes into functional regions. The fixed division mode does not fully consider the functional connection difference of a sample level and a tested level, and has limited adaptation capability under different data sets and different session conditions; (3) The global and local information fusion is insufficient, part of methods focus on the overall graph structural modeling of the whole brain level, neglect the local key brain area and the fine granularity difference thereof, and other methods focus on the local area or the sub-graph, and although the local discrimination capability is improved, the unified modeling on global topology dependence is lacking, so that the expression capability and generalization performance of the model in complex emotion tasks are still limited. In summary, the existing EEG emotion recognition technology still has the defects in the aspects of multi-scale space-time feature modeling, dynamic diagram structure constraint, self-adaptive brain region division, global-local information fusion and the like, and an electroencephalogram emotion recognition method capable of simultaneously considering multi-scale space-time convolution features, dynamic region division and global-local topological relation collaborative modeling is urgently needed, so that emotion recognition accuracy and cross-session and cross-tested robustness are improved. Disclosure of Invention In order to solve the technical problems, the invention provides a brain electricity emotion recognition method for multi-scale space-time c