CN-121997205-A - Ternary eye movement classification method, system and equipment based on optimized time convolution
Abstract
The invention discloses a ternary eye movement behavior classification method, system and equipment based on optimized time convolution, and relates to the field of artificial intelligence and man-machine interaction. The method comprises the steps of receiving eye movement data and extracting speed, direction and other characteristics under multiple time scales as a basic set, constructing an optimized time convolution network integrating SENet channel attention, multi-head attention and residual connection, capturing time sequence characteristics, training a model by utilizing a focus loss function and a label smoothing technology, and outputting a classification result. According to the invention, by introducing a multi-time scale feature extraction and dual-attention mechanism, the accuracy and robustness of eye movement classification are obviously improved, and the recognition difficulty caused by data complexity and sample imbalance of special people is effectively solved.
Inventors
- CHEN YU
Assignees
- 东南大学
Dates
- Publication Date
- 20260508
- Application Date
- 20260205
Claims (10)
- 1. The ternary eye movement behavior classification method based on the optimized time convolution is characterized by comprising the following steps of: receiving eye movement behavior data, and extracting characteristics including speed, direction, acceleration, displacement and standard deviation under multiple time scales as a basic characteristic set; an optimized time convolution network model is constructed, and the optimized time convolution network model comprises a SENet-channel attention module, a multi-head attention module and residual connection, and is used for capturing time sequence data; The SENet channel attention module performs compression, excitation and assignment operation on eye movement data, dynamically adjusts channel weights to obtain weighted output characteristics, and extracts different characteristic space information through parallel independent attention heads, and splices output of each head to obtain the output characteristics of multiple heads of attention; Based on the basic feature set, training the optimized time convolution network model by adopting a focus loss function and a label smoothing technology to obtain a trained optimized time convolution network model; And inputting the eye movement data characteristics to be identified into the trained optimized time convolution network model, and outputting an eye movement behavior classification result.
- 2. The method for classifying three-dimensional eye movement behaviors based on optimized time convolution as claimed in claim 1, wherein the eye movement data comprises basic feature sets including speed, direction, acceleration, displacement and standard deviation features, and the acquisition mode is as follows: A step-by-step incremental time scale strategy is adopted, a basic feature set is calculated on a plurality of different time steps respectively so as to capture a long-time dependency relationship, each sample consists of a fixed context window, and overlapping cutting modes are adopted among sequences, namely overlapping data exists among adjacent sequences; 5 features of each sample were extracted according to time steps using Python3, and features of eye movement data were extracted stepwise over 8 time scales, capturing behavior patterns during long-term eye movement.
- 3. The method for classifying three-dimensional eye movement behaviors based on optimized time convolution as claimed in claim 2, wherein in the compression operation of the SENet-channel attention module, all pixels of each channel are subjected to global average pooling, and the size is equal to that of each channel Is compressed into a scalar quantity : In the formula, A weight coefficient is generated for each channel, To input the characteristic diagram at the first The data on the individual channels is then used to determine, Is of a size.
- 4. The method for classifying three-dimensional eye movement behaviors based on optimized time convolution of claim 3, wherein in the excitation operation of the SENet-channel attention module, an excitation formula is: Wherein, the Representing global characteristics of a channel obtained by compressing an input feature map and converting the input feature map into weights of one channel First using a full connection layer Reducing the number of channels by using nonlinear functions Introducing nonlinear relationship, and passing through a full connection layer Restoring the number of channels using Sigmoid function Compressing the output value to (0, 1) space to obtain attention weight Wherein Representing a transformation of a two-layer fully connected network.
- 5. The method for classifying three-dimensional eye movement behaviors based on optimized time convolution as recited in claim 4, wherein the SENet channel attention module assigns: Wherein the method comprises the steps of Representing the weight generated by the channel attention mechanism, readjusting the value of each channel of the input feature map, and combining the original feature map And the calculated channel attention weights Will be By passage to Finally, the characteristic diagram weighted by the attention of the channel is obtained.
- 6. The three-dimensional eye movement behavior classification method based on optimized time convolution of claim 5, wherein the multi-head attention module extracts different feature space information through parallel independent attention heads, and splices the output of each head to obtain the output feature of the multi-head attention, and the three-dimensional eye movement behavior classification method is specifically as follows: (61) First, input characteristic matrix Split into multiple subspaces, each subspace having dimensions of Wherein Is the number of attention tips and, Is the dimension of the input feature; (62) Respectively performing linear transformation on the input features to obtain a query matrix Key matrix Sum matrix Wherein 、 、 The training parameter matrix is a training parameter matrix which needs to be learned for optimizing the time convolution network model; (63) Each head independently calculates the attention weight and performs weighted summation: Wherein, the Is a scaling factor, and prevents unstable gradients caused by overlarge values; (64) Finally, the outputs of all the attention heads are spliced together, and the final output is obtained through linear transformation: Wherein, the Represent the first The output of the individual attention heads, Is a matrix of parameters of the output transform, For matrix longitudinal splicing operation Different attention layers Is spliced together to form a feature vector 。
- 7. The method for classifying the three-dimensional eye movement behaviors based on the optimized time convolution according to claim 5, wherein the method is characterized in that a focus loss function is combined with a label smoothing technology to train an optimized time convolution network model, and is specifically as follows: (71) The Focal Loss introduces a modulation factor before cross entropy Loss to reduce the Loss weight of the sample easy to classify, and the formula is as follows: Wherein, the For the Focal Loss modulation factor, Is a standard cross entropy loss, in which For a distribution of the true class, For the model to predict the probability of a class, i.e. Softmax output, To adjust the factor, the influence of the easily classified sample on the loss is controlled, Is the total number of samples; (72) Introducing a label smoothing technology, and calculating a loss function through smoothing the distribution of the real labels: Wherein, the Is a smoothing coefficient, takes a value of 0.1 or 0.01, For the number of categories to be considered, Is a one-hot encoded tag.
- 8. The method for classifying three-dimensional eye movement behaviors based on optimized time convolution of claim 7, wherein the classification result comprises five categories of no-tag, gazing, glancing, smooth tracking and noise, and the optimized time convolution network model outputs probabilities of five categories of no-tag, gazing, glancing, smooth tracking and noise through a full connection layer and Softmax function.
- 9. A ternary eye movement behavior classification system based on optimized time convolution for implementing the ternary eye movement behavior classification method based on optimized time convolution of any one of claims 1-8, comprising: The data receiving module is used for receiving the speed, direction, acceleration, displacement and standard deviation characteristics of the eye movement data as a basic characteristic set; The model construction module is used for constructing an optimized time convolution network model, and the optimized time convolution network model comprises a SENet-channel attention module, a multi-head attention module and residual connection, and is used for capturing time sequence data; The SENet channel attention module performs compression, excitation and assignment operation on eye movement data, dynamically adjusts channel weights to obtain weighted output characteristics, and extracts different characteristic space information through parallel independent attention heads, and splices output of each head to obtain the output characteristics of multiple heads of attention; the model training module is used for training the optimized time convolution network model by adopting a focus loss function and a label smoothing technology based on the basic feature set to obtain a trained optimized time convolution network model; the recognition output module is used for inputting the eye movement data characteristics to be recognized into the trained optimized time convolution network model and outputting eye movement behavior classification results.
- 10. A terminal device comprising a memory, a processor and a computer program stored in the memory and capable of running on the processor, characterized in that the processor implements a three-way eye movement classification method based on an optimized time convolution as claimed in any one of claims 1 to 8 when the computer program is loaded and executed by the processor.
Description
Ternary eye movement classification method, system and equipment based on optimized time convolution Technical Field The invention relates to the technical field of artificial intelligence and man-machine interaction, in particular to a ternary eye movement behavior classification method, system and equipment based on optimized time convolution. Background Eye movement behavior is an important physiological signal reflecting human cognitive state, visual attention and intention, and has wide research and application values in the fields of psychology, cognitive science, neuromedical diagnosis, human-computer interaction and the like. By automatically and accurately classifying basic events such as gazing, glancing, smooth tracking and the like in the eye movement track, a key basis can be provided for further behavior analysis, state evaluation and interactive system development. Traditional eye movement classification methods rely mainly on hand design features (e.g., speed, acceleration thresholds) in combination with machine learning algorithms (e.g., support vector machine, decision tree). The method is highly dependent on the experience of field experts to define the characteristics and the threshold value, and has limited characterization capability, so that the model generalization capability is insufficient, and the method is difficult to adapt to individual differences and eye movement pattern changes in complex scenes. In recent years, deep learning techniques have achieved significant results in time series data analysis. As a deep learning model capable of capturing long-term dependency of time series data, a Time Convolution Network (TCN) shows great potential in an eye movement behavior classification task, however, a traditional TCN model has limitation in modeling dynamic characteristics of different time scales in an eye movement signal at the same time, and it is difficult to fully capture complete time context information covered from instantaneous change to continuous tracking behavior, so that complicated eye movement modes, particularly smooth tracking of events with continuity and dynamic change characteristics, have limited recognition capability, and limit performance of the complex eye movement modes in a ternary eye movement behavior classification task. Therefore, the invention provides a ternary eye movement behavior classification method, a ternary eye movement behavior classification system and ternary eye movement behavior classification equipment based on optimized time convolution. Disclosure of Invention The invention aims to provide a ternary eye movement behavior classification method, a ternary eye movement behavior classification system and ternary eye movement behavior classification equipment based on optimized time convolution, which are not only suitable for general eye movement behavior classification, but also aim to solve the analysis problem of special data of the deaf person through an optimized time convolution network, and provide a basis for future auxiliary technology development. According to the first aspect of the invention, in order to achieve the above object, the invention provides a three-dimensional eye movement behavior classification method based on optimized time convolution, comprising the following steps: receiving eye movement behavior data, and extracting characteristics including speed, direction, acceleration, displacement and standard deviation under multiple time scales as a basic characteristic set; an optimized time convolution network model is constructed, and the optimized time convolution network model comprises a SENet-channel attention module, a multi-head attention module and residual connection, and is used for capturing time sequence data; The SENet channel attention module performs compression, excitation and assignment operation on eye movement data, dynamically adjusts channel weights to obtain weighted output characteristics, and extracts different characteristic space information through parallel independent attention heads, and splices output of each head to obtain the output characteristics of multiple heads of attention; Based on the basic feature set, training the optimized time convolution network model by adopting a focus loss function and a label smoothing technology to obtain a trained optimized time convolution network model; And inputting the eye movement data characteristics to be identified into the trained optimized time convolution network model, and outputting an eye movement behavior classification result. Further, the eye movement data comprises a basic feature set including speed, direction, acceleration, displacement and standard deviation features, and the acquisition mode is as follows: A step-by-step incremental time scale strategy is adopted, a basic feature set is calculated on a plurality of different time steps respectively so as to capture a long-time dependency relationship, each sample consists