CN-116167435-B - Double-teacher sleep stage characteristic migration method based on knowledge distillation and domain self-adaption

CN116167435BCN 116167435 BCN116167435 BCN 116167435BCN-116167435-B

Abstract

A sleep stage feature migration method for double teachers based on knowledge distillation and domain self-adaption belongs to the field of signal processing and pattern recognition. Firstly, preprocessing sleep electroencephalogram and electrooculogram signals to obtain a plurality of multi-mode sleep signal data samples. And extracting time-frequency characteristics by using Morlet wavelet transformation with different resolutions for each channel contained in each sample of the source domain and target domain data in sequence, and then inputting the source domain teacher and the target domain teacher for pre-training. When training and optimizing students, two teachers freezing the feature extractor are introduced to conduct guidance, and students are restrained from learning general features of source domains and target domains and domain specific features of the target domains. Experiments prove that the model provided by the invention fully utilizes the characteristics of data to perform characteristic migration, can obtain good effect when the data volume of a target domain is small, and can effectively solve the problem that the accuracy rate of the existing automatic sleep stage method is reduced when the existing automatic sleep stage method faces a new data set.

Inventors

DUAN LIJUAN
ZHANG YAN

Assignees

北京工业大学

Dates

Publication Date: 20260512
Application Date: 20230222

Claims (4)

1. A method for migrating sleep stage characteristics of double teachers based on knowledge distillation and domain self-adaption is characterized by comprising the following steps: Step 1, acquiring sleep physiological signals of a source domain and a target domain and preprocessing data; The method comprises the steps of obtaining public sleep data as a source domain sample and data of a data set to be tested as a target domain sample, assuming that two groups of source domain training data SourceTrainData and SourceTrainData2, two groups of target domain training data TARGETTRAINDATA and TARGETTRAINDATA and one group of target domain test data TARGETTESTDATA exist for the method framework, and according to a polysomnography, the method comprises the steps of Selecting a sleeping electroencephalogram channel signal and an oculogram channel signal from the original signal samples of the signal channels, and then sampling the selected signals at a rate of Is downsampled to sleep data of (a) Next using a sliding window of 30 seconds length to divide the data into N pieces of sample data each containing Each channel has a data length of Then the sliced data matrix , Is of the size of ; Step 2, extracting time-frequency characteristics of sample data; for any one of the data segments in step 1 Comprises The number of channels in the channel is the same, And data for each of its channels Using a plurality of frequencies, i.e. 1Hz-30Hz every 0.5Hz as the centre frequency Is used for expressing local characteristics of signals by Morlet continuous wavelet transformation so as to extract corresponding time-frequency characteristics Wherein A channel representing the current sample data, and simultaneously extracting time-frequency characteristics One is reserved at intervals of 50 to reduce the data dimension, and then one data fragment extracts time-frequency characteristics Dimension is Time-frequency characteristics of N data fragments divided in step 1 Dimension is Extracting time-frequency characteristics from the training data and the test data in the step 1 to obtain a time-frequency matrix , , , , ; Step 3, training of a double-teacher feature extractor; The training set is sent into the deep learning network of the teacher as the input of the teacher network to train by the time-frequency matrix extracted in the step 2, two teacher models are designed, namely a source domain teacher and a target domain teacher, and the time-frequency matrix of the source domain training data Real tag fed to source domain teacher and using the training data to train feature extractor of source domain teacher Time-frequency matrix of target domain training data Feature extractor for feeding target domain teacher and training target domain teacher using real tag of training data ; Step 4, knowledge distillation and domain self-adaptive feature migration; In the training process of the student model, the feature extractor of the teacher model trained in the step 3 is imported And The method comprises the steps of freezing parameters of the student model, not updating the parameters in the training process of the student model, extracting features of input by using prior knowledge of the teacher model during training, optimizing the student model by using loss, dividing training of the student model into two parts, and firstly, performing time-frequency matrix of source domain data extracted in the step 2 As training input to the student model, followed by a time-frequency matrix of target domain data Input data are input to the student model and the two teacher models at the same time, then feature knowledge extracted by a middle layer of a teacher network feature extractor is migrated to the student network, and the feature migration process uses distillation loss to restrict the student network to learn domain general features of a source domain and a target domain and domain specific features of the target domain to the two teacher models; Step 5, classifying the target domain sample by the student model: And 2, carrying out time-frequency characteristics obtained by the step 2 on the target domain sample to be detected Inputting the student model trained in the step 4, The characteristics extracted by the characteristic extractor of the student network are sent into a linear classifier with the neuron number of 5 to obtain a classification result.
2. The method for migrating sleep stage characteristics of a double teacher based on knowledge distillation and domain adaptation according to claim 1, wherein the steps of obtaining and preprocessing original electroencephalogram and electrooculogram signals of a source domain and a target domain in step 1 comprise the following steps: Data sample partitioning, assuming two sets of source domain training data SourceTrainData and SourceTrainData for the method framework, two sets of target domain training data TARGETTRAINDATA1 and TARGETTRAINDATA, one set of target domain test data TARGETTESTDATA, a set of target domain test data for the target domain training data, and a set of target domain test data for the target domain training data, wherein the set of target domain test data comprises the target domain training data according to a polysomnography The original data of each signal channel is selected to form data by a sleep brain electric channel signal and an eye electric channel signal Wherein the data of each channel uses a sampling rate Continuous acquisition time After that, the dimension of the data is , The length of the time series of data, , Varying from person to person for about 8 hours of sleep; The values collected in different laboratories are 100Hz,1000Hz,128Hz and 256Hz, and then the signal samples with the data collection frequency of more than 100Hz are collected every other time Downsampling to a point by extracting a point Hz, then dividing the data using sliding windows of length 30 seconds, non-overlapping, to obtain a number N of sub-segment sample data, The segmentation process is expressed as And for SourceTrainData and SourceTrainData2, after treatment And For TARGETTRAINDATA and TARGETTRAINDATA2, the treatment was followed to obtain And For TARGETTESTDATA, obtain after treatment , For divided sub-segment sample data, the size is I.e. each sub-segment of sample data contains two channels, each channel data size 3000.
3. The method for migrating the sleep stage characteristics of the double teachers based on knowledge distillation and domain adaptation according to claim 1, wherein the training method of the double teacher characteristic extractor in the step 3 comprises the following steps: Two teacher models are designed for guiding the study of student models, wherein the two teacher models have the same structure and adopt ResNet110,110, but the training data are different, the two teacher models are composed of a feature extractor and a linear classifier, and the feature extractor is composed of a convolution kernel with the size of Is composed of a 2D convolution layer, a batch normalization layer, a ReLU activation layer, and three residual modules, wherein each residual module is composed of 18 residual blocks ResBlock, and each residual block is composed of a convolution kernel with the size of A batch normalization layer, a ReLU activation layer, a convolution kernel of size The classifier consists of a linear classification layer with the average pooling and the output neuron number of 5; 1) Training of source domain teachers Source domain CWT time-frequency data processed by step 2 Respectively taking data with minibatch size of 128 as input of a source domain teacher, and training the model by combining the real label of the input data through a characteristic extractor and a classifier of the teacher in sequence, wherein the training is performed by setting 200 iterations; 2) Training of target domain teachers Target domain CWT time-frequency data processed by step 2 Randomly scrambling, then respectively taking data with minibatch size of 128 as input of a target domain teacher model, and training the model by combining a real label of the input data through a characteristic extractor and a classifier of the teacher in sequence, wherein 200 iterations are set during training.
4. The method for migrating sleep stage characteristics of a double teacher based on knowledge distillation and domain adaptation according to claim 1, wherein the knowledge distillation and domain adaptation characteristic migration in step 4 comprises the following steps: Importing two teacher models trained in the step 3, freezing parameters of the teacher models, and not updating the parameters in the training process of the student models, wherein a feature extractor of the student models consists of 2 residual blocks ResBlock except each residual module, the design of the other convolution layers and batch processing layers is the same as that of the teacher models, and wavelet time-frequency feature data obtained in the step 2 And The feature extractor and the classifier are respectively input into a student network and are also input into two teacher models, and the two teacher models observe and restrict the output of the student models; when the source domain teacher and the student model are used as input, the source domain teacher uses the similarity loss to form a characteristic distillation method for training the compression model Constrained compressed student model Xiang Yuan domain teacher learns the feature representation of source domain, target domain teacher uses differential loss Constraining the student model to perform target domain specific feature distillation and learning the feature representation of the target domain in the source domain data; As input, the source domain teacher will use the similarity penalty Constrained compressed student model performs domain generic feature distillation learning of feature representations in target domain features that are similar to source domain features, and additionally, target domain teachers still use differential loss Constraint that the general domain features extracted from student model are different from the target domain features, and similarity loss And differential loss The feature distillation process for extracting domain common features and target domain specific features by the student model is specifically as follows: 1) Target domain specific feature distillation Target domain specific feature distillation is conducted by a target domain teacher to guide a student model; When inputting the target domain teacher, the target domain teacher will use the trained convolution layer parameters to extract the key features of the target domain from the input, and then use the differential loss The method comprises the steps of enabling source domain features extracted by students to be different from key features of a target domain, enabling a student model to extract domain specific features of the target domain, enabling design of a loss function to comprise feature transformation and feature distance measurement, and enabling feature transformation and feature distance measurement processes of a target domain teacher and the students to be as follows: when a target domain teacher is used for guiding student model training, a feature coding module is used Then extracting the wavelet time-frequency characteristics, reducing the dimension, and measuring the distance, wherein the characteristic coding module Consists of Input Embedding, multi-Head addition, add & Norm, feed Forward, add & Norm, reduce & Norm Input Embedding with a convolution kernel And a batch layer, and a layer of flame layer to pull the data into a vector; multi-Head Attention consists of a linear layer, a Dropout layer, a softmax function; The feature extractor passing through the target domain teacher and the student model will obtain the size of In (2), wherein For one minibatch of size 128 of input, For the output channel of the convolution layer in each residual module of the teacher model, three residual modules Taking 16,32,64, D and E as characteristic dimensions extracted by residual modules respectively, taking (60, 60), (30, 30), (15, 15) respectively for D and E of three residual modules respectively, and letting And Representing the third residual modules in the teacher network of the student and the target domain respectively The features extracted by the individual residual modules, , And Input feature encoding module Then, firstly carrying out Input Embedding sequence coding to prevent the information which appears earlier along with the loss of a sequence growth model, then merging the position information into a Multi-Head position structure to form relative position coding, dividing the model into 4 heads to form 4 subspaces for the model to pay Attention to the characteristic information of different positions, then inputting the value of a residual block to Forward spread through Feed Forward, then reducing the data dimension through calculating the average value of the residual block and the last second dimension of the data by the reduction & Norm, and carrying out characteristic representation after the coding module to obtain the data dimension And ; Then the student model and the target domain teacher use the differential loss after passing through the coding module Constraining students to learn specific features of a target domain, selecting the position of a feature distance metric before the last ReLU activation layer in each residual module, and enlarging a feature gap by using an orthogonal function by using a difference loss, wherein the specific expression is as follows: ; Wherein the method comprises the steps of Is F norm, i.e. for a matrix , Is that The sum of squares of each element of (c) and, similar to the similarity loss, Represents the number of residual modules, Taking 3; And Is the first The size extracted by each residual error module is High-order wavelet time-frequency characteristic passing characteristic coding module Representation of features after feature transformation, wherein Taking 16,32,64, d and E respectively at 3 residual modules and (60, 60), (30, 30), (15, 15) respectively at three residual modules; 2) Domain general feature distillation The student extracts the general characteristics of the source domain and the target domain, the characteristic distillation is guided by a teacher model of the source domain, and the training student model is used for obtaining the wavelet time-frequency characteristics obtained in the step 2 As input, continuing training the student model to extract domain common features of the source domain and the target domain; When input to the source domain teacher, the source domain teacher will use its trained convolutional layer parameters to extract the key features of the source domain from the input and then use The method comprises the steps of constraining the characteristics of a target domain extracted by a student to be identical to the characteristics of the target domain, so that the student model extracts domain general characteristics of a source domain and the target domain, designing a loss function, wherein the design of the loss function comprises characteristic transformation and characteristic distance measurement, and the characteristic transformation and characteristic distance measurement processes of a source domain teacher and the student model are as follows: when the source domain teacher directs the training of the student model, Higher-order features output by one residual module through feature extractor of student model and source domain teacher model And Is a high-dimensional data feature with four dimensions, The size is Wherein Minibatch representing an input of size 128, For the output channel of the convolution layer in each residual module of the student model, D and E are the feature dimensions extracted by the residual modules, three residual modules Taking 16,32,64, D and E respectively (60, 60), (30, 30), (15, 15), constraining the residual modules of the student model using the features extracted by each residual module of the source domain teacher model, using an output channel as Feature transformation module with convolution kernel of 1×1 convolution composition High-order features enabling student model extraction Feature alignment; thereafter, the intermediate features of the teacher and the student features after the feature transformation use similarity loss The method comprises the steps of obtaining a target domain, constraining students to extract domain general features of the source domain and the target domain, selecting the position of a feature distance metric still before the last ReLU activation layer in each residual module, calculating the feature distance by using Margin-ReLU, wherein Margin-ReLU is a negative feature before removing the ReLU, and for the negative feature of a teacher, if the value of the student is smaller than the negative value of the teacher, calculating the feature distance to be 0 because the ReLU blocks the negative number no matter how much the value of the negative number is, wherein the loss function is expressed as follows: ; Wherein the method comprises the steps of And Respectively refer to the first of source domain teacher and student feature extractor Higher order data features of the wavelet time-frequency features extracted by the residual modules, Dimension is Three residual modules 16,32,64 Respectively taking; refers to transforming the features-1*1 convolutionally, Represents the number of residual modules, Taking 3; For calculating And Part of (2) The specific expression of the distance is as follows: ; Wherein the method comprises the steps of Corresponding to I.e. Data channels representing features, D and E are feature dimensions extracted by the residual module, Taking 1,2,3 16,32,64, D and E are taken respectively (60, 60), (30, 30), (15, 15), The length D of the feature is indicated, The width E of the feature is indicated, Is characterized by And Is the first of (2) The value of the characteristic is a value of, 。

Description

Double-teacher sleep stage characteristic migration method based on knowledge distillation and domain self-adaption Technical Field The invention relates to an electroencephalogram signal processing technology, feature extraction and deep learning, belongs to the technical fields of signal processing, pattern recognition and the like, and particularly relates to a double-teacher sleep stage feature migration method based on knowledge distillation and domain self-adaption. Background The task of sleep staging is to determine the sleep state of a person from polysomnography recorded during the sleep process, and analysis of polysomnography is a time-consuming, labor-intensive process and prone to human error due to the long-time continuous assessment required. In recent years, deep learning methods have been developed that rely on the availability of large amounts of marker data for training. Many studies propose automatic classification of sleep stages using a main signal containing features of each stage, and these techniques are classified into three categories, 1) artificial feature extraction using an automatic decision algorithm, 2) application of the extracted features to a deep learning method 3) end-to-end training using a deep learning method, including convolutional neural networks or recurrent neural networks. However, there are many sleep laboratories that use manual scoring, mainly because there are large differences between common training data and data generated by the sleep laboratory, and these differences are often caused by domain shift problems caused by different data distributions of training (source) and test (target) data due to different acquisition devices, channels, environments, etc. The domain shift problem results in models trained on the source domain not achieving good results on the target domain. Therefore, an effective migration learning method is needed to solve the problem of the decline of sleep stage accuracy when the model migrates to different domains. The problem that the application of the transfer learning to the sleeping aspect solves is that the problem possibly exists among the same data sets, namely the problem of reduced precision occurs when a sleeping model is compatible between the data distribution differences caused by different reasons such as facing new data or processing the quantity and position electroencephalogram channels, sampling frequency, experiment protocols and tested included in practice. These studies fine-tune the model to service each subject, with each subject having data recorded twice at night, first night for fine-tuning the model and second night for evaluation. The application of transfer learning to sleep stage models also brings the study of sleep stage algorithms into a new stage. However, the method of fine tuning requires a large number of source data sets to improve generalization ability, and fine tuning uses data pieces of target domain data to be measured in order to improve robustness of the deep learning model to individual differences. The model trained by the method can have the problem of weak generalization of the model when being directly tested in the face of new data. In the machine learning problem, two domains with different data distributions are adopted, but when the tasks of the two domains are the same, a domain self-adaptive method can be used for migrating a model with higher accuracy of source domain data training to a target domain with less data. The domain adaptive method can overcome the problem that a large number of source data sets are required to improve generalization capability, and a method for performing domain alignment by using a completely shared model is proposed, which solves the problem from the aspect of domain sharing characteristics, but may lose information of a specific domain in the characteristic extraction process. The knowledge distillation method can be regarded as a special case of transfer learning, is a supervised learning mode, and refers to a method for helping a training process of a smaller network (student) under the supervision of a larger network (teacher), and can simplify a student model and then improve the performance and generalization capability of the student model. A common training method for knowledge distillation is to let the student model fit the probabilities of each category output by the teacher model, similar to training students with ground truth, ignoring the importance of the intermediate process. The characteristic distillation method can increase the amount of information transmitted in the distillation process so as to improve the performance of the student network. The current domain-adaptive transfer learning method is based on unsupervised, and ignores the tag information of the data. The invention uses the method combining domain self-adaption and characteristic distillation to perform characteristic migration, can well learn the characterist