CN-121565504-B - Epilepsy prediction system based on multi-mode biological image and image data processing method
Abstract
The invention relates to the technical field of intelligent medical treatment, in particular to an epileptic prediction system based on a multi-mode biological image and an image data processing method. According to the invention, firstly, the multi-mode images are collected, then the characteristics of the multi-mode images are respectively extracted, then the attention pooling mechanism is adopted to aggregate image information, and then the characteristic vectors of different modes are weighted and fused through the fusion module based on the gating attention mechanism to generate uniform multi-mode characteristic vectors which are used as the input of the prediction module, so that the epilepsy prediction can be carried out based on the multi-mode characteristic vectors. The invention fully utilizes the complementary characteristics of the eye image, the tongue image and the face image through the fusion of the multi-mode information, remarkably improves the accuracy and the robustness of the prediction, and has good clinical application prospect.
Inventors
- CHEN LEI
- LI YULONG
Assignees
- 四川大学华西医院
Dates
- Publication Date
- 20260512
- Application Date
- 20260126
Claims (9)
- 1. An epileptic prediction system based on multi-modal biological images, comprising: The system comprises a data acquisition module, a first image acquisition module and a second image acquisition module, wherein the data acquisition module is used for acquiring multi-mode image data, the multi-mode image data comprises an eye image set of seizure intervals of an epileptic patient, the first image comprises a binocular plurality of designated areas, and the first image comprises a visible light image; the epileptic seizure interval tongue image set comprises a first image and an infrared image of a tongue face and a first image and an infrared image of a sublingual face; The feature extraction module is configured to perform feature extraction on the eye image set, the tongue image set and the face image set respectively to obtain an eye feature vector, a tongue feature vector and a face feature vector; The weighting aggregation module is configured to aggregate the eye feature vector and the tongue feature vector by using an attention weighting aggregation mechanism to obtain an eye aggregation feature vector and a tongue aggregation feature vector; The fusion module is configured to perform fusion processing based on the eye aggregation feature vector, the tongue aggregation feature vector and the facial feature vector, obtain the fusion weight of each mode through a gating attention mechanism, and perform fusion processing based on the fusion weights to obtain a joint feature vector; The prediction module is configured to input the joint feature vector into a pre-trained epilepsy prediction model to output an epilepsy prediction result, wherein the epilepsy prediction model is trained by using category weighted cross entropy loss, and the category weighted cross entropy loss specifically comprises the following steps: ; Wherein, the Class weights for epileptic classes; Class weights for non-epileptic classes; Is the true label of the sample, and y is {0,1}; The prediction probability is output for the model; The category weights are obtained as follows: ; Wherein, the Representing category weights; representing the total number of samples in the training set; representing the number of samples of the kth type.
- 2. The epileptic prediction system based on multi-modal biological images according to claim 1, wherein the feature extraction module specifically comprises: a first feature extraction unit configured to input RGB channels of the tongue image set/the face image set into a pre-trained ResNet model, and extract feature vectors of the RGB channels; a second feature extraction unit configured to input the infrared channels of the tongue image set/the face image set into a randomly initialized feature extraction network, and extract feature vectors of the infrared channels; And the feature fusion unit is configured to fuse the feature vectors of the RGB channels and the feature vectors of the infrared channels of the tongue image set and the face image set respectively, and generate fused tongue feature vectors and face feature vectors.
- 3. The multi-modal biological image based epileptic prediction system according to claim 1, wherein the attention weighted aggregation mechanism is specifically: Computing an attention weight for each feature vector by an attention pooling mechanism : ; Wherein h i is the feature vector of the ith image, h j is the feature vector of the jth image, w is a learnable weight vector for calculating the similarity score of each feature vector, V is a learnable weight matrix for mapping the feature vector to a hidden space, tanh (·) is an activation function, and n is the number of images; and then obtaining an aggregated global feature representation through weighted summation: ; where h MIL is the aggregated global feature representation.
- 4. The epileptic prediction system based on multi-modal biological images according to claim 1, wherein the fusion module specifically comprises: A weight processing unit configured to input feature vectors of each modality into a gated attention network comprising at least one linear layer and a Sigmoid activation function to generate weights ranging between 0-1; The normalization unit is configured to normalize the generated weights so that the sum of the weights of all modes is 1; and the feature fusion unit is configured to perform weighted summation on the feature vector of each mode by using the normalized weight to generate a multi-mode fusion feature vector.
- 5. An image data processing method, characterized by comprising the steps of: Acquiring multi-mode image data, wherein the multi-mode image data comprises an eye image set of a seizure interval of an epileptic patient, a tongue image set of the seizure interval of the epileptic patient, a face image set of the epileptic patient, and a face image set of the epileptic patient, wherein the eye image set comprises a first image of a binocular plurality of designated areas; Extracting features of the eye image set, the tongue image set and the face image set respectively to obtain eye feature vectors, tongue feature vectors and face feature vectors; Respectively carrying out aggregation treatment on the eye feature vector and the tongue feature vector by using an attention weighted aggregation mechanism to obtain an eye aggregation feature vector and a tongue aggregation feature vector; Fusion processing is carried out on the basis of the eye aggregation feature vector, the tongue aggregation feature vector and the facial feature vector, fusion weight of each mode is obtained through a gating attention mechanism, and a joint feature vector is obtained through fusion processing on the basis of the fusion weight, wherein the joint feature vector is used as input data of an epileptic prediction model to carry out epileptic prediction, the epileptic prediction model is obtained by training by using category weighted cross entropy loss, and the category weighted cross entropy loss specifically comprises: ; Wherein, the Class weights for epileptic classes; Class weights for non-epileptic classes; Is the true label of the sample, and y is {0,1}; The prediction probability is output for the model; The category weights are obtained as follows: ; Wherein, the Representing category weights; representing the total number of samples in the training set; representing the number of samples of the kth type.
- 6. The method of claim 5, wherein the feature extraction of the tongue image set and the face image set comprises: Inputting RGB channels of the tongue image set/the face image set into a pre-trained ResNet model, and extracting feature vectors of the RGB channels; inputting the infrared channel of the tongue image set/the face image set into a random initialized feature extraction network, and extracting the feature vector of the infrared channel; and respectively fusing the feature vectors of the RGB channels and the feature vectors of the infrared channels of the tongue image set and the face image set to generate fused tongue feature vectors and face feature vectors.
- 7. An image data processing method according to claim 5, wherein: the attention weighted aggregation mechanism specifically comprises the following steps: Computing an attention weight for each feature vector by an attention pooling mechanism : ; Wherein h i is the feature vector of the ith image, h j is the feature vector of the jth image, w is a learnable weight vector for calculating the similarity score of each feature vector, V is a learnable weight matrix for mapping the feature vector to a hidden space, tanh (·) is an activation function, and n is the number of images; and then obtaining an aggregated global feature representation through weighted summation: ; where h MIL is the aggregated global feature representation.
- 8. The method according to claim 5, wherein obtaining the fusion weight of each modality by gating an attention mechanism comprises: inputting feature vectors of each modality into a gated attention network, the gated attention network comprising at least one linear layer and Sigmoid activation functions to generate weights ranging between 0-1; Normalizing the generated weights to make the sum of the weights of all modes be 1; And carrying out weighted summation on the feature vector of each mode by using the normalized weight to generate a multi-mode fusion feature vector.
- 9. The method for processing image data according to any one of claims 5 to 8, wherein the method performs epilepsy prediction result output through a pre-trained epilepsy prediction model, and the training section of the epilepsy prediction model includes: an encoder which inputs multi-modal image data and outputs eye feature vectors, tongue feature vectors, and face feature vectors; The attention weighting aggregation module inputs the eye feature vector or the tongue feature vector and outputs the eye aggregation feature vector or the tongue aggregation feature vector; the fusion layer is used for inputting an eye aggregation feature vector, a tongue aggregation feature vector and a facial feature vector and outputting a joint feature vector; and the classifier is input into the combined feature vector and output into an epileptic prediction result.
Description
Epilepsy prediction system based on multi-mode biological image and image data processing method Technical Field The invention relates to the technical field of epilepsy prediction in intelligent medical treatment, in particular to a multimode biological image-based epilepsy prediction system and an image data processing method. Background Epilepsy is a chronic nervous system disease with transient brain dysfunction caused by sudden abnormality of cerebral neuron groups and supersynchronous discharge, and has random, repetitive and notch characteristics, thus bringing serious burden to life and health of epileptic patients. Epilepsy can lead to sudden loss of control of physical and cognitive functions in patients, but can also cause mental health problems such as anxiety or depression, and even in extreme cases, sudden death from epilepsy. The diagnosis delay phenomenon of epilepsy is common, which not only brings risks of accidental injuries (such as falling, burn and drowning) to patients, but also increases the risks of sudden epileptic death (SUDEP) facing the patients by 2-3 times, and causes heavy burden to the patients, families and even society. Therefore, achieving early, accurate diagnosis of epilepsy is critical to improving patient prognosis. At present, the early stage of epileptic diagnosis mainly depends on detailed medical history acquisition and witness description, but the sudden nature, the first pass and the diversity of epileptic seizures bring great challenges to diagnosis, especially for patients with low seizure frequency, nocturnal seizures or atypical symptoms, and the missed diagnosis or misdiagnosis is extremely easy to cause. With the development of technology, modern medical technologies such as electroencephalogram (EEG), neuroimaging (such as MRI) and gene detection can provide objective basis for diagnosis. Wherein, the brain electrical signal is a gold standard for monitoring epilepsy. For example, the invention patent with publication number CN118452948B discloses an electroencephalogram data preprocessing method for the auxiliary diagnosis of epilepsy, which comprises the steps of obtaining a plurality of electroencephalograms of epileptic patients and key electroencephalogram values in the electroencephalograms, obtaining final epileptic characteristic values of the key electroencephalogram values, obtaining epileptic mild seizure electroencephalograms and epileptic severe seizure electroencephalograms according to fluctuation of the final epileptic characteristic values of the key electroencephalogram values, obtaining epileptic characteristic values of the epileptic mild seizure electroencephalograms and epileptic severe seizure electroencephalograms, correcting the key electroencephalogram values according to the final epileptic characteristic values and epileptic characteristic values, constructing an original electroencephalogram matrix, obtaining a source signal matrix, and performing dimension reduction processing to obtain a plurality of source signals after dimension reduction. The method can reduce the data dimension height and improve the characteristics of the source signals reflecting epilepsy after dimension reduction. For another example, the invention patent with publication number CN118557145B discloses an epileptic signal automatic identification system and method based on multi-mode information fusion, the electroencephalogram acquisition unit is used for acquiring EEG signals of a patient in real time; the video acquisition unit is used for synchronously acquiring monitoring video signals of a patient; the data preprocessing unit performs noise reduction processing according to the EEG signal and the monitoring video signal, adding the EEG signal and the monitoring video signal into a seizure start-stop time stamp to perform time sequence calibration to obtain epileptic electroencephalogram data and epileptic video data; the multi-scale convolution network obtains epileptic electroencephalogram multi-feature vectors by combining epileptic electroencephalogram data time domain, frequency domain, time-frequency domain and time-space domain features through a deep learning feature extraction method; the multi-level feature pyramid network processes the epileptic video data through a spatial pyramid pooling method to obtain epileptic video multi-scale features; the human body optical flow estimation module acquires an epileptic video optical flow vector by capturing an epileptic video data sequence; the multi-modal feature fusion module adopts a multi-layer perceptron mapping method to fuse epileptic electroencephalogram multi-feature vectors, epileptic video multi-scale features and video optical flow vectors to obtain epileptic fusion feature data; the epileptic signal classification module identifies epileptic fusion characteristic data through a two-way long-short-term memory network of a multiple attention mechanism to obtain an epileptic signal cla