CN-122024318-A - Multi-mode driver abnormal state detection method and system based on three expert networks

CN122024318ACN 122024318 ACN122024318 ACN 122024318ACN-122024318-A

Abstract

The invention discloses a method and a system for detecting abnormal states of a multi-mode driver based on a three-expert network, wherein the method comprises the steps of acquiring acquired video images of the face or upper body of the driver, an electroencephalogram time sequence signal and an myoelectricity time sequence signal; the method comprises the steps of acquiring three modal signals, preprocessing the acquired three modal signals, inputting the preprocessed three modal signals into a three-expert network for feature extraction to respectively obtain a visual space feature sequence, an electroencephalogram time domain feature sequence and an myoelectric activity feature sequence, carrying out feature fusion on the visual space feature sequence, the electroencephalogram time domain feature sequence and the myoelectric activity feature sequence by adopting a cross-modal attention fusion mechanism to obtain fusion feature vectors, and outputting a driver state classification result after global average pooling, full-connection layer and focus loss function optimization of the fusion feature vectors. The method realizes deep cross-modal fusion of vision, electroencephalogram and myoelectricity three-modal signals, and obviously improves the recognition accuracy and robustness of abnormal states of the driver.

Inventors

LI JUN
GE CHENYAN
HUANG HAIFENG
HU TAO

Assignees

湖北民族大学

Dates

Publication Date: 20260512
Application Date: 20260129

Claims (10)

1. The method for detecting the abnormal state of the multi-mode driver based on the three expert networks is characterized by comprising the following steps of: acquiring an acquired video image of the face or upper body of a driver, an electroencephalogram time sequence signal and a myoelectricity time sequence signal; Preprocessing the acquired three mode signals to obtain a preprocessed visual video frame, a preprocessed electroencephalogram time sequence signal and a preprocessed myoelectricity time sequence signal; inputting the preprocessed three-mode signals into a three-expert network for feature extraction, wherein the three-expert network comprises a visual space expert branch, an electroencephalogram time domain expert branch and an myoelectric activity expert branch, inputting the preprocessed visual video frames into the visual space expert branch to extract visual space feature sequences, inputting the preprocessed electroencephalogram time sequence signals into the electroencephalogram time domain expert branch to extract electroencephalogram time domain feature sequences, and inputting the preprocessed myoelectric time sequence signals into the myoelectric activity expert branch to extract myoelectric activity feature sequences; And carrying out feature fusion on the visual space feature sequence, the brain electrical time domain feature sequence and the myoelectrical activity feature sequence by adopting a cross-modal attention fusion mechanism to obtain fusion feature vectors, and outputting a driver state classification result after global average pooling, full-connection layer and focus loss function optimization of the fusion feature vectors.
2. The method of claim 1, wherein the specific method of inputting the preprocessed visual video frames into the visual space expert arm to extract the visual space feature sequence comprises: Dividing the preprocessed visual video frame into patches with set sizes, and linearly projecting the patches into a patch mark sequence after flattening; the patch mark sequence is input into a multi-layer transducer block, each layer comprises a multi-head self-attention module and a feedforward network, space topological features are gradually extracted through residual connection and layer normalization, and at least one space feature of facial expression, eye movement track and head gesture is extracted.
3. The method according to claim 2, wherein the specific method for inputting the preprocessed electroencephalogram time sequence signal into the electroencephalogram time domain expert branch to extract the electroencephalogram time domain feature sequence comprises the following steps: carrying out channel expansion and local mode extraction on the preprocessed electroencephalogram time sequence signal through an initial one-dimensional convolution block to obtain an extracted characteristic sequence; Inputting the extracted characteristic sequence into an expansion time sequence convolution network block for processing to obtain the characteristic sequence; And inputting the characteristic sequence into a long sequence space model block or an alternative transducer encoder, and outputting the electroencephalogram time domain characteristic sequence.
4. A method according to claim 3, wherein the specific method for inputting the pre-processed myoelectric time sequence signal into the myoelectric activity expert arm to extract the myoelectric activity characteristic sequence comprises the following steps: And inputting the pre-processed myoelectric time sequence signals into a stacked one-dimensional ConvNeXt block for extracting the characteristics of the muscle tone and the activation mode of the upper limb and outputting a myoelectric activity characteristic sequence.
5. The method according to claim 4, wherein the specific method for feature fusion using a cross-modal attention fusion mechanism comprises: independent linear projection is respectively carried out on the visual space characteristic sequence, the electroencephalogram time domain characteristic sequence and the myoelectric activity characteristic sequence; Taking the visual space characteristic projection result as a query vector, representing the context of the current visual attention, taking the electroencephalogram time domain characteristic projection result as a key vector, representing the state of neural activity, taking the myoelectric activity characteristic projection result as a value vector, and representing the specific action executed by muscles; calculating by adopting a scaling dot product attention mechanism, so that the context of visual attention dominates the weighted attention of the electroencephalogram and myoelectric physiological signals; And capturing cross-modal complementary relations of different subspaces by adopting multi-head parallel expansion.
6. The method as claimed in claim 1, wherein the method for preprocessing and synchronizing the collected three mode signals specifically comprises: filtering, denoising and time stamp alignment are carried out on the three mode signals, and sliding window segmentation is carried out.
7. A three expert network-based multimodal driver abnormal state detection system for implementing the three expert network-based multimodal driver abnormal state detection method of any of claims 1-6, the system comprising: The data acquisition module is used for acquiring a video image of the face or upper body of the driver, an electroencephalogram time sequence signal and a myoelectricity time sequence signal; The preprocessing module is used for preprocessing the acquired three-mode signals to obtain a preprocessed visual video frame, a preprocessed electroencephalogram time sequence signal and a preprocessed myoelectricity time sequence signal; The device comprises a feature extraction module, a feature extraction module and a power generation module, wherein the feature extraction module is used for extracting features of three preprocessed modal signals by adopting a three-expert network, the three-expert network comprises a visual space expert branch, an electroencephalogram time domain expert branch and a myoelectric activity expert branch, and is used for inputting a preprocessed visual video frame into the visual space expert branch to extract a visual space feature sequence, inputting a preprocessed electroencephalogram time sequence signal into the electroencephalogram time domain expert branch to extract an electroencephalogram time domain feature sequence, and inputting a preprocessed myoelectric time sequence signal into the myoelectric activity expert branch to extract a myoelectric activity feature sequence; And the cross-modal attention fusion module is used for carrying out feature fusion on the visual space feature sequence, the electroencephalogram time domain feature sequence and the myoelectric activity feature sequence by adopting a cross-modal attention fusion mechanism to obtain fusion feature vectors, and outputting a driver state classification result after global average pooling, full-connection layer and focus loss function optimization of the fusion feature vectors.
8. The system of claim 7, wherein the feature extraction module comprises a visual space feature extraction unit, an electroencephalogram time domain feature extraction unit and an myoelectric activity feature unit, the visual space feature extraction unit is used for dividing the preprocessed visual video frame into patches with set sizes, and the patches are linearly projected into a patch mark sequence after flattening; Inputting a patch mark sequence into a multi-layer transducer block, wherein each layer comprises a multi-head self-attention module and a feedforward network, gradually extracting space topological features through residual connection and layer normalization, and extracting at least one space feature of facial expression, eye movement track and head gesture; The electroencephalogram time domain feature extraction unit is used for carrying out channel expansion and local mode extraction on the preprocessed electroencephalogram time sequence signals through an initial one-dimensional convolution block to obtain an extracted feature sequence; Inputting the extracted characteristic sequence into an expansion time sequence convolution network block for processing to obtain the characteristic sequence; inputting the characteristic sequence into a long sequence space model block or an alternative transducer encoder, and outputting an electroencephalogram time domain characteristic sequence; The myoelectricity activity characteristic unit is used for inputting the preprocessed myoelectricity time sequence signals into the stacked one-dimensional ConvNeXt blocks, extracting the characteristics of the tension and the activation mode of the muscles of the upper limbs and outputting a myoelectricity activity characteristic sequence.
9. An electronic device comprising a processor, an input device, an output device and a memory, the processor, the input device, the output device and the memory being interconnected, the memory being for storing a computer program, the computer program comprising program instructions, characterized in that the processor is configured to invoke the program instructions to perform the method of any of claims 1-6.
10. A computer readable storage medium, characterized in that the computer readable storage medium stores a computer program comprising program instructions which, when executed by a processor, cause the processor to perform the method of any of claims 1-6.

Description

Multi-mode driver abnormal state detection method and system based on three expert networks Technical Field The invention relates to the technical field of driver state detection, in particular to a method and a system for checking abnormal states of a multi-mode driver based on a three-expert network. Background Driving becomes an indispensable travel mode in modern society, provides a high-efficiency and convenient moving means for people, and promotes economic development and social communication. However, with the rapid increase of the maintenance amount of motor vehicles and the continuous increase of the road traffic density, the occurrence frequency of traffic accidents is high, and serious threat is brought to public safety and lives and properties of people. These accidents cause huge casualties and economic losses, and road traffic injuries have become a significant burden in the world public health field. In these traffic accidents, human factors are the main causes, accounting for up to 90% or more. Common artifacts include adverse conditions such as speeding, drunk driving, fatigue driving, distraction driving, and emotional agitation. The fatigue driving can cause the driver to react slowly, the attention is reduced, the judgment is weakened, and according to the related research, the death rate of the serious traffic accident caused by the fatigue driving is extremely high. Distraction driving (e.g., using a cell phone, adjusting equipment, or talking to passengers) is equally fatal, and National Highway Traffic Safety Administration (NHTSA) related data shows that distraction driving is one of the important causes of fatal collisions. In the emotional agitation state, the driver is easy to perform the agitation operation, and the accident risk is further amplified. The abnormal states are often interwoven, are easier to show under long-time driving or complex road conditions, and obviously reduce the perception and coping ability of a driver to road environment, thereby becoming a main hidden danger of traffic accidents. In practical applications for driver abnormal state detection, there are still the following prominent disadvantages: 1. most schemes highly depend on a single mode, especially a visual mode, so that the detection performance is obviously reduced in a scene with low illumination, shielding (such as a sunglasses and a hat) or strict privacy protection, and the pure physiological mode scheme is easily interfered by motion artifacts and has insufficient stability. 2. Physiological signal acquisition equipment is often invasive higher, needs special bandeau or multiple spot electrode to attach, wears uncomfortable for a long time, and brain electricity, electromyographic signal processing complexity is high, is difficult to continuously reliable work in real driving. 3. The vehicle operation parameter scheme is indirect reasoning, early abnormality of a physiological layer is difficult to capture, and fine granularity distinguishing capability of different abnormal states is limited. 4. Most of the prior multi-mode schemes have insufficient fusion depth, adopt feature layers for simple splicing or decision layer weighting, and cannot fully mine deep complementary relations among visual space information, electroencephalogram time sequence dynamics and myoelectric activity modes, so that the recognition precision and generalization capability in a composite abnormal state (such as fatigue accompanying with emotion fluctuation) or a cross-individual scene are limited. Meanwhile, the calculation complexity is high, and real-time performance is difficult to meet the requirement of vehicle-mounted embedded deployment. Disclosure of Invention Aiming at the defects in the prior art, the invention provides a multimode driver abnormal state detection method based on a three-expert network, which realizes deep cross-modal fusion of visual, electroencephalogram and myoelectricity three-mode signals and obviously improves the recognition accuracy and robustness of the driver abnormal state. In a first aspect, the method for detecting abnormal states of a multi-mode driver based on a three-expert network provided by the embodiment of the invention includes: acquiring an acquired video image of the face or upper body of a driver, an electroencephalogram time sequence signal and a myoelectricity time sequence signal; Preprocessing the acquired three mode signals to obtain a preprocessed visual video frame, a preprocessed electroencephalogram time sequence signal and a preprocessed myoelectricity time sequence signal; inputting the preprocessed three-mode signals into a three-expert network for feature extraction, wherein the three-expert network comprises a visual space expert branch, an electroencephalogram time domain expert branch and an myoelectric activity expert branch, inputting the preprocessed visual video frames into the visual space expert branch to extract visual space feature sequ