CN-121971090-A - Depression monitoring method and system integrating millimeter wave radar and vision sensor

CN121971090ACN 121971090 ACN121971090 ACN 121971090ACN-121971090-A

Abstract

The invention discloses a depression monitoring method and system integrating millimeter wave radar and a vision sensor, wherein the monitoring method comprises the following steps of (1) synchronous acquisition of bimodal data, (2) preprocessing of the bimodal data, including preprocessing of radar signals and preprocessing of video data, (3) generating of monomodal intermediate data, (4) pretraining and fine tuning of a monomodal feature extraction model, including pretraining and fine tuning of an HRV feature extraction model and training of a microexpressive feature extraction model, (5) monomodal feature extraction, (6) dynamic integration of the multimodal features, and (7) evaluation and output of a depression state. The monitoring system comprises a data acquisition module, a data preprocessing module, an intermediate data generation module, a model training module, a characteristic fusion module and a depression evaluation module. The invention can realize depression monitoring with non-contact, precision and generalization.

Inventors

Qiao Lishan
JIAO BOWEN
LIU YANAN
ZHANG LIMEI
XI XIAOMING

Assignees

山东建筑大学

Dates

Publication Date: 20260505
Application Date: 20260317

Claims (10)

1. The depression monitoring method integrating the millimeter wave radar and the vision sensor is characterized by comprising the following steps of: The method comprises the steps of (1) synchronously acquiring bimodal data, starting a millimeter wave radar to emit 77GHz frequency modulation continuous wave, acquiring human thoracic cavity micro-motion echo signals within a range of 0.5-1.5m, synchronously starting a vision sensor to acquire human face video data, wherein the video data is marked with depression type labels (no depression, mild depression, moderate depression and major depression) and comprises non-depression subject samples; step (2) bimodal data preprocessing, including radar signal preprocessing and video data preprocessing, wherein, The radar signal preprocessing comprises the steps of sequentially executing distance dimension fast Fourier transform on echo signals in the step (1), namely executing FFT on echo data of each frame along the direction of a sampling point, obtaining signal distribution of different distance bins, removing static clutter to eliminate static background interference, and unwrapping phase to restore continuous phase information, so as to obtain pure chest micro-motion signals, wherein the static clutter removal adopts a phasor mean value cancellation method, and the formula is as follows: Wherein y [ i, N ] is the original echo signal, N is the average frame number; Uniformly extracting frames of the face video in the step (1) according to 0.1 s/frame, positioning the periocular region marking points 36-47 and the mouth region marking points 48-67 by adopting a 68-point face marker as key regions, intercepting pictures containing key regions, namely key region pictures, and performing Gaussian blur on non-key regions in the key region pictures so as to protect privacy of a subject and reduce background interference, and simultaneously keeping definition of the periocular region and the mouth of the key region, so that subsequent microexpressive feature extraction is facilitated; The method comprises the steps of (3) generating single-mode intermediate data, namely extracting a heartbeat interval IBI sequence from the pure chest micro-motion signal in the step (2), generating an acceleration signal through second-order differentiation, detecting a minimum point as a heartbeat peak value, calculating adjacent peak value time intervals and eliminating abnormal values out of a range of 0.3-2s, generating a micro-expression dynamic characteristic sequence, namely calculating an inter-frame pixel motion vector by adopting a Farneback optical flow method on the key region picture in the step (2), and generating a space-time characteristic sequence by combining a facial action unit AUs label; And (4) pre-training and fine-tuning a single-mode feature extraction model, wherein the pre-training and fine-tuning of an HRV feature extraction model and the training of a micro expression feature extraction model are included, and the method comprises the following steps: Pre-training an HRV feature extraction model, namely pre-training an LSTM time sequence model by adopting an MIT-BIH public physiological data set, and learning a HRV signal universal time sequence mode to obtain a pre-trained LSTM model; Trimming the HRV feature extraction model, namely trimming the pre-trained LSTM model by using an IBI sequence marked with depression category, and optimizing model parameters to adapt to depression-related HRV feature SDNN, RMSSD, LF/HF extraction to obtain an HRV feature extractor; Inputting the microexpressive dynamic feature sequence generated in the step (3) into a transducer model, and performing end-to-end training by taking the depression type as a label to obtain a microexpressive feature extractor, wherein the microexpressive feature extractor and the HRV feature extractor are respectively used for extracting the features of a visual mode and a radar mode and work independently and parallelly; the step (5) is that the single-mode feature extraction is carried out, the IBI sequence extracted in the step (3) is input into an HRV feature extractor, and an HRV feature vector is output; step (6) multi-mode feature dynamic fusion, namely calculating the dual-mode weight, , The method comprises the steps of (1) carrying out weighted splicing on bimodal feature vectors according to weights, setting the dimensions of the HRV feature vectors output by the HRV feature extractor to be 32-128, setting the dimensions of the HRV feature vectors output by the micro-expression feature extractor to be 128-256, carrying out weighted splicing according to the calculated fusion weights, and generating a feature fusion matrix, wherein C 1 is the prediction confidence coefficient of the HRV feature extractor, and the error is less than 5% and is high confidence coefficient; And (7) assessing and outputting the depression state by adopting a multi-branch fusion network, wherein LSTM branches process HRV features and Transformer branches process micro-expression features, the outputs of the two branches are automatically learned and fused through a attention mechanism, attention weights are automatically learned and fused through network training, then the full-connection classification layer is accessed, the depression state assessor is obtained by using a feature fusion matrix sample set training network marked with depression types, and training parameters can be set by referring to a micro-expression model.
2. The depression monitoring method based on the millimeter wave radar and the vision sensor fusion of claim 1 is characterized in that in the step (2), bimodal data preprocessing is carried out, radar signal preprocessing is carried out, MATLAB software is adopted to carry out distance-FFT (fast Fourier transform) on radar echo signals, distance dimension information is extracted, the first 200 distance bin data are selected to cover a detection range of 0.5-1.5m, static clutter is removed through a phasor mean value cancellation method, an average frame number N=50 is set, clutter removal effect and signal instantaneity are balanced, phase unwrapping is carried out through a least square algorithm, 2 pi jump of phase signals is eliminated, and continuous and pure chest micro signals are obtained.
3. The depression monitoring method based on the millimeter wave radar and vision sensor fusion of claim 1 is characterized in that in the step (2), bimodal data preprocessing is conducted, video data preprocessing is conducted, an OpenCV4.8.0 open source library is used for evenly extracting frames of a face video according to 0.1 s/frame interval, micro expression dynamic capture and data quantity control are considered, 68-point face marking models of Dlib19.24.0 library are used for positioning periocular region marking points 36-47 and mouth region marking points 48-67, key region pictures are intercepted, resolution requirements of micro expression feature extraction are adapted, and Gaussian blur of 5×5 kernel size is conducted on non-key regions.
4. The depression monitoring method based on the millimeter wave radar and the vision sensor fusion of claim 1, wherein the step (3) is characterized in that a micro-expression dynamic feature sequence is generated, the space-time feature sequence is generated by calculating an inter-frame pixel motion vector by adopting a Farneback optical flow method and generating the space-time feature sequence in combination with a facial action unit AUs label.
5. The depression monitoring method based on the fusion millimeter wave radar and the visual sensor is characterized in that the step (4) is performed with pre-training and fine-tuning by using an IBI (integrated b-mode) sequence sample set marked with depression type, wherein the pre-training is performed by adopting MIT-BIH public physiological signal data sets and HRV time sequence data of normal people, an LSTM time sequence model is built, 2 layers of hidden layers are contained, the number of neurons of each layer can be set to 64-128, HRV time sequence feature learning is adapted, the batch size is set to 32-64, training rounds are performed for 20-30, the learning rate is 0.001-0.005, the general time sequence mode of learning HRV signals is performed, the HRV feature extraction model is finely tuned by using an IBI sequence sample set marked with depression type, the pre-trained LSTM model is subjected to parameter fine-tuning, the fine-tuning rounds are performed for 10-20 rounds, the learning rate is reduced to 1/10 times of the pre-training stage, the pre-training parameters are prevented from being excessively covered, the model is enabled to be adapted to the extraction requirements of depression-related HRV features SDNN, RMSSD, LF/HF, the micro-training is obtained, the micro-training model is used, the micro-training model is provided with a transducer, the micro-training model is set to comprise 4-6 layers, the micro-level feature is set to be 32-64, training round 1 is used, the micro-state feature is set to be 8, the micro-state feature is set to be a micro-level, the micro-training model is set to be associated with the micro-training model, and the micro-training model is set to be a micro-level, and the micro-level feature model is set to be a micro-level, and has a micro-level feature level, and a micro-level model, and a micro level model.
6. The method of claim 1, wherein in the step (1), a millimeter wave radar adopts a TI AWR1642 type radar module, a frequency modulation slope of 70 MHz/mu s is set, a sampling rate of 1000Hz is set, a detection angle is +/-60 degrees, a visual sensor adopts a high-definition camera with 1080P resolution and 30fps frame rate, a lens focal length is 2.8mm, and a time stamp error is ensured to be less than or equal to 10ms by the radar and the camera through a hardware synchronous interface.
7. The method of claim 1, wherein in the step (5), the HRV feature vector has a dimension of 32-128 and the micro-expression feature vector has a dimension of 128-256.
8. The method of claim 1, wherein in the step (6), the prediction confidence level C 1 of the HRV feature extractor is calculated by using the verification set F1-Score with an error of <5% as a high confidence level judgment standard, and the recognition accuracy C 2 of the micro-expression feature extractor is calculated by using the verification set F1-Score.
9. The method of claim 1, wherein in the step (7), the output dimension of the fully connected classification layer is 4, and the four categories are non-depressive, slightly depressive, moderately depressive and severely depressive.
10. A monitoring system for implementing a depression monitoring method integrating millimeter wave radar and vision sensor, comprising: the data acquisition module comprises a millimeter wave radar and a vision sensor and is used for synchronously acquiring thoracic cavity micro-motion echo signals and face video data; The data preprocessing module comprises a radar signal processing unit and a video key region segmentation unit, and is used for performing signal denoising and region interception; the intermediate data generation module comprises an IBI sequence extraction unit and a micro expression feature sequence generation unit and outputs single-mode time sequence data; the model training module comprises a pre-training unit, a fine tuning unit and an end-to-end training unit, and respectively trains the HRV and the micro expression feature extractor; The feature fusion module is used for calculating dynamic weights based on the single-mode confidence coefficient and realizing feature weighting and splicing; the depression evaluation module is used for carrying a multi-branch fusion network and outputting a depression state evaluation result; The device comprises a data acquisition module, a data preprocessing module, a radar signal processing unit, a video key region segmentation unit, a IBI sequence extraction unit, a micro-expression feature sequence generation unit, a model training module, a feature fusion module, a depression evaluation module, a processor and a data transmission module, wherein the output of the data acquisition module is connected with the data preprocessing module, the radar signal processing unit and the video key region segmentation unit in the data preprocessing module are respectively output to the IBI sequence extraction unit and the micro-expression feature sequence generation unit in the intermediate data generation module, the single-mode time sequence data output by the intermediate data generation module are connected with the model training module, the model training module is used for obtaining a HRV feature extractor and a micro-expression feature extractor, the feature fusion module is used for receiving feature vectors output by the HRV feature extractor and the micro-expression feature extractor and outputting feature fusion matrixes, the depression evaluation module is used for receiving the feature fusion matrixes and outputting depression state evaluation results, and the data transmission among the modules is called and controlled by the processor according to the sequence of a monitoring flow.

Description

Depression monitoring method and system integrating millimeter wave radar and vision sensor Technical Field The invention relates to data processing, in particular to a depression monitoring method and system integrating millimeter wave radar and a vision sensor. Background The early screening of depressive disorder depends on comprehensive evaluation of physiological indexes and emotional behaviors, and common methods include a single physiological signal monitoring scheme, a pure visual micro-expression recognition scheme and a multi-modal fusion technology. However, the prior art has the following significant limitations: 1. a single physiological signal monitoring scheme (such as a wearable device for collecting heart rate) needs to be worn in a contact way, so that the compliance of a user is low, and the Heart Rate Variability (HRV) extraction accuracy is insufficient due to the fact that the user is easy to be disturbed by movement; 2. the pure visual micro-expression recognition scheme is easily influenced by illumination and attitude shielding, and is lack of correlation verification with physiological indexes, so that the false judgment rate of a depression state is higher; 3. The existing multi-mode fusion technology mostly adopts fixed weight splicing, single-mode signal quality dynamic adjustment is not considered, the HRV feature extraction model is not trained by general physiological data, and learning generalization of a depression related time sequence mode is poor. The psychological crisis multi-cascade control scheme based on a psychological big model disclosed by CN202510540684 focuses on the generalization and fusion of multi-source data, a bimodal feature extraction and dynamic fusion mechanism is not designed aiming at depression special items, and although the patents such as CN202310290150 relate to facial key region segmentation, physiological-behavior association analysis is difficult to realize without combining non-contact physiological signals. Therefore, there is a need for a depression monitoring technique that combines non-contact, precision and generalization. Disclosure of Invention The technical problem to be solved by the invention is to provide a depression monitoring method and system integrating millimeter wave radar and vision sensor, and the monitoring method and system solve the problems in the prior art. In order to solve the technical problems, the invention adopts the following technical means: A depression monitoring method integrating millimeter wave radar and vision sensor comprises the following steps: the method comprises the steps of (1) synchronously collecting bimodal data, starting a millimeter wave radar to emit 77GHz frequency modulation continuous wave, collecting human chest cavity micro-motion echo signals within a distance range of 0.5-1.5m, synchronously starting a vision sensor to collect human face video data, marking the video data with depression type labels (no depression, mild depression, moderate depression and major depression), and containing non-depression subject samples. The step (2) of bimodal data preprocessing comprises radar signal preprocessing and video data preprocessing, wherein the radar signal preprocessing comprises the steps of sequentially executing distance dimension fast Fourier transform on echo signals in the step (1), namely executing FFT on echo data of each frame along the direction of a sampling point, obtaining signal distribution of different distance bins, static clutter removal to eliminate static background interference, and phase unwrapping to restore continuous phase information, and obtaining pure chest micro-motion signals, wherein the static clutter removal adopts a phasor mean value cancellation method, and the formula is as follows: And (3) preprocessing video data, namely uniformly extracting frames of the face video in the step (1) according to 0.1 s/frame, positioning the periocular region marking points 36-47 and the mouth region marking points 48-67 by adopting a 68-point face marker as key regions, intercepting pictures containing the key regions, namely key region pictures, and executing Gaussian blur on non-key regions in the key region pictures so as to protect privacy of a subject and reduce background interference, and simultaneously keeping definition of the periocular region and the mouth of the key region, so that the subsequent microexpressive feature extraction is facilitated. The method comprises the steps of (3) generating single-mode intermediate data, namely extracting a heartbeat interval IBI sequence from the pure chest micro-motion signal in the step (2), generating an acceleration signal through second-order differentiation, detecting a minimum point as a heartbeat peak value, calculating adjacent peak value time intervals, removing abnormal values out of a range of 0.3-2s, generating a micro-expression dynamic characteristic sequence, namely calculating an inter-frame pixel mo