CN-122000026-A - Disease auxiliary diagnosis system and method based on multi-mode structured causal reasoning

CN122000026ACN 122000026 ACN122000026 ACN 122000026ACN-122000026-A

Abstract

The invention discloses a disease auxiliary diagnosis system and method based on multi-mode structured causal reasoning, and belongs to the technical field of intelligent auxiliary diagnosis. The system comprises a behavior feature layer, a scale entry layer, a symptom latent variable layer, a disease seed layer and a result interpretation layer, wherein the behavior feature layer is used for obtaining behavior features representing behavior changes of a target object by synchronously collecting audio data and facial image video data of the target object, the scale entry layer is used for mapping the behavior features into entry vectors of a psychological assessment clinical scale, the symptom latent variable layer is used for aggregating the entry vectors into symptom latent variables reflecting different clinical symptom dimensions according to the corresponding relation between scale entries and clinical symptom dimensions, the disease seed layer is used for obtaining disease seed influence probability for analyzing influence of symptoms on disease risks by constructing a structured causal relation graph, and the result interpretation layer is used for outputting readable causal interpretation paths and feature contribution degrees based on a causal structure diagram and disease seed prediction results. The invention can realize objectification, structuring and interpretable auxiliary decision making of mental disease diagnosis.

Inventors

CHEN WEIQI
HAN CHUANHUI

Assignees

广东技术师范大学

Dates

Publication Date: 20260508
Application Date: 20260122

Claims (10)

1. A disease assisted diagnosis system based on multi-modal structured causal reasoning, comprising: The data acquisition and preprocessing module synchronously acquires multi-mode physiological data of a target, including audio data and facial image video data, and performs noise reduction and alignment on the multi-mode physiological data to obtain an audio sequence fragment set containing effective information and a corresponding video sequence fragment set; the behavior feature extraction module is used for respectively extracting features of the obtained audio sequence fragment set and the obtained video sequence fragment set, and fusing the extracted features to obtain a multi-mode feature fusion vector; The scale item predicting module is used for mapping the multi-mode behavior feature fusion vector into an item predicting vector of a psychological assessment clinical scale by using a regression model based on a multi-layer fully-connected neural network; The symptom latent variable construction module is used for aggregating the item prediction vectors into symptom latent variables reflecting different clinical symptom dimensions according to the corresponding relation between the predefined scale items and the clinical symptom dimensions; The causal structure learning and dynamic optimizing module is used for constructing a structured causal relation graph comprising behavior feature nodes, scale item nodes, symptom latent variable nodes and disease type nodes based on the multi-modal feature fusion vector, the item prediction vector and the predefined disease types, applying directed acyclic constraint in the structured causal relation graph, and optimizing and learning a causal adjacent matrix by combining with a priori adjacent matrix constructed based on expert knowledge to obtain a causal structure diagram for restraining a disease type prediction neural network; and the anti-facts reasoning and result interpretation module is used for carrying out causal tracking and anti-facts reasoning analysis based on the causal structure diagram and the disease type prediction result obtained by learning and outputting readable causal interpretation paths and characteristic contribution degrees.
2. The disease auxiliary diagnosis system based on multi-modal structured causal reasoning according to claim 1, wherein the data acquisition module acquires multi-modal physiological data of the target, and specifically comprises: based on audio sampling frequency within a set time range Collecting audio data of a target ; Based on video frame rate within the same set time range Capturing facial image video frame sequence data of a target 。
3. The disease auxiliary diagnostic system based on multi-modal structured causal reasoning according to claim 2, wherein the data preprocessing module performs noise reduction and alignment on the acquired multi-modal physiological data, and specifically comprises: Targeting audio data Pre-emphasis and noise reduction are performed: where gamma is the pre-emphasis coefficient and, ; For audio data Speech activity detection to obtain a speech activity mask function ; Obtaining a de-muted audio signal based on the voice activity mask function and the de-noised pre-emphasis audio data: ; Video sequence for each frame in facial image video frame sequence data Performing face detection to obtain face region ; Cutting and aligning the face area to obtain aligned video frames ; And (3) carrying out illumination equalization and histogram equalization on the aligned video frame sequence to obtain a preprocessing frame: setting the segment length to be Second, the segment overlap ratio is ρ, and segment intervals are divided according to the audio time axis: intercepting corresponding audio segments in each time interval: simultaneously selecting a video frame set with a time stamp falling in the interval: Wherein the method comprises the steps of And numbering global frames corresponding to the m-th frame in the segment k.
4. The disease auxiliary diagnosis system based on multi-modal structured causal reasoning according to claim 1, wherein the behavior feature extraction module is configured to perform feature extraction on an audio sequence segment set and a video sequence segment set, and fuse the features in a feature dimension stitching manner to obtain a multi-modal behavior feature fusion vector, and the disease auxiliary diagnosis system specifically includes: for each segment k of a set of segments of an audio sequence Extracting audio features including speech speed, pause ratio, average energy, fundamental frequency mean value and fundamental frequency range, and forming the extracted audio features into voice feature vectors; video frame sequence for each segment k in a set of segments of a video sequence Extracting video features, including the intensity of action units of each frame, the average activation value of each action unit in a segment, blink frequency, pitch angle, yaw angle, roll angle and angular velocity mean value, and forming the extracted video features into video feature vectors; Splicing the voice feature vector and the video feature vector of the same segment k in the feature dimension to obtain a multi-mode fusion feature vector of the segment, and carrying out normalization processing on the multi-mode fusion feature vector; And carrying out weighted aggregation on the normalized multi-modal fusion feature vectors of all the fragments based on the attention mechanism to obtain the multi-modal behavior feature fusion vector.
5. The disease auxiliary diagnostic system based on multi-modal structured causal reasoning of claim 1, wherein the psychological assessment clinical scale comprises a total of 33 entries corresponding to PHQ-9, GAD-7 and HAMD-17 scales, the entries comprising an entry vector , Wherein: PHQ-9 scale 1-9; GAD-7 scale 1-7; HAMD-17 scale 1-17.
6. The multi-modal structured causal reasoning based disease assisted diagnosis system of claim 1, wherein the symptom latent building module is configured to map the scale entry vector to the clinical symptom dimension, building a symptom latent, wherein: Defining a symptom latent variable set: Wherein F1 represents a mood lowering factor, F2 represents an interest and motivation lowering factor, F3 represents a sleep and rhythm disorder factor, F4 represents a somatic symptom factor, F5 represents a psychomotor state factor, and F6 represents an anxiety and stress factor; By linear mapping matrix Bias vector Complete the mapping of entries to symptom potential variables: wherein the linear mapping matrix Is set according to a predetermined entry home set: f1 Corresponding to PHQ-9, 2 nd, HAMD-17, 1 st; f2 Corresponding to PHQ-9, HAMD-17, 7; f3 Corresponding to PHQ-9, HAMD-17, 4-6; F4 Corresponding to PHQ-9, 4 th to 7 th and 8 th, and HAMD-17, 13 th and 16 th; F5 Corresponding to PHQ-9, 8 th and 9 th HAMD-17; F6 Corresponding to GAD-7, 1-7 and HAMD-17, 10, 11 and 15.
7. The disease-assisted diagnosis system based on multi-modal structured causal reasoning as claimed in claim 1, the method is characterized in that in the causal structure learning and dynamic optimizing module, the following steps are specifically executed: Causal impact strength from any node v i to another node v j in a structured causal graph Constructing a causal adjacency matrix Wherein the behavior characteristic node points to a scale entry node, the scale entry node points to a symptom latent variable node, and the symptom latent variable node points to a disease seed node; Applying a matrix-exponential form based loop-free constraint to the causal adjacency matrix: wherein +.is Hadamard element-by-element product, tr ) Is a trace function; The causal adjacency matrix is learned by the following optimization objective function: , wherein A prior is a priori adjacency matrix based on expert knowledge, and lambda is a weight coefficient; is the Frobenius norm; under the causal structure constraint, the disease prediction model outputs a disease prediction result according to the symptom latent variable vector F: Wherein For a predicted risk score of 3 disease species, Is a classification model parameter.
8. The disease assisted diagnostic system of claim 1 wherein the anti-facts inference and results interpretation module is configured to analyze the effect of selected variable nodes on disease prediction results by applying anti-facts interventions to the selected variable nodes under learned causal structural constraints and to generate corresponding causal interpretations based on the anti-facts analysis results, wherein: The anti-facts reasoning and result interpretation module applies an intervention operation to the variable nodes v i in the causal graph aiming at the variable nodes v i in the causal graph, and is based on a causal reasoning function: Recalculating disease prediction results, wherein F represents applying a counterfactual intervention to variable node vi The resulting symptom potential vector is then recalculated based on the causal adjacency matrix a. ; The inverse fact reasoning and result interpretation module calculates the variable node v i to the target disease node v based on the disease prediction results before and after intervention d Contribution degree of (2): ; and carrying out normalization processing on the variable node set I participating in analysis based on the contribution degree to obtain contribution weights corresponding to all variable nodes: obtaining the target disease type node v from the variable node v i d Is provided with a directed causal path in the sequence, the path is expressed as: And calculating a total causal effect score of the path based on causal effect strengths corresponding to causal edges in the path: selecting a plurality of causal paths with causal effect scores larger than a preset threshold, carrying out structural analysis on the causal paths according to a predefined causal interpretation rule based on semantic types of all nodes in the causal paths, causal connection relations among the nodes and corresponding causal effect scores, and converting an analysis result into natural language description based on a preset natural language template to generate a corresponding natural language interpretation result.
9. The disease auxiliary diagnosis method based on multi-modal structured causal reasoning is characterized by comprising the following steps of: synchronously acquiring multi-mode physiological data of a target, wherein the multi-mode physiological data comprises audio data and facial image video data; Noise reduction and alignment are carried out on the acquired multi-mode physiological data to obtain an audio sequence fragment set containing effective information and a corresponding video sequence fragment set; respectively extracting features of the obtained audio sequence fragment set and the video sequence fragment set, and fusing the extracted features to obtain a multi-mode feature fusion vector; Mapping the multi-modal behavior feature fusion vector into an item prediction vector of a psychological assessment clinical scale by using a regression model based on a multi-layer fully connected neural network; According to the corresponding relation between the pre-defined table entries and the clinical symptom dimension, aggregating the entry prediction vectors into symptom latent variables reflecting different clinical symptom dimensions; Constructing a structured causal relation graph comprising behavior feature nodes, scale item nodes, symptom latent variable nodes and disease type nodes based on the multi-modal feature fusion vector, the item prediction vector and the predefined disease types, applying directed acyclic constraint in the structured causal relation graph, and optimally learning a causal adjacency matrix by combining with a priori adjacency matrix constructed based on expert knowledge to obtain a causal structure diagram for restraining a disease type prediction neural network; And carrying out causal tracking and anti-facts reasoning analysis based on the causal structure diagram and the disease type prediction result obtained by learning, and outputting a readable causal interpretation path and characteristic contribution degree.
10. The disease assisted diagnosis method based on multi-modal structured causal reasoning of claim 9, wherein the step of obtaining a causal structure map for constraining a disease model prediction neural network, comprises: Causal impact strength from any node v i to another node v j in a structured causal graph Constructing a causal adjacency matrix Wherein the behavior characteristic node points to a scale entry node, the scale entry node points to a symptom latent variable node, and the symptom latent variable node points to a disease seed node; Applying a matrix-exponential form based loop-free constraint to the causal adjacency matrix: wherein +.is Hadamard element-by-element product, tr ) Is a trace function; The causal adjacency matrix is learned by the following optimization objective function: , wherein A prior is a priori adjacency matrix based on expert knowledge, and lambda is a weight coefficient; is the Frobenius norm; under the causal structure constraint, the disease prediction model outputs a disease prediction result according to the symptom latent variable vector F: Wherein For a predicted risk score of 3 disease species, Is a classification model parameter.

Description

Disease auxiliary diagnosis system and method based on multi-mode structured causal reasoning Technical Field The invention relates to the technical field of intelligent auxiliary diagnosis, in particular to a disease auxiliary diagnosis system and method based on multi-mode structured causal reasoning. Background Mental diseases such as depressive disorder, anxiety disorder and bipolar disorder have become a problem of wide attention in the fields of clinical diagnosis and treatment and public health. The existing clinical diagnosis process generally comprises the steps that a doctor obtains a main complaint and an actual medical history through face-to-face inquiry, a patient fills in self-evaluation scales such as PHQ-9 and GAD-7, the doctor carries out scoring of other evaluation scales such as HAMD-17 by combining interview manifestations and medical history information, and a diagnosis conclusion is formed by integrating symptom items, disease course characteristics and social function damage degree. However, the diagnosis procedure is largely dependent on the experience judgment of doctors and subjective expression of patients, and the objective quantification basis is insufficient. With the development of computer vision and speech recognition technologies, studies have been attempted to use audio and facial video signals to perform mental disease screening or risk assessment, for example, extracting features such as speech speed, pause ratio, tone variation, energy, etc. from speech data to reflect emotion and psychomotor state, and extracting features such as emotion Units (AU), blink frequency, eye variation, head posture, etc. from facial video to reflect emotion expression ability and psychomotor features. However, the existing intelligent auxiliary method still has the following defects: (1) Based on correlation modeling, causal interpretation capability is lacking. Most methods adopt supervised learning to establish a correlation map of behavior characteristics-diagnostic labels, so that potential causes and accompanying phenomena are difficult to distinguish, and the forming reasons of diagnostic conclusions cannot be clearly explained. (2) There is a lack of alignable mechanisms with clinical scale structures. Although PHQ-9, GAD-7 and HAMD-17 equivalent scales are widely used for symptom assessment in clinic, the existing algorithm is often directly based on feature prediction diagnosis labels, and no clear and traceable structured corresponding relation is established between a behavior feature layer, a scale entry layer and a symptom dimension layer, so that model output is difficult to integrate into the existing diagnosis and treatment process. (3) Interpretation results lack path-level readability. Common attention visualizations or feature importance ranking can only give a feature weight magnitude, and it is difficult to answer the link-type question of "which specific behavioral features lead to a certain disease risk rise" by affecting which scale entries and further acting on symptom dimensions. (4) Lack of counterfacts reasoning and capability to intervene in the analysis. In clinical practice, the inverse facts of "if the behavior changes, whether the disease risk changes with it" need to be evaluated, but the traditional model based on correlation output cannot support such intervention analysis and interpretation at the structural mechanism level. Therefore, it is highly desirable to provide a solution that enables objectification, structuring, interpretation and support of counterfactual analysis in the context of mental disease assisted diagnosis, to assist in clinical decisions. Disclosure of Invention In view of the above, the present invention provides a disease auxiliary diagnosis system and method based on multi-modal structured causal reasoning, which are used for solving the problems of insufficient objectivity, disjointed clinical scale, lack of interpretability and incapability of supporting inverse fact analysis in the existing mental disease auxiliary diagnosis method. In order to achieve the above purpose, the present invention adopts the following technical scheme. The invention discloses a disease auxiliary diagnosis system based on multi-mode structured causal reasoning, which comprises: The data acquisition and preprocessing module synchronously acquires multi-mode physiological data of a target, including audio data and facial image video data, and performs noise reduction and alignment on the multi-mode physiological data to obtain an audio sequence fragment set containing effective information and a corresponding video sequence fragment set; the behavior feature extraction module is used for respectively extracting features of the obtained audio sequence fragment set and the obtained video sequence fragment set, and fusing the extracted features to obtain a multi-mode feature fusion vector; The scale item predicting module is used for mapping the multi-mo