CN-122024976-A - SAH late-onset cerebral ischemia risk feature Lu Bangshai selection method based on hybrid causal structure learning and related equipment
Abstract
The application provides a SAH post-delayed cerebral ischemia risk characteristic Lu Bangshai selection method based on hybrid causal structure learning and related equipment, belonging to the technical field of neurological severe medical data analysis. The method comprises the steps of carrying out time sequence alignment and multi-mode preprocessing on SAH clinical data to obtain preprocessed data, establishing a mixed causal graph based on a medical hard constraint matrix, outputting a minimum confusion variable adjustment set, carrying out consistency verification on average causal effect values output by each estimator, outputting fused average causal effect values and confidence intervals thereof if the verification is passed, carrying out multi-dimensional refuting verification on the fused average causal effect values, calculating a clinical causal robustness score, and grading and outputting candidate dangerous factors corresponding to processing variables as corroborative causal factors, potential causal factors or pseudo-related factors. The application can solve the conflict problem of medical prior and data driving in causal graph construction, and improves the interpretation and credibility of the result.
Inventors
- ZHENG ZELONG
- YU LEI
- LV JIANPING
- ZHANG YI
- GUO XINYING
- ZHANG YANHONG
Assignees
- 广州市第一人民医院(广州消化疾病中心、广州医科大学附属市一人民医院、华南理工大学附属第二医院)
Dates
- Publication Date
- 20260512
- Application Date
- 20251223
Claims (10)
- 1. A method for selecting a late-onset SAH cerebral ischemia risk profile Lu Bangshai based on hybrid causal structure learning, the method comprising the steps of: acquiring SAH clinical data; performing time sequence alignment and multi-mode preprocessing on the SAH clinical data to obtain preprocessed data; setting at least one candidate risk factor to be analyzed as a processing variable; Establishing a mixed causal graph based on a medical hard constraint matrix according to the preprocessing data and the processing variables; Performing causal effect identification based on the mixed causal graph, judging whether causal relationships between the processing variables and the ending variables are identifiable, and outputting a minimum confusion variable adjustment set; Based on the minimum confusion variable adjustment set, carrying out numerical estimation on the causal effect between the processing variable and the ending variable by adopting an integration strategy, running at least two heterogeneous estimators in parallel, and calculating an average causal effect value output by each estimator; Carrying out consistency check on the average causal effect value output by each estimator, and outputting the fused average causal effect value and a confidence interval thereof if the verification is passed; Performing multi-dimensional refuting test on the fused average causal effect value, and calculating clinical causal robustness scores based on the multi-dimensional refuting test result; And according to the confidence interval of the fused average causal effect value and the clinical causal robustness score, grading and outputting the candidate risk factors corresponding to the processing variables as corroborative causal factors, potential causal factors or pseudo-related factors.
- 2. The method of claim 1, wherein the time-aligned and multi-modal preprocessing of the SAH clinical data results in preprocessed data, comprising: Establishing a time axis by taking SAH onset time as a time sequence reference, and dividing all variables in SAH clinical data into a baseline variable set, an intermediate process variable set and a final variable, wherein the final variable comprises the occurrence state of delayed cerebral ischemia; and carrying out missing value processing, abnormal value cleaning and standardized coding on the data in the baseline variable set, the intermediate process variable set and the final variable to obtain preprocessed data.
- 3. The method of claim 2, wherein setting at least one candidate risk factor to be analyzed as a process variable comprises: And selecting at least one variable from the baseline variable set or the intermediate process variable set as a processing variable, wherein the selected processing variable is a candidate risk factor to be analyzed.
- 4. The method of claim 1, wherein the creating a hybrid causal graph based on a medical hard constraint matrix from the pre-processed data and the processing variables comprises: Defining a medical hard constraint matrix based on the preprocessing data and medical priori knowledge, wherein a forbidden connection rule among variables is defined by setting a specific value as a forbidden symbol in the medical hard constraint matrix, and the forbidden connection rule at least comprises the steps of forbidden ending variables to point to any variable in a base line variable set; Based on the medical hard constraint matrix, performing a condition independence test on the preprocessed data by using a constraint-based causal discovery algorithm to generate an initial directed acyclic graph; Marking edges which conflict with weak prior knowledge of medical science in the initial directed acyclic graph as pending edges; For each pending edge, calculating the variation of the Bayesian information criterion value of the model whole when the edge is reserved and removed in the initial directed acyclic graph; When the absolute value of the variation exceeds a preset resolution threshold, reserving the pending edge, otherwise, eliminating the pending edge to obtain a conflict resolution result; And according to the result of conflict resolution, the initial directed acyclic graph is adjusted to obtain a mixed causal graph based on a medical hard constraint matrix.
- 5. The method of claim 1, wherein the identifying causal effects based on the mixed causal graph, determining whether causal relationships between the process variables and the ending variables are identifiable, outputting a minimum confusion variable adjustment set, comprises: Traversing and analyzing all connection paths based on a back gate criterion or a front gate criterion on the topological structure of the mixed causal graph to judge whether an observation variable set exists or not, wherein the observation variable set is used for blocking all non-causal paths from the processing variable to the ending variable; if the observation variable set does not exist, judging that the observation variable set is unidentifiable, and terminating the subsequent analysis of the current causal relationship; if the observation variable set exists, the current causal relationship is judged to be identifiable, and then nodes which are identified as the clash variable and the intermediate variable by the preset rules are excluded from the observation variable set, so that the minimum confusion variable adjustment set is obtained.
- 6. The method of claim 1, wherein the computing the average causal effect value for each estimator output based on the minimum confusion variable adjustment set using an integration strategy to numerically estimate causal effects between the process variable and the ending variable, running at least two types of heterogeneity estimators in parallel, comprises: The method comprises the steps of carrying out numerical estimation on causal effects by adopting an integration strategy based on a minimum confusion variable adjustment set, training a heterogeneity estimator, wherein the type of the heterogeneity estimator comprises a parameterized estimator based on a result model, a non-parameterized estimator based on a processing distribution model and a dual robust estimator, the parameterized estimator based on the result model is used for fitting conditional probability distribution of the final variable under the condition of giving the processing variable and the minimum confusion variable adjustment set, the non-parameterized estimator based on the processing distribution model is used for estimating tendency scores of the processing variable under the condition of giving the minimum confusion variable adjustment set and carrying out weighting or matching based on the scores, and the dual robust estimator is used for simultaneously carrying out estimation by combining information of the parameterized estimator based on the result model and the non-parameterized estimator based on the processing distribution model; And running at least two types of heterogeneity estimators in parallel, and respectively calculating and outputting an average causal effect value estimated by each heterogeneity estimator.
- 7. The method of claim 1, wherein said performing a consistency check on the average causal effect value output by each estimator, and outputting a fused average causal effect value and confidence interval thereof if the check passes, comprises: calculating the variation coefficient between the average causal effect values output by different estimators; Comparing the variation coefficient with a preset consistency threshold value, checking whether the effect directions of the average causal effect values are consistent, and if the variation coefficient is smaller than the consistency threshold value and the effect directions of all the average causal effect values are consistent, judging that the verification is passed; When the verification is passed, taking the reciprocal of the variance of each estimated value as the weight, carrying out weighted average, and calculating to obtain the fused average causal effect value and the confidence interval thereof.
- 8. The method of claim 1, wherein the multi-dimensional refuting test is performed on the fused average causal effect value, and calculating a clinical causal robustness score based on the results of the multi-dimensional refuting test comprises: replacing the process variable in the original data with a randomly generated placebo variable, wherein the placebo variable is a variable that is not causally related to a final variable; Re-operating the heterogeneity estimator under the minimum confusion variable adjustment set, and calculating an effect value of the placebo variable on the ending variable as a placebo effect value; generating a plurality of data subsets from the original data by randomly sampling with a put back; independently running the heterogeneity estimator on each data subset, executing a causal effect identification and estimation flow, and calculating the fused average causal effect value to obtain a subset effect estimation value sequence; setting a strong confusion factor which is not recorded in the data, and setting the intensity of the strong confusion factor which simultaneously affects the processing variable and the ending variable; gradually increasing the influence intensity of the strong confusion factor, and calculating and determining the minimum influence intensity required for eliminating the statistical significance of the fused average causal effect value as a sensitivity threshold; Calculating a placebo check score, a stability score and a sensitivity score by a preset scoring rule based on the placebo effect value, the subset effect estimation value sequence and the sensitivity threshold; and carrying out weighted summation on the placebo test score, the stability score and the sensitivity score according to preset weight coefficients to calculate a final clinical causal robustness score, wherein the clinical causal robustness score is a scalar value between 0 and 1.
- 9. The method of claim 1, wherein the ranking the candidate risk factors corresponding to the process variable as corroborative causal factors, potential causal factors, or pseudo-correlation factors based on the confidence interval of the fused average causal effect value and the clinical causal robustness score comprises: presetting a first scoring threshold value and a second scoring threshold value, wherein the first scoring threshold value is higher than the second scoring threshold value; based on the confidence interval of the fused mean causal effect value and the clinical causal robustness score, applying the following decision rule: If the confidence interval of the fused average causal effect value does not contain zero under the preset significance level and the clinical causal robustness score is greater than or equal to the first score threshold, judging and outputting the candidate risk factors in the processing variable as corroborative causal factors; If the confidence interval of the fused average causal effect value does not contain zero under the preset significance level and the clinical causal robustness score is smaller than the first score threshold but larger than or equal to the second score threshold, judging and outputting the candidate risk factors in the processing variables as potential causal factors; And if the confidence interval of the fused average causal effect value contains zero or the clinical causal robustness score is smaller than the second scoring threshold, judging and outputting the candidate risk factors in the processing variables as pseudo-related factors.
- 10. A computer program product comprising a computer program, characterized in that the computer program, when executed by a processor, implements the method of any one of claims 1 to 9.
Description
SAH late-onset cerebral ischemia risk feature Lu Bangshai selection method based on hybrid causal structure learning and related equipment Technical Field The application relates to the technical field of analysis of nerve severe medical data, in particular to a SAH post-delayed cerebral ischemia risk characteristic Lu Bangshai selection method based on mixed causal structure learning and related equipment. Background In the related art, the pathophysiology of DCI is highly complex, and relates to a plurality of processes such as large vasospasm, microcirculation disturbance, cortical spreading depolarization, inflammatory reaction, blood brain barrier destruction and the like, and multiple interweaving relations exist among various clinical scores, imaging indexes, biochemical markers and therapeutic measures. The complexity of the multi-factor and multi-path makes the prediction model feature basis of the traditional single risk factor and regression analysis unreliable and has poor generalization capability. In summary, the technical problems in the related art are to be improved. Disclosure of Invention The embodiment of the application mainly aims to provide a SAH post-late cerebral ischemia risk feature Lu Bangshai selection method and related equipment based on mixed causal structure learning, which can solve the problem of conflict between medical prior and data driving in causal graph construction and realize accurate identification from false correlation to true causal. To achieve the above object, an aspect of the embodiments of the present application provides a method for selecting a late-onset cerebral ischemia risk feature Lu Bangshai after SAH based on hybrid causal structure learning, the method comprising the steps of: acquiring SAH clinical data; performing time sequence alignment and multi-mode preprocessing on the SAH clinical data to obtain preprocessed data; setting at least one candidate risk factor to be analyzed as a processing variable; Establishing a mixed causal graph based on a medical hard constraint matrix according to the preprocessing data and the processing variables; Performing causal effect identification based on the mixed causal graph, judging whether causal relationships between the processing variables and the ending variables are identifiable, and outputting a minimum confusion variable adjustment set; Based on the minimum confusion variable adjustment set, carrying out numerical estimation on the causal effect between the processing variable and the ending variable by adopting an integration strategy, running at least two heterogeneous estimators in parallel, and calculating an average causal effect value output by each estimator; Carrying out consistency check on the average causal effect value output by each estimator, and outputting the fused average causal effect value and a confidence interval thereof if the verification is passed; Performing multi-dimensional refuting test on the fused average causal effect value, and calculating clinical causal robustness scores based on the multi-dimensional refuting test result; And according to the confidence interval of the fused average causal effect value and the clinical causal robustness score, grading and outputting the candidate risk factors corresponding to the processing variables as corroborative causal factors, potential causal factors or pseudo-related factors. In some embodiments, the time-aligned and multi-modal preprocessing of the SAH clinical data results in preprocessed data, comprising: Establishing a time axis by taking SAH onset time as a time sequence reference, and dividing all variables in SAH clinical data into a baseline variable set, an intermediate process variable set and a final variable, wherein the final variable comprises the occurrence state of delayed cerebral ischemia; and carrying out missing value processing, abnormal value cleaning and standardized coding on the data in the baseline variable set, the intermediate process variable set and the final variable to obtain preprocessed data. In some embodiments, the setting at least one candidate risk factor to be analyzed as a process variable includes: And selecting at least one variable from the baseline variable set or the intermediate process variable set as a processing variable, wherein the selected processing variable is a candidate risk factor to be analyzed. In some embodiments, the creating a medical hard constraint matrix-based hybrid causal graph from the pre-processed data and the processing variables comprises: Defining a medical hard constraint matrix based on the preprocessing data and medical priori knowledge, wherein a forbidden connection rule among variables is defined by setting a specific value as a forbidden symbol in the medical hard constraint matrix, and the forbidden connection rule at least comprises the steps of forbidden ending variables to point to any variable in a base line variable set; Based on the medica