CN-122000054-A - Method and system for constructing risk prediction model of Kawasaki disease medium to huge coronary aneurysm

CN122000054ACN 122000054 ACN122000054 ACN 122000054ACN-122000054-A

Abstract

The application belongs to the technical field of biological information processing, and particularly relates to a method and a system for constructing a model for predicting risk of a large coronary aneurysm in Kawasaki disease. The application provides a Kawasaki disease MGCAA risk prediction model construction method and system, wherein the method is based on six conventional clinical variables, a random forest (RF ranger) algorithm is adopted to build a prediction model, a SHAP interpretation mechanism is introduced, contributions of variables to a prediction result can be intuitively displayed on the global and individual levels, transparency and clinical interpretability of the model are improved, reliability and applicability of the model in different crowds are guaranteed through external verification and intercept-only recalibration (intercept-only recalibration), an online webpage tool (Shiny App) is deployed, doctors can input clinical data of patients in real time, and individual risk prediction and interpretation are immediately obtained, so that a MGCAA high risk infant is identified in early clinical stage, and personalized treatment scheme and early intervention measures are assisted to be formulated.

Inventors

HUANG HONGBIAO
HE YING
LIN FAN

Assignees

福州大学附属省立医院

Dates

Publication Date: 20260508
Application Date: 20260123

Claims (8)

1. The method for constructing the model for predicting the risk of the Kawasaki disease from the middle to the huge coronary aneurysm is characterized by comprising the following steps of: s1, collecting clinical data of Kawasaki patients, and obtaining initial variables after data processing and screening; S2, screening all initial variables by adopting RF ranger algorithm and combining a recursion characteristic elimination method to obtain 6 core characteristic variables of hemorrhagic hemoglobin, diagnosis delay time, triglyceride, neutrophil percentage, oral mucosa change and rash, and training to obtain a model for predicting risk of large coronary aneurysm in Kawasaki disease; S3, embedding an SHAP method into the initial model in S2 to obtain risk factors and protection factors for the distribution result of risk probability output by the initial model for predicting the risk of the huge coronary aneurysm in Kawasaki disease, wherein the risk factors and the protection factors are used for explaining the prediction result, carrying out risk stratification based on the prediction result, determining an optimal risk threshold value, and finally obtaining the risk prediction model for the huge coronary aneurysm in Kawasaki disease as a decision basis.
2. The method of claim 1, wherein the data processing comprises multiple interpolation of missing data, removal of variables with larger missing proportions, ordered encoding of classified variables, direct inclusion modeling of continuous variables with original values, and control of multiple collinearity using Spearman correlation analysis.
3. The method of claim 1, wherein the risk stratification comprises using a subject work characteristic (ROC) analysis based on risk probabilities output by the model and determining a risk threshold for risk stratification in combination with a Youden index, dividing the target patient into low risk stratification when the predicted risk probability is less than the risk threshold, dividing the target patient into high risk stratification when the predicted risk probability is greater than or equal to the risk threshold, and further evaluating net benefit of model prediction strategies under different risk thresholds by decision curve analysis to determine a risk probability interval for the model to have positive net benefit in clinical decisions for defining the risk stratification and the effective range of decision triggering.
4. A mid-to-large coronary aneurysm risk prediction model of kawasaki disease obtained according to the construction method of any one of claims 1-3.
5. A mid-to-large coronary aneurysm risk prediction system in kawasaki disease employing the model of claim 4, comprising the following modules: (1) The characteristic input module is used for inputting 6 characteristic data of hemorrhagic hemoglobin, diagnosis delay time, triglyceride, neutrophil percentage, oral mucosa change and rash of a target patient; (2) The data processing module comprises RF ranger algorithm for processing the input characteristics to obtain a prediction result; (3) And the result interpretation module comprises a SHAP module and is used for performing result interpretation on the prediction result.
6. A method of predicting risk of mid-to-large coronary aneurysms in kawasaki disease, comprising using the system of claim 5: Obtaining the characteristics of bleeding red protein, diagnosis delay time, triglyceride, neutrophil percentage, oral mucosa change and rash of a target patient, and inputting the characteristics into the system of claim 4 to obtain a predicted result and an explanation result.
7. A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the method according to claim 6 when executing the computer program.
8. A computer readable storage medium storing a computer program, characterized in that the computer program, when executed by a processor, implements the method according to claim 6.

Description

Method and system for constructing risk prediction model of Kawasaki disease medium to huge coronary aneurysm Technical Field The invention belongs to the technical field of biological information processing, and particularly relates to a method and a system for constructing a model for predicting risk of a large coronary aneurysm in Kawasaki disease. Background Kawasaki disease (KAWASAKI DISEASE, KD) is an acute, self-limiting systemic vasculitis, mainly affecting small to medium arteries in children, and is the leading cause of acquired heart disease in children in developed countries. Typical clinical manifestations include persistent fever, rash, changes in the oral mucosa, conjunctival congestion, cervical lymphadenectasis, and changes in the extremities. Intravenous immunoglobulin (IVIG) is currently the primary therapeutic measure, and has significantly reduced the incidence of coronary lesions. However, some patients still develop coronary aneurysms (Coronary Artery Aneurysm, CAA), of which Medium-to-Giant CAA (MGCAA) is the most severe phenotype, often accompanied by structural destruction of the vessel wall and sustained inflammatory response, leading to thrombotic, stenotic and major adverse cardiovascular events for long periods of time, severely threatening the health of the infant. In order to identify high risk infants at an early stage, risk prediction tools and scoring methods are gradually introduced clinically. For example, the Kobayashi score, egami score, proposed by japanese scholars, is weighted primarily by several clinical and laboratory criteria for predicting the risk of resistance of the child to IVIG treatment. In addition, there have been studies attempting to build a traditional statistical-based risk model to assist physicians in risk stratification and treatment decisions early in the course of the disease. These methods drive the development of kawasaki disease risk assessment, but most are not specifically directed to MGCAA's predictions, with limited clinical applicability. For this reason, prior art 1 (patent number: CN 109243604A) discloses a method and a system for constructing Kawasaki disease risk assessment model based on neural network algorithm. The method comprises the steps of extracting effective samples from Kawasaki disease sample data sets, screening 10 features (comprising gender, age, CRP, fibrinogen, albumin, globulin, complement C3, igG, prealbumin, white ball proportion and the like) suitable for clinical auxiliary diagnosis from a feature set, randomly dividing the samples into a training set and a verification set, performing model training by using a neural network method, adopting ten-fold cross verification, recording optimal parameters, determining a classification threshold t on the verification set through an ROC curve, constructing a risk assessment model, constructing a corresponding system, evaluating newly input clinical data, and outputting 'KDx score' to assist clinical diagnosis and risk judgment. However, the proposal aims at reducing misdiagnosis and missed diagnosis of Kawasaki disease, so that patients can obtain timely prevention, intervention and treatment in early stage. It has the following drawbacks: ① Prediction targets mismatch the model is directed to kawasaki disease diagnosis and general risk assessment, not to the prediction of large coronary tumors (MGCAA). ② And (3) the algorithm is blackened, namely, a neural network is adopted, so that the prediction accuracy can be improved, but the algorithm lacks good interpretability, and a doctor cannot easily understand the influence of each variable on the result. ③ The variable quantity is relatively large, 10 characteristics are needed, the clinical acquisition cost is high, and the rapid popularization is not facilitated. ④ Lack of external verification and recalibration the patent does not mention external verification and intercept recalibration across people and therefore has limited generalization. ⑤ The system is only a theoretical model and system, and is not deployed as a web tool convenient for doctors to use immediately. Prior art 2 (patent number: CN 109273094A) discloses a construction method and system of a Kawasaki disease risk assessment model based on Boosting algorithm. The method is similar to the prior art 1 in flow, and mainly comprises the steps of extracting effective samples from a sample data set, screening 10 characteristic variables related to clinical diagnosis, randomly dividing the data into a training set and a verification set, training by using Boosting algorithm (such as XGBoost, adaBoost, GBM and the like), determining optimal model parameters by adopting ten-fold cross verification, setting a classification threshold t on the verification set through an ROC curve, finally obtaining a risk assessment model, evaluating new samples by a construction system, and outputting a KDx score. However, this technique has the following drawbacks: ① The pre