CN-121999949-A - Multi-model-based clinical outcome machine learning prediction method, device and storage medium
Abstract
The invention discloses a multi-model-based clinical outcome machine learning prediction method, a device and a storage medium, which are characterized in that a plurality of machine learning prediction models are constructed under a unified calculation frame, and automatic parameter optimization, multi-dimensional performance evaluation, prediction probability calibration and model interpretation analysis are combined, the method realizes systematic modeling and evaluation of the clinical outcome prediction task, effectively avoids instability caused by single model dependence and manual parameter adjustment, and improves the stability, generalization capability and repeatability of the prediction result. Meanwhile, the reliability and the practicability of the prediction result in risk assessment and clinical decision support are improved by carrying out calibration processing on the prediction probability and introducing clinical decision curve and clinical influence curve analysis. According to the invention, the transparency and the understandability of the prediction process are enhanced through model interpretation and visual output, so that the acceptability and the application value of the model in medical research and clinical application are improved.
Inventors
- LI QUANLIN
- LIU ZUQIANG
- Gao Pinting
- MA LIYUN
- TAN YANFANG
- XU PEIRONG
- ZHOU PINGHONG
Assignees
- 复旦大学附属中山医院
Dates
- Publication Date
- 20260508
- Application Date
- 20260127
Claims (10)
- 1. The machine learning prediction method for the clinical outcome based on the multiple models is characterized by comprising the following steps of: s1, acquiring original data for clinical outcome prediction, and constructing a data set structure comprising training set data and external verification set data, wherein the training set is used for model construction, and the verification set is used for model generalization capability assessment; S2, preprocessing the data in the data set structure obtained in the step S1, wherein clinical features with different sources and different scales meet the input requirements of subsequent model training through the preprocessing operation; S3, constructing a plurality of machine learning prediction models of different types under a unified computing framework, and through multi-model parallel construction, avoiding limitation of a single model structure on a clinical outcome prediction result, and improving stability and adaptability of the overall prediction result; S4, aiming at different types of machine learning prediction models, respectively presetting matched parameter search spaces according to model structures and training mechanisms, generating a plurality of groups of candidate parameter combinations in the parameter search spaces under the condition of no manual intervention through preset parameter combination generation rules, training and evaluating performance of the corresponding machine learning prediction models by adopting a cross verification mode to obtain performance evaluation results of each candidate parameter combination, and automatically determining optimal parameter configuration of each machine learning prediction model from the candidate parameter combinations based on preset target performance indexes; s5, training and verifying each machine learning prediction model in a training set in a cross verification mode, and reducing the influence of accidental factors on the model performance through multiple data division and training processes, so that the prediction stability of the machine learning prediction model on different sample subsets is improved; S6, respectively calculating multi-dimensional performance evaluation indexes of each prediction model after training, and realizing comprehensive quantification of model prediction performance through the multi-dimensional performance evaluation indexes; S7, comprehensively comparing the plurality of prediction models based on the multi-dimensional performance evaluation result obtained in the step S6, and screening an optimal machine learning prediction model for clinical outcome prediction from the plurality of prediction models according to a preset model evaluation rule or evaluation strategy; S8, aiming at the optimal prediction model determined in the step S7, calibrating the output prediction probability of the optimal prediction model to improve the consistency and reliability between the prediction probability and the occurrence probability of a real clinical outcome, wherein the prediction probability calibration process establishes a probability mapping relation based on training data or a cross verification result, and applies the probability mapping relation to the prediction probability of a sample to be predicted, so that the calibrated prediction probability is obtained, systematic deviation of the model prediction probability is reduced through the prediction probability calibration process, and the usability of the prediction result in clinical risk assessment and decision support is improved; S9, based on the calibrated prediction probability obtained in the step S8, evaluating the clinical application value of the optimal prediction model under different risk threshold values, respectively calculating the corresponding net benefit value of the optimal prediction model under each threshold value condition in a preset prediction probability threshold value interval by introducing a clinical decision curve analysis method, and comparing and analyzing the net benefit result of the optimal prediction model with all intervention strategies and non-interference strategies so as to evaluate the clinical decision advantage of the optimal prediction model relative to a baseline strategy in different risk threshold value intervals; S10, performing model interpretation analysis on the optimal prediction model to quantify the influence degree of different features on a clinical outcome prediction result, calculating the contribution value of each input feature on the prediction result by constructing an interpretation model based on the feature contribution degree, and performing statistical analysis on the contribution value to obtain the relative importance of each feature in the prediction process; s11, outputting the prediction result of the optimal prediction model, the prediction probability after calibration, the model performance evaluation result and the model interpretation analysis result, and using the prediction result, the model performance evaluation result and the model interpretation analysis result for medical research analysis, clinical risk assessment, prognosis judgment or auxiliary clinical decision.
- 2. The multi-model based machine learning prediction method of clinical outcome according to claim 1, wherein in step S1, the raw data comprises at least one of clinical basic information data, laboratory examination data, image or image derived feature data, pathology, follow-up or outcome labeling data; in step S2, the preprocessing operation includes performing balance processing on data type conversion and/or ending variables; In the step S4, the parameter search space comprises model complexity parameters, regularization parameters, learning rate parameters or sampling proportion parameters, the parameter combination generation rule comprises an equidistant combination, layered combination or random combination mode, and the target performance index comprises a weighted result of area under a curve, about step index and/or multidimensional performance index; in step S6, the performance evaluation index includes at least one of sensitivity, specificity, accuracy, positive predictive value, negative predictive value, F1 value, about log index and area under curve; In step S7, the evaluation rule or the evaluation policy includes at least one of a comparison rule based on a single performance index, a comprehensive evaluation rule based on a plurality of performance indexes, and a comprehensive evaluation rule obtained by weighting calculation based on different performance indexes; In step S11, the prediction result is output in form of a table, a graph or an electronic document, and may be stored in a local storage medium or transmitted to an external system through a network, so as to support subsequent clinical application or scientific research analysis.
- 3. The multi-model based machine learning prediction method of claim 1, wherein in step S3, the machine learning prediction model includes an ensemble learning model, a gradient lifting class model, a linear or regularized model, a distance or kernel function model, and a neural network model, or is another machine learning model capable of implementing clinical outcome prediction.
- 4. The method according to claim 1, wherein in step S4, an early termination mechanism is introduced, and if the model performance of the machine learning prediction model does not reach a preset elevation threshold in several continuous evaluations, the training process of the current candidate parameter combination is automatically terminated, so as to improve the parameter optimization efficiency and reduce the risk of overfitting.
- 5. The machine learning prediction method for clinical outcome based on multiple models according to claim 1, wherein in step S7, the evaluation rule uses the area under the curve as a main evaluation index, and performs auxiliary judgment in combination with other performance indexes.
- 6. The machine learning prediction method for clinical outcome based on multiple models as claimed in claim 1, wherein in step S8, the prediction probability calibration process includes adopting an equivalent regression calibration method, and monotonously mapping the prediction probability outputted by the optimal prediction model, so that the calibrated prediction probability is closer to the occurrence ratio of the real clinical outcome while maintaining the original ordering relationship; or the prediction probability calibration processing comprises the steps of adopting a logistic regression calibration method, constructing a probability calibration model based on the relation between the prediction probability output by the optimal prediction model and the real ending label, and adjusting the prediction probability.
- 7. The machine learning prediction method for clinical outcome based on multiple models according to claim 1, wherein in step S9, by constructing a clinical impact curve, the optimal prediction model predicts the number of samples with high risk under different prediction probability threshold conditions, and the number of samples in which the target clinical outcome actually occurs is counted and visually displayed, so as to intuitively reflect the distribution situation of the prediction result of the prediction model under different risk threshold conditions, thereby providing auxiliary basis for the clinician to perform risk threshold selection, intervention strategy formulation and clinical decision.
- 8. The machine learning prediction method for clinical outcome based on multiple models according to claim 1, wherein in step S10, the overall feature contribution distribution of the optimal prediction model and the prediction decision process of a single sample are displayed in a visualized manner such as a waterfall graph, a bee colony graph, a feature dependency graph or a single sample interpretation graph, so as to enhance the transparency and the understandability of the model prediction result.
- 9. An electronic device, comprising: One or more processors; A storage device for storing one or more programs, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to execute software to implement the clinical outcome machine learning prediction method of claim 1.
- 10. A computer readable storage medium having stored thereon executable instructions which when executed by a processor cause the processor to perform a method of machine learning prediction of clinical outcome according to claim 1.
Description
Multi-model-based clinical outcome machine learning prediction method, device and storage medium Technical Field The invention relates to a machine learning prediction method, a device and a storage medium for clinical outcome, which are suitable for the fields of medical diagnosis research, disease risk assessment, prognosis prediction and clinical auxiliary decision making. Background With the development of medical informatization and accurate medicine, predicting whether a patient will develop a specific clinical outcome in the future by using clinical data has become an important research direction. In the prior art, prediction is usually performed by adopting logistic regression, random forest, support vector machine, neural network, gradient lifting model and the like, but the following defects still exist: 1. different clinical data have significant differences in sample size, variable dimension, distribution characteristics, noise level and the like, and a single model is difficult to adapt to all data types, so that unstable prediction performance or insufficient generalization capability are easily caused. 2. Model parameter tuning is usually carried out by adopting a manual experience setting or simple trial mode, the parameter tuning process is time-consuming and labor-consuming and difficult to be completed efficiently in a multi-model scene, the parameter selection lacks a system searching strategy and is difficult to obtain a global optimal parameter combination, the parameter setting difference adopted by different researchers is large, the model result is difficult to reproduce, and the standardization of medical research is not facilitated. 3. Model performance assessment typically reports only a single indicator of accuracy or AUC, and fails to fully evaluate model performance from multiple dimensions. The single index is difficult to reflect the actual performance of the model in clinical application, key factors such as sensitivity, specificity, predicted value and clinical decision value are easily ignored, and the application reliability of the model in a real clinical scene is limited. 4. Most existing prediction models only output a prediction result or prediction probability, and the reliability of the prediction probability is not subjected to system calibration, so that deviation exists between the prediction probability and the real risk. Meanwhile, the existing research is less in combination with methods such as clinical decision curve analysis, clinical influence curve and the like to evaluate the clinical benefit of the model under different risk thresholds, and visual and operable decision support is difficult to provide for clinicians. 5. Part of high-performance machine learning models belong to a black box model, the prior art often lacks interpretation and analysis of model prediction results, and the contribution degree of each feature to the prediction results is difficult to be determined. The model has insufficient interpretability, is not beneficial to doctors to understand the model prediction logic, and limits the practical application of the model in clinical diagnosis and risk assessment. Disclosure of Invention Aiming at the problems of low model selection dependence experience, low parameter optimization efficiency, single performance evaluation dimension, insufficient prediction probability reliability, poor model interpretation and the like in the existing clinical outcome prediction method, the invention provides a multi-model-based clinical outcome machine learning prediction method, a multi-model-based clinical outcome machine learning prediction device and a multi-model-based storage medium, so as to realize systematic, stable and interpretable modeling of clinical outcome prediction tasks. In order to achieve the above object, a first aspect of the present invention discloses a multi-model-based machine learning prediction method for clinical outcome, which is characterized by comprising the following steps: s1, acquiring original data for clinical outcome prediction, and constructing a data set structure comprising training set data and external verification set data, wherein the training set is used for model construction, and the verification set is used for model generalization capability assessment; S2, preprocessing the data in the data set structure obtained in the step S1, wherein clinical features with different sources and different scales meet the input requirements of subsequent model training through the preprocessing operation; S3, constructing a plurality of machine learning prediction models of different types under a unified computing framework, and through multi-model parallel construction, avoiding limitation of a single model structure on a clinical outcome prediction result, and improving stability and adaptability of the overall prediction result; S4, aiming at different types of machine learning prediction models, respectively p