CN-122025121-A - Uncertainty perception stacked meta-learning structure-based rheumatoid arthritis patient low muscle mass risk prediction method

CN122025121ACN 122025121 ACN122025121 ACN 122025121ACN-122025121-A

Abstract

The invention provides a low muscle mass risk prediction method for a rheumatoid arthritis patient based on an uncertainty perception stacked meta-learning structure, and belongs to the technical field of machine learning. The method comprises the steps of obtaining a rheumatoid arthritis data set, constructing a base learner set comprising a table Transformer network and a gradient lifting tree model, splicing the out-of-the-way prediction probability and statistics of each base learner to form a meta-feature matrix, constructing and training a meta-learner comprising a spectrum normalization multi-layer perceptron and a random feature Gaussian process output layer by taking the meta-feature matrix as input, collecting data to be detected, sequentially inputting the trained base learner set and the trained meta-learner, performing temperature scaling and beta-calibration on a prediction result, and taking the calibrated prediction probability as a final prediction result. The invention solves the defects of the existing model in the aspects of practicality, probability reliability and the like through multi-source data fusion.

Inventors

JIANG PING
ZHOU FEIYUE
ZHAO XIAOHU
Zhong shuai

Assignees

山东中医药大学附属医院

Dates

Publication Date: 20260512
Application Date: 20260108

Claims (10)

1. A method for predicting low muscle mass risk of a rheumatoid arthritis patient based on an uncertainty-aware stacked meta-learning structure, comprising the steps of: acquiring a rheumatoid arthritis data set which comprises sample characteristics and two kinds of labels, performing characteristic processing and interactive construction, and processing sample unbalance, and dividing a training set, a test set and a verification set; training the base learner set by using a training set, and generating the out-of-refraction prediction probability of each base learner on a training sample through cross verification; Splicing the out-of-folding prediction probability and statistics of each base learner to form a meta-feature matrix; Taking the meta-feature matrix as input, constructing and training a meta-learner comprising a spectrum normalization multi-layer perceptron and a random feature Gaussian process output layer to obtain a risk probability prediction mean value and an uncertainty quantization index of each sample, aligning an asymmetric loss function by using an uncertainty perception threshold value by using a training loss function, and combining GroupDRO worst group optimization strategies; The method comprises the steps of collecting data to be detected, sequentially inputting a trained basic learner set and a trained meta learner, carrying out temperature scaling and beta-calibration on a predicted result, and taking the calibrated predicted probability as a final predicted result.
2. The method for predicting low muscle mass risk in a patient suffering from rheumatoid arthritis based on an uncertainty perception stacked meta-learning structure of claim 1 wherein the sample characteristics include gender, age, body mass index, neutrophil count, lymphocyte count, hemoglobin, platelet count, alanine aminotransferase, aspartate aminotransferase, cholesterol, albumin, urea, creatinine, uric acid; sample imbalance processing includes class weights and oversampling to increase the positive sample proportion.
3. The method for predicting low muscle mass risk in rheumatoid arthritis patients based on uncertainty aware stacked meta learning structure of claim 1, wherein the table Transformer class network comprises FT-Transformer, SAINT or VIME and the gradient boost tree class model comprises LightGBM, XGBoost or CatBoost; the set of base learners generates the out-of-roll prediction probability of each base learner on a training sample through cross validation, comprising: Dividing training set data into K folds, training each base learner on K-1 folds, generating prediction probability on the rest 1 folds, traversing all folds, and obtaining the out-of-fold prediction probability of each base learner.
4. The method for predicting low muscle mass risk in rheumatoid arthritis patients based on uncertainty aware stacked meta-learning structure of claim 1, wherein the meta-feature matrix comprises: The method comprises the steps of splicing the out-of-folding prediction probability of each base learner into a multi-channel probability vector according to sample dimensions, calculating statistics of the multi-channel probability vector, wherein the statistics comprise mean value, variance, range, entropy, logic and probability sequencing, and forming a meta-feature matrix by the multi-channel probability vector and the statistics.
5. The method for predicting low muscle mass risk of rheumatoid arthritis patients based on an uncertainty aware stacked meta-learning structure of claim 1, wherein the meta-learner uses a spectrum normalized multi-layer perceptron as a feature extractor and a random feature gaussian process head as an output layer; The random characteristic gaussian process is implemented by: the hidden representation of the top layer of the spectrum normalization multi-layer perceptron is introduced into an approximate radial basis translation invariant kernel function, and the following mapping is constructed: , Wherein, the For the M-dimensional feature vector after random feature mapping, To approximate the number of frequency vectors sampled in the radial basis shift invariant kernel function versus spectral density, For the mth frequency vector, For spectrum normalizing hidden representations of the top layer of the multi-layer perceptron, In order to uniformly sample the offset term, Representing a matrix transpose; Obtaining a predicted mean and variance of the logic domain by utilizing posterior deduction of the approximate Gaussian process of the mapped features and the linear output layer; Mapping the prediction mean value into risk probability through a Sigmoid function, and transmitting the variance approximation to a probability domain to obtain the uncertainty quantization index.
6. The method for predicting low muscle mass risk of rheumatoid arthritis patients based on the uncertainty aware stacked meta-learning structure according to claim 5, wherein the method is characterized in that the post-experimental inference of the mapped features and the linear output layer approximate gaussian process is used to obtain the predicted mean and variance of the logic domain by the following specific ways: The posterior probability is calculated as follows: , , , Wherein, the For a linear output layer weight vector, In order to train the data set, Is in the multi-element normal distribution, As the mean vector of the posterior distribution, As a covariance matrix of the posterior distribution, To train the random feature matrix of the samples, Is a regular super-parameter, which is a parameter, As a result of the super-parameters of the noise, Is a matrix of units which is a matrix of units, Is a label; The predictive mean and variance of the logic domain are calculated based on posterior probability as follows: , , Wherein, the For the predicted mean of sample i in the logic domain, For the predicted mean of the random variable z of the ith sample in the logic domain, For the predicted variance of the random variable z of the ith sample in the logic domain, Is characteristic of sample i.
7. The method for predicting low muscle mass risk of rheumatoid arthritis patients based on the uncertainty aware stacked meta-learning structure of claim 6, wherein the uncertainty quantitative indicator is calculated as follows: , Wherein, the As a quantitative indicator of the uncertainty, For the variance operator, the size of the random variable around its mean fluctuation is represented, For a point estimate of the predictive probability that the i-th sample is of a positive class, The standard deviation of the random variable z in the logic domain is the ith sample.
8. The method for predicting low muscle mass risk of a rheumatoid arthritis patient based on uncertainty aware stacked meta-learning structure of claim 7, wherein the loss function The following are provided: , , , , Wherein, the As a parameter of the weight-bearing element, For the purpose of the task layer average loss, The penalty is optimized for the worst-case group, For the ensemble of training samples of the meta-learner, The threshold alignment asymmetry penalty for the ith sample uncertainty is perceived, As a function of the hinge(s), For the sample subset corresponding to the g-th center, As an average loss in the g-th center, For the number of samples in the g-th group, As the true label of the i-th sample, For the penalty weight of the false negative, For the false positive penalty weight, As a risk threshold value, For a point estimate of the predictive probability that the i-th sample is of a positive class, Is the risk threshold A constant of the corresponding confidence level, The standard deviation is inferred for the i-th sample model parameter posterior.
9. The method for predicting low muscle mass risk of rheumatoid arthritis patients based on the uncertainty-aware stacked meta-learning structure of any one of claims 1-8, wherein data to be detected is collected, selectively predicted based on an uncertainty quantization index when the trained base learner set and meta-learner are input, and the prediction is rejected when the sample uncertainty quantization index exceeds a set threshold.
10. The method for predicting low muscle mass risk of a rheumatoid arthritis patient based on an uncertainty-aware stacked meta-learning structure of claim 9, wherein the confidence set constructed based on Mondrian conformal prediction is used as the prediction set of the data to be detected, comprising: Dividing the verification set into subsets based on center numbers, calculating the non-uniformity score of the corresponding subset for each center, and counting the 1-alpha score threshold of the non-uniformity score for each center Wherein a presets a level of significance; calculating the non-consistency scores of the candidate labels according to the center g to which the data to be detected belongs, and enabling all the satisfied non-consistency scores to be smaller than or equal to If the confidence set is empty or comprises two or more tags, selectively predicting through an uncertainty quantization index.

Description

Uncertainty perception stacked meta-learning structure-based rheumatoid arthritis patient low muscle mass risk prediction method Technical Field The invention relates to a low muscle mass risk prediction method for a rheumatoid arthritis patient based on an uncertainty perception stacked meta-learning structure, and belongs to the technical field of machine learning. Background Rheumatoid arthritis is a chronic autoimmune disease, and chronic inflammation, metabolic disorders and nutritional disorders are prevalent, with obvious adverse effects on muscle mass and function. Current studies indicate that the incidence of sarcopenia in rheumatoid arthritis patients is 24% to 61.7%, which can seriously impair physical function and quality of life of the patient, and increase the risk of falls and fractures, further increasing the disease burden of the patient. Although Rheumatoid Arthritis (RA) patients are increasingly becoming increasingly aware of sarcopenia, the disease is still overlooked in clinical practice. Currently, the diagnosis of sarcopenia generally involves an assessment of muscle mass, measured mainly by dual energy X-ray absorption (DXA) or Bioelectrical Impedance Analysis (BIA). However, these methods are highly dependent on specialized equipment, not only increasing the patient's examination burden, but also failing to diagnose the primary medical institution because they lack the necessary detection equipment. These factors objectively reduce the willingness of patients to receive sarcopenia screening and diagnosis, thereby impeding clinical cognition and popularization and application of the disease. Currently, a scoring scale or nomogram constructed based on questionnaires, body tests and conventional assay indexes is used for predicting the myopenia risk of RA patients, logistic regression or single model ML is adopted mostly, indexes such as AUC, accuracy and the like are focused, and the probability calibration and clinical net benefit are concerned only to a limited extent. The method is characterized in that a part of researches adopt machine learning methods such as random forests, gradient lifting trees, XGBoost and the like, and combine with interpretation technologies such as SHAP and the like to promote prediction performance and provide feature importance interpretation, but the problems that training targets focus on AUC/accuracy and are not directly consistent with clinical 'net benefit', probability outputs often are under-calibrated to influence risk stratification and clinical threshold decision, sample imbalance, multi-center distribution deviation and the like are solved by few systems, and part of models are deployed as online tools, but are mostly single-layer models and simple thresholds, so that explicit control on prediction uncertainty is lacked. In recent years, aiming at the problem of medical AI prediction accuracy, a plurality of uncertainty quantification and selective prediction methods are proposed by the academy, and the model is enabled to select to reject prediction when the confidence coefficient is insufficient, so that the reliability of 'giving a prediction subset' is improved. At the same time, conformal prediction (especially Mondrian conformal prediction) is used to construct a prediction set with limited sample condition coverage guarantees at different centers, ensuring that the probability of a real tag falling into the prediction set is not below a preset level when "giving the prediction". However, the work of integrating multi-model stacking, probability calibration, uncertainty perceived loss, selectivity prediction and Mondrian conformal prediction into a set of practicable RA low muscle mass risk prediction systems has not been publicly reported, especially in the data scenario of multi-center RA queue + NHANES, where there are significant distribution differences. The existing rheumatoid arthritis low muscle quality risk prediction method still has various defects in practical application, mainly comprises the steps that firstly, model training targets are mainly statistical indexes such as AUC (automatic curve index), accuracy and the like, clinical 'net benefit' under different thresholds are difficult to reflect directly, and the method has a disjoint with clinical actual decision scenes (such as high-risk crowd screening, referral or intervention starting thresholds), secondly, the existing model prediction probability generally has an under-calibration problem, the prediction value is not matched with the actual incidence, risk stratification and individuation communication based on probability by doctors are not facilitated, thirdly, in multi-center or cross-hospital application, due to the fact that the structure, detection means and disease spectrum of each center crowd are different, the model is prone to distribution deviation, the robustness of centers or special subgroups with smaller sample size is insufficient, popularization