CN-122000032-A - Auxiliary reproduction multi-node clinical decision method based on machine learning
Abstract
The invention relates to the technical field of application of artificial intelligence in auxiliary reproduction technology, and discloses an auxiliary reproduction multi-node clinical decision method based on machine learning. The method comprises the steps of collecting sample data, obtaining sample characteristics, converting and interpolating the sample characteristics, screening predicted variables of the sample characteristics by using Spearman correlation coefficients, dividing the data of a complete sample into a training set, a test set and a verification set according to the sample ratio of 8:1:1, taking the screened predicted variables as independent variables in the training set, taking accumulated live products within 2 years after single egg taking as dependent variables, respectively constructing a prediction model in different decisions of 4 stages of a controlled ovarian stimulation scheme, gonadotrophin starting amount and the like, correcting the predicted values to obtain predicted values, performing super-parameter adjustment by using the test set, and comparing live product results conforming to people which do not conform to recommendation by using the verification set. The present invention is able to provide its corresponding cumulative live yield predictions for a variety of possible scenarios, rather than recommending a single best path.
Inventors
- CHEN ZIJIANG
- ZHAO HAN
- ZHAO SHIGANG
- ZHANG HONGHUI
- YIN CHANGJIAN
Assignees
- 山东大学
Dates
- Publication Date
- 20260508
- Application Date
- 20260128
Claims (6)
- 1. The assisted reproductive multinode clinical decision method based on machine learning is characterized by comprising the following steps of: Step 1, collecting sample data to obtain sample characteristics; step 2, converting and interpolating the sample characteristics in the step 1; Step 3, screening candidate prediction variables by using the Spearman correlation coefficient of the complete sample formed in the step 2, namely selecting an index with the absolute value of the Spearman correlation coefficient of the accumulated live products being more than 0.05 and the significance value being less than 0.05, and incorporating the index into the prediction model in the step 4 as required together with a controlled ovarian stimulation scheme, gonadotrophin starting amount, human chorionic gonadotrophin dosage and 3 rd day fresh embryo transplantation or not; The data of the complete sample is divided into a training set, a test set and a verification set according to the sample proportion of 8:1:1, the training set sample formed in the step 2, the step 3 and the step 4 is used for constructing a cumulative live yield prediction model based on a generalized additive model, the processed data is used as an independent variable, the cumulative live yield within 2 years after single egg taking is used as a dependent variable for constructing the model, the test set is used for model super-parameter K value, namely the maximum degree of freedom of the independent variable, and when the K value is 6, the model prediction accuracy reaches the maximum value; Step 5, performing Platt correction on the predicted value output in the step 4 to improve the reliability and the interpretability of the model output probability; Step 6, calculating a corrected predicted value in a verification set by using the generalized additive model and the Platt correction model constructed in the step 4 and the step 5, and evaluating the accuracy of predicting the accumulated live yield; Step 7, based on the stage of measure decision, selecting from the variables in step 3, respectively predicting the accumulated live yield in different measures, constructing a prediction model, constructing a model for multiple times, which is a multiple decision process for gradually obtaining clinical information in clinical practice, such as further deciding the gonadotropin starting amount after determining a scheme, deciding the chorionic gonadotropin dosage of a human being after obtaining the estradiol level and the follicular number with the diameter reaching 1.4cm on a trigger day, deciding whether fresh embryo transplantation is carried out on the 3 rd day after obtaining the information such as the egg number, the egg number of the discharged 1 st polar body, the normal fertilized embryo number, the 3 rd day high quality embryo number and the like; and 8, calculating corrected predicted values for all the included people by using the models obtained based on the steps 4, 5, 6 and 7, and identifying people with poor measures, wherein when the controlled ovarian stimulation scheme, the gonadotrophin starting amount, the human chorionic gonadotrophin dosage and the 3 rd day fresh embryo transplantation are selected, if the measure with the highest expected live yield probability is 5% higher than the actual use measure, the expected live yield probability is increased by 5%, and the 95% confidence intervals of the two are not crossed, the patient is considered to have poor measures at the stage.
- 2. The machine learning-based assisted reproductive multinode clinical decision method of claim 1, wherein step 1 comprises: step 1.1, setting a sample data collection standard, wherein patients meeting the standard are taken into follow-up, and a final result is obtained, namely, accumulated live spawn of single egg taking is obtained; step 1.2, selecting sample characteristics related to pregnancy ending in sample data according to preset standards, wherein Spearman analysis proves that the characteristics are related to accumulated live birth, namely the correlation coefficient exceeds 0.05, and the statistical significance p value is smaller than 0.05.
- 3. The machine learning-based assisted reproductive multinode clinical decision method of claim 1, wherein in step 2, the sample feature specific transformation and interpolation process is as follows: Step 2.1, converting and encoding sample characteristics, namely carrying out standardization and normalization treatment on acquired female age, anti-mullerian hormone, dou Luan bubble count and basic follicle stimulating hormone data, wherein the HCG day is more than or equal to 1.4cm follicle count, egg count acquisition and normal fertilized embryo count data so as to enable the acquired female age, anti-mullerian hormone, dou Luan bubble count and basic follicle stimulating hormone data to meet the input requirement of a prediction model; And 2.2, if part of the sample features have missing values, filling the features with the missing values in the sample features by using a multiple interpolation method aiming at the missing values of different features, wherein the missing proportion of all the features is less than 5%, five complete data sets are generated by multiple interpolation, each missing value is interpolated by using a prediction mean value matching method and other methods so as to maintain the relation between the distribution of original data and variables, one complete data set closest to the statistical features of the original samples is selected to enter the subsequent step, and if the missing values do not exist, filling is not needed, and finally the complete sample without the missing values is formed and enters the step 3.
- 4. The machine learning-based assisted reproductive multinode clinical decision method of claim 1, wherein said step 7 specifically comprises: Step 7.1, selecting a controlled ovarian stimulation scheme, namely, female age, anti-mullerian hormone, dou Luan bubble count, basic follicle stimulating hormone, body mass index and primary infertility or not, respectively training and verifying a generalized additive model and a logistic regression model in a crowd subjected to controlled ovarian stimulation by selecting a long scheme, a short scheme, an antagonist scheme and an overlength scheme; Step 7.2, selecting gonadotrophin activation amount, namely, female age, anti-mullerian hormone, dou Luan bubble count, basic follicle stimulating hormone, body mass index, primary infertility or non-primary infertility and a controlled ovarian stimulation scheme, wherein the model is respectively constructed and verified in a crowd activated by 75-150 IU, 175-200 IU, 225-250 IU, 275-300 IU and 325-450 IU Gn; Step 7.3, selecting the dosage of human chorionic gonadotrophin, namely, female age, anti-mullerian hormone, dou Luan bubble count, basic follicle stimulating hormone, body mass index, primary infertility or not, a controlled ovarian stimulation scheme, gonadotrophin starting amount and total amount, the estradiol level on a trigger day and the number of follicles with diameter reaching 1.4cm, wherein the model is respectively constructed and verified in a crowd using the human chorionic gonadotrophin of <6000IU, 8000IU and 10000-14000 IU; Step 7.4, selection of fresh embryo transplantation on day 3, namely female age, anti-mullerian hormone, dou Luan bubble count, basic follicle stimulating hormone, body mass index, controlled ovarian stimulation scheme, gonadotrophin activation amount and total amount, estradiol level on trigger day and number of follicles with diameter reaching 1.4cm, number of obtained eggs, number of eggs discharged from the 1 st pole body, number of normal fertilized embryos and number of high-quality embryos on day 3, and respectively constructing and verifying the model in fresh embryo transplantation groups on or without day 3.
- 5. The machine learning-based assisted reproductive multinode clinical decision method of claim 1, wherein in step 4, the generalized additive model is formulated as: providing a predictive value of cumulative live production for each patient; prior to selection of the controlled ovarian stimulation regimen, the cumulative live production was predicted by age, anti-mullerian hormone, dou Luan bubble count, basal follicle stimulating hormone, body mass index, as follows: g(y)=[f 1 (x 1 )+ℇ 1 ]+[f 2 (x 2 )+ℇ 2 ]+[f 3 (x 3 )+ℇ 3 ]+[f 4 (x 4 )+ℇ 4 ]+[f 5 (x 5 )+ℇ 5 ]; Wherein y is accumulated live yield, x 1 ,x 2 ,x 3 ,x 4 ,x 5 is age, anti-mullerian hormone, dou Luan bubble count, basic follicle stimulating hormone and body mass index ,f 1 (x 1 )+ℇ 1 ,f 2 (x 2 )+ℇ 2 ,f 3 (x 3 )+ℇ 3 ,f 4 (x 4 )+ℇ 4 ,f 5 (X 5 )+ℇ 5 are smoothing functions and error items used for fitting 5 parameters respectively after normalization and standardization treatment, automatic selection is carried out for a prediction model according to limited maximum likelihood estimation so as to accurately reflect real change of data, a testing set is used for adjusting the super-parameters, and a verification set is used for evaluating model accuracy.
- 6. The machine learning-based assisted reproductive multinode clinical decision method of claim 1, wherein the step 5 is specifically configured to construct a logistic regression model using the predicted values s i and the cumulative live production or not of each sample in the test set, and the specific formula is: Wherein s i is the original predicted value of the cumulative live yield prediction model, P i is the corrected predicted value, A is the slope, B is the bias, and the linear fit of the cumulative live yield prediction probability and the actual cumulative live yield from the test set is derived from the probability index y i = P i is the Sigmoid of probability index ) I.e., to a probability between 0 and 1, to accurately reflect the likelihood of live birth.
Description
Auxiliary reproduction multi-node clinical decision method based on machine learning Technical Field The invention relates to the technical field of application of artificial intelligence in auxiliary reproduction technology, in particular to an auxiliary reproduction multi-node clinical decision method based on machine learning. Background The incidence of infertility is increasing year by year, and is becoming a public health problem that cannot be ignored. Assisted reproductive technology, particularly in vitro fertilization-embryo transfer (IVF-ET) technology, has received attention in recent years as a key measure for the treatment of infertility. With the development of embryo vitrification freezing-recovery technology, the accumulated live yield obtained by embryo transplantation for multiple times after single ovum taking is increased year by year, and the obtained clinical acceptance is increased by years, wherein in recent years, a plurality of high-level random control clinical tests are carried out, and the accumulated live yield is taken as a main ending index. IVF-ET is a sequential treatment highly dependent on clinical fine decisions, all patients need to go through in turn 1. Controlled ovarian stimulation protocol selection, i.e. from long, short, antagonists etc., 2. Gonadotrophin initiation amount selection, common range includes 75-150 IU, 175-200 IU etc., 3. Human chorionic gonadotrophin dosage selection, common range includes <6000 IU, 6000 IU etc., 4. Selection of fresh embryo transplantation or not. The decision of any stage is based on the premise that the stage of the patient is definitely located, for example, the dosage of human chorionic gonadotrophin is selected, the decision is required to be carried out after definitely controlling the ovarian stimulation scheme and the gonadotrophin activation amount, and the follicular development caused by the decision is also considered. Also because of this timing, the selection of each stage can only be done in sequence, and it is not necessary to coordinate the selection before all decisions. Clinical guidelines give guidance on the selection of controlled ovarian stimulation regimens, etc., but the complexity of patient condition and response to the pregnancy-assisting process often requires the clinician to formulate personalized pregnancy-assisting measures for the patient. This results in success rates for the same patient to assist in pregnancy, which can be quite different due to differences in the experience of the formulator. The development of artificial intelligence provides support for inexperienced personalized IVF-ET, and the development of assisted reproduction assistance systems based on big data is a necessary choice for facing infertility problems. The application provides an assisted reproduction multi-node clinical decision method based on machine learning, which aims to realize personalized pregnancy assistance based on artificial intelligence. Disclosure of Invention The invention aims to provide an assisted reproductive multinode clinical decision method based on machine learning, which aims to solve the problems in the prior art proposed in the background art. In order to achieve the aim, the invention provides the following technical scheme that a multi-node clinical decision method for assisted reproduction based on machine learning, The method comprises the following steps: Step 1, collecting sample data to obtain sample characteristics; Step 1.1, setting a sample data collection standard, wherein patients meeting the standard are taken into visit, the patients are infertility due to various reasons such as oviduct blockage or adhesion in 1 month 2015 to 2023 months, and are subjected to in vitro fertilization-embryo transplantation by a plurality of national or provincial reproductive centers such as auxiliary reproductive hospitals in Shandong university, the residence of the women is distributed to various places in China, and after the chromosome abnormality is eliminated, 73413 patients are taken into visit, the final result is obtained after 2 years of visit, the accumulated live birth of single egg taking is obtained, the age, body weight index, ovarian reserve index, the response of ovaries to drug stimulation, egg and embryo laboratory index, embryo transplantation and live birth information of the patients are synchronously recorded in an electronic medical record system during diagnosis, after the patient's informed consent is achieved, personal identification information such as medical record number of the erased patients is achieved by using the intranet system of the hospitals, the time range and the application are strictly registered and prepared after the chromosome abnormality is eliminated, the application in the outside network environment is not allowed, the relevant data is fully provided by the medical record system of the medical record system after the patient's informed consent, the