CN-122000069-A - Lung cancer risk prediction method, equipment and storage medium

CN122000069ACN 122000069 ACN122000069 ACN 122000069ACN-122000069-A

Abstract

The invention relates to the field of medical intelligent analysis, and discloses a lung cancer risk prediction method, equipment and a storage medium. The method comprises the steps of carrying out combination processing on clinical features, GRIK2 methylation values, HOXA9 methylation values, PTGER4 methylation values, SHOX2 methylation values and PITX2 methylation values to generate feature vectors, carrying out classification processing on the feature vectors based on a preset machine learning algorithm to obtain classified risk values, generating lung cancer high risk prediction results when the classified risk values are larger than preset high risk threshold values, generating lung cancer low risk prediction results when the classified risk values are smaller than preset low risk threshold values, and generating lung cancer medium risk prediction results when the classified risk values are not larger than preset high risk threshold values and not smaller than preset low risk threshold values. In the embodiment of the invention, the consistency of the lung cancer risk prediction is realized, and the accuracy of the lung cancer risk prediction is improved.

Inventors

Bai Zongke
ZHANG ZHAO
HU LIFU

Assignees

深圳泽医细胞治疗集团有限公司

Dates

Publication Date: 20260508
Application Date: 20260206

Claims (10)

1. A method for predicting lung cancer risk, comprising the steps of: Receiving clinical characteristics of a target user, and acquiring a GRIK2 methylation value, a HOXA9 methylation value, a PTGER4 methylation value, a SHOX2 methylation value and a PITX2 methylation value of the cfDNA of the plasma of the target user; Combining the clinical features, the GRIK2 methylation value, the HOXA9 methylation value, the PTGER4 methylation value, the SHOX2 methylation value and the PITX2 methylation value to generate a feature vector; Classifying the feature vectors based on a preset machine learning algorithm to obtain classified risk values; when the classified risk value is larger than a preset high risk threshold value, generating a lung cancer high risk prediction result corresponding to the target user; when the classified risk value is smaller than a preset low risk threshold value, generating a lung cancer low risk prediction result corresponding to the target user; And when the classified risk value is not larger than a preset high risk threshold value and not smaller than a preset low risk threshold value, generating a risk prediction result in lung cancer corresponding to the target user.
2. The method according to claim 1, further comprising, before the step of classifying the feature vectors based on a preset machine learning algorithm to obtain classified risk values: Combining the clinical features, GRIK2 methylation values, HOXA9 methylation values, PTGER4 methylation values, SHOX2 methylation values and PITX2 methylation values of the N training samples to generate N training feature vectors, wherein N is a positive integer; performing result labeling processing on the N training feature vectors to obtain labeling results corresponding to the N training feature vectors; Based on a preset machine learning algorithm, respectively classifying N training feature vectors to generate N predicted values; Performing result judgment processing on the N predicted values according to a preset first super parameter and a preset second super parameter to obtain N predicted results, wherein the first super parameter comprises a parameter of a high risk threshold value, and the second super parameter comprises a parameter of a low risk threshold value; Based on the labeling results corresponding to the N training feature vectors, calculating the accuracy of the N prediction results to obtain the test accuracy; and adjusting parameters of the machine learning algorithm, the first super-parameters and the second super-parameters based on the test accuracy rate, and performing iterative training until the test accuracy rate converges.
3. The method for predicting risk of lung cancer according to claim 2, wherein the step of classifying the N training feature vectors based on a preset machine learning algorithm to generate N predicted values includes: performing logistic regression classification processing on the N training feature vectors based on a preset logistic regression algorithm to obtain N first predicted values; based on a preset support vector machine, performing kernel function classification processing on the N training feature vectors to obtain N second predicted values; Based on a preset random forest algorithm, carrying out decision classification processing on the N training feature vectors to obtain N third predicted values; And carrying out regression mapping classification processing on the N training feature vectors based on a preset XGBoost algorithm to obtain N fourth predicted values.
4. The method for predicting lung cancer risk according to claim 3, wherein the step of performing result determination processing on the N predicted values according to the preset first hyper-parameter and the preset second hyper-parameter to obtain N predicted results includes: And carrying out result division processing on the N first predicted values, the N second predicted values, the N third predicted values and the N fourth predicted values based on the preset first super-parameters and the preset second super-parameters to obtain N first predicted results, N second predicted results, N third predicted results and N fourth predicted results.
5. The lung cancer risk prediction method according to claim 4, wherein the calculating the accuracy of the N prediction results based on the labeling results corresponding to the N training feature vectors, to obtain a test accuracy includes: And respectively calculating the accuracy rates of the N first prediction results, the N second prediction results, the N third prediction results and the N fourth prediction results according to the labeling results corresponding to the N training feature vectors to obtain a first accuracy rate, a second accuracy rate, a third accuracy rate and a fourth accuracy rate.
6. The method of claim 5, wherein adjusting parameters of the machine learning algorithm, the first hyper-parameter, the second hyper-parameter based on the test accuracy comprises: Based on the first accuracy, adjusting parameters of the logistic regression algorithm, the first super-parameters and the second super-parameters for iterative training to obtain a first convergence accuracy; Based on the second accuracy, adjusting parameters of the support vector machine, the first super parameters and the second super parameters for iterative training to obtain a second convergence accuracy; Based on the third accuracy, adjusting parameters of the random forest algorithm, the first super parameters and the second super parameters for iterative training to obtain a third convergence accuracy; Based on the fourth accuracy, adjusting parameters of the XGBoost algorithm, the first super-parameters and the second super-parameters for iterative training to obtain a fourth convergence accuracy; and determining the type of the machine learning algorithm based on the maximum values of the first convergence accuracy, the second convergence accuracy, the third convergence accuracy and the fourth convergence accuracy.
7. The method of claim 1, wherein the combining the clinical features, the grink 2 methylation value, the HOXA9 methylation value, the PTGER4 methylation value, the SHOX2 methylation value, the PITX2 methylation value, and generating a feature vector comprises: Performing missing value filling and normalization processing on the clinical characteristics, the GRIK2 methylation value, the HOXA9 methylation value, the PTGER4 methylation value, the SHOX2 methylation value and the PITX2 methylation value to obtain a preprocessed clinical characteristic, a preprocessed GRIK2 methylation value, a preprocessed HOXA9 methylation value, a preprocessed PTGER4 methylation value, a preprocessed SHOX2 methylation value and a preprocessed PITX2 methylation value; Filling the pretreated clinical features, the pretreated GRIK2 methylation value, the pretreated HOXA9 methylation value, the pretreated PTGER4 methylation value, the pretreated SHOX2 methylation value and the pretreated PITX2 methylation value into a preset feature frame one by one to generate feature vectors.
8. The method of claim 1, wherein the clinical profile comprises age, gender, smoking Shi Leibie, family history of cancer.
9. A lung cancer risk prediction device, comprising a memory and at least one processor, wherein the memory stores instructions, and the memory and the at least one processor are interconnected by a line; the at least one processor invokes the instructions in the memory to cause the lung cancer risk prediction device to perform the lung cancer risk prediction method of any of claims 1-8.
10. A computer readable storage medium having a computer program stored thereon, wherein the computer program when executed by a processor implements the lung cancer risk prediction method according to any of claims 1-8.

Description

Lung cancer risk prediction method, equipment and storage medium Technical Field The invention relates to the field of medical intelligent analysis, in particular to a lung cancer risk prediction method, lung cancer risk prediction equipment and a storage medium. Background The epidemiological status of lung cancer is that the incidence rate and death rate of lung cancer are high, so that the early discovery and early treatment of lung cancer have extremely important clinical significance. The clinical situation and pain point of lung cancer are that the existing screening technology is low-dose CT, namely LDCT, and the lung nodule discovery rate is increased rapidly along with the popularization of the screening technology, so that a wider early intervention effect of lung cancer is realized. However, the problem of the existing lung cancer screening technology is that the false positive rate of LDCT is high, which is determined by the limitation of the technology, and LDCT has certain radiation, poor crowd compliance, high equipment input cost and low clinical accessibility. Because the interpretation of LDCT lacks unified standard, relies on experience and subjective judgment of imaging doctors, and the inconsistency rate among observers is high. Other clinical risk assessment models (e.g., the Brock, mayo models) are based primarily on demographic and imaging fundamental features, resulting in inadequate predictive performance due to non-integrated molecular level information. Therefore, aiming at the technical problems of insufficient consistency and prediction accuracy of the current lung cancer screening scheme, a new technical scheme is needed to solve the current problems. Disclosure of Invention The invention mainly aims to solve the technical problems of insufficient consistency and prediction accuracy of the current lung cancer screening scheme. The first aspect of the present invention provides a lung cancer risk prediction method, comprising the steps of: Receiving clinical characteristics of a target user, and acquiring a GRIK2 methylation value, a HOXA9 methylation value, a PTGER4 methylation value, a SHOX2 methylation value and a PITX2 methylation value of the cfDNA of the plasma of the target user; Combining the clinical features, the GRIK2 methylation value, the HOXA9 methylation value, the PTGER4 methylation value, the SHOX2 methylation value and the PITX2 methylation value to generate a feature vector; Classifying the feature vectors based on a preset machine learning algorithm to obtain classified risk values; when the classified risk value is larger than a preset high risk threshold value, generating a lung cancer high risk prediction result corresponding to the target user; when the classified risk value is smaller than a preset low risk threshold value, generating a lung cancer low risk prediction result corresponding to the target user; And when the classified risk value is not larger than a preset high risk threshold value and not smaller than a preset low risk threshold value, generating a risk prediction result in lung cancer corresponding to the target user. Optionally, in a first implementation manner of the first aspect of the present invention, before the step of performing a classification process on the feature vector based on a preset machine learning algorithm to obtain a classification risk value, the method further includes: Combining the clinical features, GRIK2 methylation values, HOXA9 methylation values, PTGER4 methylation values, SHOX2 methylation values and PITX2 methylation values of the N training samples to generate N training feature vectors, wherein N is a positive integer; performing result labeling processing on the N training feature vectors to obtain labeling results corresponding to the N training feature vectors; Based on a preset machine learning algorithm, respectively classifying N training feature vectors to generate N predicted values; Performing result judgment processing on the N predicted values according to a preset first super parameter and a preset second super parameter to obtain N predicted results, wherein the first super parameter comprises a parameter of a high risk threshold value, and the second super parameter comprises a parameter of a low risk threshold value; Based on the labeling results corresponding to the N training feature vectors, calculating the accuracy of the N prediction results to obtain the test accuracy; and adjusting parameters of the machine learning algorithm, the first super-parameters and the second super-parameters based on the test accuracy rate, and performing iterative training until the test accuracy rate converges. Optionally, in a second implementation manner of the first aspect of the present invention, the step of classifying the N training feature vectors based on a preset machine learning algorithm, and generating N predicted values includes: performing logistic regression classification processing on