Search

CN-122017106-A - Cancer prediction system and method based on metabonomics mass spectrum data parameterization

CN122017106ACN 122017106 ACN122017106 ACN 122017106ACN-122017106-A

Abstract

The invention further provides a cancer prediction system based on metabonomics mass spectrum data parameterization, which comprises a sample acquisition module for acquiring a detection sample, a mass spectrum detection module for performing mass spectrum scanning, a mass spectrum data processing module and a data classification module, wherein the mass spectrum data processing module is used for carrying out signal selection addition on mass spectrum signals, fitting and classifying the added mass spectrum signals to obtain metabolite passage characteristics of the detection sample, and the data classification module adopts a machine learning algorithm to construct a prediction model according to the metabolite passage characteristics to obtain a final prediction result. The prediction system can improve the prediction accuracy and solve the problem of low accuracy of predicting cancers by using serum metabolome.

Inventors

  • XU TENGFEI
  • WANG YICHEN
  • LIU XUESONG
  • CHEN YONG
  • QIU JUNJIE
  • WANG ZICHUAN

Assignees

  • 浙江大学

Dates

Publication Date
20260512
Application Date
20260408

Claims (9)

  1. 1. The cancer prediction system based on metabonomics mass spectrum data parameterization is characterized by comprising a sample acquisition module, a mass spectrum detection module, a mass spectrum data processing module and a data classification module; The sample acquisition module is used for acquiring a detection sample; The mass spectrum detection module is used for carrying out mass spectrum scanning on a detection sample to obtain a mass spectrum signal; the mass spectrum data processing module is used for carrying out signal selection addition on the mass spectrum signals, and the mass spectrum signals after addition are fitted and classified to obtain metabolite passage characteristics of the detection sample, and specifically comprises the following steps: s1-1, calculating mass-to-charge ratio m/z of all metabolite addition ions of the biological pathway; S1-2, adding mass spectrum signals obtained by a mass spectrum detection module by taking a mass-to-charge ratio m/z as a center and taking 0-50ppm as a radius to obtain added mass spectrum signals; s1-3, performing radial basis function fitting on the added mass spectrum signals by using an improved maximum expected algorithm to obtain fitted metabolite characteristic parameters; s1-4, classifying the fitted metabolite characteristic parameters according to the biological pathway to obtain metabolite pathway characteristics; And the data classification module adopts a machine learning algorithm to construct a prediction model according to the metabolite pathway characteristics so as to obtain a final prediction result.
  2. 2. The metabonomic mass spectrometry data parameterized cancer prediction system of claim 1, wherein the test sample comprises one or more of a blood sample, a tissue sample, or a urine sample.
  3. 3. The system for predicting cancer based on metabonomics mass spectrometry data of claim 1, wherein the sample comprises a tumor sample comprising one or more of colon cancer, cervical cancer, ovarian cancer, prostate cancer, renal cell carcinoma, benign tumor, and clear renal cell carcinoma.
  4. 4. The metabonomics mass spectrometry data parameterization-based cancer prediction system of claim 1, wherein the mass spectrometry data processing module normalizes retention times of mass spectrometry signals to 0-30 minutes before signal selective summing of the mass spectrometry signals.
  5. 5. The system for predicting cancer based on metabonomics mass spectrometry data parameterization of claim 1, wherein fitting the radial basis function to the summed mass spectrum signals with a modified maximum expected algorithm yields fitted metabolite profile, comprising the steps of: s2-1, sampling a core parameter c of a radial basis function of each metabolite mass spectrum signal according to the added mass spectrum signal intensity; s2-2, initializing width parameters of radial basis functions; S2-3, acquiring weight parameters of the radial basis function by adopting multiple linear regression, and optimizing the initialized width parameters in the S2-2 by adopting a gradient descent method to limit the width parameters between (5 and 60); Repeating the steps S2-2 and S2-3 for 1-5 times to obtain the width parameters of the radial basis function of each metabolite spectrum signal, and fitting the mass spectrum signal of each metabolite according to the weight parameters obtained in the step S2-3 to obtain the fitted metabolite characteristic parameters.
  6. 6. The system for predicting cancer based on metabonomics mass spectrometry data parameterization of claim 1, wherein the mass spectrometry data processing module calculates the ppm of the metabolite at a mass-to-charge ratio m/z greater than 400 by 1ppm = m/z/1000000 for metabolites with mass-to-charge ratio m/z greater than 400 and 1ppm = 400/1000000 for metabolites with mass-to-charge ratio less than or equal to 400.
  7. 7. The metabonomics mass spectrometry data parameterization-based cancer prediction system of claim 1, wherein the machine learning algorithm employed by the data classification module comprises any one or more of constructing extreme gradient lifting, support vector machine, random forest, lightGBM models by Python's sklearn package, or any one or more of constructing graph attention network, graph convolution, transducer models by Python's pytorch package.
  8. 8. A method of cancer prediction based on metabonomics mass spectrometry data parameterization, comprising the steps of: step 1, aligning mass spectrum scanning time to a reference, and performing mass spectrum scanning on a detection sample to obtain a mass spectrum signal; Step 2, calculating mass-to-charge ratio m/z of all metabolite addition ions of the biological pathway; step 3, taking the mass-to-charge ratio m/z in the step 2 as a center and taking 0-50ppm as a radius to sum the mass spectrum signals obtained in the step 1, so as to obtain the summed mass spectrum signals; step 4, performing radial basis function fitting on the added mass spectrum signals by using an improved maximum expected algorithm to obtain fitted metabolite characteristic parameters; Step 5, classifying the fitted metabolite characteristic parameters according to the biological pathway to obtain the pathway characteristics of the metabolite; And 6, constructing a prediction model by adopting a machine learning algorithm according to the path characteristics of the metabolites to obtain a final prediction result.
  9. 9. The method of claim 8, wherein the mass spectrometry signal in step 1 is derived from mass spectrometry data of one or more of the datasets of MTBLS6039, MTBLS3444, MTBLS1122, MTBLS3838, ST001705, ST002521, MTBLS 4294.

Description

Cancer prediction system and method based on metabonomics mass spectrum data parameterization Technical Field The invention relates to the technical field of medical model construction, in particular to a cancer prediction system and method based on metabonomics mass spectrum data parameterization. Background Malignant tumors are common medical diseases, existing imaging examinations (CT/MRI) are not sufficiently sensitive (< 50%) to sub-cm-level lesions, tissue biopsies are invasive and dynamic monitoring is difficult to achieve. The development of highly sensitive, non-invasive early diagnosis techniques has become a central need for accurate oncology. CN118604104a discloses a mass spectrometry system for endometrial cancer detection based on multi-module machine learning. Obtaining metabolism and polypeptide molecular fingerprint information of a detection sample by utilizing a liquid chromatography tandem mass spectrometer (LC-MS), and finding out characteristic information related to cancer by combining a plurality of machine learning methods; constructing a classifier for identification of endometrial cancer based on the characteristic information and a plurality of machine learning methods; finally, integrating the judgment results of the plurality of independent models to obtain a final identification result; the invention breaks through the traditional thought of being limited to a single biomarker or a single sample or a single histology module, and effectively improves the diagnosis performance of endometrial cancer, colorectal cancer, lung cancer and other cancer types. CN118280558a discloses a method and system for diagnosing and analyzing cancer based on mass spectrum data, so as to solve the problem that when using black box model as diagnosis system, doctor can hardly understand why to cause model to predict specific result, specifically, reading mass spectrum scanning file in the same catalog folder to obtain mass spectrum data through file preprocessing algorithm, and then organizing parameterized mass spectrum data into csv format for use; the method comprises the steps of carrying out mass spectrum data pre-training on an extreme gradient lifting algorithm by using a Bayesian optimization algorithm, optimizing and selecting optimal parameters according to a plurality of pre-training results, adjusting the extreme gradient lifting algorithm parameters according to the reserved optimal parameters, carrying out diagnosis classification by using a pre-trained model through mass spectrum data of healthy and cancer patients, giving diagnosis results, analyzing the diagnosis process of the model by using a SHAP algorithm, giving features in the mass spectrum data according to the model, and outputting thirty features with larger influence on the diagnosis results. The metabonomics directly reflects the pathophysiological state of the organism by systematically analyzing small molecular metabolites less than or equal to 1500 Da in biological fluid (serum/urine). Compared with the genome/proteome technology, the method has the advantages of real-time dynamic monitoring, easy acquisition of samples, controllable cost and the like. However, three technical bottlenecks still exist in the field, namely, marker screening difficulty, over ten thousand types of body fluid metabolites, concentration crossing 9 orders of magnitude, difficulty in distinguishing disease-specific fluctuation and individual physiological variation by traditional univariate analysis, model generalization defect, general drop of Area Under Curve (AUC) of the existing machine learning model (such as a Support Vector Machine (SVM) and random forest) in cross-center verification is 0.15-0.2, and the model is caused by insufficient batch effect correction and nonlinear metabolic interaction modeling deficiency, clinical transformation barriers, most researches are limited to single cancer types, lack of multi-cancer type differential diagnosis capability, and no quantitative correlation model with TNM stage (tumor node metastasis classification) is established. Disclosure of Invention Aiming at systematic errors introduced by the fact that the traditional metabonomics requires metabolite annotation, the invention provides a cancer diagnosis system based on metabonomics mass spectrum data parameterization, and the prediction system can improve the prediction accuracy and solve the problem of low accuracy of predicting cancers by using serum metabonomics. In order to achieve the above purpose, the invention adopts the following technical scheme: a cancer prediction system based on metabonomics mass spectrum data parameterization comprises a sample acquisition module, a mass spectrum detection module, a mass spectrum data processing module and a data classification module; The sample acquisition module is used for acquiring a detection sample; The mass spectrum detection module is used for carrying out mass spectrum scanning on a dete