Search

CN-121215309-B - Colorectal cancer drug sensitivity prediction method and system

CN121215309BCN 121215309 BCN121215309 BCN 121215309BCN-121215309-B

Abstract

The invention relates to a colorectal cancer drug sensitivity prediction method and system, wherein the method comprises the steps of carrying out full transcriptome sequencing on tumor tissue samples of colorectal cancer patients to obtain gene expression original data; the method comprises the steps of carrying out mapping analysis on gene expression original data to obtain gene expression quantity data, carrying out classification marking on drug sensitivity according to primary tumor cell drug sensitivity detection results to form a drug sensitivity classification data set, constructing a drug sensitivity prediction model through a deep learning model according to the gene expression quantity data and the drug sensitivity classification data set, and predicting a drug to be predicted through the drug sensitivity prediction model to obtain a drug sensitivity prediction result. According to the colorectal cancer gene expression data and the drug sensitivity detection data of the corresponding sample, the deep learning model is utilized to predict the drug sensitivity of colorectal cancer drugs, and the prediction accuracy and the prediction efficiency are high.

Inventors

  • XIA YUCHAO
  • LI SHAOLU
  • SUN CHANGHONG

Assignees

  • 北京基石生命科技有限公司

Dates

Publication Date
20260508
Application Date
20251128

Claims (6)

  1. 1. A method for predicting drug susceptibility to colorectal cancer, comprising: s1, performing full transcriptome sequencing on tumor tissue samples of colorectal cancer patients to obtain gene expression original data; S2, mapping analysis is carried out on the gene expression original data to obtain gene expression quantity data; S3, classifying and marking the drug sensitivity according to the drug sensitivity detection result of the primary tumor cells to form a drug sensitivity classification data set; the primary tumor cells are micro-tumor PTC, and are obtained by a suspension three-dimensional culture method without a bracket; The drug sensitivity classification data set specifically comprises patient number, drug number and drug sensitivity response classification information; The drug response classification information specifically comprises labeling drug responses of which the number of the killer tumor cells is more than or equal to 30% as sensitive and is represented by 0, labeling drug responses of which the number of the killer tumor cells is less than 30% as drug resistance and is represented by 1; s4, constructing a drug sensitivity prediction model through a deep learning model according to the gene expression quantity data and the drug sensitivity classification data set; The step S4 further comprises the steps of S41, converting a pharmaceutical chemical molecular formula into graph data to obtain pharmaceutical structure characterization, S42, pre-training by using a graph self-coding mechanism based on TCGA gene expression data and a gene interaction network to obtain gene expression characteristic characterization, S43, inputting the gene expression quantity data, the drug sensitivity classification data set, the pharmaceutical structure characterization and the gene expression characteristic characterization into a transducer network to train, and constructing a drug sensitivity prediction model; S5, predicting the drug to be predicted through the drug sensitivity prediction model to obtain a drug sensitivity prediction result.
  2. 2. The method according to claim 1, wherein in step S1, full transcriptome sequencing is performed by NovaSeqXPlus platform, and the raw data of gene expression is collected to be greater than or equal to 20G for each sample.
  3. 3. The method for predicting drug susceptibility to colorectal cancer according to claim 1, wherein step S2 further comprises: S21, quality control is carried out on the gene expression original data to obtain quality control data; s22, mapping the quality control data to a reference genome GRCh38 to obtain a gene mapping file; S23, calculating the gene expression quantity according to the gene mapping file to obtain gene expression quantity data expressed by TPM values.
  4. 4. The method for predicting drug sensitivity to colorectal cancer according to claim 1, wherein step S5 further comprises: S51, obtaining gene expression data to be detected of a tumor tissue sample of a colorectal cancer patient to be detected; S52, processing the gene expression data to be detected to obtain gene expression quantity data to be detected, which are expressed by TPM values; S53, inputting the gene expression quantity data to be detected and the structural characterization of the corresponding medicine to be predicted into the medicine sensitivity prediction model, and outputting a medicine sensitivity prediction result through the medicine sensitivity prediction model.
  5. 5. The method according to claim 4, wherein the drug to be predicted in step S5 comprises: Fluorouracil, oxaliplatin epirubicin irinotecan, paclitaxel.
  6. 6. A colorectal cancer drug susceptibility prediction system, comprising: the sequencing module is configured as a sequencing platform and is used for performing full transcriptome sequencing on tumor tissue samples of colorectal cancer patients to obtain gene expression original data; the mapping module is used for carrying out mapping analysis on the gene expression original data to obtain gene expression quantity data; the marking module is used for classifying and marking the drug sensitivity according to the drug sensitivity detection result of the primary tumor cells to form a drug sensitivity classification data set; the primary tumor cells are micro-tumor PTC, and are obtained by a suspension three-dimensional culture method without a bracket; The drug sensitivity classification data set specifically comprises patient number, drug number and drug sensitivity response classification information; The drug response classification information specifically comprises labeling drug responses of which the number of the killer tumor cells is more than or equal to 30% as sensitive and is represented by 0, labeling drug responses of which the number of the killer tumor cells is less than 30% as drug resistance and is represented by 1; the construction module is used for constructing a drug sensitivity prediction model through a deep learning model according to the gene expression quantity data and the drug sensitivity classification data set; The construction module is further used for converting the pharmaceutical chemical molecular formula into graph data to obtain pharmaceutical structural characterization, performing pre-training by using a graph self-coding mechanism based on TCGA gene expression data and a gene interaction network to obtain gene expression characteristic characterization, inputting the gene expression quantity data, the pharmaceutical sensitivity classification data set, the pharmaceutical structural characterization and the gene expression characteristic characterization into a transducer network for training, and constructing a pharmaceutical sensitivity prediction model; And the prediction module is configured as a drug sensitivity prediction model constructed by the construction module and used for predicting the drug to be predicted to obtain a drug sensitivity prediction result.

Description

Colorectal cancer drug sensitivity prediction method and system Technical Field The invention relates to the technical field of bioinformatics, in particular to a colorectal cancer drug sensitivity prediction method and system. Background Tumor drug sensitivity prediction is a very important research direction in the field of precision medicine, and can help doctors select the most appropriate drug treatment scheme for patients, thereby improving the treatment effect and reducing unnecessary side effects. In recent years, deep learning approaches have advanced in predicting the susceptibility to anticancer drugs, and current prediction approaches are capable of predicting the susceptibility of different cell lines to anticancer drugs by using CCLE (encyclopedia of cancer cell lines) data sets. However, the current methods for predicting drug sensitivity are all based on multiple sets of chemical data of tumor cell lines, which cannot well reflect the actual situation of tumors in patients, so that the prediction results are inaccurate. Disclosure of Invention The invention provides a colorectal cancer drug sensitivity prediction method and system, which are used for solving the defects of the prior art. The invention provides a colorectal cancer drug sensitivity prediction method, which comprises the following steps: s1, performing full transcriptome sequencing on tumor tissue samples of colorectal cancer patients to obtain gene expression original data; S2, mapping analysis is carried out on the gene expression original data to obtain gene expression quantity data; S3, classifying and marking the drug sensitivity according to the drug sensitivity detection result of the primary tumor cells to form a drug sensitivity classification data set; s4, constructing a drug sensitivity prediction model through a deep learning model according to the gene expression quantity data and the drug sensitivity classification data set; S5, predicting the drug to be predicted through the drug sensitivity prediction model to obtain a drug sensitivity prediction result. According to the colorectal cancer drug sensitivity prediction method provided by the invention, in the step S1, full transcriptome sequencing is realized through a NovaSeqXPlus platform, and for each sample, the acquired gene expression raw data is greater than or equal to 20G. According to the colorectal cancer drug sensitivity prediction method provided by the invention, the step S2 further comprises the following steps: S21, quality control is carried out on the gene expression original data to obtain quality control data; s22, mapping the quality control data to a reference genome GRCh38 to obtain a gene mapping file; S23, calculating the gene expression quantity according to the gene mapping file to obtain gene expression quantity data expressed by TPM values. According to the colorectal cancer drug susceptibility prediction method provided by the invention, the drug susceptibility classification data set in step S3 specifically comprises: patient number, drug number, and drug sensitivity response classification information. According to the colorectal cancer drug sensitivity prediction method provided by the invention, the drug sensitivity response classification information specifically comprises the following steps: labeling drug response of more than or equal to 30% of killer tumor cells as sensitive, and representing the drug response by 0; Drug responses that kill less than 30% of tumor cells are marked as drug resistance, denoted by 1. According to the colorectal cancer drug sensitivity prediction method provided by the invention, the primary tumor cells in the step S3 are micro-tumor PTC, and are obtained by a suspension three-dimensional culture method without a bracket. According to the colorectal cancer drug sensitivity prediction method provided by the invention, the step S4 further comprises the following steps: s41, converting the chemical molecular formula of the medicine into graph data to obtain medicine structural characterization; S42, based on TCGA gene expression data and a gene interaction network, performing pre-training by using a graph self-coding mechanism to obtain gene expression characteristic characterization; S43, inputting the gene expression quantity data, the drug sensitivity classification data set, the drug structure characterization and the gene expression characteristic characterization into a transducer network for training, and constructing a drug sensitivity prediction model. According to the colorectal cancer drug sensitivity prediction method provided by the invention, the step S5 further comprises the following steps: S51, obtaining gene expression data to be detected of a tumor tissue sample of a colorectal cancer patient to be detected; S52, processing the gene expression data to be detected to obtain gene expression quantity data to be detected, which are expressed by TPM values; S53, inputting the gene expres