CN-121986175-A - Methods and compositions for analyzing free biomarkers
Abstract
In various aspects, the disclosure provides methods for detecting various cancer types, including measuring the level of a target molecule in a sample. In some embodiments, the one or more target molecules comprise free DNA from a plurality of different target genomic regions that are differentially methylated in at least one of the plurality of cancers, and a plurality of different polypeptides that are differentially expressed in at least one of the plurality of cancer types. Methods for training a classifier for detecting target molecules from cancer are also provided.
Inventors
- David H. Burkhardt
- Ma Xiu.h.lasen
- rose. E. Muntz Bella
- Samuel D. Mesirov
- Kennon Su
- Christopher James. A. V. Yakim
Assignees
- 格里尔公司
Dates
- Publication Date
- 20260505
- Application Date
- 20240823
- Priority Date
- 20240202
Claims (20)
- 1. A method of detecting cancer in a subject, comprising: (a) Measuring the level of first target molecules from a first sample of the subject, wherein the first target molecules comprise free DNA from a plurality of different target genomic regions that are differentially methylated in at least one of a plurality of cancer types; (b) Measuring the level of second target molecules of a second sample from the subject, wherein the second target molecules comprise a plurality of different polypeptides that are differentially expressed in at least one of the plurality of cancer types; (c) Applying a trained classifier to the measured levels of the first and second target molecules to assign an overall probability score for the cancer, wherein applying the trained classifier comprises (i) applying a first trained model to the measured levels of the first target molecules to assign a first probability score for the cancer, (ii) applying a second trained model to the measured levels of the second target molecules to assign a second probability score for the cancer, and (iii) summing the first probability score and the second probability score, and (D) The cancer is detected by identifying that the overall probability score is above a threshold for the presence of the cancer.
- 2. The method of claim 1, wherein the trained classifier is trained on reference samples from (1) reference subjects with known cancer and (2) reference subjects without cancer using a reference first probability score from the first trained model, a reference second probability score from the second trained model, and a reference overall probability score that summarizes the reference first probability scores and the reference second probability scores.
- 3. The method of claim 1, wherein the trained classifier assigns a global probability score to each of a plurality of different cancer types, and detecting the cancer comprises identifying the cancer type as the cancer type having the highest global probability score.
- 4. The method of claim 1, wherein aggregating the first probability score and the second probability score comprises calculating a product of the first probability score and the second probability score for the cancer.
- 5. The method of claim 1, wherein the first sample and the second sample are the same.
- 6. The method of claim 1, wherein the plurality of different target genomic regions comprises at least 1000, 5000, 10000, 20000, or 30000 target genomic regions.
- 7. The method of claim 1, wherein the plurality of target genomic regions have a total aggregate length of at least 50 kb, 100 kb, 500 kb, or 1000 kb.
- 8. The method of claim 1, wherein each of the plurality of different target genomic regions comprises at least five methylation sites.
- 9. The method of claim 1, wherein measuring the first target molecules comprises sequencing the converted cfDNA or amplified products thereof from the plurality of different target genomic regions, wherein the converted cfDNA comprises cfDNA treated with a deaminating agent.
- 10. The method of claim 9, further comprising treating the cfDNA with the deaminating agent, optionally wherein the deaminating agent is cytosine deaminase or bisulfite.
- 11. The method of claim 9, wherein the sequencing produces at least 100,000 sequencing reads.
- 12. The method of claim 8, wherein measuring the first target molecules comprises enriching the converted cfDNA or amplified products thereof to produce an enriched polynucleotide sample.
- 13. The method of claim 12, wherein the enriching comprises capturing the converted cfDNA or amplified products thereof with a plurality of corresponding decoy oligonucleotides.
- 14. The method of claim 13, wherein the plurality of different target genomic regions for enrichment by the decoy oligonucleotides are genomic regions identified by the first trained model as differentially methylated in at least one of a plurality of cancer types relative to non-cancer tissue or relative to different types of cancer.
- 15. The method of claim 1, wherein the plurality of different polypeptides comprises at least 5, 10, 25, 50, 100, 200, 500, 1000, 2000, 3000, 5000, or 7500 different polypeptides.
- 16. The method of claim 15, wherein the plurality of different polypeptides comprises (a) identifying a polypeptide selected from the group consisting of a protein of list 1, (b) identifying a polypeptide selected from the group consisting of a protein of any one of list 2-19, (c) identifying a polypeptide selected from the group consisting of a protein of list 20, or (d) identifying a polypeptide of one or more of CHAD, KRT19, MMP12, PTN, SERPINA3, and SPP 1.
- 17. The method of claim 1, wherein the trained classifier distinguishes between subjects with cancer and subjects without cancer with a specificity defined for each of the plurality of cancer types.
- 18. The method of claim 1, wherein the trained classifier has a higher sensitivity to cancer detection than each of the first trained model and the second trained model, optionally wherein the trained classifier has a specificity to cancer detection that is equal to or greater than each of the first trained model and the second trained model.
- 19. The method of claim 1, wherein the trained classifier is a binary classifier, a mixed model classifier, a multi-layer perceptron model classifier, or a logistic regression classifier.
- 20. The method of claim 1, wherein the first trained model and/or the second trained model is a binary classifier, a mixed model classifier, a multi-layer perceptron model classifier, or a logistic regression classifier.
Description
Methods and compositions for analyzing free biomarkers Cross reference The present application claims the benefit of U.S. provisional application No. 63/578,347 filed on day 23 of 8.2023 and U.S. provisional application No. 63/549,406 filed on day 2 of 2024, which are incorporated herein by reference in their entireties for all purposes. Technical field and background art Cancer is a prominent global public health problem. Screening programs and early diagnosis have important implications for improving disease-free survival and reducing mortality in cancer patients. Since non-invasive methods for early diagnosis promote patient compliance, they can be incorporated into screening programs. DNA methylation plays an important role in regulating gene expression. Abnormal DNA methylation has been implicated in a number of disease processes, including cancer. DNA methylation profiling using methylation sequencing, such as Whole Genome Bisulfite Sequencing (WGBS), is increasingly being considered an important diagnostic tool for detecting, diagnosing and/or monitoring cancer. For example, specific patterns of differentially methylated regions can be used as molecular markers for various diseases. However, WGBS is not ideally suited for product determination because most genomes are not differentially methylated in cancer, or local CpG densities are too low to provide a robust signal, only a few percent of the genome may be available for classification. Cancer remains a common cause of death worldwide. Treatment options have improved over the past few decades, but survival rates have remained low. The success of treatment by surgical resection and drug-based methods is strongly dependent on the identification of early tumors. However, many current detection methods often fail to identify tumors prior to the more advanced stages of the disease. Disclosure of Invention In view of the above, there remains a need for a non-invasive test that can identify disease at the earliest stage when therapeutic intervention has a greater chance of success. Aspects of the present disclosure address this need and provide other advantages as well. For example, some aspects provided herein relate to methods of combining polypeptide detection with methylation pattern assessment in cfDNA in a non-invasive, cost-effective manner for detecting cancer-related biomarkers in a sample from a subject. By analyzing data related to both methylation patterns and polypeptide levels in cfDNA, a higher sensitivity for detecting cancer biomarkers can be achieved with a specificity equal to or greater than either alone. This improvement in detection allows the identification of a true positive cancer sample that may be missed by analysis of either analyte alone. In one aspect, the present disclosure provides a method of detecting cancer in a subject. In some embodiments, the method of detecting cancer in a subject includes (a) measuring the level of a first target molecule from a first sample of the subject, (b) measuring the level of a second target molecule from a second sample of the subject, (c) applying a trained classifier to the measured levels of the first target molecule and the second target molecule to assign a global probability score to the cancer, and (d) detecting the cancer by identifying that the global probability score is above a threshold for the presence of the cancer. In some embodiments, the first target molecule comprises episomal DNA (cfDNA) from a plurality of different target genomic regions that are differentially methylated in at least one of a plurality of cancer types. In some embodiments, the second target molecule comprises a plurality of different polypeptides that are differentially expressed in at least one of the plurality of cancer types. In some embodiments, applying the trained classifier includes (i) applying a first trained model to the measured levels of the first target molecules to assign a first probability score to the cancer, (ii) applying a second trained model to the measured levels of the second target molecules to assign a second probability score to the cancer, and (iii) summing the first probability score and the second probability score. In some embodiments, the first sample and the second sample are the same. In some embodiments, for reference samples from (1) a reference subject with known cancer and (2) a reference subject without cancer, the trained classifier is trained using a reference first probability score from the first trained model, a reference second probability score from the second trained model, and a reference overall probability score that summarizes the reference first probability scores and the reference second probability scores. In some embodiments, the trained classifier assigns an overall probability score to each of a plurality of different cancer types, and detecting the cancer includes identifying the cancer type as the cancer type with the highest overall probability