US-20260128123-A1 - METHOD FOR DETECTING PATIENTS WITH SYSTEMATICALLY UNDER-ESTIMATED TUMOR MUTATIONAL BURDEN WHO MAY BENEFIT FROM IMMUNOTHERAPY
Abstract
Methods for more accurately determining tumor mutational burden (TMB) based on sequence read data for a sample from a subject are described. The methods may comprise, for example, receiving sample data comprising tumor purity data, variant data, variant allele fraction (VAF) data, or any combination thereof, for a sample from a subject; providing the sample data as input to a machine learning model configured to classify the sample according to TMB status based on the input sample data; and outputting a classification of the TMB status of the sample.
Inventors
- Brennan DECKER
- Zoe R. Fleischmann
- Rachel Beth EVANS
- Douglas A. MATA
Assignees
- FOUNDATION MEDICINE, INC.
Dates
- Publication Date
- 20260507
- Application Date
- 20251007
Claims (20)
- 1 . A method for classifying a sample from a subject according to tumor mutational burden (TMB) status, the method comprising: receiving, at one or more processors, sample data comprising tumor purity data, variant data, variant allele fraction (VAF) data, or any combination thereof, for the sample from the subject; providing, using the one or more processors, the sample data as input to a machine learning model configured to classify the sample according to TMB status based on the input sample data; and outputting, using the one or more processors, a classification of the TMB status of the sample.
- 2 . The method of claim 1 , wherein the classification of the TMB status comprises classification of the sample as being TMB-High or TMB-Low.
- 3 . The method of claim 1 , wherein the classification of the TMB status comprises classification of the sample as being TMB-High, TMB-Low, or TMB-Indeterminate.
- 4 . A method for determining a tumor mutational burden (TMB) status for a sample from a subject, the method comprising: receiving, at one or more processors, sample data comprising tumor purity data, variant data, variant allele fraction (VAF) data, or any combination thereof, for the sample from the subject; providing, using the one or more processors, the sample data as input to a machine learning model configured to determine the TMB of the sample based on the input sample data; and outputting, using the one or more processors, a determination of the TMB of the sample.
- 5 . The method of claim 4 , wherein the determination of TMB further comprises a determination of a 95% confidence interval for the TMB of the sample.
- 6 . The method of claim 1 , wherein the variant data comprises short variant data.
- 7 . The method of claim 1 , wherein the variant data comprises genomic rearrangement data.
- 8 . The method of claim 1 , wherein the sample data further comprises mutational signature data.
- 9 . The method of claim 1 , wherein the sample data further comprises copy number alteration (CNA) data.
- 10 . The method of claim 9 , wherein the copy number alteration (CNA) data comprises copy number signature data.
- 11 . The method of claim 1 , wherein the sample data further comprises digital histopathology data.
- 12 . The method of claim 1 , wherein the machine learning model comprises a random forest model, a logistic regression model, a support vector machine (SVM), an XGBoost model, or a neural network.
- 13 . The method of claim 1 , wherein the machine learning model is trained using at least one training data set comprising sample data for a set of training samples from a cohort of cancer patients.
- 14 . The method of claim 13 , wherein the training data set comprises sample data for training samples having a tumor purity of at least 10%, 15%, 20%, 25%, or 30%.
- 15 . The method of claim 1 , further comprising processing sequence read data for a plurality of sequence reads obtained from the sample from the subject to generate the sample data.
- 16 . The method of claim 1 , wherein the sample comprises a tissue biopsy sample, a liquid biopsy sample, or a normal control.
- 17 . The method of claim 16 , wherein the sample is a liquid biopsy sample and comprises blood, plasma, cerebrospinal fluid, sputum, stool, urine, or saliva.
- 18 . The method of claim 16 , wherein the sample is a liquid biopsy sample and comprises circulating tumor cells (CTCs).
- 19 . The method of claim 16 , wherein the sample is a liquid biopsy sample and comprises cell-free DNA (cfDNA), circulating tumor DNA (ctDNA), or any combination thereof.
- 20 . The method of claim 1 , wherein the subject has been diagnosed cancer.
Description
CROSS-REFERENCE TO RELATED APPLICATION This application is a continuation application of International Application No. PCT/US2024/022232, filed internationally on Mar. 29, 2024, which claims the priority benefit of U.S. Provisional Patent Application Ser. No. 63/458,809, filed Apr. 12, 2023, the disclosures of which are herein incorporated by reference in their entirety. FIELD OF THE INVENTION The present disclosure relates generally to methods and systems for analyzing genomic profiling data, and more specifically to methods and systems for: (i) identifying subjects for whom tumor mutational burden is systematically underestimated using genomic profiling data, and/or (ii) for making more accurate determinations of TMB in those cases. BACKGROUND Tumor mutational burden (TMB) is a complex biomarker that quantifies the number of mutations in a sample from a subject (e.g., the number of non-synonymous somatic mutations per megabase in coding regions of the genome) that may contribute to the immunogenicity of a tumor. TMB is therefore considered a predictor of a subject's (e.g., a cancer patient's) response to immune checkpoint inhibitor (ICPI) therapy. Determination of tumor mutational burden based on DNA sequencing data for a sample from a subject (e.g., a patient) often requires the imposition of a variant allele fraction (VAF) threshold to exclude subclonal or artifactual variants from the calculation of TMB. These VAF thresholds can be considered to impose a tumor purity limit of detection on the accurate determination of TMB. Thus, when tumor purity is low, the application of the VAF threshold for inclusion of detected variants in the TMB calculation can cause some clonal variants to be excluded, thereby causing underestimation of the sample's TMB. Underestimation of the actual TMB for the sample may, in turn, lead to erroneous predictions of a subject's response to ICPI therapy. Thus, improved methods for determining tumor mutational burden are required to improve the predictive accuracy of this biomarker and associated healthcare outcomes. BRIEF SUMMARY OF THE INVENTION Disclosed herein are methods and systems for identifying samples from subjects for which a determination of TMB using conventional approaches is likely to result in an underestimate, and for more accurately determining TMB based on DNA sequencing data for a sample from a subject, particularly for those subjects for whom TMB is systematically underestimated using conventional approaches. The disclosed methods utilizes a machine learning (ML)-based approach to identify samples with high risk for missed tumor mutational burden-high (TMB-H) status. The machine learning model is trained to recognize patterns of short variants detected based on sequence read data (e.g., single nucleotide variants (SNVs) and small insertions/deletions) that are recurrently associated with TMB-H status when tumor content is adequate for confident TMB assessment. The trained model can then be applied to variants detected in samples with low tumor content and apparently low TMB to highlight cases that are at risk for a false negative determination of TMB-H status. Disclosed herein are methods method comprising: providing a plurality of nucleic acid molecules obtained from a sample from a subject; ligating one or more adapters onto one or more nucleic acid molecules from the plurality of nucleic acid molecules; amplifying the one or more ligated nucleic acid molecules from the plurality of nucleic acid molecules; capturing amplified nucleic acid molecules from the amplified nucleic acid molecules; sequencing, by a sequencer, the captured nucleic acid molecules to obtain a plurality of sequence reads that represent the captured nucleic acid molecules; receiving, at one or more processors, sequence read data for the plurality of sequence reads; receiving, at one or more processors, sample data comprising tumor purity data, variant data, variant allele fraction (VAF) data, or any combination thereof, based on the sequence read data; providing, using the one or more processors, the sample data as input to a machine learning model configured to identify the sample as one for which TMB is likely to be underestimated based on the input sample data; and outputting, using the one or more processors, a prediction of whether or not a determination of TMB for the sample is likely to be an underestimated value. In some embodiments, the variant data comprises short variant data. In some embodiments, the variant data comprises genomic rearrangement data. In some embodiments, the sample data further comprises mutational signature data. In some embodiments, the sample data further comprises copy number alteration (CNA) data. In some embodiments, the copy number alteration (CNA) data comprises copy number signature data. In some embodiments, the sample data further comprises digital histopathology data. In some embodiments, the machine learning model comprises a supervised machine l