US-20260128124-A1 - METHODS AND SYSTEMS FOR PREDICTION OF NOVEL PATHOGENIC MUTATIONS

US20260128124A1US 20260128124 A1US20260128124 A1US 20260128124A1US-20260128124-A1

Abstract

Methods and systems for predicting the pathogenicity of variant sequences detected in a sample from a subject are described. The disclosed methods may comprise, for example, receiving sequence read data for a plurality of sequence reads obtained from a sample from a subject; identifying one or more variant sequences based on the sequence read data; providing a variant sequence from the one or more identified variant sequences as input to a trained machine learning model configured to determine a pathogenicity prediction score for the identified variant sequence based on the variant sequence and at least one of additional genomic profiling, demographic pathogenicity prediction score determined for the variant sequence identified in the sample from the subject.

Inventors

Douglas I. LIN
Dean PAVLICK
Garrett M. FRAMPTON
Jonathan Keith KILLIAN
James HABERBERGER

Assignees

FOUNDATION MEDICINE, INC.

Dates

Publication Date: 20260507
Application Date: 20240514

Claims (20)

1 . A method for identifying pathogenic variants comprising: receiving, at one or more processors, sequence read data for a plurality of sequence reads obtained from a sample from a subject; identifying, using the one or more processors, one or more variant sequences based on the sequence read data; providing, using the one or more processors, a variant sequence from the one or more identified variant sequences as input to a trained machine learning model configured to determine a pathogenicity prediction score for the identified variant sequence based on the variant sequence and at least one of additional genomic profiling, demographic, or clinical feature data for the sample or subject; outputting, using the one or more processors, the pathogenicity prediction score determined for the variant sequence identified in the sample from the subject; and selecting a treatment for a disease exhibited by the subject based on a pathogenicity prediction score for at least one identified variant sequence that indicates that it is pathogenic.
2 . The method of claim 1 , further comprising: comparing, using the one or more processors, the pathogenicity prediction score for the variant sequence identified in the sample from the subject to a predetermined pathogenicity threshold, and based on the comparison: reporting the variant sequence as being pathogenic if its pathogenicity prediction score is greater than or equal to the predetermined pathogenicity threshold; or reporting the variant sequence as being not pathogenic if its pathogenicity prediction score is less than the predetermined pathogenicity threshold.
3 . The method of claim 1 , wherein the trained machine learning model is further configured to output a prediction of whether the variant sequence is a drug resistance gene.
4 . (canceled)
5 . The method of claim 1 , wherein the one or more identified variant sequences comprise one or more single nucleotide substitutions, one or more short insertions, one or more short deletions, or any combination thereof.
6 . The method of claim 1 , wherein the additional genomic profiling feature data comprises genomic ancestry, microsatellite instability, tumor mutational burden, a determination of somatic versus germline status for the identified variant sequence, or any combination thereof.
7 . The method of claim 1 , wherein the additional demographic feature data comprises the subject's age, sex, race, or any combination thereof.
8 . The method of claim 1 , wherein the additional clinical feature data comprises the subject's sample type, disease diagnosis, family history of disease, or any combination thereof.
9 . The method of claim 1 , wherein the machine learning model comprises a supervised machine learning model.
10 . (canceled)
11 . The method of claim 1 , wherein the trained machine learning model is trained using a training dataset that comprises data for variant sequences identified in samples from a cohort of subjects that includes subjects diagnosed with different cancers.
12 . The method of claim 11 , wherein the training dataset further comprises additional genomic profiling feature data for the samples from the cohort of subjects.
13 . The method of claim 12 , wherein the additional genomic profiling feature data comprises genomic ancestry data, microsatellite instability data, tumor mutational burden data, a determination of somatic versus germline status for the identified variant sequence, or any combination thereof.
14 . The method of claim 11 , wherein the training dataset further comprises additional demographic feature data for the cohort of subjects.
15 . The method of claim 11 , wherein the training dataset further comprises additional clinical feature data for the cohort of subjects.
16 . The method of claim 2 , wherein the predetermined pathogenicity threshold is determined on a per-gene basis.
17 . The method of claim 1 , wherein the disease exhibited by the subject is cancer, and the treatment is an anti-cancer therapy.
18 . The method of claim 1 , wherein the disease exhibited by the subject is gastrointestinal stromal tumor (GIST), and the treatment is a tyrosine kinase inhibitor.
19 . The method of claim 18 , wherein treatment with the tyrosine kinase inhibitor is recommended if the variant sequence is determined to be pathogenic and is not predicted to be a tyrosine kinase inhibitor resistance gene.
20 . The method of claim 18 , wherein treatment with the tyrosine kinase inhibitor is not recommended if the variant sequence is determined to be not pathogenic or is predicted to be a tyrosine kinase inhibitor resistance gene.

Description

CROSS-REFERENCE TO RELATED APPLICATION This application claims the priority benefit of U.S. Provisional Patent Application Ser. No. 63/466,943, filed May 16, 2023, the contents of which are incorporated herein by reference in their entirety. FIELD The present disclosure relates generally to methods and systems for analyzing genomic profiling data, and more specifically to methods and systems for predicting novel pathogenic mutations based on variant sequence data and other genomic or clinical data. BACKGROUND Genomic profiling techniques have enabled research scientists and clinicians to explore and elucidate the landscape of genetic variants that underly a variety of disease states, including a variety of genetic disorders and cancers. Gastrointestinal stromal tumor (GIST), for example, is the most common mesenchymal cancer of the digestive tract. Complete genomic profiling (CGP) and analysis of next generation sequencing (NGS) data using variant calling algorithms has identified several variant forms of the KIT, PDGFRA, NF1, SDHA, and BRAF genes of patients diagnosed with GIST. However, the prevalence of primary driver mutations in these genes varies across samples collected from a large cohort of patients, and furthermore also varies between sample types (e.g., between tissue versus liquid biopsy samples), thus indicating that additional genomic and/or clinical factors also influence the degree to which a mutation in one of these genes is pathogenic. Thus, improved methods for predicting the pathogenicity of genetic mutations based on the detected variant sequences in combination with other genomic and/or clinical data are needed to inform prognosis and treatment selection for patients with genetic disorders and cancers. BRIEF SUMMARY OF THE INVENTION Disclosed herein are methods and systems for predicting the pathogenicity of variant sequences detected in a sample from a subject based on the variant sequence data in combination with other genomic, demographic, and/or clinical data for the subject. The disclosed methods comprise the use of a trained machine learning model that is configured to process input data comprising variant sequence data and at least one of additional genomic profile feature data, demographic feature data, and/or clinical feature data for the sample or subject and output a pathogenicity prediction score for the detected variant sequence. The trained machine learning model can be used to predict novel pathogenic mutations for a given disease, e.g., a given type of cancer. In some embodiments, the trained machine learning model may also be used to predict specific treatment-resistant mutations for the given disease, e.g., a given type of cancer. In some aspects, disclosed herein are methods for predicting the effects of variant sequences, comprising: providing a plurality of nucleic acid molecules obtained from a sample from a subject; ligating one or more adapters onto one or more nucleic acid molecules from the plurality of nucleic acid molecules; amplifying the one or more ligated nucleic acid molecules from the plurality of nucleic acid molecules; capturing amplified nucleic acid molecules from the amplified nucleic acid molecules; sequencing, by a sequencer, the captured nucleic acid molecules to obtain a plurality of sequence reads that represent the captured nucleic acid molecules; receiving, at one or more processors, sequence read data for the plurality of sequence reads obtained from the sample from the subject; identifying, using the one or more processors, one or more variant sequences based on the sequence read data; providing, using the one or more processors, a variant sequence from the one or more identified variant sequences as input to a trained machine learning model configured to determine a pathogenicity prediction score for the identified variant sequence based on the variant sequence and at least one of additional genomic profiling, demographic, or clinical feature data for the sample or subject; and outputting, using the one or more processors, the pathogenicity prediction score determined for the variant sequence identified in the sample from the subject. In some embodiments, the methods disclosed herein can further comprise: comparing, using the one or more processors, the pathogenicity prediction score for the variant sequence identified in the sample from the subject to a predetermined pathogenicity threshold, and based on the comparison: reporting the variant sequence as being pathogenic if its pathogenicity prediction score is greater than or equal to the predetermined pathogenicity threshold; or reporting the variant sequence as being not pathogenic if its pathogenicity prediction score is less than the predetermined pathogenicity threshold. In any of the embodiments herein, the trained machine learning model can be further configured to output a prediction of whether the variant sequence is a drug resistance gene. In any of the embodiments herein, the