Search

US-20260127448-A1 - IDENTIFICATION OF SOMATIC OR GERMLINE ORIGIN FOR CELL-FREE DNA

US20260127448A1US 20260127448 A1US20260127448 A1US 20260127448A1US-20260127448-A1

Abstract

The present disclosure provides systems and methods to detect somatic or germline variants from cell-free DNA (cfDNA). Generally, the systems and methods comprise receiving sequencing information from cfDNA from said subject, said sequencing information comprising cfDNA sequencing reads from said plurality of genomic loci; determining quantitative allele fraction (AF) measures for each of said plurality of genomic loci based on said cfDNA sequencing reads; determining a standard deviation (STDEV) for each of said AF measures; providing a STDEV threshold and an AF threshold; determining whether each of said AF measures has a STDEV above or below said STDEV threshold; determining whether each of said AF measures is above or below said AF threshold; and classifying each locus with a STDEV below said STDEV threshold and an AF measure below said AF threshold below as being of somatic origin or classifying each locus with a STDEV below said STDEV threshold and an AF measure above said AF threshold below as being of germline.

Inventors

  • Richard B. LANMAN
  • Geoffrey R. Oxnard

Assignees

  • GUARDANT HEALTH, INC.
  • DANA-FARBER CANCER INSTITUTE, INC.

Dates

Publication Date
20260507
Application Date
20260106

Claims (20)

  1. 1 . A method for identifying an origin of a genetic variant in a subject, the method comprising: (a) providing a buffy coat specimen comprising genomic DNA and a plasma sample comprising cell-free nucleic acid (cfNA) from the subject; (b) sequencing exons from a plurality of different genes in the cfNA or amplification products thereof to obtain a first set of sequence reads, wherein the cfNA comprises cell-free DNA; (c) analyzing the first set of sequence reads to obtain a plasma allele faction (AF) measure of the genetic variant in the plasma sample, wherein the plasma AF measure comprises an AF confidence interval at a confidence level; (d) perform germline sequencing of the genomic DNA or amplification products thereof to obtain a second set of sequence reads; (e) analyzing the second set of sequence reads to obtain a buffy coat AF measure of the genetic variant in the buffy coat specimen; and (f) comparing the plasma AF measure with the buffy coat AF measure, thereby identifying the genetic variant to be of somatic origin, germline origin, or indeterminate origin.
  2. 2 . The method of claim 1 , wherein the cfNA comprises RNA.
  3. 3 . The method of claim 1 , wherein the sequencing in (b) comprises exome sequencing.
  4. 4 . The method of claim 1 , wherein the sequencing in (b) comprises whole transcriptome sequencing.
  5. 5 . The method of claim 1 , wherein the sequencing in (b) is performed at a read depth of at least 3,000 reads per base.
  6. 6 . The method of claim 1 , wherein the sequencing in (b) is performed at a read depth of at least 8,000 reads per base.
  7. 7 . The method of claim 1 , wherein the sequencing in (b) is performed at a read depth of 10,000 to about 5,000,000 reads per base.
  8. 8 . The method of claim 1 , wherein the confidence level in (c) is 95%.
  9. 9 . The method of claim 1 , wherein the plasma AF is in the range of 5% to 20%.
  10. 10 . The method of claim 1 , wherein the genetic variant is detected in the plasma sample at a frequency of 0.1% to 5%.
  11. 11 . The method of claim 1 , wherein the genetic variant is detected in the plasma sample at a frequency of at least 0.1%, 0.25%, 0.5%, or 1.0%.
  12. 12 . The method of claim 1 , wherein the genetic variant is detected in the plasma sample at a frequency of 0.1% or lower.
  13. 13 . The method of claim 1 , wherein the genetic variant is selected from the group consisting of a single nucleotide variation (SNV), an indel, a copy number variation (CNV), a fusion, a rearrangement, a repeat, and an aneuploidy.
  14. 14 . The method of claim 1 , wherein the genetic variant is selected from the group consisting of an insertion, a deletion, a copy number amplification (CNA), a copy number loss (CNL), a transversion, a translocation, and an inversion.
  15. 15 . The method of claim 1 , wherein the genetic variant is indicative of a cancer in the subject.
  16. 16 . The method of claim 15 , wherein the cancer is selected from the group consisting of ovarian cancer, pancreatic cancer, breast cancer, colorectal cancer, and non-small cell lung carcinoma.
  17. 17 . The method of claim 1 , further comprising determining a tissue of origin of the genetic variant.
  18. 18 . The method of claim 1 , further comprising detecting a presence or absence of residual cancer in the subject.
  19. 19 . The method of claim 1 , further comprising determining methylation profiles of the cfNA.
  20. 20 . The method of claim 1 , further comprising filtering out at least a portion of the first set of sequence reads.

Description

CROSS-REFERENCE This application is a continuation of U.S. patent application Ser. No. 19/098,448, filed Apr. 2, 2025, which is a continuation of U.S. patent application Ser. No. 16/678,060, filed Nov. 8, 2019, which is a continuation of International Application No. PCT/US2018/033038, filed May 16, 2018, which claims the benefit of U.S. Provisional Application No. 62/507,127, filed May 16, 2017, each of which is incorporated herein by reference in its entirety. REFERENCE TO SEQUENCE LISTING The Sequence Listing submitted Jun. 5, 2025, in U.S. patent application Ser. No. 19/098,448, filed Apr. 2, 2025, as a xml file named “38007.0015U6.xml,” created on Jun. 4, 2025, and having a size of 16,578 bytes is hereby incorporated by reference pursuant to 37 C.F.R. § 1.52(e)(5). BACKGROUND Comparison of the genome of a subject and a reference genome (e.g., GRCh38.p4), will typically show differences (genetic variation) at about 0.01% of bases. Genetic variants in the germline can represent SNPs transferred through normal heredity or through germinal mutations. Variations can exist in homozygous or heterozygous form. Certain pathological states, such as cancer, are characterized by genetic variations in the genomes of pathological cells as compared to the germline genome. These variations result from mutation in somatic cells, and are referred to as somatic mutations. Polynucleotides harboring somatic mutations can be detected in cell-free DNA (cfDNA), where they are mixed with DNA from cells having the germline genome. Where a large background (germline) is present in cfDNA, no computer implemented process can differentiate germline variants from somatic mutations automatically. Instead, conventional systems rely on the expertise of an individual human expert or a consortium of experts (in either case called a Tumor Board) to distinguish somatic mutations from the germline ones. If noise and biases were to be absent, the germline variants would be those with an allelic fraction of 50% (in the case of heterozygous (het) loci) or 100% (in the case of homozygous (homo) loci). However, in practice, the existence of noise and biases in the system make these crisp numbers fuzzy. In other words, the het or homo loci are not detected at exactly 50% or 100%, but are instead between lower and upper confidence bounds for each of the het and homo categories. For example, a het locus could be in the range of 40% to 60%, while a homo locus could be in the range of 98% to 100%. SUMMARY Additional aspects and advantages of the present disclosure will become readily apparent to those skilled in this art from the following detailed description, wherein only illustrative embodiments of the present disclosure are shown and described. As will be realized, the present disclosure is capable of other and different embodiments, and its several details are capable of modifications in various obvious respects, all without departing from the disclosure. Accordingly, the drawings and description are to be regarded as illustrative in nature, and not as restrictive. In one aspect, the present disclosure provides a method for identifying somatic origin of each of a plurality of genomic loci in cell-free DNA (cfDNA) from a subject, said method comprising: receiving sequencing information from said cfDNA from said subject, said sequencing information comprising cfDNA sequencing reads from said plurality of genomic loci; determining quantitative allele fraction (AF) measures for each of said plurality of genomic loci based on said cfDNA sequencing reads; determining a standard deviation (STDEV) for each of said AF measures; providing a STDEV threshold and an AF threshold; determining whether each of said AF measures has a STDEV above or below said STDEV threshold; determining whether each of said AF measures is above or below said AF threshold; and classifying each locus with a STDEV below said STDEV threshold and an AF measure below said AF threshold below as being of somatic origin. In one aspect, the present disclosure provides a method for identifying germline origin of each of a plurality of genomic loci in cell-free DNA (cfDNA) from a subject, said method comprising: receiving sequencing information from said cfDNA from said subject, said sequencing information comprising cfDNA sequencing reads from said plurality of genomic loci; determining quantitative allele fraction (AF) measures for each of said plurality of genomic loci based on said cfDNA sequencing reads; determining a standard deviation (STDEV) for each of said AF measures; providing a STDEV threshold and an AF threshold; determining whether each of said AF measures has a STDEV above or below said STDEV threshold; determining whether each of said AF measures is above or below said AF threshold; and classifying each locus with a STDEV below said STDEV threshold and an AF measure above said AF threshold below as being of germline origin. In some embodiments, an AF measure for a genomic