US-12624394-B2 - Methods and systems for genetic analysis

US12624394B2US 12624394 B2US12624394 B2US 12624394B2US-12624394-B2

Abstract

This disclosure provides systems and methods for sample processing and data analysis. Sample processing may include nucleic acid sample processing and subsequent sequencing. Some or all of a nucleic acid sample may be sequenced to provide sequence information, which may be stored or otherwise maintained in an electronic storage location. The sequence information may be analyzed with the aid of a computer processor, and the analyzed sequence information may be stored in an electronic storage location that may include a pool or collection of sequence information and analyzed sequence information generated from the nucleic acid sample. Methods and systems of the present disclosure can be used, for example, for the analysis of a nucleic acid sample, for producing one or more libraries, and for producing biomedical reports. Methods and systems of the disclosure can aid in the diagnosis, monitoring, treatment, and prevention of one or more diseases and conditions.

Inventors

Gabor T. Bartha
Gemma Chandratillake
Richard Chen
Sarah Garcia
Hugo Yu Kor Lam
Mark R. Pratt
John West

Assignees

Personalis, Inc.

Dates

Publication Date: 20260512
Application Date: 20250626

Claims (20)

1 . A method for analyzing nucleic acid samples obtained from a subject, comprising: (a) generating a first subset of nucleic acid molecules from a first nucleic acid sample obtained from a first sample from a subject suffering from a cancer at a first time point; (b) conducting a first sequencing assay on the first subset of nucleic acid molecules to yield a first result comprising a first set of nucleic acid sequencing information, wherein: (i) the first sequencing assay comprises whole genome sequencing by synthesis that produces a first set of nucleic acid sequencing information comprising a first set of sequence reads, wherein the first set of sequence reads comprises single nucleotide polymorphisms (SNPs) with heterozygous allelic forms, and (ii) the first sequencing assay generates at least 2,000,000 reads per run; (c) conducting a second sequencing assay on a second subset of nucleic acid molecules from a second nucleic acid sample obtained from the subject at the first time point or a second time point to yield a second result comprising a second set of nucleic acid sequencing information, wherein: (i) the second sequencing assay comprises sequencing by synthesis that produces a second set of nucleic acid sequencing information comprising a second set of sequence reads, and (ii) the second sequencing assay generates at least 2,000,000 reads per run; (d) combining, with the aid of a computer processor, the first set of sequence reads and the second set of sequence reads to generate a combined result; (e) based on the combined result, identifying a plurality of nucleic acid regions comprising variants; (f) producing, with the aid of a computer processor, a plurality of pulldown probes, wherein: (i) the plurality of pulldown probes comprises 10 or more pulldown probes with different sequences, (ii) individual instances of the plurality of pulldown probes hybridize to individual instances of the plurality of nucleic acid regions comprising variants of step (e), (iii) individual instances of the plurality of pulldown probes each comprise a label, and (iv) the label comprises biotin or a magnetic particle; (g) generating a third subset of nucleic acid molecules from a third nucleic acid sample obtained from the subject at a third time point, wherein the generating comprises: (i) hybridizing at least part of the third nucleic acid sample with the plurality of pulldown probes, (ii) separating pulldown probe-hybridized nucleic acid molecules from pulldown probe-free nucleic acid molecules, and (iii) conducting one or more elution reactions on the pulldown probe-hybridized nucleic acid molecules; (h) conducting a third sequencing assay on the third subset of nucleic acid molecules from the third nucleic acid sample to yield a third result comprising a third set of nucleic acid sequencing information, wherein: (i) the third sequencing assay comprises sequencing by synthesis, and (ii) the third sequencing assay generates at least 2,000,000 reads per run; and (i) generating a biomedical report that includes biomedical information of the subject, wherein the biomedical information is indicative of the combined result or the third result and is predictive, prognostic, or diagnostic for a cancer.
2 . The method of claim 1 , further comprising, prior to the conducting of step (c), generating the second set of nucleic acid molecules from the second nucleic acid sample of the subject by contacting at least part of the second nucleic acid sample with a second plurality of pulldown probes, wherein: (i) the second plurality of pulldown probes comprises 10 or more pulldown probes with different sequences, (ii) the second plurality of pulldown probes hybridizes to a genomic region feature comprising polymorphisms, and (iii) individual instances of the second plurality of pulldown probes each comprise between about 10 to about 500 nucleotides.
3 . The method of claim 2 , wherein generating the second set of nucleic acid molecules further comprises: (A) hybridizing the at least part of the second nucleic acid sample with the second plurality of pulldown probes; and (B) separating pulldown probe-hybridized nucleic acid molecules from pulldown probe-free nucleic acid molecules.
4 . The method of claim 3 , wherein generating the second set of nucleic acid molecules further comprises: (C) conducting one or more elution reactions on the pulldown probe-hybridized nucleic acid molecules.
5 . The method of claim 1 , wherein individual instances of the plurality of pulldown probes each comprise between about 10 to about 500 nucleotides.
6 . The method of claim 1 , wherein the method further comprises, prior to the conducting of step (b), amplifying the first subset of nucleic acid molecules to generate a first set of amplified nucleic acid molecules, and the first sequencing assay is performed on the first set of amplified nucleic acid molecules.
7 . The method of claim 1 , wherein the method further comprises, during the conducting of step (b), amplifying the first subset of nucleic acid molecules.
8 . The method of claim 1 , further comprising: (i) prior to the conducting of step (b), amplifying the first subset of nucleic acid molecules to generate a first set of amplified nucleic acid molecules; and (ii) during the conducting of step (b), amplifying the first set of amplified nucleic acid molecules.
9 . The method of claim 1 , wherein the method further comprises, prior to the conducting of step (c), amplifying the second subset of nucleic acid molecules to generate a second set of amplified nucleic acid molecules, and the second sequencing assay is performed on the second set of amplified nucleic acid molecules.
10 . The method of claim 1 , wherein the method further comprises, during the conducting of step (c), amplifying the second subset of nucleic acid molecules.
11 . The method of claim 1 , further comprising: (i) prior to the conducting of step (c), amplifying the second subset of nucleic acid molecules to generate a second set of amplified nucleic acid molecules; and (ii) during the conducting of step (c), amplifying the second set of amplified nucleic acid molecules.
12 . The method of claim 1 , wherein the method further comprises, prior to the conducting of step (h), amplifying the third subset of nucleic acid molecules to generate a third set of amplified nucleic acid molecules, and the third sequencing assay is performed on the third set of amplified nucleic acid molecules.
13 . The method of claim 1 , wherein the method further comprises, during the conducting of step (h), amplifying the third subset of nucleic acid molecules.
14 . The method of claim 1 , further comprising: (i) prior to the conducting of step (h), amplifying the third subset of nucleic acid molecules to generate a third set of amplified nucleic acid molecules; and (ii) during the conducting of step (h), amplifying the third set of amplified nucleic acid molecules.
15 . The method of claim 1 , wherein the first sample from the subject suffering from cancer comprises a tumor sample.
16 . The method of claim 1 , wherein the second subset of nucleic acid molecules is isolated from a sample comprising a body fluid or a tissue sample.
17 . The method of claim 16 , wherein the second subset of nucleic acid molecules is isolated from a body fluid, wherein the body fluid comprises blood, plasma, or a blood fraction.
18 . The method of claim 16 , wherein the second subset of nucleic acid molecules is isolated from a tissue sample, wherein the tissue sample comprises a benign tissue sample.
19 . The method of claim 1 , wherein the third subset of nucleic acid molecules is isolated from a sample comprising blood, plasma, or a blood fraction.
20 . The method of claim 1 , wherein the first subset of nucleic acid molecules, the second subset of nucleic acid molecules, and/or the third subset of nucleic acid molecules comprises DNA, RNA, DNA/RNA hybrids, or cDNA derived from RNA.

Description

CROSS-REFERENCE This application is a continuation application of U.S. patent application Ser. No. 18/824,319, filed Sep. 4, 2024, which is a continuation application of Ser. No. 18/626,998, filed Apr. 4, 2024, which is a continuation application of U.S. patent application Ser. No. 18/178,764, filed Mar. 6, 2023, now U.S. Pat. No. 11,976,326, which is a continuation application of U.S. patent application Ser. No. 18/058,376, filed Nov. 23, 2022, now U.S. Pat. No. 11,649,499, which is a continuation application of U.S. patent application Ser. No. 17/744,205, filed May 13, 2022, now U.S. Pat. No. 11,591,653, which is a continuation application of U.S. patent application Ser. No. 17/507,578, filed Oct. 21, 2021, now U.S. Pat. No. 11,365,446, which is a divisional application of U.S. patent application Ser. No. 17/080,474, filed Oct. 26, 2020, now U.S. Pat. No. 11,155,867, which is a continuation application of U.S. patent application Ser. No. 16/816,135, filed Mar. 11, 2020, which is a continuation application of U.S. patent application Ser. No. 16/526,928, filed Jul. 30, 2019, which is a continuation application of U.S. patent application Ser. No. 15/996,215, filed Jun. 1, 2018, now U.S. Pat. No. 10,415,091, which is a continuation application of U.S. patent application Ser. No. 14/810,337, filed Jul. 27, 2015, now U.S. Pat. No. 10,266,890, which is a divisional application of U.S. patent application Ser. No. 14/141,990, filed Dec. 27, 2013, now U.S. Pat. No. 9,128,861, which claims priority to U.S. Provisional Application No. 61/753,828, filed Jan. 17, 2013, each of which is incorporated herein by reference in its entirety. BACKGROUND Current methods for whole genome and/or exome sequencing may be costly and fail to capture many biomedically important variants. For example, commercially available exome enrichment kits (e.g., Illumina's TruSeq exome enrichment and Agilent's SureSelect exome enrichment), may fail to target biomedically interesting non-exomic and exomic regions. Often, whole genome and/or exome sequencing using standard sequencing methods performs poorly in content regions having very high CG content (>70%). Furthermore, whole genome and/or exome sequencing also fail to provide adequate and/or cost-effective sequencing of repetitive elements in the genome. The methods disclosed herein provide specialized sequencing protocols or technologies to address these issues. SUMMARY Provided herein is a method for analyzing a nucleic acid sample, comprising (a) producing two or more subsets of nucleic acid molecules from a nucleic acid sample, wherein (i) the two or more subsets comprise a first subset of nucleic acid molecules and a second subset of nucleic acid molecules, and (ii) the first subset of nucleic acid molecules differs from the second subset of nucleic acid molecules by one or more features selected from genomic regions, mean GC content, mean molecular size, subset preparation method, or combination thereof; (b) conducting one or more assays on at least two of the two or more subsets of nucleic acid molecules, wherein (i) a first assay, comprising a first sequencing reaction, is conducted on the first subset of the two or more subsets to produce a first result, and (ii) a second assay is conducted on the second subset of the two or more subsets to produce a second result; and (c) combining, with the aid of a computer processor, the first result and second result, thereby analyzing the nucleic acid sample. Also provided herein is a method for analyzing a nucleic acid sample, comprising (a) producing two or more subsets of nucleic acid molecules from a nucleic acid sample, wherein the two or more subsets differ by one or more features selected from genomic regions, mean GC content, mean molecular size, subset preparation method, or combination thereof; (b) combining at least two of the two or more subsets of nucleic acid molecules to produce a first combined pool of nucleic acid molecules; and (c) conducting one or more assays on the first combined pool of nucleic acid molecules, wherein at least one of the one or more assays comprises a sequencing reaction. Disclosed herein is a method for analyzing a nucleic acid sample, comprising (a) producing two or more nucleic acid molecules subsets from a nucleic acid sample, wherein producing the two or more nucleic acid molecules comprise enriching the two or more subsets of nucleic acid molecules for two or more different genomic regions; (b) conducting a first assay on a first subset of nucleic acid molecules among the two or more subsets of nucleic acid molecules to produce a first result, wherein the first assay comprises a first sequencing reaction; (c) conducting a second assay on at least a second subset of nucleic acid molecules among the two or more subsets of nucleic acid molecules to produce a second result; and (d) combining, with the aid of a computer processor, the first result with the second result, thereby analyzing the nucleic acid