US-20260125742-A1 - METHODS AND SYSTEMS FOR ANALYZING NUCLEIC ACID MOLECULES

US20260125742A1US 20260125742 A1US20260125742 A1US 20260125742A1US-20260125742-A1

Abstract

The disclosure provides methods for processing nucleic acid populations containing different forms (e.g., RNA and DNA, single-stranded or double-stranded) and/or extents of modification (e.g., cytosine methylation, association with proteins). These methods accommodate multiple forms and/or modifications of nucleic acid in a sample, such that sequence information can be obtained for multiple forms. The methods also preserve the identity of multiple forms or modified states through processing and analysis, such that analysis of sequence can be combined with epigenetic analysis.

Inventors

Andrew Kennedy

Assignees

GUARDANT HEALTH, INC.

Dates

Publication Date: 20260507
Application Date: 20260105

Claims (20)

1 . A method of monitoring residual disease or recurrence of disease in a subject, wherein the disease is cancer, wherein the method comprises: (a) providing a sample comprising at least 10 ng of cell-free DNA (cfDNA); (b) splitting the sample into first and second aliquots; (c) assaying cfDNA of the first aliquot to obtain sequence data of the first aliquot, irrespective of methylation state of the cfDNA, wherein the sequence data of the first aliquot is analyzed for SNVs, indels and/or gene fusions and assaying cfDNA of the second aliquot to obtain sequence data comprising information on the methylation state of the cfDNA, wherein adapters comprising one or more molecular tags are attached to the cfDNA of the first aliquot and/or cfDNA of the second aliquot, wherein nucleic acids derived from the first aliquot and/or the second aliquot are subject to target capture, in which molecules having target sequences are captured for subsequent analysis; and (d) analyzing the sequence data from the first and second aliquots to monitor residual disease or recurrence of disease.
2 . The method of claim 1 , wherein the nucleic acids derived from the first aliquot and the second aliquot are subject to target capture, in which molecules having target sequences are captured for subsequent analysis.
3 . The method of claim 1 , wherein the target capture uses a bait set comprising oligonucleotide baits labelled with a capture moiety.
4 . The method of claim 3 , wherein the capture moiety is biotin.
5 . The method of claim 3 , wherein the bait set has a higher relative concentration for more specifically desired sequences of interest.
6 . The method of claim 1 , wherein the method comprises adding a sample tag to the nucleic acids derived from the first aliquot and the nucleic acids derived from the second aliquot.
7 . The method of claim 6 , wherein the nucleic acids derived from the first aliquot and the second aliquot are combined prior to a sequencing step.
8 . The method of claim 1 , wherein the nucleic acids derived from the first aliquot and/or the second aliquot are sequenced to a depth of 1,000-50,000 reads per locus.
9 . The method of claim 1 , wherein the cfDNA is from a bodily fluid sample.
10 . The method of claim 9 , wherein the bodily fluid sample is blood, serum, or plasma.
11 . The method of claim 1 , wherein the assaying the cfDNA of the second aliquot comprises bisulfite sequencing.
12 . The method of claim 1 , wherein the sequence data from the second aliquot is from at least 50,000 sequencing reactions.
13 . The method of claim 1 , wherein the sequence data from the second aliquot includes sequence coverage of at least 20 different genes.
14 . The method of claim 1 , wherein the sequence data from the second aliquot includes sequence coverage of at least 200 different genes.
15 . The method of claim 1 , wherein the sequence data from the first and/or second aliquot indicates the presence of a germline variant.
16 . The method of claim 1 , wherein the sequence data from the first and/or second aliquot provides for sequence coverage of the genome which is less than 5%, 10% or 15%.
17 . The method of claim 1 , wherein the method further comprises analysing the sequencing data to select a therapy.
18 . The method of claim 1 , wherein the cancer is blood cancer, brain cancer, lung cancer, skin cancer, nose cancer, throat cancer, liver cancer, bone cancer, lymphoma, pancreatic cancer, skin cancer, bowel cancer, rectal cancer, thyroid cancer, bladder cancer, kidney cancer, mouth cancer, stomach cancer, a solid state tumor, a heterogeneous tumor, and/or a homogenous tumor.
19 . The method of claim 1 , wherein the cancer is colorectal cancer.
20 . The method of claim 1 , wherein the subject is in remission.

Description

REFERENCE TO RELATED PATENT APPLICATIONS This application is a continuation application of U.S. Utility application Ser. No. 19/025,566, filed Jan. 16, 2025, which is a continuation application of U.S. Utility application Ser. No. 18/770,271, filed Jul. 11, 2024, now issued as U.S. Pat. No. 12,312,634, which is a continuation application of U.S. Utility application Ser. No. 18/625,882, filed Apr. 3, 2024, now issued as U.S. Pat. No. 12,428,670, which is a continuation application of U.S. Utility application Ser. No. 18/061,898, filed Dec. 5, 2022, now issued as U.S. Pat. No. 11,952,616, which is a continuation application of U.S. Utility application Ser. No. 16/450,918, filed Jun. 24, 2019, now issued as U.S. Pat. No. 11,519,019, which is a continuation of International Patent Application No. PCT/US2017/068329, filed Dec. 22, 2017, which claims the benefit of the priority dates of U.S. Provisional Patent Application Nos. 62/438,240, filed Dec. 22, 2016; 62/512,936, filed May 31, 2017 and 62/550,540, filed Aug. 25, 2017, all of which are incorporated by reference herein in their entirety. SEQUENCE LISTING The instant application contains a Sequence Listing which has been filed electronically in XML format and is hereby incorporated by reference in its entirety. Said XML copy, created on Jun. 22, 2023, is named GH0024US-CON42534-756_302_SL.xml and is 2,912 bytes in size. BACKGROUND Cancer is a major cause of disease worldwide. Each year, tens of millions of people are diagnosed with cancer around the world, and more than half eventually die from it. In many countries, cancer ranks the second most common cause of death following cardiovascular diseases. Early detection is associated with improved outcomes for many cancers. Cancer can be caused by the accumulation of genetics variations within an individual's normal cells, at least some of which result in improperly regulated cell division. Such variations commonly include copy number variations (CNVs), single nucleotide variations (SNVs), gene fusions, insertions and/or deletions (indels), epigenetic variations include 5-methylation of cytosine (5-methylcytosine) and association of DNA with chromatin and transcription factors. Cancers are often detected by biopsies of tumors followed by analysis of cells, markers or DNA extracted from cells. But more recently it has been proposed that cancers can also be detected from cell-free nucleic acids in body fluids, such as blood or urine. Such tests have the advantage that they are noninvasive and can be performed without identifying suspected cancer cells in biopsy. However, such tests are complicated by the fact that amount of nucleic acids in body fluids is very low and what nucleic acid are present are heterogeneous in form (e.g., RNA and DNA, single-stranded and double-stranded, and various states of post-replication modification and association with proteins, such as histones). It is desirable to increase sensitivity of liquid biopsy assays while reducing the loss of circulating nucleic acid (original material) or data in the process. SUMMARY The disclosure provides methods, compositions and systems for analyzing a nucleic acid population comprising at least two forms of nucleic acid selected from double-stranded DNA, single-stranded DNA and single-stranded RNA. In some embodiments the method comprises (a) linking at least one of the forms of nucleic acid with at least one tag nucleic acid to distinguish the forms from one another, (b) amplifying the forms of nucleic acid at least one of which is linked to at least one nucleic acid tag, wherein the nucleic acids and linked nucleic acid tag, if present, are amplified, to produce amplified nucleic acids, of which those amplified from the at least one form are tagged; (c) assaying sequence data of the amplified nucleic acids at least some of which are tagged; and (d) decoding tag nucleic acid molecules of the amplified nucleic acids to reveal the forms of nucleic acids in the population providing an original template for the amplified nucleic acids linked to the tag nucleic acid molecules for which sequence data has been assayed. In some embodiments, the method further comprises enriching for at least one of the forms relative to one or more of the other forms. In some embodiments at least 70% of the molecules of each form of nucleic acid in the population are amplified in step (b). In some embodiments at least three forms of nucleic acid are present in the population and at least two of the forms are linked to different tag nucleic acid forms distinguishing each of the three forms from one another. In some embodiments each of the at least three forms of nucleic acid in the population is linked to a different tag. In some embodiments each molecule of the same form is linked to a tag comprising the same identifying information tag (e.g., a tag with the same or comprising the same sequence). In some embodiments molecules of the same form are linked to different types