WO-2026093508-A1 - METHODS FOR NUCLEIC ACID ANALYSIS
Abstract
The invention provides methods for generating a DNA library from single cells or single nuclei. The methods comprise compartmentalising single cells or single nuclei, lysing the cells or nuclei to release genomic DNA, fragmenting the genomic DNA, introducing first and second hairpin polynucleotides to the DNA fragments, and extending a hairpin polynucleotide along the DNA strand to generate a library of barcoded DNA. The invention also provides methods for DNA sequencing and methods for mapping the location of modified cytosine residues, as well as kits for use with the methods.
Inventors
- GOLDER, Paula
- CHEN, JINFENG
- VALENT, Iris
- CIAU-UTIZ, Romualdo
- GOSAL, Walraj
- TAIPALE, Minna
- KOKKO-GONZALES, Paula
- SHI, Chenfu
- MONAHAN, Jack
- VIVIAN, Julia
Assignees
- BIOMODAL LIMITED
Dates
- Publication Date
- 20260507
- Application Date
- 20251031
- Priority Date
- 20241101
Claims (20)
- 1. A method for generating a DNA library, the method comprising the steps of: (a) providing a plurality of compartments, each compartment comprising a single cell, or a nucleus from a single cell; (b) lysing the compartmentalised cells or nuclei to release genomic DNA into each compartment; (c) fragmenting the genomic DNA to produce double-stranded DNA fragments, and introducing a first hairpin polynucleotide to the 5’-end of each DNA strand in the fragments and a second hairpin polynucleotide to the 3’-end of each DNA strand in the fragments, wherein at least one of the first and second hairpin polynucleotides comprises a barcode sequence that differs between each compartment, thereby producing uniquely labelled DNA in each compartment; (d) cleaving the first hairpin polynucleotide that is attached to the 5’-end of each DNA strand; and (e) allowing the second hairpin polynucleotide at the 3’-end of each DNA strand to self-anneal, and extending the second hairpin polynucleotide along the DNA strand, to generate a library of barcoded DNA strands each having complementary regions covalently linked by the second hairpin polynucleotide.
- 2. The method of claim 1 , wherein each first hairpin polynucleotide comprises a first barcode sequence, wherein the first barcode sequence is different between each compartment, and wherein step (d) comprises selectively cleaving the first hairpin polynucleotide at a site 5’- to the first barcode sequence.
- 3. The method of claim 2, comprising pooling the DNA fragments from two or more compartments after introducing the first hairpin polynucleotide.
- 4. The method of claim 3, wherein the pooling is performed prior to introducing the second hairpin polynucleotide.
- 5. The method of any one of claims 2 to 4, wherein each second hairpin polynucleotide comprises a second barcode sequence. 008856557
- 6. The method of any one of claims 2 to 5, wherein the first hairpin polynucleotide comprises a non-canonical DNA nucleotide 5’- to the first barcode sequence.
- 7. The method of claim 6, wherein the non-canonical DNA nucleotide is a deoxyuridine residue.
- 8. The method of claim 6 or claim 7, wherein step (d) comprises contacting the DNA with a DNA glycosylase and optionally an endonuclease.
- 9. The method of any one of claims 1 to 8, wherein step (b) comprises contacting the cells or nuclei or lysate thereof with proteinase K.
- 10. The method of any one of claims 1 to 9, wherein step (c) comprising introducing the first hairpin polynucleotide to each DNA strand in a fragment and extending the 3’-end of the complementary DNA strand in said fragment to introduce the second hairpin polynucleotide to the complementary DNA strand.
- 11. The method of claim 10, wherein extending the 3’-ends in step is performed with a strand displacing polymerase to displace a base-paired region in the first hairpin polynucleotide of the complementary DNA strand.
- 12. The method of claim 10 or claim 11, wherein step (c) comprises extending the 3’-end of the complementary DNA strand to produce blunt-ended DNA fragments.
- 13. The method of any one of claims 1 to 12, wherein step (c) comprises cleaving the genomic DNA with a complex comprising a transposase bound to a barcoded hairpin polynucleotide, to give double-stranded DNA fragments wherein the first hairpin polynucleotide is covalently linked to the 5’-end of each DNA strand.
- 14. The method of claim 13, wherein the transposase is a Tn5 transposase, such as a wild-type Tn5 transposase or a variant thereof.
- 15. The method of claim 13 or claim 14, wherein step (c) comprises loading a transposase with the hairpin primer to produce the complex.
- 16. The method of any one of claims 1 to 15, wherein step (c) comprises ligating a barcoded hairpin polynucleotide to the 5’-end of the DNA fragments, to give double-stranded 008856557 DNA fragments wherein the first hairpin polynucleotide is covalently linked to the 5’-end of each DNA strand.
- 17. The method of any one of claims 1 to 16, wherein the plurality of compartments is a plurality of wells in a multi-well plate, such as a 96-well plate or a 384-well plate.
- 18. The method of any of claims 1 to 17, wherein step (e) comprises denaturing the DNA fragments prior to self-annealing of the second hairpin polynucleotide.
- 19. A method for sequencing DNA from a sample single cell, or the nucleus thereof, the method comprising the steps of: (i) providing a plurality of single cells comprising the sample single cell, or a plurality of nuclei comprising the nucleus of the sample single cell; (ii) distributing each single cell or single nucleus into respective compartments to provide a plurality of compartmentalised single cells or single nuclei; (iii) generating a DNA library from the compartmentalised cells or nuclei by a method according to any one of claims 1 to 18, wherein DNA from the sample single cell is uniquely labelled with one or more barcode sequences; and (iv) sequencing the library, and identifying the DNA from the sample single cell by the one or more barcode sequences.
- 20. A method for mapping the location of a modified cytosine residue from a sample single cell, the method comprising the steps of: (i) providing a plurality of single cells comprising the sample single cell, or a plurality of nuclei comprising the nucleus of the sample single cell; (ii) distributing each single cell or single nucleus into respective compartments to provide a plurality of compartmentalised single cells or single nuclei; (iii) generating a DNA library from the compartmentalised cells or nuclei by a method according to any one of claims 1 to 18, wherein DNA from the sample single cell is uniquely labelled with one or more barcode sequences; (iv) deaminating cytosine residues in the DNA library to form a treated DNA library; and (v) sequencing the treated library, and identifying the DNA from the sample single cell by the one or more barcode sequences. 008856557
Description
METHODS FOR NUCLEIC ACID ANALYSIS Related Application This present case is related to, and claims the benefit of, GB 2416167.1 filed on 01 November 2024 (01.11.2024), the contents of which are hereby incorporated by reference in their entirety. Technical Field This invention relates to methods for generating a DNA library from single cells. The invention also provides a method for sequencing DNA from a single cell, and methods for mapping the location of modified cytosine residues from a single cell. Background Information encoded in nucleic acids is fundamental to the biology of living systems. There are multiple dimensions of information stored within DNA. Genetic sequencing of the DNA bases G, C, T and A has been transformed by high-throughput sequencing approaches in the past two decades. Epigenetic information in DNA provides insights into dynamic changes in biology that are closely associated with transcriptional programs (He et al., 2022) and cell fate (Mazid et al., 2022). The combination of genetic and epigenetic information provides a more comprehensive view of biology. More recently, 5-hydroxymethylcytosine (5hmC) has emerged as an important base modification that can provide information that goes beyond 5mC and genetics (Sprujit et al., 2013, Mellen et al., 2017). Hitherto, researchers have accessed either genetic or epigenetic information, without resolving 5mC from 5hmC. Commonly used sequencing approaches do not capture full information from both genetics and epigenetics. Next-generation sequencing directly captures the canonical bases G, C, T and A in its readout (Bentley et al., 2008). A number of base-conversion chemistries have been developed to help differentiate unmodified C from its epigenetic variants, 5mC or 5hmC. These include bisulfite-based approaches such as whole-genome bisulfite sequencing (WGBS) (Frommer et al., 1992) and bisulfite-free approaches such as enzymatic-methyl sequencing (EM-seq) (Vaisvila et al., 2021) and TET-assisted pyridine borane sequencing (Liu et al., 2019). An important shortfall of all such methods is that conversion of either the C base, or one of its epigenetic derivatives, to a II (read as T) compromises the direct detection of genetic C-to-T changes, which is the most common mutation in the mammalian genome (Cagan et al., 2022) and in cancer (Alexandrov et al., 2020). Furthermore, the ambiguity caused by C-to-T conversions in the sequenced reads being mapped against either C or T in the reference genome increases false-positive matches in the search space, consequently making computational alignment and mapping of 008856551 converted reads slower, more expensive and less accurate (Xi et al., 2009). Also, these existing methods cannot distinguish 5mC from 5hmC in a single workflow. Methods to distinguish 5hmC from 5mC by exclusively converting only one base have been developed for example oxidative bisulfite sequencing (Booth et al., 2013), TET-assisted pyridine borane sequencing-beta (Liu et al., 2021), Tet-assisted bisulfite sequencing (Yu et al., 2012) and APOBEC-coupled epigenetic sequencing (Schutsky et al., 2018) or by selectively copying 5mC across strands of DNA (WO 2013/090588; Kawasaki et al., 2017). However, some of these can involve separate, parallel workflows and sequencing to yield full information, which may increase sample requirement, cost and time taken and/or yield data that lack phased information. Combining separate datasets is fraught with difficulties that lead to additive measurement error and coverage gaps across workflows. There is a need for methods to detect epigenetic modifications in nucleic acids, particularly for single cell applications. Accordingly, the present inventors have developed new methods for generating DNA libraries from single cells that allow for the detection of epigenetic modifications with high sensitivity. Summary of the Invention At its most general, the present invention relates to a method of generating a DNA library from a single cell by compartmentalising single cells or nuclei from single cells, fragmenting the DNA, and attaching a barcode to each DNA fragment, as well as a 3’-end hairpin polynucleotide. The method comprises self-annealing of the 3’-end hairpin polynucleotide, which is then extended along the DNA fragment, to provide a library of barcoded, hairpin- tagged DNA fragments comprising an original strand and a copy strand. The barcode can be used to identify DNA from each cell. The methods of the invention are thus useful in single-cell sequencing. The methods are particularly useful for identifying both genetic and epigenetic features from single-cell DNA, since epigenetic modifications are preserved in the original DNA strand, while the copy strand allows the genetic sequence to be accurately mapped. The decoding of bases across an original strand and a copy strand provides a simultaneous readout of genetic and epigenetic bases with high accuracy. In a first aspect, the invention p