WO-2026093510-A1 - METHODS FOR NUCLEIC ACID ANALYSIS AND FOR GENERATING A LIBRARY OF TAGGED DNA

WO2026093510A1WO 2026093510 A1WO2026093510 A1WO 2026093510A1WO-2026093510-A1

Abstract

The invention provides methods for generating a library of tagged DNA. The methods comprise cleaving a double-stranded DNA with a complex comprising a transposase bound to a hairpin primer, extending the 3'-ends of each strand in the cleaved DNA fragments such that a complementary hairpin primer is introduced to the DNA strands, allowing the complementary hairpin primer to self-anneal, and extending the complementary hairpin primer to generate the library of tagged DNA. The invention also provides a method of mapping the location of a modified cytosine residue in a double-stranded DNA sample, as well as a DNA library, an isolated transposase complex, and a kit of parts.

Inventors

GOLDER, Paula
CHEN, JINFENG
VALENT, Iris
CIAU-UTIZ, Romualdo
GOSAL, Walraj
TAIPALE, Minna
KOKKO-GONZALES, Paula
SHI, Chenfu
MONAHAN, Jack
VIVIAN, Julia

Assignees

BIOMODAL LIMITED

Dates

Publication Date: 20260507
Application Date: 20251031
Priority Date: 20241101

Claims (20)

1 . A method for generating a library of tagged DNA, the method comprising the steps of: (i) cleaving a double-stranded DNA with a complex comprising a transposase bound to a hairpin primer, to give double-stranded DNA fragments where the hairpin primer is covalently linked to the 5’-end of each strand in each double-stranded DNA fragment; (ii) extending the 3’-ends of each strand in the double-stranded DNA fragments from step (i) to produce DNA fragments comprising a complementary hairpin primer at the 3’-end of each DNA strand; (iii) removing all or a portion of the hairpin primer from the 5’-end of each DNA strand; (iv) allowing the complementary hairpin primer to self-anneal; and (v) extending the complementary hairpin primer to generate a library of tagged DNA having complementary regions covalently linked at an end by the complementary hairpin primer.
2. The method of claim 1 , wherein extending the 3’-ends in step (ii) comprises displacing a base-paired region in the hairpin primer of the complementary strand.
3. The method of claim 1 or claim 2, wherein extending the 3’-ends in step (ii) is performed with a strand displacing polymerase to displace a base-paired region in the hairpin primer of the complementary strand.
4. The method of any one of claims 1 to 3, wherein the hairpin primer comprises a cleavage site for removing all or a portion of the hairpin primer from the 5’-end of each strand.
5. The method of any one of claims 1 to 4, wherein the cleavage site is a non-canonical DNA nucleotide, such as a deoxyuridine residue.
6. The method of claim 5, wherein step (iii) comprises contacting the DNA strands with a glycosylase and optionally an endonuclease.
7. The method of claim 6, wherein the hairpin primer comprises a deoxyuridine residue, and wherein step (iii) comprises contacting the DNA strands with a uracil glycosylase and optionally an endonuclease. 008856544
8. The method of any one of claims 4 to 7, wherein the hairpin primer comprises a barcode sequence.
9. The method of any one of claims 4 to 8, wherein the cleavage site is positioned 3’- to the barcode sequence.
10. The method of any one of claims 4 to 8, wherein the cleavage site is positioned 5’- to the barcode sequence.
11 . The method of any one of claims 4 to 10, wherein the hairpin primer comprises two or more barcode sequences.
12. The method of any one of claims 1 to 11 , wherein the transposase is a Tn5 transposase, such as a wild-type Tn5 transposase or a variant thereof.
13. The method of any one of claims 1 to 12, wherein step (i) comprises loading a transposase with the hairpin primer to produce the complex.
14. The method of any one of claims 1 to 13, wherein the double-stranded DNA is genomic DNA.
15. The method of any one of claims 1 to 14, wherein the DNA fragments produced in step (ii) are blunt-ended DNA fragments.
16. The method of any of claims 1 to 15, wherein step (iv) comprises denaturing the DNA fragments prior to self-annealing, such as denaturing complementary regions between the hairpin primer and the complementary hairpin primer.
17. A method for generating a sequencing library, the method comprising the steps of: (a) generating a library of tagged DNA by a method according to any one of claims 1 to 16; and (b) ligating a sequencing adapter to the free ends of the tagged DNA.
18. A method of mapping the location of one or more modified cytosine residues in a double-stranded DNA sample, comprising: (a) generating a DNA library by a method according to any one of claims 1 to 16; 008856544 (b) deaminating cytosine residues in the DNA library to form a treated DNA library; and (c) sequencing the treated DNA library.
19. The method of claim 18, comprising protecting one or more modified cytosine residues from deamination.
20. The method of claim 18 or claim 19, comprising: identifying a cytosine:guanine base pair in the complementary regions of the DNA in the treated library as the location of a modified cytosine residue; and/or identifying a uracil:guanine or a thymine:guanine base pair in the complementary regions of the DNA in the treated library as the location of a cytosine residue.

Description

METHODS FOR NUCLEIC ACID ANALYSIS Related Application This present case is related to, and claims the benefit of, GB 2416165.5 filed on 01 November 2024 (01.11.2024), the contents of which are hereby incorporated by reference in their entirety. Technical Field This invention relates to methods for generating DNA libraries using a transposase, and methods of mapping the location of modified cytosine residues in a DNA sample using the DNA libraries. The invention also provides a DNA library, and also a transposase complex and a kit. Background Information encoded in nucleic acids is fundamental to the biology of living systems. There are multiple dimensions of information stored within DNA. Genetic sequencing of the DNA bases, G, C, T and A, has been transformed by high-throughput sequencing approaches in the past two decades. Epigenetic information in DNA provides insights into dynamic changes in biology that are closely associated with transcriptional programs (He et al., 2022) and cell fate (Mazid et al., 2022). The combination of genetic and epigenetic information provides a more comprehensive view of biology. More recently, 5-hydroxymethylcytosine (5hmC) has emerged as an important base modification that can provide information that goes beyond 5-methylcytosine (5mC) and genetics (Sprujit et al., 2013, Mellen et al., 2017). Hitherto, researchers have accessed either genetic or epigenetic information, without resolving 5mC from 5hmC. Commonly used sequencing approaches do not capture full information from both genetics and epigenetics. Next-generation sequencing directly captures the canonical bases G, C, T and A in its readout (Bentley et al., 2008). A number of base-conversion chemistries have been developed to help differentiate unmodified C from its epigenetic variants, 5mC or 5hmC. These include bisulfite-based approaches such as whole-genome bisulfite sequencing (WGBS) (Frommer et al., 1992) and bisulfite-free approaches such as enzymatic-methyl sequencing (EM-seq) (Vaisvila et al., 2021) and TET-assisted pyridine borane sequencing (Liu ef a/.,2019). An important shortfall of all such methods is that conversion of either the C base, or one of its epigenetic derivatives, to a U (read as T) compromises the direct detection of genetic C-to-T changes, which is the most common mutation in the mammalian genome (Cagan et al., 2022) and in cancer (Alexandrov et al., 2020). Furthermore, the ambiguity caused by C-to-T conversions in the sequenced reads being mapped against either C or T in the reference genome increases false-positive 008856544 matches in the search space, consequently making computational alignment and mapping of converted reads slower, more expensive and less accurate (Xi et al., 2009). Also, these existing methods cannot distinguish 5mC from 5hmC in a single workflow. Methods to distinguish 5hmC from 5mC by exclusively converting only one base have been developed for example oxidative bisulfite sequencing (Booth et al., 2013), TET-assisted pyridine borane sequencing-beta (Liu et al., 2021), Tet-assisted bisulfite sequencing (Yu et al., 2012) and APOBEC-coupled epigenetic sequencing (Schutsky et al., 2018) or by selectively copying 5mC across strands of DNA (WO 2013/090588; Kawasaki et al., 2017). However, some of these can involve separate, parallel workflows and sequencing to yield full information, which may increase sample requirement, cost and time taken and/or yield data that lack phased information. Combining separate datasets is fraught with difficulties that lead to additive measurement error and coverage gaps across workflows. There is a need for methods to detect epigenetic modifications in nucleic acids with high accuracy. Accordingly, the present inventors have developed new methods for generating DNA libraries in high yield that allow for the detection of epigenetic modifications with high sensitivity. Summary of the Invention At its most general, the present invention relates to a method for generating a library of tagged DNA strands from a DNA sample, the tagged DNA strands each having a template strand region and a complementary strand region, which regions are covalently linked together by a hairpin tag. Libraries generated in this way are useful for simultaneously decoding the genetic and the epigenetic sequences from a DNA sample. Any modified nucleobases that are present in the template strand are preserved in the tagged DNA, and the resultant libraries can be used to differentiate modified nucleobases from canonical nucleobases in the template strand with high accuracy. The method includes fragmenting a piece of DNA and covalently linking a hairpin primer to the 5’-end of each DNA fragment using a transposase enzyme. The 3’-ends of the resultant fragments are extended, using the hairpin primer that is covalently linked to the complementary strand within the same fragment as a template. This generates DNA strands each having a 3’-end hairpin primer, with the primer be