EP-4735625-A1 - USING FLOWCELL SPATIAL COORDINATES TO LINK READS FOR IMPROVED GENOME ANALYSIS

EP4735625A1EP 4735625 A1EP4735625 A1EP 4735625A1EP-4735625-A1

Abstract

Methods are provided for assigning nucleic acid sequence reads to target polynucleotides, including providing a substrate having transposome complexes immobilized thereon, wherein the transposome complexes comprise a transposase and a first polynucleotide comprising an end sequence and a first tag; contacting the transposome complexes with target polynucleotides under conditions to fragment the target polynucleotides; amplifying the fragmented target polynucleotides to form a plurality of nucleic acid clusters on the substrate; obtaining location information for the plurality of nucleic acid clusters on the substrate; determining the nucleic acid sequence reads of the fragmented nucleic acids in each of the nucleic acid clusters; and assigning the nucleic acid sequence reads to the target polynucleotides using the obtained location information.

Inventors

WEIR, Jacqueline
ANDREWS, Daniel James
TIAN, YUAN
GORMLEY, NIALL ANTHONY
ZANARELLO, FABIO

Assignees

Illumina, Inc.

Dates

Publication Date: 20260506
Application Date: 20240625

Claims (20)

WHAT IS CLAIMED IS: 1. A method for assigning nucleic acid sequence reads to target polynucleotides comprising: providing transposome complexes, wherein the transposome complexes comprise a transposase and a first polynucleotide comprising an end sequence and a first tag; contacting the transposome complexes with target polynucleotides under conditions to fragment the target polynucleotides; amplifying the fragmented target polynucleotides to form a plurality of nucleic acid clusters on a substrate; obtaining location information for the plurality of nucleic acid clusters on the substrate; determining the nucleic acid sequence reads of the fragmented nucleic acids in each of the nucleic acid clusters; and assigning the nucleic acid sequence reads to the target polynucleotides using the obtained location information.
2. The method of claim 1, wherein a length of the target polynucleotides is greater than a length of the fragment.
3. The method of claim 1, wherein assigning the nucleic acid sequence reads comprises determining the distance between each of the clusters and using the determined distance to assign reads to a specific target polynucleotide.
4. The method of claim 3, wherein assigning the nucleic acid sequence reads comprises determining for a likelihood score that at least a first and a second cluster on the substrate derive from the same target polynucleotide.
5. The method of claim 4, further comprising increasing the likelihood score for the first cluster when the spatial distance between at least the first and second clusters are below a threshold value.
6. The method of claim 4, further comprising increasing the likelihood score for the first cluster when a genomic distance between at least the first and second clusters are below a threshold value.
7. The method of claim 5, further comprising increasing the likelihood score for the first cluster when a genomic distance between at least the first and second clusters are below the threshold value.
8. The method of claim 4, further comprising increasing the likelihood score for the first cluster when the spatial distance and a genomic distance between at least the first and second clusters are below a threshold value.
9. The method of any one of claims 5-8, wherein the likelihood score is influenced by a pitch of the substrate, size of the substrate, a pattern of the substrate, temperature, loading density, fragment directionality, or a combination thereof.
10. The method of claim 9, further comprising determining whether the target nucleic acid has a variant when the spatial distance between at least the first and second clusters are below a threshold value and when the genomic distance is above the genomic distance threshold.
11. The method of claim 10, wherein the spatial distance threshold from a cluster forms a pattern of an ellipse or a circle around the cluster.
12. The method of claim 4, wherein the likelihood score of the first cluster is 0.
13. The method of claim 4, wherein the likelihood score of the second cluster is above 30.
14. The method of claim 4, wherein the likelihood score of one or more other clusters is above 30.
15. The method of any one of claims 1-14, wherein the location information comprises a first spatial coordinate and a second spatial coordinate in a cartesian coordinate system.
16. The method of claim 12, further comprising sorting the plurality of the nucleic acid clusters by their spatial coordinates.
17. The method of claim 1, wherein said transposome complexes comprise a second polynucleotide comprising a region complementary to the transposon end sequence.
18. The method of claim 1, wherein the transposome complexes are present on the substrate at a density of at least 10 3 , 10 4 , 10 5 , 10 6 , 10 7 , 10 8 , 10 9 , or 10 10 or more complexes per mm 2 .
19. The method of claim 1, wherein said transposome complexes comprise a hyperactive Tn5 transposase.
20. The method of claim 1, wherein the substrate comprises microparticles.

Description

ILLINC.801WO / IP-2629-PCT PATENT USING FLOWCELL SPATIAL COORDINATES TO LINK READS FOR IMPROVED GENOME ANALYSIS INCORPORATION BY REFERENCE TO ANY PRIORITY APPLICATIONS [0001] This application claims priority to U.S. Provisional Application No. 63/511,593 filed June 30, 2023, the content of which is incorporated by reference in its entirety. BACKGROUND [0002] Traditional nucleic acid sequencing methods, and several types of next- generation sequencing methods, use a shotgun approach to sequence large genomic DNA fragments, called template genomic sequences. Specifically, template genomic sequences are first fragmented in solution into smaller pieces that are amenable to next-generation sequencing methods on a flowcell. One of the difficulties of this approach is that by the time the smaller sequence fragments from the template genomic sequences have been read, knowledge of their connectivity and proximity to each other in the original template genomic sequence is lost. The process of ordering the sequence fragments to arrive at the sequence of the original template genomic sequence is generally referred to as "assembly." Assembly processes can be computationally intensive and time-consuming. In addition, sequence and assembly errors can become a problem depending upon the sequencing methodology used and the quality of genomic DNA samples under evaluation. [0003] Moreover, many genomes of interest contain more than one version of each chromosome. For example, the human genome is diploid, having two sets of chromosomes— one set inherited from each parent. Some organisms have polyploid genomes with more than two sets of chromosomes. Examples of polyploid organisms include animals, such as salmon, and many plant species such as wheat, apple, oat and sugar cane. When diploid and polyploid genomes are fragmented and sequenced in typical shotgun methods, phasing information, pertaining to the identity of which fragments came from which set of chromosomes, is lost. This phasing information can be difficult or impossible to reconstruct using typical shotgun methods. [0004] Somewhat similar yet often more complex difficulties can arise when mixed samples are evaluated. Mixed samples can contain nucleic acid molecules, such as chromosomes, mRNA transcripts, plasmids etc., from two or more organisms. Mixed samples having multiple organisms are often referred to as metagenomic samples. Other examples of mixed samples are different cells or tissues that although being derived from the same organism have different characteristics. Examples include cancerous tissues which may comprise a mixture of healthy cells and cancerous cells, tissues that may comprise pre- cancerous cells and cancerous cells, tissue that may comprise two or more different types of cancerous cells. Indeed, there may be a variety of different types of cancer cells as is the case for cancer samples that have mosaicity. Another example of different cells derived from a single organism are mixtures of maternal and fetal cells obtained from a pregnant female (e.g. from the blood or from tissues). When mixed nucleic acid samples are fragmented and sequenced in typical shotgun methods information pertaining to the identity of which fragments came from which cell, organism or other source is lost. This origin information can be difficult or impossible to reconstruct using typical shotgun methods. SUMMARY [0005] The methods disclosed herein each have several aspects, no single one of which is solely responsible for their desirable attributes. Without limiting the scope of the claims, some prominent features will now be discussed briefly. Numerous other embodiments are also contemplated, including embodiments that have fewer, additional, and/or different components, steps, features, objects, benefits, and advantages. The components, aspects, and steps may also be arranged and ordered differently. After considering this discussion, and particularly after reading the section entitled “Detailed Description”, one will understand how the features of the devices and methods disclosed herein provide advantages over other known devices and methods. [0006] In some embodiments, a method for assigning nucleic acid sequence reads to target polynucleotides is provided, the method including providing transposome complexes, wherein the transposome complexes include a transposase and a first polynucleotide including an end sequence and a first tag; contacting the transposome complexes with target polynucleotides under conditions to fragment the target polynucleotides; amplifying the fragmented target polynucleotides to form a plurality of nucleic acid clusters on a substrate; obtaining location information for the plurality of nucleic acid clusters on the substrate; determining the nucleic acid sequence reads of the fragmented nucleic acids in each of the nucleic acid clusters; and assigning the nucleic acid sequence reads to the target polynucleotides using the obtained l