JP-7856276-B2 - Nucleic acid sequencing by emergence

JP7856276B2JP 7856276 B2JP7856276 B2JP 7856276B2JP-7856276-B2

Inventors

ミア，カリム

Assignees

エックスゲノムズコーポレーション

Dates

Publication Date: 20260511
Application Date: 20181129
Priority Date: 20171129

Claims (20)

A method for sequencing nucleic acids, (a) Immobilizing the nucleic acid on a test substrate in a double-stranded linearized extension form, thereby forming an immobilized extended double-stranded nucleic acid; (b) Denature the immobilized extended double-stranded nucleic acid into a single-stranded form on the test substrate, thereby obtaining an immobilized first strand and an immobilized second strand of the nucleic acid, wherein each base of the immobilized second strand is located adjacent to the corresponding complementary base of the immobilized first strand; (c) Exposure the immobilized first chain and the immobilized second chain to each oligonucleotide probe in a set of oligonucleotide probes, wherein each oligonucleotide probe in the set of oligonucleotide probes is of a predetermined sequence and length and includes a label, the label being selected from the group consisting of dyes, fluorescent nanoparticles, light scattering particles and FRET partners, and the exposure (c) occurs under conditions that cause each individual probe of each oligonucleotide probe to bind to a portion of the immobilized first chain or the immobilized second chain complementary to each oligonucleotide probe and to form each heteroduplex, thereby causing each generation of optical activity; (d) Using a two-dimensional imager, measure the location and duration of each occurrence of optical activity on the test substrate during exposure (c); (e) Repeating exposure (c) and measurement (d) for each oligonucleotide probe in the set of oligonucleotide probes, thereby obtaining a plurality of sets of positions on the test substrate, wherein each set of positions on the test substrate corresponds to one oligonucleotide probe in the set of oligonucleotide probes; and (f) Determining the sequence of at least a portion of the nucleic acids from the plurality of sets of positions on the test substrate by compiling the oligonucleotide probe sequence on the test substrate represented by the plurality of sets of positions on the test substrate. Methods that include...
The method according to claim 1, wherein the exposure (c) occurs under conditions that transiently and reversibly bind each individual probe of each pool of each oligonucleotide probe to a portion of the immobilized first or immobilized second chain complementary to the individual probe and form each heteroduplex, thereby generating optical activity.
The method according to claim 1, wherein the exposure (c) occurs under conditions that repeatedly, transiently, and reversibly bind each individual probe of each pool of each oligonucleotide probe to a portion of the immobilized first or immobilized second chain that is complementary to the individual probe, and that each heteroduplex is formed, thereby repeatedly causing each generation of optical activity.
The method according to claim 1, wherein in step (c) of exposure, each oligonucleotide probe in the set of oligonucleotide probes is an oligonucleotide probe conjugated to a label.
The aforementioned exposure occurs in the presence of a first label in the form of a dye. Each oligonucleotide probe in the set of oligonucleotide probes is an oligonucleotide probe bound to a second label, and the first label causes the second label to fluoresce when the first label and the second label are in close proximity to each other. The method according to claim 1.
The method according to claim 1, wherein one or more oligonucleotide probes in the set of oligonucleotide probes are exposed to the immobilized first strand and the immobilized second strand during exposure (c).
The method according to claim 1, wherein different oligonucleotide probes in the set of oligonucleotide probes exposed to the immobilized first strand and the immobilized second strand during the exposure (c) are associated with different labels.
The method according to claim 1, wherein the exposure (c) is performed on the first oligonucleotide probe in the set of oligonucleotide probes at a first temperature, and the repetition (e) of the exposure (c) and the measurement (d) is performed on the first oligonucleotide probe at a second temperature.
The exposure (c) is performed on the first oligonucleotide probe in the set of oligonucleotide probes at a first temperature. The repetition (e ) of the exposure (c) and measurement (d) includes performing the exposure (c) and measurement (d) on the first oligonucleotide probe at each of a plurality of different temperatures, and further includes constructing a melt curve for the first oligonucleotide probe using the locations and periods of optical activity measured by the measurement (d) for the first temperature and each of the plurality of different temperatures, The method according to claim 1.
The method according to claim 1, wherein the set of oligonucleotide probes comprises a plurality of subsets of the oligonucleotide probes, and the repetition (e) of exposure (c) and measurement (d) is performed for each respective subset of oligonucleotide probes in the plurality of subsets of oligonucleotide probes.
Measuring the location on the test substrate includes identifying and fitting each occurrence of optical activity using a Gaussian function or Fourier transform in order to identify the center of each occurrence of optical activity in an image frame of data obtained by the two-dimensional imager and to fit it to the location on the test substrate, and the center of each occurrence of optical activity is considered to be the position of each occurrence of optical activity on the test substrate. The method according to claim 1.
Each occurrence of optical activity persists across multiple image frames measured by the two-dimensional imager. Measuring the location on the test substrate includes identifying each occurrence of optical activity across the plurality of image frames using a Gaussian function or Fourier transform and fitting it to the location on the test substrate in order to identify the center of each occurrence of optical activity across the plurality of image frames. The center of each occurrence of optical activity is considered to be the position of each occurrence of optical activity on the test substrate across the plurality of image frames. The method according to claim 1.
Measuring the location on the test board includes inputting image frames of the data measured by the two-dimensional imager into a trained convolutional neural network. The image frame of the data includes each of the occurrences of optical activity among a plurality of occurrences of optical activity, Each occurrence of optical activity in the plurality of occurrences of optical activity corresponds to an individual probe that binds to a portion of the fixed first chain or the fixed second chain. In response to the input, the trained convolutional neural network identifies the location on the test substrate of one or more occurrences of optical activity in the plurality of occurrences of optical activity. The method according to claim 1.
The method according to any one of claims 11 to 12, wherein the measurement resolves the center of each occurrence of optical activity at a position on the test substrate with a positional accuracy of at least 20 nm.
The method according to any one of claims 1 to 14, wherein the measurement of the location and duration on the test substrate for each occurrence of optical activity (d) is performed to measure more than 5,000 photons at the location.
The method according to any one of claims 1 to 15, wherein the standard deviation of each occurrence of optical activity is greater than a predetermined number of standard deviations with respect to the background observed on the test substrate.
The method according to claim 16, wherein the predetermined number of standard deviations is greater than 3.
The method according to any one of claims 1 to 17, wherein each oligonucleotide probe in the plurality of oligonucleotide probes contains a unique N-mer sequence, where N is an integer in the set {1, 2, 3, 4, 5, 6, 7, 8, 9, and 10} and all unique N-mer sequences of length N are represented by the plurality of oligonucleotide probes.
The method according to claim 18, wherein the unique N-mer sequence includes one or more nucleotide positions occupied by one or more degenerate nucleotide positions.
The method according to any one of claims 1 to 19, wherein the test substrate is cleaned before repeating the exposure (c) and measurement (d), thereby removing each oligonucleotide probe from the test substrate before exposing the test substrate to another oligonucleotide probe in the set of oligonucleotide probes.

Description

Cross-reference of related applications This application claims priority to U.S. Patent Application No. 62/591,850, titled “Sequencing by Emergence,” filed November 29, 2017, which is incorporated herein by reference. This disclosure generally relates to systems and methods for sequencing nucleic acids via the transient binding of probes to one or more polynucleotides. Background DNA sequencing was initially made possible by gel electrophoresis-based methods, namely dideoxy chain arrest (e.g., Sanger et al., Proc. Natl. Acad. Sci. 74:5463–5467, 1977) and chemical decomposition (e.g., Maxam et al., Proc. Natl. Acad. Sci. 74:560–564, 1977). Both of these methods for sequencing nucleotides were time-consuming and expensive. Nevertheless, the former, after spending hundreds of millions of dollars over a period of more than a decade, led to the sequencing of the human genome for the first time. As the dream of personalized medicine draws closer to reality, there is a growing demand for inexpensive, large-scale methods for sequencing individual human genomes (Mir, Sequencing Genomes: From Individuals to Populations, Briefings in Functional Genomics and Proteomics, 8: 367-378, 2009). Several sequencing methods that avoid gel electrophoresis (and are, secondly, less expensive) have been developed as "next-generation sequencing." One such sequencing method using reversible terminators (implemented by Illumina Inc.) is the most promising. The detection methods used in the most advanced form of Sanger sequencing and currently the most promising in Illumina's technology include fluorescence. Other possible means of detecting single nucleotide insertions include detection utilizing proton emission (e.g., via field-effect transistors, ion currents through nanopores, and electron microscopes). Illumina chemistry involves cyclic addition of nucleotides using reversible terminators (Canard et al., Metzker Nucleic Acids Research 22:4259–4267, 1994), which supports fluorescent labels (Bentley et al., Nature 456:53–59, 2008). Illumina sequencing starts with clonal amplification of a single genome molecule, requiring substantial sample pretreatment to convert the target genome into a library, which is then clonally amplified as clusters. However, two methods subsequently emerged that circumvented the requirement of pre-sequencing amplification. Both new methods perform fluorescence sequencing (SbS) by synthesis of single-molecule DNA. The first method, by HelicosBio (now SeqLL), performs stepwise SbS including reversible termination (Harris et al., Science, 320:106-9, 2008). The second method, SMRT sequencing by Pacific Biosciences, utilizes labeling on terminal phosphates, which are innate leaving groups in the nucleotide incorporation reaction, allowing sequencing to be performed continuously without the need to change reagents. One drawback of this approach is its low processing power, as the detector must remain fixed in a single field of view (e.g., Levene et al., Science 299:682–686, 2003 and Eid et al., Science, 323:133–8, 2009). A somewhat similar approach to PCI Bioscience sequencing is a method currently under development by Genia (now at Roche), which detects SbS via nanopores rather than optically. The most commonly used sequencing methods have limited read lengths, increasing both the cost of sequencing and the difficulty of assembling the resulting reads. Sanger sequencing yields reads in the range of 1000 bases (e.g., Kchouk et al., Biol. Med. 9:395, 2017). Roche 454 sequencing and Ion Torrent both produce reads in the range of several hundred bases. Illumina sequencing, which initially started with reads of approximately 25 bases, now typically produces 150–300 base pair reads. However, because each base in the read length requires the supply of fresh reagent, sequencing 250 bases rather than 25 requires 10 times longer time and 10 times more expensive reagents. In recent years, the standard read length for Illumina instruments has been reduced to approximately 150 base pairs, likely because longer reads are susceptible to phasing (where molecules within a cluster lose synchronization), which can lead to errors in the technique. The longest read lengths achievable with commercially available systems are obtained by Oxford Nanopore Technology (ONT) nanopore chain sequencing and Pacific Bioscience (PacBio) sequencing (e.g., Kchouk et al., Biol. Med. 9:395, 2017). The latter routinely yields reads of approximately 10,000 base pairs on average, while the former, though very rarely, can produce reads of several hundred kilobase pairs (e.g., Laver et al., Biomol. Det. Quant. 3:1-8, 2015). While these long read lengths are desirable in terms of alignment, they come at the cost of accuracy. Due to their often extremely low accuracy, these methods cannot be used as standalone sequencing technologies for most human sequencing applications, although they can be used as aids to Illumina sequencing. Furthermore, the processi