US-12620098-B2 - Creating a template of nucleic acid site locations on a flow cell
Abstract
Methods and systems for analysis of image data generated from various reference points. Particularly, the methods and systems provided are useful for real time analysis of image and sequence data generated during DNA sequencing methodologies.
Inventors
- Francisco Jose Garcia
- Klaus Maisinger
- Stephen Tanner
- John A. Moon
- Tobias Mann
- Michael Lawrence Parkinson
- Anthony James Cox
- Haifang H. Ge
Assignees
- ILLUMINA, INC.
Dates
- Publication Date
- 20260505
- Application Date
- 20250708
Claims (20)
- 1 . A method for creating a template of nucleic acid site locations on a flow cell, comprising: obtaining fluorescent emission signals having different emission spectra from nucleic acid sites on a flow cell, wherein the different emission spectra are detected in different detection channels; identifying candidate site locations on the flow cell from the fluorescent emission signals; processing intensities of the fluorescent emission signals to determine a plurality of parameters for the candidate site locations, wherein at least one parameter of the plurality of parameters is indicative of relative intensities of the fluorescent emission signals detected at the candidate site locations; and generating a template of locations of the nucleic acid sites on the flow cell based on the at least one parameter of the plurality of parameters.
- 2 . The method of claim 1 , further comprising determining base calls for nucleic acids bound to the flow cell at the nucleic acid sites according to the template.
- 3 . The method of claim 2 , wherein the base calls are determined over a plurality of cycles, each cycle of the plurality of cycles comprising delivery of different types of nucleotides to the nucleic acid sites.
- 4 . The method of claim 3 , wherein the different types of nucleotides are delivered simultaneously to the nucleic acid sites.
- 5 . The method of claim 3 , further comprising determining phasing parameters based on intensities of additional fluorescent emission signals detected during the plurality of cycles; and generating corrected intensities based on the phasing parameters.
- 6 . The method of claim 3 , further comprising determining cross-talk parameters based on the intensities of the additional fluorescent emission signals detected during the plurality of cycles; and generating corrected intensities based on the cross-talk parameters.
- 7 . The method of claim 6 , wherein each cross-talk parameter of the cross-talk parameters comprises an angle.
- 8 . The method of claim 1 , further comprising registering one or more images to the template, wherein the one or more images comprise fluorescent emission signals indicative of nucleotides present at the nucleic acid sites.
- 9 . The method of claim 8 , wherein registering the one or more images to the template comprises aligning the template and the one or more images based on an offset.
- 10 . The method of claim 8 , wherein registering the one or more images to the template comprises aligning the template and the one or more images using an affine transformation.
- 11 . The method of claim 1 , wherein the processing comprises: determining a first parameter of the plurality of parameters based on a first function comprising the intensities of the fluorescent emission signals, wherein the intensities of the fluorescent emission signals are detected at a first candidate site location; and determining a second parameter of the plurality of parameters based on a second function comprising the intensities of the fluorescent emission signals, wherein the intensities of the fluorescent emission signals are detected at a second candidate site location.
- 12 . The method of claim 11 , wherein the first function comprises a first ratio, and the second function comprises a second ratio.
- 13 . The method of claim 12 , wherein the first ratio comprises a highest intensity fluorescent emission signal and a second highest intensity fluorescent emission signal of the fluorescent emission signals detected at the first candidate site location; and wherein the second ratio comprises a highest intensity fluorescent emission signal and a second highest intensity fluorescent emission signal of the fluorescent emission signals detected at the second candidate site location.
- 14 . The method of claim 11 , further comprising determining that the first candidate site location and the second candidate site location are within a defined distance.
- 15 . The method of claim 14 , wherein the defined distance is measured in pixels or sub-pixels.
- 16 . The method of claim 14 , wherein the defined distance is a radius.
- 17 . The method of claim 14 , further comprising adjusting the defined distance based on a density of the nucleic acid sites.
- 18 . The method of claim 14 , further comprising excluding from the template the first candidate site location or the second candidate site location when the first candidate site location and the second candidate site location are within the defined distance.
- 19 . The method of claim 13 , further comprising including in the template the first candidate site location or the second candidate site location based on comparing the first parameter and the second parameter.
- 20 . The method of claim 1 , wherein the relative intensities comprise refined signal intensities.
Description
RELATED APPLICATIONS This application is a continuation of U.S. application Ser. No. 19/013,507, filed Jan. 8, 2025, which is a continuation of U.S. application Ser. No. 18/303,436, filed Apr. 19, 2023, now U.S. Pat. No. 12,223,651 issued Feb. 11, 2025, which is a continuation of U.S. application Ser. No. 17/445,994 filed Aug. 26, 2021, now U.S. Pat. No. 11,676,275 issued Jun. 13, 2023, which is a continuation of U.S. application Ser. No. 17/157,622 filed Jan. 25, 2021, now U.S. Pat. No. 11,605,165 issued Mar. 14, 2023, which is a continuation of U.S. application Ser. No. 16/378,894 filed Apr. 9, 2019, which is a continuation of U.S. application Ser. No. 15/354,540 filed Nov. 17, 2016, now U.S. Pat. No. 10,304,189 issued May 28, 2019, which is a continuation of U.S. application Ser. No. 14/608,471 filed Jan. 29, 2015, now U.S. Pat. No. 9,530,207 issued Dec. 27, 2016, which is a continuation of U.S. application Ser. No. 13/006,206 filed Jan. 13, 2011, now U.S. Pat. No. 8,965,076 issued Feb. 24, 2015, which claims the benefit of U.S. Provisional Application No. 61/294,811 filed on Jan. 13, 2010 and U.S. Provisional Application No. 61/321,029 filed on Apr. 5, 2010, each of which is hereby incorporated by reference in its entirety. BACKGROUND Field of the Invention Embodiments disclosed herein relate to methods and systems for analysis of image data generated at multiple reference points, and particularly to image and sequence data generated during DNA sequencing. Description of the Related Art The analysis of image data presents a number of challenges, especially with respect to comparing images of an item or structure that are captured from different points of reference. One field that exemplifies many of these challenges is that of nucleic acid sequence analysis. The detection of specific nucleic acid sequences present in a biological sample has a wide variety of applications, such as identifying and classifying microorganisms, diagnosing infectious diseases, detecting and characterizing genetic abnormalities, identifying genetic changes associated with cancer, studying genetic susceptibility to disease, and measuring response to various types of treatment. A valuable technique for detecting specific nucleic acid sequences in a biological sample is nucleic acid sequencing. Nucleic acid sequencing methodology has evolved significantly from the chemical degradation methods used by Maxam and Gilbert and the strand elongation methods used by Sanger. Today, there are a number of different processes being employed to elucidate nucleic acid sequence. A particularly popular sequencing process is sequencing-by-synthesis. One reason for its popularity is that this technique can be easily applied to massively parallel sequencing projects. For example, using an automated platform, it is possible to carry out hundreds of thousands of sequencing reactions simultaneously. Sequencing-by-synthesis differs from the classic dideoxy sequencing approach in that, instead of generating a large number of sequences and then characterizing them at a later step, real time monitoring of the incorporation of each base into a growing chain is employed. Although this approach might be viewed as slow in the context of an individual sequencing reaction, it can be used for generating large amounts of sequence information in each sequencing cycle when hundreds of thousands to millions of reactions are performed in parallel. Despite these advantages, the vast size and quantity of sequence information obtained through such methods can limit the speed and quality of analysis of sequence data. Thus, there is a need for methods and systems which improve the speed and accuracy of analysis of nucleic acid sequencing data. SUMMARY The present technology relates to methods and systems for analysis of image data. In particular exemplary embodiments, the technology relates to methods and systems for analysis of image data generated during nucleic acid sequencing. In some embodiments, such methods and systems include data acquisition and/or storage functions. In some embodiments of the present invention, such methods and systems permit the analysis of image data from sequencing processes with improved speed and accuracy. In some embodiments of the technology described herein, methods of performing image analysis are provided that allows image analysis to occur while storing large amounts of image data. The methods can include performing image analysis in the background of a process that preferentially acquires image data. Such methods can be performed by a single processor capable of time-division multiplexing or other multithreading process. In other embodiments, such methods are implemented using multiple processes that may or may not overlap temporally, for example, by utilizing two or more separate processors. An advantage that may be realized by such methods is a reduction in data storage requirements since analyzed data typically requires less stor