WO-2026096864-A1 - LIBRARIES, METHODS, AND KITS FOR INDEX COLOR BALANCING NUCLEIC ACIDS
Abstract
The invention provides compositions that include artificial nucleic acids that are useful for color balancing detected signals during a sequencing reaction, and methods of using the same such as methods of sequencing nucleic acids and methods of preparing a color balanced nucleic acid sample for improving accuracy of acquired nucleic acid sequencing data.
Inventors
- HUANG, Xiaoyu
- LEONARD, JACK, T.
- MELLOR, JOSEPH, C.
Assignees
- SEQWELL, INC.
Dates
- Publication Date
- 20260507
- Application Date
- 20251031
- Priority Date
- 20241101
Claims (20)
- 1 . A plurality of artificial nucleic acid molecules, wherein each of the artificial nucleic acid molecules comprises from its 5’ end to its 3’ end the following: (a) an attachment region for linking the artificial nucleic acid molecule to a surface; (b) a first calibration region having a nucleotide sequence in a repeating pattern of XY across the length of the calibration region, wherein X represents one of two nucleobases selected from adenine (A), thymine (T), guanine (G), or cytosine (C), and Y represents one of three nucleobases consisting of the same two nucleobases of X and a third nucleobase, wherein the third nucleobase is a different nucleobase than the two nucleobases; and (c) a control region, wherein the first calibration region and the control region both have a different nucleotide sequence in each artificial nucleic acid molecule.
- 2. A plurality of artificial nucleic acid molecules, wherein each of the artificial nucleic acid molecules comprises from its 5’ end to its 3’ end the following: (a) an attachment region for linking the artificial nucleic acid molecule to a surface; and (b) a calibration region having a nucleotide sequence in a repeating pattern of XY across the length of the calibration region, wherein X represents one of two nucleobases selected from A, T, G, or C, and Y represents one of three nucleobases consisting of the same two nucleobases of X and a third nucleobase, wherein the third nucleobase is a different nucleobase than the two nucleobases; wherein the first calibration region of each artificial nucleic acid molecule has a different nucleotide sequence.
- 3. The plurality of artificial nucleic acid molecules of claim 2, wherein each of the artificial nucleic acid molecules further comprises a control region at its 3’ end and the control region has a different nucleotide sequence in each artificial nucleic acid molecule.
- 4. The plurality of artificial nucleic acid molecules of any one of claims 1 -3, wherein the attachment region comprises a first priming region.
- 5. The plurality of artificial nucleic acid molecules of claim 4, wherein the first priming region comprises a nucleotide sequence of 5’-AATGATACGGCGACCACCGAGATCTACAC-3’ (SEQ ID NO: 1) or 5’-CAAGCAGAAGACGGCATACGAGAT-3’ (SEQ ID NO: 2).
- 6. The plurality of artificial nucleic acid molecules of any one of claims 1 -5, wherein each artificial nucleic acid molecule further comprises a second priming region 3’ of the control region.
- 7. The plurality of artificial nucleic acid molecules of claim 6, wherein the second priming region comprises a nucleotide sequence of the reverse complement of 5’- PATENT ATTORNEY DOCKET NO. 51 178-020W02 AATGATACGGCGACCACCGAGATCTACAC-3’ (SEQ ID NO: 1 ) or 5’- CAAGCAGAAGACGGCATACGAGAT-3’ (SEQ ID NO: 2).
- 8. The plurality of artificial nucleic acid molecules of any one of claims 1 and 3-7, wherein each artificial nucleic acid molecule further comprising a first read sequence between the first calibration region and the control region.
- 9. The plurality of artificial nucleic acid molecules of claim 8, wherein the read sequence comprises 5’-TCGTCGGCAGCGTC-3’ (SEQ ID NO: 3) or 5’-GTCTCGTGGGCTCGG-3’ (SEQ ID NO: 4).
- 10. The plurality of artificial nucleic acid molecules of any one of claims 1 -9, wherein the first calibration region is 5 to 20 nucleotides in length.
- 1 1 . The plurality of artificial nucleic acid molecules of any one of claims 1 -10, wherein the attachment region is 5 to 20 nucleotides in length.
- 12. The plurality of artificial nucleic acid molecules of any one of claims 1 and 3-10, wherein each artificial nucleic acid molecule further comprises a second calibration region 3’ of the control region, wherein the second calibration region has a nucleotide sequence in a repeating pattern of Y’X across the length of the second calibration region, wherein Y’ represents one of three nucleobases consisting of the same two nucleobases of X and a third nucleobase complementary to the third nucleobase of Y, wherein the second calibration region has a different nucleotide sequence in each artificial nucleic acid molecule.
- 13. The plurality of artificial nucleic acid molecules of claim 12, wherein the second calibration region is 5 to 20 nucleotides in length.
- 14. The plurality of artificial nucleic acid molecules of claim 12 or 13, further comprising a second read sequence between the control region and the second calibration region.
- 15. The plurality of artificial nucleic acid molecules of claim 14, wherein the read sequence comprises the reverse complement of 5’-TCGTCGGCAGCGTC-3’ (SEQ ID NO: 3) or 5’- GTCTCGTGGGCTCGG-3’ (SEQ ID NO: 4).
- 16. The plurality of artificial nucleic acid molecules of any one of claims 1 -15, wherein the artificial nucleic acid molecules are 100 to 1 ,000 nucleotides in length.
- 17. The plurality of artificial nucleic acid molecules of any one of claims 1 and 3-16, wherein the control region is a portion of an artificial sequence or a portion of a genome of an organism. PATENT ATTORNEY DOCKET NO. 51 178-020W02
- 18. The plurality of artificial nucleic acid molecules of claim 17, wherein the organism is PhiX174.
- 19. The plurality of artificial nucleic acid molecules of any one of claims 9-18, wherein the first calibration region is 10 nucleotides in length.
- 20. The plurality of artificial nucleic acid molecules of any one of claims 1 -19, wherein the surface is a flow cell for nucleic acid sequencing.
Description
PATENT ATTORNEY DOCKET NO. 51178-020W02 LIBRARIES, METHODS, AND KITS FOR INDEX COLOR BALANCING NUCLEIC ACIDS SEQUENCE LISITING The instant application contains a Sequence Listing which has been submitted electronically in XML format and is hereby incorporated by reference in its entirety. Said XML copy, created on October 29, 2025, is named “51178-020W02_Sequence_Listing_10_29_25” and is 4,425 bytes in size. FIELD OF THE INVENTION The present invention relates generally to compositions including artificial nucleic acids such as color-balancing artificial nucleic acids that serve as calibration and validation controls for methods of sequencing, methods of preparing and sequencing nucleic acid samples that include the artificial nucleic acids, and kits including the same. BACKGROUND Nucleic acid sequencing methods (e.g., DNA sequencing methods) rely on a variety of means for detecting signals, including changes in conductivity, emission of fluorescence or radiation, and differences in mass. Sequencing-by-extension (SBE) involves template-dependent incorporation of nucleotides into a growing DNA chain by DNA polymerase. Typical sequencing chemistries rely on reversibly, fluorescently labeled nucleotides to identify the terminal base. To identify the terminal base (i.e., A, C, G, or T) present on the DNA, one chemistry combination requires four different fluorophores, one for each base. Another version only requires three different fluorophores, one for each base except one, which is known as the “dark base” because it has no fluorophore. Fewer fluorophores reduce the cost of the optical systems used to excite and detect them. Two fluorophores may also be used in detection. However, utilizing fewer fluorophores, specifically in a two-channel detection system where only two fluorophores are used, poses significant challenges for accurate sequencing. In such systems, two dyes are used to distinguish four bases, with a binary code assigned to each nucleotide such that one fluorophore corresponds to a first base, the second fluorophore corresponds to a second base, both fluorophores correspond to a third base, and no signal (e.g., a dark signal) corresponds to a fourth base. One key challenge in this approach is that a dark signal of one base, which corresponds to the absence of a fluorescent signal, may be misinterpreted as a weak signal from a fluorophore if the detection system is not adequately calibrated. This challenge is compounded over the course of the sequencing run when signal intensities may shift due to photobleaching or internal detection fluctuations due to laser power drift or changes in the flow cell of the sequencing platform. Such challenges can reduce the accuracy and depth of the sequencing data and are particularly problematic in homopolymeric regions or in low-complexity sequences, where the lack of contrast in signals can lead to errors in base calling. Thus, there is a need for improved calibration of sequencing detection systems during data acquisition to improve sequencing accuracy and coverage. PATENT ATTORNEY DOCKET NO. 51178-020W02 SUMMARY OF THE INVENTION The present invention provides compositions and methods of use thereof, e.g., for nucleic acid library preparation and sequencing. In one aspect, the invention features plurality of artificial nucleic acid molecules, wherein each of the artificial nucleic acid molecules includes from its 5’ end to its 3’ end the following: (a) an attachment region for linking the artificial nucleic acid molecule to a surface; (b) a first calibration region having a nucleotide sequence in a repeating pattern of XY across the length of the calibration region, wherein X represents one of two nucleobases selected from adenine (A), thymine (T), guanine (G), or cytosine (C), and Y represents one of three nucleobases consisting of the same two nucleobases of X and a third nucleobase, wherein the third nucleobase is a different nucleobase than the two nucleobases; and (c) a control region, wherein the first calibration region and the control region both have a different nucleotide sequence in each artificial nucleic acid molecule. In a second aspect, the invention features a plurality of artificial nucleic acid molecules, wherein each of the artificial nucleic acid molecules includes from its 5’ end to its 3’ end the following: (a) an attachment region for linking the artificial nucleic acid molecule to a surface; and (b) a calibration region having a nucleotide sequence in a repeating pattern of XY across the length of the calibration region, wherein X represents one of two nucleobases selected from A, T, G, or C, and Y represents one of three nucleobases consisting of the same two nucleobases of X and a third nucleobase, wherein the third nucleobase is a different nucleobase than the two nucleobases; wherein the first calibration region of each artificial nucleic acid molecule has a different nucleotide sequence. In some embodiments, each of the artifici