KR-20260066804-A - COMPOSITIONS AND METHODS FOR NUCLEIC ACID-BASED DATA STORAGE

KR20260066804AKR 20260066804 AKR20260066804 AKR 20260066804AKR-20260066804-A

Abstract

The present invention provides systems and methods for storing digital information in nucleic acid molecules in various ways. Digital information may be received as a symbol string, wherein each symbol within the symbol string has a symbol value and a symbol position within the symbol string. A first identifier nucleic acid molecule may be formed by attaching M selected component nucleic acid molecules to a compartment and physically assembling the M selected component nucleic acid molecules, wherein the M selected component nucleic acid molecules are selected from a set of distinct component nucleic acid molecules separated into M different layers. Multiple identifier nucleic acid molecules may be formed, each corresponding to a symbol position. The identifier nucleic acid molecules may be formed in a pool having a powder, liquid, or solid form.

Inventors

로?? 나타니엘
박 현준
바티아 스와프닐 피.
리크 데빈

Assignees

카탈로그 테크놀로지스, 인크.

Dates

Publication Date: 20260512
Application Date: 20190516
Priority Date: 20180516

Claims (1)

A method for storing digital information in nucleic acid molecules, comprising the following steps: (a) A step of receiving digital information as a symbol string, wherein each symbol in the symbol string has a symbol value and a symbol position in the symbol string; (b) Step of forming a first identifier nucleic acid molecule by the following: (1) A step of selecting one component nucleic acid molecule from each of the M layers from a set of distinct component nucleic acid molecules separated into M different layers; (2) A step of attaching M selected component nucleic acid molecules to one compartment; (3) A step of physically assembling M component nucleic acid molecules selected in (2) to form a first identifier nucleic acid molecule having first and second terminal molecules and a third molecule located between the first terminal molecule and the second terminal molecule, such that as a result, the component nucleic acid molecules of the first and second layers correspond to the first and second terminal molecules of the identifier nucleic acid molecule, and the component nucleic acid molecule of the third layer corresponds to the third molecule of the identifier nucleic acid molecule, thereby defining the physical order of M layers within the first identifier nucleic acid molecule; (c) forming a plurality of additional identifier nucleic acid molecules, each having (1) first and second terminal molecules and a third molecule located between the first and second terminal molecules, and (2) corresponding to each symbol position, wherein at least one of the first terminal molecule, the second terminal molecule, and the third molecule of at least one additional identifier nucleic acid molecule is identical to the target molecule of the first identifier nucleic acid molecule in (b), so that the probe can select at least two identifier nucleic acid molecules corresponding to each symbol having consecutive symbol positions within the symbol string, and (d) A step of collecting the identifier nucleic acid molecules of (b) and (c) in a pool having a powder, liquid, or solid form.

Description

Compositions and Methods for Nucleic Acid-Based Data Storage Cross-reference regarding related applications This application claims priority and interest to U.S. Provisional Patent Application No. 62/672,495, filed May 16, 2018, titled “Composition and Method for Nucleic Acid-Based Data Storage.” The full contents of the aforementioned application are incorporated herein by reference. background Nucleic acid digital data storage is a stable approach that encodes information for long-term storage, storing data at a higher density than magnetic tape or hard drive storage systems. Furthermore, digital data stored in nucleic acid molecules kept in cold, dry conditions can be retrieved for as long as 60,000 years or more. Nucleic acid molecules can be sequenced to access digital data stored within them. Therefore, storing digital nucleic acid data can be an ideal method for storing data that is not accessed frequently but contains a large amount of information that needs to be stored for a long period or archived. Current methods rely on encoding digital information (e.g., binary code) into base-level nucleic acid sequences, so base-to-base relationships are directly converted into digital information (e.g., binary code). Sequencing digital data stored as byte streams of digitally encoded information or base-by-base sequences that can be read by byte can be error-prone and expensive to encode, as the cost of de novo base-by-base nucleic acid synthesis can be high. Opportunities for new methods to perform nucleic acid digital data storage can provide commercially less expensive and easier approaches for data encoding and retrieval. Brief explanation of the drawing Novel features of the present invention are specifically set forth in the appended claims. The features and advantages of the present invention will be better understood by referring to the following detailed description presenting exemplary embodiments utilizing the principles of the present invention and the following accompanying drawings (also referred to herein as “Drawings” and “Figures”): FIG. 1 schematically illustrates an overview of the process for encoding, recording, accessing, querying, reading, and decoding digital information stored in a nucleic acid sequence; FIGS. 2A and 2B schematically illustrate an exemplary method of encoding digital data referred to as "address data" using objects or identifiers (e.g., nucleic acid molecules); FIG. 2A illustrates generating an identifier by combining a rank object (or address object) with a byte-value object (or data object); FIG. 2B illustrates a specific example of an address data method in which the rank objects and byte-value objects themselves are combinational links of other objects; FIGS . 3A and 3B schematically illustrate exemplary methods for encoding digital information using objects or identifiers (e.g., nucleic acid sequences); FIG. 3A illustrates encoding digital information using a rank object as an identifier; FIG. 3B illustrates a specific example of an encoding method in which the address object itself is a combinational chain of other objects; Figure 4 shows a contour plot in logarithmic space regarding the relationship between the combination space of possible identifiers (C, x-axis) and the average number of identifiers (k, y-axis) that can be configured to store information of a given size (contour); Figure 5 schematically illustrates an overview of a method for recording information in a nucleic acid sequence (e.g., deoxyribonucleic acid); FIGS. 6A and 6B illustrate an example of a method referred to as a "product scheme" for constructing identifiers (e.g., nucleic acid molecules) by assembling distinct components (e.g., nucleic acid sequences) in a combinatorial manner; FIG. 6A illustrates the structure of an identifier constructed using the product scheme; FIG. 6B illustrates an example regarding a combination space of identifiers that can be constructed using the product scheme; FIG. 7 schematically illustrates the formation of identifiers (e.g., nucleic acid molecules) from components (e.g., nucleic acid sequences) using a nested extension polymerase chain reaction; FIG. 8 schematically illustrates the formation of identifiers (e.g., nucleic acid molecules) from components (e.g., nucleic acid sequences) using adhesive end connections; FIG. 9 schematically illustrates the formation of identifiers (e.g., nucleic acid molecules) from components (e.g., nucleic acid sequences) using a recombinant enzyme assembly; FIGS. 10A and 10B illustrate template-directed linkages; FIG. 10A schematically illustrates the construction of identifiers (e.g., nucleic acid molecules) from components (e.g., nucleic acid sequences) using template-directed linkages; FIG. 10B shows a histogram of the copy numbers (abundance) of 256 distinct nucleic acid sequences, each assembled combinatorially from six nucleic acid sequences (e.g., components) in a single pooled template-directed linkage reaction;