US-12618845-B2 - Methods and compositions for protein sequencing
Abstract
Aspects of the application provide methods of identifying and sequencing proteins, polypeptides, and amino acids, and compositions useful for the same. In some aspects, the application provides methods of obtaining data during a degradation process of a polypeptide, and outputting a sequence representative of the polypeptide. In some aspects, the application provides amino acid recognition molecules comprising a shielding element that enhances photostability in polypeptide sequencing reactions.
Inventors
- Brian Reed
- Jeremy Lackey
- Haidong Huang
Assignees
- Quantum-Si Incorporated
Dates
- Publication Date
- 20260505
- Application Date
- 20241121
Claims (20)
- 1 . A method of sequencing a polypeptide, the method comprising: a) contacting a single polypeptide molecule with a composition comprising one or more terminal amino acid recognition molecules; b) detecting signal pulses indicative of association of the one or more terminal amino acid recognition molecules with a terminus of the single polypeptide molecule; c) contacting the single polypeptide molecule with a composition comprising one or more cleaving reagents; and d) repeating (a)-(c) one or more times, wherein the detected signal pulses form a series of signal pulses that is indicative of a series of amino acids exposed at the terminus over time as a result of terminal amino acid cleavage by the one or more cleaving reagents, and wherein the single polypeptide molecule is immobilized to a surface through a linkage group comprising an oligonucleotide.
- 2 . The method of claim 1 , wherein association of the one or more terminal amino acid recognition molecules with each type of amino acid exposed at the terminus produces a characteristic pattern in the series of signal pulses that is different from other types of amino acids exposed at the terminus.
- 3 . The method of claim 2 , wherein the characteristic pattern comprises a portion of the series of signal pulses.
- 4 . The method of claim 2 , wherein a signal pulse of the characteristic pattern corresponds to an individual association event between a terminal amino acid recognition molecule and an amino acid exposed at the terminus.
- 5 . The method of claim 4 , wherein the characteristic pattern is indicative of the amino acid exposed at the terminus of the single polypeptide molecule and an amino acid at a contiguous position.
- 6 . The method of claim 2 , wherein signal pulses of the characteristic pattern comprise a mean pulse duration of between about 10 milliseconds and about 100 milliseconds or between about 100 milliseconds and about 500 milliseconds.
- 7 . The method of claim 1 , wherein at least one of the one or more terminal amino acid recognition molecules comprises a degradation pathway protein, a peptidase, an antibody, an aminotransferase, a tRNA synthetase, or an SH2 domain-containing protein or fragment thereof.
- 8 . The method of claim 1 , wherein at least one of the one or more terminal amino acid recognition molecules comprises a detectable label.
- 9 . The method of claim 8 , wherein the detectable label is a luminescent label.
- 10 . The method of claim 1 , wherein sequencing comprises identifying at least a portion of all types of successive amino acids exposed at the terminus of the single polypeptide molecule while the single polypeptide molecule is being degraded by the one or more cleaving reagents.
- 11 . The method of claim 1 , wherein sequencing comprises identifying that an amino acid of the single polypeptide molecule comprises a post-translational modification.
- 12 . The method of claim 11 , wherein the post-translational modification is selected from acetylation, ADP-ribosylation, caspase cleavage, citrullination, formylation, N-linked glycosylation, O-linked glycosylation, hydroxylation, methylation, myristoylation, neddylation, nitration, oxidation, palmitoylation, phosphorylation, prenylation, S-nitrosylation, sulfation, sumoylation, and ubiquitination.
- 13 . The method of claim 11 , wherein the amino acid of the single polypeptide molecule comprises phospho-tyrosine or phospho-serine.
- 14 . The method of claim 1 , wherein the linkage group comprises a biotin molecule and an avidin protein.
- 15 . The method of claim 14 , wherein the avidin protein comprises streptavidin, traptavidin, tamavidin, bradavidin, or xenavidin.
- 16 . The method of claim 15 , wherein the avidin protein comprises streptavidin.
- 17 . The method of claim 1 , wherein the surface comprises a surface of a substrate.
- 18 . The method of claim 17 , wherein the substrate comprises an array of sample wells.
- 19 . The method of claim 18 , wherein the single polypeptide molecule is immobilized within a sample well of the array.
- 20 . The method of claim 18 , wherein the substrate comprises a plurality of polypeptide sequencing reactions, each polypeptide sequencing reaction occurring in an individual sample well of the array.
Description
CROSS-REFERENCE TO RELATED APPLICATIONS This application is a continuation of U.S. patent application Ser. No. 18/598,736, filed Mar. 7, 2024, which is a continuation of U.S. patent application Ser. No. 16/708,989, filed Dec. 10, 2019, now issued as U.S. Pat. No. 11,959,920, which is a continuation of U.S. patent application Ser. No. 16/686,028, filed Nov. 15, 2019, which claims priority under 35 U.S.C. § 119 (e) to U.S. Provisional Patent Application No. 62/907,507, filed Sep. 27, 2019, and U.S. Provisional Patent Application No. 62/768,076, filed Nov. 15, 2018, each of which is hereby incorporated by reference in its entirety. REFERENCE TO AN ELECTRONIC SEQUENCE LISTING The contents of the electronic sequence listing (R070870042US13-SEQ-JIB.xml; Size: 209,037 bytes; and Date of Creation: Nov. 18, 2024) is herein incorporated by reference in its entirety. BACKGROUND Proteomics has emerged as an important and necessary complement to genomics and transcriptomics in the study of biological systems. The proteomic analysis of an individual organism can provide insights into cellular processes and response patterns, which lead to improved diagnostic and therapeutic strategies. The complexity surrounding protein structure, composition, and modification present challenges in determining large-scale protein sequencing information for a biological sample. SUMMARY In some aspects, the application provides methods and compositions for determining amino acid sequence information from polypeptides (e.g., for sequencing one or more polypeptides). In some embodiments, amino acid sequence information can be determined for single polypeptide molecules. In some embodiments, the relative position of two or more amino acids in a polypeptide is determined, for example for a single polypeptide molecule. In some embodiments, one or more amino acids of a polypeptide are labeled (e.g., directly or indirectly) and the relative positions of the labeled amino acids in the polypeptide is determined. In some aspects, the application provides methods comprising obtaining data during a degradation process of a polypeptide. In some embodiments, the methods further comprise analyzing the data to determine portions of the data corresponding to amino acids that are sequentially exposed at a terminus of the polypeptide during the degradation process. In some embodiments, the methods further comprise outputting an amino acid sequence representative of the polypeptide. In some embodiments, the data is indicative of amino acid identity at the terminus of the polypeptide during the degradation process. In some embodiments, the data is indicative of a signal produced by one or more amino acid recognition molecules binding to different types of terminal amino acids at the terminus during the degradation process. In some embodiments, the data is indicative of a luminescent signal generated during the degradation process. In some embodiments, the data is indicative of an electrical signal generated during the degradation process. In some embodiments, analyzing the data further comprises detecting a series of cleavage events and determining the portions of the data between successive cleavage events. In some embodiments, analyzing the data further comprises determining a type of amino acid for each of the individual portions. In some embodiments, each of the individual portions comprises a pulse pattern (e.g., a characteristic pattern), and analyzing the data further comprises determining a type of amino acid for one or more of the portions based on its respective pulse pattern. In some embodiments, determining the type of amino acid further comprises identifying an amount of time within a portion when the data is above a threshold value and comparing the amount of time to a duration of time for the portion. In some embodiments, determining the type of amino acid further comprises identifying at least one pulse duration for each of the one or more portions. In some embodiments, determining the type of amino acid further comprises identifying at least one interpulse duration for each of the one or more portions. In some embodiments, the amino acid sequence includes a series of amino acids corresponding to the portions. In some aspects, the application provides systems comprising at least one hardware processor, and at least one non-transitory computer-readable storage medium storing processor-executable instructions that, when executed by the at least one hardware processor, cause the at least one hardware processor to perform a method in accordance with the application. In some aspects, the application provides at least one non-transitory computer-readable storage medium storing processor-executable instructions that, when executed by at least one hardware processor, cause the at least one hardware processor to perform a method in accordance with the application. In some aspects, the application provides methods of polypeptide sequencing. In some embod