US-20260126437-A1 - SYSTEMS FOR CHARACTERIZING POLYPEPTIDES

US20260126437A1US 20260126437 A1US20260126437 A1US 20260126437A1US-20260126437-A1

Abstract

Systems for identifying a protein within a sample are provided herein. A panel of antibodies are acquired, none of which are specific for a single protein or family of proteins. Additionally, the binding properties of the antibodies in the panel are determined. Further, the protein is iteratively exposed to a panel of antibodies. Additionally, a set of antibodies which bind the protein are determined. The identity of the protein is determined using one or more deconvolution methods based on the known binding properties of the antibodies to match the set of antibodies to a sequence of a protein.

Inventors

Parag Mallick

Assignees

NAUTILUS SUBSIDIARY, INC.

Dates

Publication Date: 20260507
Application Date: 20251222

Claims (20)

1 . A computer program product, comprising one or more non-transitory computer-readable media having computer program instructions stored therein, the computer program instructions being configured such that, when executed by one or more computing devices, the computer program instructions cause the one or more computing devices to: receive pattern data indicative of iterative applications of different affinity reagents binding or not binding to protein molecules from a sample that are spatially separated on an array, wherein the array includes at least 1,000,000 protein molecules; receive known binding characteristics of each of the different affinity reagents binding to protein molecules that may be in the sample; and identify the 1,000,000 protein molecules based on (i) the known binding characteristics and (ii) the pattern data.
2 . The computer program product of claim 1 , wherein the protein molecules are spatially separated at unique locations of a pre-determined grid and optically resolvable from each other.
3 . The computer program product of claim 1 , further comprising: quantifying abundances of the protein molecules based on the (i) the known binding characteristics and (ii) the pattern data.
4 . The computer program product of claim 1 , further comprising: quantifying abundances of the protein molecules based on the identifications of the 1,000,000 protein molecules.
5 . The computer program product of claim 1 , wherein the protein molecules are intact protein molecules.
6 . The computer program product of claim 1 , wherein identifying the 1,000,000 protein molecules includes determining probable identities for the 1,000,000 protein molecules on the array.
7 . The computer program product of claim 1 , wherein the identifying is based on a machine learning algorithm.
8 . The computer program product of claim 1 , wherein the identifying is based on an expectation maximization algorithm.
9 . The computer program product of claim 1 , wherein identities of the 1,000,000 protein molecules on the array are unknown.
10 . The computer program product of claim 1 , wherein the 1,000,000 protein molecules are at unique coordinates of a pre-determined grid of the array, and the pattern data indicates binding or not binding at each of the unique coordinates for each of the different affinity reagents.
11 . The computer program product of claim 1 , wherein identifying the 1,000,000 protein molecules includes identifying post translational modifications of the 1,000,000 protein molecules.
12 . A computer program product, comprising one or more non-transitory computer-readable media having computer program instructions stored therein, the computer program instructions being configured such that, when executed by one or more computing devices, the computer program instructions cause the one or more computing devices to: receive pattern data indicative of iterative applications of different affinity reagents binding or not binding to different intact protein molecules from a sample that are spatially separated on an array, wherein the array includes at least 1,000,000 intact protein molecules; receive known binding characteristics of each of the different affinity reagents binding to intact protein molecules that may be in the sample; and quantify abundances of the intact protein molecules on the array based on (i) the known binding characteristics and (ii) the pattern data.
13 . The computer program product of claim 12 , wherein the 1,000,000 intact protein molecules are spatially separated at unique locations of a pre-determined grid and optically resolvable from each other.
14 . The computer program product of claim 12 , wherein quantifying abundances of the intact protein molecules is based on identifying the 1,000,000 intact protein molecules based on the (i) the known binding characteristics and (ii) the pattern data.
15 . The computer program product of claim 1 , wherein quantifying the abundances includes determining probable identities for the 1,000,000 intact protein molecules on the array.
16 . The computer program product of claim 1 , wherein the quantifying is based on a machine learning algorithm.
17 . The computer program product of claim 1 , wherein the quantifying is based on an expectation maximization algorithm.
18 . The computer program product of claim 1 , wherein identities of the 1,000,000 intact protein molecules on the array are unknown.
19 . The computer program product of claim 1 , wherein the 1,000,000 intact protein molecules are at unique coordinates of a pre-determined grid of the array, and the pattern data indicates binding or not binding at each of the unique coordinates for each of the different affinity reagents.
20 . The computer program product of claim 1 , wherein quantifying the 1,000,000 intact protein molecules includes identifying post translational modifications of the 1,000,000 intact protein molecules.

Description

CROSS-REFERENCE This application is a continuation of U.S. application Ser. No. 19/332,786, filed Sep. 18, 2025, which is a continuation of U.S. application Ser. No. 17/933,051, filed Sep. 16, 2022, which is a continuation of U.S. application Ser. No. 17/534,405, filed Nov. 23, 2021, now U.S. Pat. No. 11,448,647, which is a continuation of U.S. application Ser. No. 17/191,632, filed Mar. 3, 2021, now U.S. Pat. No. 11,579,144, which is a continuation application of U.S. application Ser. No. 17/153,877, filed Jan. 20, 2021, now U.S. Pat. No. 11,754,559, which is a continuation of U.S. application Ser. No. 16/659,132, filed Oct. 21, 2019, now U.S. Pat. No. 10,948,488, which is a continuation application of U.S. application Ser. No. 16/426,917, filed May 30, 2019, now U.S. Pat. No. 10,473,654, which is a continuation of International Patent Application No. PCT/US2017/064322, filed on Dec. 1, 2017, which claims priority to U.S. Provisional Application No. 62/429,063, filed Dec. 1, 2016, and U.S. Provisional Application No. 62/500,455, filed May 2, 2017, each of which applications is incorporated herein by reference in its entirety. SEQUENCE LISTING The instant application contains a Sequence Listing which has been submitted electronically in XML format and is hereby incorporated by reference in its entirety. The XML copy, created on Dec. 18, 2025, is named NBIOT003C14SeqListing.xml and is 19,842 bytes in size. BACKGROUND OF THE INVENTION Current techniques for protein identification typically rely upon either the binding and subsequent readout of highly specific and sensitive antibodies or upon peptide-read data (typically on the order of 12-30 AA long) from a mass spectrometer. SUMMARY OF THE INVENTION The present disclosure provides methods and systems for assaying proteins. In some embodiments, the present disclosure provides approaches in which the identities of proteins, i.e. their sequence, in a mixture are inferred from a series of measurements that may be highly incomplete and/or are not specific to a particular protein. Methods and systems described herein may also be used to characterize and/or identify biopolymers, including proteins. Additionally, methods and systems described herein may be used to identify proteins more quickly than techniques for protein identification that rely upon data from a mass spectrometer. In some examples, methods and systems described herein may be used to identify at least 400 different proteins with at least 50% accuracy at least 10% more quickly than techniques for protein identification that rely upon data from a mass spectrometer. In some examples, methods and systems described herein may be used to identify at least 1000 different proteins with at least 50% accuracy at least 10% more quickly than techniques for protein identification that rely upon data from a mass spectrometer. An aspect of the invention provides a method of determining protein characteristics. The method comprises obtaining a substrate with portions of one or more proteins conjugated to the substrate such that each individual protein portion has a unique, resolvable, spatial address. In some cases, each individual protein portion may have a unique, optically resolvable, spatial address. The method further comprises applying a fluid containing a first through nth set of one or more affinity reagents to the substrate. In some embodiments, the affinity reagents may contain or be coupled to an identifiable tag. After each application of the first through nth set of one or more of affinity reagents to the substrate, the method comprises performing the following steps: observing the affinity reagent or identifiable tag; identifying one or more unique spatial addresses of the substrate having one or more observed signal; and determining that each portion of the one or more proteins having an identified unique spatial address contains the one or more epitopes associated with the one or more observed signals. In some instances, each of the conjugated portions of the one or more proteins is associated with an unique spatial address on the substrate. In some instances, each affinity reagent of the first through nth set of one or more affinity reagents is not specific to an individual protein or protein family. In some instances, the binding epitope of the affinity reagent is not known or specific to an individual protein or protein family. In some cases, the methods of this disclosure may also be used with a substrate which has multiple proteins bound in a single location, wherein at least about 50%, 60%, 70%, 80%, 90%, or more than 90% of the proteins at a single location comprise a common amino acid sequence. In some cases, the methods of this disclosure may also be used with a substrate which has multiple proteins bound in a single location, wherein at least about 50%, 60%, 70%, 80%, 90%, or more than 90% of the proteins at a single location comprise at least 95% amino acid sequence identity. In some embo