JP-2026076191-A - Methods and systems for protein identification

JP2026076191AJP 2026076191 AJP2026076191 AJP 2026076191AJP-2026076191-A

Abstract

[Problem] To provide a method and system for accurate and efficient identification and quantification of proteins. [Solution] A method for iteratively identifying candidate proteins in a sample of an unknown protein is disclosed, comprising the steps of: receiving information on binding measurements for each of a plurality of affinity reagent probes for the unknown protein, wherein each affinity reagent probe is configured to selectively bind to one or more candidate proteins; comparing at least a portion of the binding measurement information with a database containing a plurality of protein sequences, wherein each protein sequence corresponds to one candidate protein; and iteratively generating the probability that each of the one or more candidate proteins is present in the sample, based on the comparison of the binding measurement information of the candidate proteins with the database containing a plurality of protein sequences. [Selection Diagram] Figure 1

Inventors

パテルスジャルエム．
マリックパラグ
エガーストンジャレットディー．

Assignees

ノーティラス・サブシディアリー・インコーポレイテッド

Dates

Publication Date: 20260511
Application Date: 20251226
Priority Date: 20171023

Claims (20)

A computer-based method for repeatedly identifying candidate proteins in a sample of an unknown protein, comprising the following steps: (a) A step of receiving, by computer, binding measurements for each of a plurality of affinity reagent probes to the unknown protein in the sample, wherein each affinity reagent probe is configured to selectively bind to one or more candidate proteins from a plurality of candidate proteins; (b) A step of comparing the binding measurement with a database containing multiple protein sequences using the computer, wherein each protein sequence corresponds to one candidate protein among the multiple candidate proteins; and (c) A step of repeatedly generating, by the computer, the probability that each of the one or more candidate proteins among the plurality of candidate proteins is present in the sample, based on the comparison of the binding measurement with the database which includes a plurality of protein sequences corresponding to each of the candidate proteins among the plurality of candidate proteins.
The process of generating the aforementioned multiple probabilities is, The process further includes repeatedly receiving additional information on binding measurements for each of several additional affinity reagent probes, wherein each of the additional affinity reagent probes is among the several candidate proteins. The method according to claim 1, configured to selectively bind to one or more candidate proteins.
The method according to claim 1, further comprising the step of generating a confidence level that, with respect to each of the one or more candidate proteins, the candidate protein is consistent with one of the unknown proteins in the sample.
The step of generating the aforementioned probability is, The method according to claim 1, comprising taking into account the error rate of the detector related to the information of the coupled measurement.
The method according to claim 4, wherein the error rate of the detector is obtained from the specifications of one or more detectors used to obtain the information of the coupled measurement.
The method according to claim 4, wherein the error rate of the detector is set to the estimated error rate of the detector.
The method according to claim 6, wherein the estimated error rate of the detector is set by the user of the computer.
The method according to claim 6, wherein the estimated error rate of the detector is approximately 0.001.
The process of repeatedly generating the aforementioned multiple probabilities is as follows: The method according to claim 1, further comprising removing one or more candidate proteins from the plurality of candidate proteins from subsequent iterations, thereby reducing the number of iterations required to carry out the repeated generation of the probability.
The method according to claim 9, wherein the removal of one or more candidate proteins is based at least on predetermined criteria for the binding measurement related to the candidate proteins.
The aforementioned prescribed criteria The method according to claim 10, comprising one or more candidate proteins having a binding measurement below a predetermined threshold for a first plurality of affinity reagent probes.
The method according to claim 1, wherein each of the probabilities is normalized with respect to the length of the candidate protein.
The method according to claim 1, wherein each of the probabilities is normalized with respect to the sum of the probabilities of the plurality of candidate proteins.
The method according to claim 1, wherein the plurality of affinity reagent probes comprises 50 or fewer affinity reagent probes.
The method according to claim 1, wherein the plurality of affinity reagent probes comprises 100 or fewer affinity reagent probes.
The method according to claim 1, wherein the plurality of affinity reagent probes comprises 500 or fewer affinity reagent probes.
The method according to claim 1, wherein the plurality of affinity reagent probes comprises more than 500 affinity reagent probes.
The method according to claim 1, wherein the probability is generated repeatedly until a predetermined condition is met.
The method according to claim 18, wherein the predetermined condition includes generating each of a plurality of probabilities with at least 90% confidence.
The method according to claim 19, wherein the predetermined condition includes generating each of the plurality of probabilities with at least 95% confidence.

Description

Cross-reference This application claims priority to U.S. Provisional Patent Application No. 62/575,976, filed October 23, 2017, which is incorporated herein by reference in its entirety. Background Current techniques for protein identification typically rely on either the binding of highly specific and sensitive affinity reagents (such as antibodies) and subsequent information readout, or peptide readout data from mass spectrometers (typically around 12–30 amino acid lengths). Such techniques can be applied to unknown proteins in a sample to determine the presence, absence, or quantity of candidate proteins based on an analysis of the binding measurement of highly specific and sensitive affinity reagents to the protein of interest. Summary This specification recognizes the need for improved identification and quantification of proteins in samples of unknown proteins. The methods and systems provided herein can significantly reduce or eliminate errors in identifying proteins in samples, thereby improving the quantification of such proteins. Such methods and systems can achieve accurate and efficient identification of candidate proteins in samples of unknown proteins. Such identification may be based on iterative calculations using information from binding measurements of affinity reagent probes set to selectively bind to one or more candidate proteins. In some embodiments, samples of unknown proteins may be repeatedly exposed to individual affinity reagent probes, pooled affinity reagent probes, or combinations of individual and pooled affinity reagent probes. Identification may involve inferring the confidence level of the presence of each of the one or more candidate proteins in the sample. In one aspect, this specification discloses a computer-based method for iteratively identifying each candidate protein in a sample of an unknown protein, the method comprising the steps of: (a) receiving by the computer information of binding measurements for each of a plurality of affinity reagent probes for the unknown protein in the sample, each affinity reagent probe being configured to selectively bind to one or more candidate proteins among the plurality of candidate proteins; (b) comparing at least a portion of the binding measurement information with a database comprising a plurality of protein sequences, each protein sequence corresponding to one of the plurality of candidate proteins; and (c) iteratively generating by the computer, for each of the one or more candidate proteins among the plurality of candidate proteins, the probability that each of the one or more candidate proteins is present in the sample, based on the comparison of at least a portion of the binding measurement information for each of the one or more candidate proteins with the database comprising the plurality of protein sequences. In some embodiments, the step of generating the plurality of probabilities further includes iteratively receiving additional information on binding measurements for each of the plurality of additional affinity reagent probes, each of which is configured to selectively bind to one or more candidate proteins among the plurality of candidate proteins. In some embodiments, the method further includes, with respect to each of the one or more candidate proteins, generating a confidence level that the candidate protein is consistent with one of the unknown proteins in the sample. In some embodiments, the step of generating the probability includes taking into account the error rate of the detector related to the information of the coupled measurement. In some embodiments, the error rate of the detector is obtained from the specifications of one or more detectors used to obtain the information of the coupled measurement. In some embodiments, the error rate of the detector is set to the estimated error rate of the detector. In some embodiments, the estimated error rate of the detector is set by the user of the computer. In some embodiments, the estimated error rate of the detector is about 0.001. Such error rates are described elsewhere in this specification. Physical errors in the detector may be included. Alternatively, such an error rate may be due to the failure of the probe to "land" on a protein, for example, if the probe gets stuck in the system and is not properly washed away, or if the probe binds to a protein that was not expected based on the probe's previous qualitative and testing. Thus, the detector error rate may include one or more of the following: physical error rate of the detector, off-target binding rate, or error rate due to stuck probes. In some embodiments, the process of repeatedly generating the plurality of probabilities is carried out from subsequent iterations. The method further includes removing one or more candidate proteins from the plurality of candidate proteins, thereby reducing the number of iterations required to carry out the repeated generation of the probability. In some embodiment