Search

WO-2026096894-A1 - METHODS AND SYSTEMS FOR MASS-SPECTROMETRY-BASED DETECTION OF NON-CANONICAL PEPTIDES

WO2026096894A1WO 2026096894 A1WO2026096894 A1WO 2026096894A1WO-2026096894-A1

Abstract

Presented herein are technologies for identifying non-canonical polypeptide targets within a sample via mass-spectrometry. Among other things, non-canonical target detection technologies leverage machine-learning classifiers along with unique input feature design to eliminate and/or mitigate false positive identifications that can plague attempts to detect small quantities of non-canonical polypeptide targets in biological samples. Accordingly, methods and systems of the present disclosure address significant shortcomings of previous mass- spectrometry data analysis techniques that rendered them unsuitable for detecting targets from the "dark" proteome. In doing so, technologies of the present disclosure facilitate target identification for immunotherapies, including treatments for cancer and/or infectious diseases.

Inventors

  • ROONEY, Michael Steven
  • PATASKAR, Abhijeet

Assignees

  • BioNTech SE

Dates

Publication Date
20260507
Application Date
20251031
Priority Date
20241101

Claims (20)

  1. 1. A method for detecting non-canonical peptides within a biological sample via mass- spectrometry, the method comprising: (a) obtaining, by a processor of a computing device, mass spectrometry data for the biological sample, said mass spectrometry data comprising one or more sample spectrum (spectra); (b) identifying, by the processor, based on the mass spectrometry data, a plurality of candidate peptides, each candidate peptide having a corresponding candidate mass spectrum determined, by the processor, to match at least a portion of the one or more sample spectrum (spectra) of the mass spectrometry data, and wherein at least a portion of the plurality of candidate peptides are non-canonical peptides; (c) determining, by the processor, for each candidate peptide, values for one or more quality features, wherein, for a particular candidate peptide, the one or more quality features measure a quality with which the candidate mass spectrum corresponding to the particular candidate peptide matches the portion of the one or more sample spectrum (spectra); (d) determining, by the processor, for each of the plurality of candidate peptides, a corresponding prediction value using a machine learning model, wherein, for a particular candidate peptide, the corresponding prediction value (z) measures a predicted likelihood of, and/or (zz) classifies, the particular candidate peptide being present in the biological sample as determined by the machine learning model based on a set of input feature values comprising (z) the values for the one or more quality features determined for the particular candidate peptide and (zz) a value of a peptide source feature that indicates whether the particular candidate peptide is a non-canonical peptide; (e) selecting, by the processor, a subset of the plurality of candidate peptides for inclusion in a final set of detected peptides, based on their corresponding prediction values; and (f) storing and/or providing, by the processor, the final set of detected peptides. Page 60 of 71 13071326vl Attorney Docket No.: 2013237-1511
  2. 2. The method of claim 1, wherein the biological sample is a cell sample.
  3. 3. The method of claim 1, wherein the biological sample is a tissue sample.
  4. 4. The method of any one of the preceding claims, wherein the biological sample comprises cancer cells.
  5. 5. The method of any one of the preceding claims, wherein the biological sample is an organoid sample.
  6. 6. The method of any one of the preceding claims, wherein the biological sample is a sample obtained from a subject having been diagnosed with cancer.
  7. 7. The method of any one of the preceding claims, wherein the biological sample comprises cells infected with an infectious agent.
  8. 8. The method of claim 7, wherein the biological sample is a sample obtained from a subject having been infected with an infectious agent.
  9. 9. The method of claim 8, wherein the infectious agent is a virus.
  10. 10. The method of any one of the preceding claims, wherein the mass spectrometry data is or has been obtained using a purified version of the biological sample obtained following one or more sample preparation steps. Page 61 of 71 13071326vl Attorney Docket No.: 2013237-1511
  11. 11 . The method of claim 10, wherein the one or more sample preparation steps comprise isolation of MHC -bound peptides from the biological sample.
  12. 12. The method of claim 10 or 11, wherein the one or more sample preparation steps comprise a protease digestion step.
  13. 13. The method of any one of the preceding claims, wherein the mass spectrometry data is tandem mass spectrometry data.
  14. 14. The method of any one of the preceding claims, wherein the mass spectrometry data comprises a plurality of sample spectra, each sample spectra generated via a MS/MS scan associated with a particular selected precursor ion of a particular survey scan.
  15. 15. The method of any one of the preceding claims, comprising generating the mass spectrometry data using a tandem mass spectrometer.
  16. 16. The method of any one of the preceding claims, wherein step (b) comprises identifying, by the processor, for each of at least a portion of the one or more sample spectrum (spectra), a matching candidate peptide.
  17. 17. The method of claim 16, comprising, for a given sample spectrum: selecting, by the processor, a plurality of prospective candidate peptides from one or more target databases; determining, for each of the plurality of prospective candidate peptides, a corresponding candidate mass spectrum; determining, by the processor, for each of the plurality of prospective candidate peptides, one or more corresponding spectral similarity scores, wherein, for a particular prospective Page 62 of 71 13071326vl Attorney Docket No.: 2013237-1511 candidate peptide, the one or more corresponding spectral similarity scores are determined based on the corresponding candidate mass spectrum and the given sample spectrum; and selecting, by the processor, one or more of the prospective candidate peptides as matching candidate peptides.
  18. 18. The method of claim 17, wherein the one or more corresponding spectral similarity scores comprise a cross-correlation score determined based on a cross-correlation between (i) the prospective candidate peptide’s corresponding mass spectrum and (ii) the given sample spectrum.
  19. 19. The method of claim 17 or 18, wherein the one or more target databases comprise one or more sequence database(s).
  20. 20. The method of claim 19, wherein the one or more target databases comprise a canonical human proteome sequence.

Description

Attorney Docket No.: 2013237-1511 METHODS AND SYSTEMS FOR MASS-SPECTROMETRY-BASED DETECTION OF NON-CANONICAL PEPTIDES CROSS-REFERENCE TO RELATED APPLICATIONS [0001] This application claims priority to and benefit of U.S. Provisional Patent Application No. 63/715,436 filed November 1, 2024, the disclosure of which is incorporated by reference herein in its entirety. SEQUENCE LISTING [0002] The instant application contains a Sequence Listing, which has been submitted electronically in XML format and is hereby incorporated by reference in its entirety. The XML copy, created October 31, 2025, is named 2013237-1511.xml, and is 29,392 bytes in size. BACKGROUND [0003] To first order, proteins are produced via transcription and translation of protein coding genes, each into to a single, canonical protein. However, a variety of mechanisms may give rise to non-canonical proteins, which, in turn, are increasingly understood to account for a large, but “dark”, fraction of the proteome. For example, non-canonical proteins may be encoded by regions in DNA that were not previously considered to have the ability to express or, at the translation level, may be produced via alternative translation and splicing events. This dark proteome may be relevant to diagnosis and treatment of diseases. Accordingly, improved technologies for accurately identifying non-canonical proteins and peptides are needed. SUMMARY [0004] Presented herein are technologies for identifying non-canonical polypeptide targets within a sample via mass-spectrometry. Among other things, non-canonical target detection technologies leverage machine-learning classifiers along with unique input feature design to eliminate and/or mitigate false positive identifications that can plague attempts to detect small quantities of non-canonical polypeptide targets in biological samples. Accordingly, methods and systems of the present disclosure address significant shortcomings of previous Page 1 of 71 13071326vl Attorney Docket No.: 2013237-1511 mass-spectrometry data analysis techniques that rendered them unsuitable for detecting targets from the “dark” proteome. In doing so, technologies of the present disclosure facilitate target identification for immunotherapies, including treatments for cancer and/or infectious diseases. [0005] In some aspects, the present disclosure provides methods for detecting non- canonical peptides within a biological sample via mass-spectrometry, said provided methods including: (a) obtaining, by a processor of a computing device, mass spectrometry data for the biological sample, said mass spectrometry data including one or more sample spectrum (spectra); (b) identifying, by the processor, based on the mass spectrometry data, a plurality of candidate peptides, each candidate peptide having a corresponding candidate mass spectrum determined, by the processor, to match at least a portion of the one or more sample spectrum (spectra) of the mass spectrometry data, and wherein at least a portion of the plurality of candidate peptides are non-canonical peptides; (c) determining, by the processor, for each candidate peptide, values for one or more quality features, wherein, for a particular candidate peptide, the one or more quality features measure a quality [e.g., accuracy, explanatory power (e.g., percentage accounted for), likelihood of being correct] with which the candidate mass spectrum corresponding to the particular candidate peptide matches the portion of the one or more sample spectrum (spectra); (d) determining, by the processor, for each of the plurality of candidate peptides, a corresponding prediction value using a machine learning model, wherein, for a particular candidate peptide, the corresponding prediction value (i) measures a predicted likelihood of, and/or (ii) classifies, the particular candidate peptide being present in the biological sample as determined by the machine learning model based on a set of input feature values including (i) the values for the one or more quality features determined for the particular candidate peptide and (ii) a value of a peptide source feature that indicates whether the particular candidate peptide is a non-canonical peptide; (e) selecting, by the processor, a subset of the plurality of candidate peptides for inclusion in a final set of detected peptides, based on their corresponding prediction values; and (f) storing and/or providing (e.g., for display and/or further processing), by the processor, the final set of detected peptides. [0006] In some embodiments, a biological sample is a cell sample (e.g., a solution of cells; e.g., a dissociated cell sample). Page 2 of 71 13071326vl Attorney Docket No.: 2013237-1511 [0007] In some embodiments, a biological sample is a tissue sample (e.g., a tissue biopsy). [0008] In some embodiments, a biological sample includes cancer cells. [0009] In some embodiments, a biological sample is an organoid sample. [0010] In some embodiments, a biological sam