Search

EP-4740009-A1 - ANALYSING LIQUID CHROMATOGRAPHY ELECTROSPRAY IONISATION MASS SPECTROMETRY (LC-ESI-MS) DATA

EP4740009A1EP 4740009 A1EP4740009 A1EP 4740009A1EP-4740009-A1

Abstract

A computer system for analysing liquid chromatography electrospray ionisation mass spectrometry (LC-ESI-MS) data. The computer system is configured to implement a raw data processor that extracts a mass spectrum based on data at each retention time and a base peak signal intensity derived from the peak apex and the baseline, from a base peak chromatogram derived from raw LC- ESI-MS data. The system also implements a fingerprinting system performing background correction for detected peaks, producing a m/z value by subtracting mass over charge (m/z) values of the baseline from an original m/z value and, if the resulting m/z value is positive, restoring the original m/z value. The fingerprinting system generates a compact data representation comprising, for each detected peak, the retention time, original m/z values and intensities, and performs chemical fingerprinting for the raw LC-ESI-MS data based on the compact data representation.

Inventors

  • TAY, Wei Peng, Dillon
  • LIM, Yee Hwee

Assignees

  • Agency for Science, Technology and Research

Dates

Publication Date
20260513
Application Date
20240705

Claims (20)

  1. 1. A computer system for analysing liquid chromatography electrospray ionisation mass spectrometry (LC-ESI-MS) data, comprising : memory; and at least one processor, the memory storing instructions that, when executed by the at least one processor, cause the computer system to: implement a raw data processor that: obtains raw LC-ESI-MS data from a source and extracts a base peak chromatogram (BPC) from the raw data; detects, from the BPC, a peak for each signal intensity above a first predetermined threshold and, for each detected peak, a retention time, a peak apex and a baseline; and extracts a mass spectrum based on data at each retention time and a base peak signal intensity derived from the peak apex and the baseline; and implement a fingerprinting system that: performs background correction by, for each detected peak, producing a resulting m/z value by subtracting mass over charge (m/z) values of the baseline from an original m/z value and, if the resulting m/z value is positive, restoring the original m/z value; and generates a compact data representation comprising, for each detected peak, the retention time, original m/z values and intensities; and performs chemical fingerprinting for the raw LC-ESI-MS data based on the compact data representation.
  2. 2. The computer system of claim 1, wherein the raw data processor identifies, for each detected peak, the peak apex by determining if the peak comprises: an intensity plateau, the peak apex comprising a midpoint of the detected peak; or a gradient sign change to negative, the peak apex corresponding to a point at which the gradient sign changes to negative.
  3. 3. The computer system of claim 1 or 2, wherein the raw data processor detects, for each detected peak, the baseline by sampling a nearest baseline timepoint prior to the detected peak and a nearest baseline timepoint after the detected peak.
  4. 4. The computer system of claim 3, wherein the m/z values of the baseline comprise the m/z values of the nearest baseline timepoints.
  5. 5. The computer system of any one of claims 1 to 4, wherein the raw data processor: detects background noise from the BPC based on an intensity of less than a second predetermined threshold; and averages out the background noise.
  6. 6. The computer system of any one of claims 1 to 5, comprising: identifying, as a base peak, a said peak of highest signal intensity; determining if one or more molecular ion peaks exist in the mass spectrum, with detectable isotopes in the mass spectrum; and selecting from the one or more molecular ion peaks, if any, a said molecular ion peak with highest signal intensity, wherein the compact data representation comprises the mass spectrum represented as a series of retention times, m/z values above the first predetermined threshold, base peak m/z, the base peak and molecular ion peak, if any, with highest signal intensity.
  7. 7. The computer system of any one of claims 1 to 6, further comprising a query engine, the query engine comparing a first mass spectrum of a first sample corresponding to first raw LC-ESI-MS data, and a second mass spectrum of a second sample corresponding to second raw LC-ESI-MS data, the query engine determining if the first raw LC-ESI-MS data and the second raw LC-ESI-MS data are for similar compounds by comparing m/z of respective base peaks, and retention times of the base peaks.
  8. 8. The computer system of claim 7, wherein the query engine is further configured to filter the mass spectrum based on a specific peak intensity or specific m/z value.
  9. 9. The computer system of claim 7 or 8, wherein the query engine is further configured to filter the mass spectrum based on a peak intensity range or m/z value range.
  10. 10. A method for analysing liquid chromatography electrospray ionisation mass spectrometry (LC-ESI-MS) data, comprising : using a raw data processor to: obtain raw LC-ESI-MS data from a source and extract a base peak chromatogram (BPC) from the raw data; detect, from the BPC, a peak for each signal intensity above a first predetermined threshold and, for each detected peak, a retention time, a peak apex and a baseline; and extract a mass spectrum based on data at each retention time and a base peak signal intensity derived from the peak apex and the baseline; and using a fingerprinting system to: perform background correction by, for each detected peak, producing a resulting m/z value by subtracting mass over charge (m/z) values of the baseline from an original m/z value and, if the resulting m/z value is positive, restoring the original m/z value; generate a compact data representation comprising, for each detected peak, the retention time, original m/z values and intensities; and perform chemical fingerprinting for the raw LC-ESI-MS data based on the compact data representation.
  11. 11. The method of claim 10, wherein identifying, for each detected peak, the peak apex comprises determining if the peak comprises: an intensity plateau, the peak apex comprising a midpoint of the detected peak; or a gradient sign change to negative, the peak apex corresponding to a point at which the gradient sign changes to negative.
  12. 12.The method of claim 10 or 11, wherein detecting, for each detected peak, the baseline comprises sampling a nearest baseline timepoint prior to the detected peak and a nearest baseline timepoint after the detected peak.
  13. 13.The method of claim 12, wherein the m/z values of the baseline comprise the m/z values of the nearest baseline timepoints.
  14. 14.The method of any one of claims 10 to 13, further comprising using the raw data processor to: detect background noise from the BPC based on an intensity of less than a second predetermined threshold; and average out the background noise.
  15. 15.The method of any one of claims 11 to 16, further comprising: identifying, as a base peak, a said peak of highest signal intensity; determining if one or more molecular ion peaks exist in the mass spectrum, with detectable isotopes in the mass spectrum; and selecting from the one or more molecular ion peaks, if any, a said molecular ion peak with highest signal intensity, wherein generating the compact data representation comprises representing the mass spectrum as a series of retention times, m/z values above the first predetermined threshold, base peak m/z, the base peak and molecular ion peak, if any, with highest signal intensity
  16. 16.The method of any one of claims 10 to 15, further comprising implementing a query engine to compare a first mass spectrum of a first sample corresponding to first raw LC-ESI-MS data, and a second mass spectrum of a second sample corresponding to second raw LC-ESI-MS data, the query engine determining if the first raw LC-ESI-MS data and the second raw LC-ESI-MS data are for similar compounds by comparing respective compact data representations corresponding to the first mass spectrum and second mass spectrum.
  17. 17.The method of claim 16, further comprising causing the query engine to filter the first mass spectrum based on a specific peak intensity or specific m/z value.
  18. 18. The method of claim 16 or 17, further comprising causing the query engine to filter the first mass spectrum based on a peak intensity range or m/z value range.
  19. 19. A method of building a database, comprising : for each of a plurality of base peak chromatograms (BPCs), at a data processor: obtaining the BPC either directly or from LC-ESI-MS data; detecting a peak for each signal intensity above a first predetermined threshold and, for each detected peak, a retention time, a peak apex and a baseline; and extracting a mass spectrum based on data at each retewntion time and a base peak signal intensity derived from the peak apex and the baseline; and at a fingerprinting system: performing background correction by, for each detected peak, producing a resulting m/z value by subtracting mass over charge (m/z) values of the baseline from an original m/z value and, if the resulting m/z value is positive, restoring the original m/z value; and generating a compact data representation comprising features, each feature being a said detected peak or, for each detected peak, the retention time, original m/z values and intensities; storing the compact data representation in the database; and providing a query interface for: receiving a query comprising, for a new BPC or new LC-ESI-MS data, a new compact data representation for the new BPC or new LC-ESI- MS data; performing chemical fingerprinting by comparing of one or more features of the new compact data representation to the features of the compact data representations in the database; and returning one or more of the compact data representations from the database, based on the chemical fingerprinting.
  20. 20. A database built in accordance with the method of claim 19.

Description

ANALYSING LIQUID CHROMATOGRAPHY ELECTROSPRAY IONISATION MASS SPECTROMETRY (LC-ESI-MS) DATA Technical Field The present invention relates, in general terms, to a system, and method implemented by that system, for analysing liquid chromatography electrospray ionisation mass spectrometry (LC-ESI-MS) data. Background High-resolution tandem mass spectrometry is a powerful data-rich analytical technique that provides high-dimensional data often complicated by background noise. Analysis of these complex datasets presents a challenging, timeconsuming task that often bottlenecks project progress (e.g. in untargeted metabolomic discovery studies). Current commercial LC-ESI-MS analytical software solutions act directly on the raw data which (1) takes up a large amount of storage space at ~200MB per file and (2) is extremely timeconsuming (e.g. can take 6-8 hours to perform a single query) due to the general nature of the program. Peak integration is also complicated by background noise and poor resolution (e.g. peak shoulders) resulting in erroneous fold change comparisons between samples. This bottleneck in data analysis prevents efficient extraction of scientific insights that could serve as building blocks for other applications. It would be desirable to overcome at least one of the above-described problems by providing an automated data analysis workflow for rapid processing of LC- ESI-MS data, or at least to provide a useful alternative. Summary In view of the above problems, the solution presented herein provides a high throughput data analysis workflow for automated processing and analysis of LC- ESI-MS data. The workflow utilizes a series of processes for (1) adjustable rulebased peak picking, (2) extraction of key features which, for some embodiments, reduces the file size by 10,000x, (3) automated annotation of chemical fingerprints and (4) creation of a dynamic summary for easy data exploration. In some embodiments, a 50x acceleration in data analysis turnover (from hours to minutes) is achieved, enabling faster query execution and easy data exploration. The present invention provides a computer system for analysing liquid chromatography electrospray ionisation mass spectrometry (LC-ESI-MS) data, comprising : memory; and at least one processor, the memory storing instructions that, when executed by the at least one processor, cause the computer system to: implement a raw data processor that: obtains raw LC-ESI-MS data from a source and extracts a base peak chromatogram (BPC) from the raw data; detects, from the BPC, a peak for each signal intensity above a first predetermined threshold and, for each detected peak, a retention time, a peak apex and a baseline; and extracts a mass spectrum based on data at each retention time and a base peak signal intensity derived from the peak apex and the baseline; and implement a fingerprinting system that: performs background correction by, for each detected peak, producing a resulting m/z value by subtracting mass over charge (m/z) values of the baseline from an original m/z value and, if the resulting m/z value is positive, restoring the original m/z value; generates a compact data representation comprising, for each detected peak, the retention time, original m/z values and intensities; and performs chemical fingerprinting for the raw LC-ESI-MS data based on the compact data representation. Also disclosed is a method for analysing liquid chromatography electrospray ionisation mass spectrometry (LC-ESI-MS) data, comprising: using a raw data processor to: obtain raw LC-ESI-MS data from a source and extract a base peak chromatogram (BPC) from the raw data; detect, from the BPC, a peak for each signal intensity above a first predetermined threshold and, for each detected peak, a retention time, a peak apex and a baseline; and extract a mass spectrum based on data at each retention time and a base peak signal intensity derived from the peak apex and the baseline; and using a fingerprinting system to: perform background correction by, for each detected peak, producing a resulting m/z value by subtracting mass over charge (m/z) values of the baseline from an original m/z value and, if the resulting m/z value is positive, restoring the original m/z value; generating a compact data representation comprising, for each detected peak, the retention time, original m/z values and intensities; and perform chemical fingerprinting for the raw LC-ESI-MS data based on the compact data representation. Also disclosed is a method of building a database, comprising : for each of a plurality of base peak chromatograms (BPCs), at a data processor: obtaining the BPC either directly or from LC-ESI-MS data; detecting a peak for each signal intensity above a first predetermined threshold and, for each detected peak, a retention time, a peak apex and a baseline; and extracting a mass spectrum based on data at each retention time and a base peak signal intensity derived from the