Search

CN-121981080-A - Multi-spectrogram data format conversion method, computer system and storage medium

CN121981080ACN 121981080 ACN121981080 ACN 121981080ACN-121981080-A

Abstract

The application provides a multi-spectrogram data format conversion method, a computer system and a storage medium, wherein the multi-spectrogram data format conversion method comprises the following steps of S1, abstracting spectrogram data formats of various types based on Protobuf definition, establishing a spectrogram data structure, S2, extracting core data and metadata of the spectrogram data format according to format specifications of the spectrogram data format, and S3, mapping the core data and the metadata into the spectrogram data structure, and carrying out serialization storage to obtain a unified target spectrogram data stream. The application aims at realizing efficient storage optimization and security confidentiality of spectrogram data as a core development target, and provides a solution for pain points such as format isomerism, storage redundancy, low resolution and the like in spectrogram data management. And a unified spectrogram data structure is constructed based on Protobuf protocol, and the efficient analysis and standardized conversion can be realized on multi-type spectrogram data formats such as chromatograph, mass spectrum and spectrum by combining a plug-in analysis architecture.

Inventors

  • ZHU LING
  • WEI MIN
  • Xiong Ningjing
  • LIU YUWEI
  • HOU YIMIN
  • LI RAN
  • HE YUNLU
  • PAN XI
  • CHEN LIN
  • SONG XUYAN

Assignees

  • 湖北中烟工业有限责任公司

Dates

Publication Date
20260505
Application Date
20260123

Claims (10)

  1. 1. A multi-spectrogram data format conversion method, characterized in that the multi-spectrogram data format conversion method comprises: Step S1, abstracting spectrogram data formats of all types based on the definition of Protobuf, and establishing a spectrogram data structure; s2, extracting core data and metadata of the spectrogram data format according to the format specification of the spectrogram data format; and step S3, mapping the core data and the metadata into the spectrogram data structure and carrying out serialization storage to obtain a unified target spectrogram data stream.
  2. 2. The multi-spectral data format conversion method according to claim 1, wherein the spectral data format comprises a chromatographic data format, a mass spectral data format, and a spectral data format.
  3. 3. The multi-spectrogram data format conversion method according to claim 1, wherein the step S1 comprises: step S11, abstracting the spectrogram data format into a dimension for gating and a dimension for gating to form a spectrogram general data dimension list; step S12, splitting the message structure of the Protobuf based on the spectrogram general data dimension list to obtain a Protobuf hierarchical structure; And S13, constructing the spectrogram data structure according to the Protobuf hierarchical structure.
  4. 4. The multi-spectrogram data format conversion method according to claim 1, wherein the step S2 comprises: Step S21, determining an analysis library corresponding to the spectrogram data format according to the format specification of the spectrogram data format; Step S22, according to the analysis library, the spectrogram data format is used for describing service core signal data as the core data; And S23, using the spectrogram data format to describe auxiliary descriptive data according to the analysis library as the metadata.
  5. 5. The multi-spectrogram data format conversion method according to claim 4, wherein the step S21 comprises: Step S211, identifying a source format type according to the format specification of the spectrogram data format, wherein the source format type comprises CDF, mzXML, mzML, JDX/DX and CSV; and step S212, based on the identified source format type, establishing a one-to-one/one-to-many format-parsing library mapping relation, and matching and determining the parsing library.
  6. 6. The multi-spectrogram data format conversion method according to claim 1, wherein the step S3 comprises: step S31, mapping the core data and the metadata to the spectrogram data structure; step S32, binary serializing is carried out on the spectrogram data structure according to the coding rule of the Protobuf, and a binary byte stream is generated; and step S33, packaging and storing the binary byte stream as the target map data stream.
  7. 7. The multi-spectral data format conversion method according to claim 1, further comprising: and S4, performing deserialization processing on the target map data stream to obtain spectrogram data.
  8. 8. A computer system comprising a memory, a processor and a computer program stored on the memory, characterized in that the processor executes the computer program to carry out the steps of the multispectral data format conversion method of any one of claims 1 to 7.
  9. 9. A computer readable storage medium having stored thereon a computer program/instruction which when executed by a processor implements the steps of the multispectral data format conversion method of any one of claims 1 to 7.
  10. 10. A computer program product comprising computer programs/instructions which when executed by a processor implement the steps of the multispectral data format conversion method of any one of claims 1 to 7.

Description

Multi-spectrogram data format conversion method, computer system and storage medium Technical Field The application belongs to the field of spectrogram data storage, and particularly relates to a multi-spectrogram data format conversion method, a computer system and a storage medium. Background In the analytical testing fields of mass spectrum, chromatograph, spectrum and the like, various data storage formats are formulated by different instrument manufacturers and standard organizations, and the main stream types comprise mzXML, mzML in the mass spectrum field, AIA and CDF in the chromatograph field, JDX/DX, CSV and the like in the spectrum field. The formats have significant differences in data structure, storage logic and field definition, and the original data files generally occupy larger storage space, so that not only are heavy loads brought to data storage equipment, but also the problems of easy unauthorized reading, poor confidentiality and the like exist, and a plurality of challenges are brought to the safety management and the long-term standardized management of scientific research data: (1) Tool suitability limitations. The existing analysis software is developed for specific formats, for example, mzML files need to be opened by a special mass spectrometry analysis tool, JDX files need to be analyzed by spectrum special software, unified collection and management of multi-format data are difficult to realize, and the operation complexity and time cost of scientific research and detection personnel are greatly increased. (2) The storage redundancy problem is prominent. When files with different formats are repeatedly stored, sharing multiplexing of data structure information cannot be realized, so that storage resources are wasted, and particularly, for large-volume spectroscopy files containing millions of data points, the problem of space occupation caused by redundant storage is more serious, so that the storage management cost is further improved. (3) Data reliability is not sufficient across scene flows. When data flow is carried out in a cross-platform and cross-programming language mode, format conversion is mediated by a plurality of intermediate tools, and key data loss phenomenon, such as parent ion m/z information in mass spectrum analysis, retention time precision deviation in chromatographic analysis and the like, easily occurs in the conversion process, so that the accuracy of subsequent data interpretation and scientific research conclusion is directly affected. Disclosure of Invention In view of the foregoing, it is an object of the present application to provide a multi-spectrogram data format conversion method, a computer system and a storage medium for solving the above-mentioned problems. In order to solve the technical problems, the application adopts the following technical scheme: The application provides a multi-spectrogram data format conversion method, which comprises the steps of S1, abstracting spectrogram data formats of various types based on Protobuf definition, establishing a spectrogram data structure, S2, extracting core data and metadata of the spectrogram data format according to format specifications of the spectrogram data format, and S3, mapping the core data and the metadata into the spectrogram data structure, and carrying out serialization storage to obtain a unified target spectrogram data stream. Further, the spectrogram data format includes a chromatographic data format, a mass spectrometry data format, and a spectroscopic data format. Further, the step S1 comprises the steps of abstracting a spectrogram data format into a dimension which is needed to be selected and a dimension which is needed to be selected to form a spectrogram general data dimension list, the step S12 comprises the steps of splitting a Protobuf message structure based on the spectrogram general data dimension list to obtain a Protobuf hierarchical structure, and the step S13 comprises the step of constructing the spectrogram data structure according to the Protobuf hierarchical structure. Further, the step S2 comprises the step S21 of determining an analysis library corresponding to the spectrogram data format according to the format specification of the spectrogram data format, the step S22 of using the spectrogram data format for describing service core signal data as core data according to the analysis library, and the step S23 of using the spectrogram data format for describing auxiliary descriptive data as metadata according to the analysis library. Further, the step S21 comprises the steps of identifying a source format type according to a format specification of a spectrogram data format, wherein the source format type comprises CDF, mzXML, mzML, JDX/DX and CSV, and the step S212 comprises the step of establishing a one-to-one/one-to-many format-parsing library mapping relation based on the identified source format type, and matching and determining a parsing library. Fu