Search

CN-122024934-A - Perfluoro compound identification method and apparatus based on machine learning

CN122024934ACN 122024934 ACN122024934 ACN 122024934ACN-122024934-A

Abstract

The application discloses a perfluoro compound identification method and device based on machine learning, relates to the technical field of material identification, and mainly aims to solve the problem of poor accuracy of existing perfluoro compound identification. The method comprises the steps of obtaining mass spectrum data of an object to be detected, extracting mass spectrum characteristic information from the mass spectrum data, predicting the mass spectrum characteristic information based on a perfluorocompound prediction model trained by a model to obtain perfluorocompound reference characteristics, wherein the perfluorocompound prediction model is trained based on a mass spectrum characteristic sample, the mass spectrum characteristic sample is constructed based on three modal characteristic data extracted from the mass spectrum sample, determining structure identification information based on the perfluorocompound reference characteristics, and marking the structure identification information based on a preset compound classification library to obtain a perfluorocompound identification result.

Inventors

  • FU JIANJIE
  • HUANG KAI
  • YE ZHIHONG
  • JI FENG
  • YANG XIAOCHUN
  • LI XIAODONG
  • JIANG GUIBIN

Assignees

  • 国科大杭州高等研究院
  • 岛津企业管理(中国)有限公司

Dates

Publication Date
20260512
Application Date
20260410

Claims (10)

  1. 1. A method for identifying a perfluorocompound based on machine learning, comprising: acquiring mass spectrum data of an object to be detected, and extracting mass spectrum characteristic information from the mass spectrum data; predicting the mass spectrum characteristic information based on a perfluorocompound prediction model which is trained by a model, so as to obtain a perfluorocompound reference characteristic, wherein the perfluorocompound prediction model is trained based on a mass spectrum characteristic sample, and the mass spectrum characteristic sample comprises three modal characteristic data construction based on extraction of the mass spectrum sample: And determining structural identification information based on the perfluoro compound reference characteristics, and marking the structural identification information based on a preset compound classification library to obtain perfluoro compound identification results.
  2. 2. The method of claim 1, wherein the predicting the mass spectral signature based on the model trained perfluorinated compound prediction model further comprises, prior to deriving a perfluorinated compound reference signature: acquiring a mass spectrum sample of a perfluorinated compound sample, and classifying the mass spectrum sample based on methyl substructure information; Extracting three modal feature data from the classified data set according to the precursor ion level feature, the secondary mass spectrum fragment intensity feature and the secondary mass spectrum fragment rule feature, and constructing a mass spectrum feature sample; Training the constructed multi-modal neural network model based on the mass spectrum characteristic sample to obtain the perfluorinated compound prediction model.
  3. 3. The method of claim 1, wherein said determining structural identification information based on said perfluoro compound reference signature comprises: Determining a first reference structure of a reference molecule based on a precursor ion charge information ratio and an isotopic pattern of the perfluorinated compound reference feature; determining theoretical fragments corresponding to the first reference structure based on a theoretical fragment calculation function, and determining a second reference structure based on secondary mass spectrum information of the perfluorinated compound reference feature and the theoretical fragments; and matching the precursor ion charge information and the second mass spectrum information of the perfluorinated compound reference feature with a preset secondary mass spectrum database to determine the structure identification information.
  4. 4. The method of claim 3, wherein the determining a second reference structure based on the secondary mass spectrometry information of the reference feature of the perfluorinated compound and the theoretical fragments comprises: and when the mass-to-charge ratio deviation of at least two theoretical fragments is smaller than a preset difference value and the intensity sum ratio of the theoretical fragments is larger than a preset intensity ratio, determining that the theoretical fragments match the secondary mass spectrum information of the perfluorinated compound reference feature, and generating a second reference structure.
  5. 5. The method of claim 3, wherein the matching of the precursor ion charge information based on the reference signature of the perfluorinated compound and the second mass spectrum information with a pre-set secondary mass spectrum database, determining the structural identification information comprises: Matching the precursor ion mass charge information with a preset secondary mass spectrum database, wherein the preset secondary mass spectrum database is a database containing compound information corresponding to a target compound; When the first matching deviation is smaller than a first preset deviation threshold, matching the secondary mass spectrum fragment ions corresponding to the second mass spectrum information with fragment ions in the preset secondary mass spectrum database; and when the second matching deviation is smaller than a second preset deviation threshold value and the matching sequence similarity is larger than a preset similarity threshold value, determining the structure identification information based on the compound information corresponding to the target compound.
  6. 6. The method of claim 1, wherein the labeling the structural identification information based on the predetermined compound classification library, the obtaining a perfluoro compound identification result comprises: searching source information matched with the structure identification information from the preset compound classification library, and marking the structure identification information according to the searched source information; Searching the hazard information matched with the structure identification information from the preset compound classification library, and marking the structure identification information according to the searched hazard information; and generating a perfluoro compound identification result based on the marked structure identification information.
  7. 7. The method of any one of claims 1-6, wherein the extracting mass spectral feature information from the mass spectral data comprises: And generating mass spectrum peak characteristics of the precursor ion mass-to-charge ratio based on the primary mass spectrum information of the mass spectrum data based on a preset peak lifting algorithm, and distributing the precursor ion mass-to-charge ratio, retention time, isotope pattern and secondary mass spectrum information to the mass spectrum peak characteristics to obtain mass spectrum characteristic information.
  8. 8. A perfluorocompound recognition apparatus based on machine learning, comprising: the acquisition module is used for acquiring mass spectrum data of an object to be detected and extracting mass spectrum characteristic information from the mass spectrum data; the prediction module is used for predicting the mass spectrum characteristic information based on a perfluorinated compound prediction model which is trained by the model, so as to obtain perfluorinated compound reference characteristics, the perfluorinated compound prediction model is trained based on a mass spectrum characteristic sample, and the mass spectrum characteristic sample comprises three modal characteristic data construction based on extraction of the mass spectrum sample: And the determining module is used for determining structural identification information based on the reference characteristics of the perfluorinated compounds, and marking the structural identification information based on a preset compound classification library to obtain perfluorinated compound identification results.
  9. 9. A computer readable storage medium having stored thereon a computer program/instruction which when executed by a processor performs the steps of the method of claim 1.
  10. 10. A computer device comprising a memory, a processor and a computer program stored on the memory, characterized in that the processor executes the computer program to carry out the steps of the method of claim 1.

Description

Perfluoro compound identification method and apparatus based on machine learning Technical Field The application relates to the technical field of material identification, in particular to a perfluoro compound identification method and device based on machine learning. Background Perfluoro and polyfluoroalkyl species (Per-and polyfluoroalkyl substances, abbreviated as perfluoro compounds) are a class of synthetic organofluoro compounds having multiple C-F bonds. Because of its excellent surface activity, water and oil repellency and chemical stability, it is widely used in the fields of industrial manufacture, food packaging, medicine and pesticide. These properties, however, impart to the perfluorinated compounds simultaneously very high environmental durability, bioaccumulation and potential toxicity, and are therefore referred to as "permanent chemicals". As perfluorinated compounds have been detected in water, soil, sediment, organisms and human blood, environmental distribution and potential health risks have become prominent, and there is a great need to develop analytical methods that can efficiently identify unknown or novel perfluorinated compounds to support environmental monitoring and risk assessment efforts. At present, a non-target screening method which depends on high-resolution mass spectrum for perfluoro compound identification is available, and perfluoro compound screening is realized by analyzing mass loss, neutral loss and characteristic fragment information. However, with the new definition of perfluoro compounds released by the economic co-operation and development organization in 2021, all the compounds containing any trifluoromethyl or difluoromethylene are included in the category of perfluoro compounds, so that the structure types of the compounds are more diversified, and the problem that the structural discrimination capability is limited under the condition of uncertain mass spectrum fragment formation in the conventional non-target screening method is further highlighted, so that a perfluoro compound identification method based on machine learning is needed to solve the above problems. Disclosure of Invention In view of the above, the present application provides a method and a device for identifying a perfluoro compound based on machine learning, which mainly aims to solve the problem of poor accuracy of identifying the existing perfluoro compound. According to one aspect of the present application, there is provided a perfluorocompound identification method based on machine learning, comprising: acquiring mass spectrum data of an object to be detected, and extracting mass spectrum characteristic information from the mass spectrum data; predicting the mass spectrum characteristic information based on a perfluorocompound prediction model which is trained by a model, so as to obtain a perfluorocompound reference characteristic, wherein the perfluorocompound prediction model is trained based on a mass spectrum characteristic sample, and the mass spectrum characteristic sample comprises three modal characteristic data construction based on extraction of the mass spectrum sample: And determining structural identification information based on the perfluoro compound reference characteristics, and marking the structural identification information based on a preset compound classification library to obtain perfluoro compound identification results. Further, before predicting the mass spectrum characteristic information based on the perfluorinated compound prediction model trained by the model to obtain the perfluorinated compound reference characteristic, the method further comprises: acquiring a mass spectrum sample of a perfluorinated compound sample, and classifying the mass spectrum sample based on methyl substructure information; Extracting three modal feature data from the classified data set according to the precursor ion level feature, the secondary mass spectrum fragment intensity feature and the secondary mass spectrum fragment rule feature, and constructing a mass spectrum feature sample; Training the constructed multi-modal neural network model based on the mass spectrum characteristic sample to obtain the perfluorinated compound prediction model. Further, the determining structural identification information based on the perfluoro compound reference feature includes: Determining a first reference structure of a reference molecule based on a precursor ion charge information ratio and an isotopic pattern of the perfluorinated compound reference feature; determining theoretical fragments corresponding to the first reference structure based on a theoretical fragment calculation function, and determining a second reference structure based on secondary mass spectrum information of the perfluorinated compound reference feature and the theoretical fragments; and matching the precursor ion charge information and the second mass spectrum information of the perfluorinated compound r