Search

CN-121994856-A - Nuclear magnetic resonance data-based compound structure analysis method and system

CN121994856ACN 121994856 ACN121994856 ACN 121994856ACN-121994856-A

Abstract

The invention discloses a compound structure analysis method and system based on nuclear magnetic resonance data, wherein the method comprises the steps of receiving original nuclear magnetic resonance data about a compound to be analyzed, which is acquired by a nuclear magnetic instrument, preprocessing the original nuclear magnetic resonance data to convert nuclear magnetic resonance data in a time domain form into a frequency domain form, suppressing noise interference and outputting a peak position list; the method comprises the steps of constructing a solvent peak template matched with the peak shape of an actual spectrogram in nuclear magnetic resonance data, defining candidate areas in a peak position list according to configured search intervals, matching signals of each candidate area with signals of the solvent peak template to obtain a solvent peak, removing the solvent peak, calculating the offset of actual chemical displacement and a theoretical value, applying the offset to global chemical displacement correction to obtain a chemical displacement list, and carrying out structural analysis on a compound to be analyzed based on the chemical displacement list. The invention solves the problem of poor accuracy in the structural analysis of the compound in the prior art.

Inventors

  • LI XIAONONG
  • ZHANG CHEN
  • YUAN BIN
  • LIU YANFANG
  • HAN YANG
  • ZHANG GUIZHEN
  • WANG CHENG
  • Zhu Daimin
  • Tu Penghuang

Assignees

  • 赣江中药创新中心

Dates

Publication Date
20260508
Application Date
20260121

Claims (10)

  1. 1. A method for resolving a structure of a compound based on nuclear magnetic resonance data, the method comprising: Receiving original nuclear magnetic resonance data about a compound to be analyzed, which are acquired by a nuclear magnetic instrument, preprocessing the original nuclear magnetic resonance data to convert nuclear magnetic resonance data in a time domain form into a frequency domain form, suppressing noise interference and outputting a peak position list; Constructing a solvent peak template matched with the peak shape of the actual spectrogram in nuclear magnetic resonance data, defining candidate areas in a peak position list according to the configured search interval, and matching the signal of each candidate area with the signal of the solvent peak template to obtain a solvent peak; removing the solvent peak, calculating the offset of the actual chemical displacement and the theoretical value, applying the offset to global chemical displacement correction to obtain a chemical displacement list, and carrying out structural analysis on the compound to be analyzed based on the chemical displacement list.
  2. 2. The method of claim 1, wherein the step of preprocessing the raw nmr data to convert the nmr data in a time domain form to a frequency domain form, suppressing noise interference, and outputting a list of peak positions comprises: converting the original nuclear magnetic resonance data into a frequency domain spectrogram through Fourier transformation, and then removing baseline drift by adopting an automatic phase correction algorithm and a baseline correction algorithm; And carrying out peak identification on the frequency domain spectrogram by combining second derivative mutation point detection with local maximum value search, and outputting a peak position list.
  3. 3. The method for analyzing a structure of a compound based on nuclear magnetic resonance data according to claim 1, wherein the step of constructing a solvent peak template matching with the peak shape of an actual spectrogram in the nuclear magnetic resonance data comprises: Extracting actual spectrogram data from nuclear magnetic resonance data in a target chemical displacement range, and calculating global half-width as a peak shape parameter; Generating a solvent peak template according to the theoretical coupling constant value corresponding to the solvent and the Lorentz function, or Based on the statistical average template of the historical actual measurement solvent spectrogram, a corresponding solvent peak template is obtained.
  4. 4. The method for analyzing a structure of a compound based on nuclear magnetic resonance data according to claim 3, wherein the expression for generating the solvent peak template by combining the lorentz function according to the theoretical coupling constant value corresponding to the solvent is: ; ; Wherein, the Representing the intensity ratio of the peak, L represents the half-width calculated in the earlier stage, x 0 is the center position of the current peak, and J is the coupling constant value corresponding to the solvent peak.
  5. 5. The method of claim 4, wherein the step of matching the signal of each candidate region with the signal of the solvent peak template to obtain a solvent peak comprises: Cosine similarity is calculated for the signals of each candidate region and the signals of the solvent peak template, and a solvent peak is obtained according to the similarity; the calculation formula of the cosine similarity is as follows: ; Wherein, the Each point information representing a solvent peak template, Representing information of points on the selected candidate area in each experiment; And selecting the first three candidate peaks with similarity higher than a threshold value for the methanol solvent, and further verifying whether the solvent peak has the characteristic of five-fold peak splitting and the integral intensity constraint.
  6. 6. The method for analyzing a structure of a compound based on nuclear magnetic resonance data according to claim 1, wherein the step of analyzing the structure of the compound to be analyzed based on the chemical shift list comprises: Calculating a difference index of 13 C peak value list in the chemical shift list and 13 C peak value list of the compound in the constructed spectrogram shift list library; finding out the target compound with minimum difference according to the difference index, and performing structural analysis based on the target compound, or Performing structural analysis on the compound to be analyzed by utilizing a pre-trained structural generation model based on the chemical displacement list, wherein the structural generation model is obtained by adopting a contrast learning framework training or a graph neural network training, or And respectively inputting the chemical displacement list into a structure generation model and combining a spectrogram displacement list library retrieval algorithm, outputting respective results, and comprehensively scoring based on preset weights to obtain a final structure analysis result.
  7. 7. The method for analyzing a structure of a compound based on nuclear magnetic resonance data according to claim 1, wherein the difference index is calculated by the formula: ; Wherein, the And The number of 13 C peaks for compounds in the chemical shift list as neutralization spectrum shift list library, Is shown in The number of peaks that match within the threshold, Represents the experimentally measured chemical shift value of the i 13 C, The j 13 C chemical shift value, which represents the compound prediction in the spectral shift list library.
  8. 8. A compound structural analysis system based on nuclear magnetic resonance data, the system comprising: The acquisition module is used for receiving the original nuclear magnetic resonance data about the compound to be analyzed acquired by the nuclear magnetic instrument, preprocessing the original nuclear magnetic resonance data to convert the nuclear magnetic resonance data in a time domain form into a frequency domain form, suppressing noise interference and outputting a peak position list; The construction module is used for constructing a solvent peak template matched with the peak shape of the actual spectrogram in nuclear magnetic resonance data, defining candidate areas in a peak position list according to the configured search interval, and matching the signal of each candidate area with the signal of the solvent peak template to obtain a solvent peak; the analysis module is used for removing the solvent peak, calculating the offset of the actual chemical displacement and the theoretical value, applying the offset to global chemical displacement correction to obtain a chemical displacement list, and carrying out structural analysis on the compound to be analyzed based on the chemical displacement list.
  9. 9. A readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the steps of the method according to any one of claims 1 to 7.
  10. 10. An electronic device comprising a memory, a processor and a computer program stored on the memory and running on the processor, the processor implementing the steps of the method according to any one of claims 1 to 7 when the program is executed.

Description

Nuclear magnetic resonance data-based compound structure analysis method and system Technical Field The invention relates to the technical field of nuclear magnetic resonance data processing, in particular to a compound structure analysis method and system based on nuclear magnetic resonance data. Background Existing nuclear magnetic resonance data processing mainly relies on expert manual operation or semi-automated software tools. In terms of structural analysis, the traditional technology relies on expert experience to compare spectrogram databases (such as micro-spectra and PubChem) or rule-based systems (such as ACD/Labs), and has the defects of low efficiency and larger influence by subjective factors. In recent years, there have been studies attempting to classify spectra using machine learning, but are mostly limited to single spectrum types and lack the ability to match across modes (e.g., 13 C NMR spectrum combined with HSQC spectrum). The prior art has the main defects that 1) the solvent correction precision is insufficient, and the existing automatic method has poor correction effect on compounds under certain solvents (such as deuterated methanol), so that the chemical shift extraction error is caused. For example, when the DP5 algorithm is used to process the asiatic moonseed samples, solvent peak mislabeling or missing labeling occurs. 2) The structural analysis is highly dependent on manual work, and an expert needs to manually compare the multidimensional spectrogram, so that the time consumption is long and the repeatability is poor. Therefore, the current compound structure analysis method has the problem of poor accuracy. Disclosure of Invention In view of the above, the present invention aims to provide a method and a system for analyzing a compound structure based on nuclear magnetic resonance data, which aims to solve the problem of poor accuracy of the compound structure analysis method in the prior art. In one aspect, the invention provides a method for analyzing a structure of a compound based on nuclear magnetic resonance data, the method comprising: Receiving original nuclear magnetic resonance data about a compound to be analyzed, which are acquired by a nuclear magnetic instrument, preprocessing the original nuclear magnetic resonance data to convert nuclear magnetic resonance data in a time domain form into a frequency domain form, suppressing noise interference and outputting a peak position list; Constructing a solvent peak template matched with the peak shape of the actual spectrogram in nuclear magnetic resonance data, defining candidate areas in a peak position list according to the configured search interval, and matching the signal of each candidate area with the signal of the solvent peak template to obtain a solvent peak; removing the solvent peak, calculating the offset of the actual chemical displacement and the theoretical value, applying the offset to global chemical displacement correction to obtain a chemical displacement list, and carrying out structural analysis on the compound to be analyzed based on the chemical displacement list. Further, the method for analyzing a structure of a compound based on nmr data, wherein the step of preprocessing the raw nmr data to convert the nmr data in a time domain form into the nmr data in a frequency domain form, suppressing noise interference, and outputting a peak position list includes: converting the original nuclear magnetic resonance data into a frequency domain spectrogram through Fourier transformation, and then removing baseline drift by adopting an automatic phase correction algorithm and a baseline correction algorithm; And carrying out peak identification on the frequency domain spectrogram by combining second derivative mutation point detection with local maximum value search, and outputting a peak position list. Further, the method for analyzing a structure of a compound based on nuclear magnetic resonance data, wherein the step of constructing a solvent peak template matching with the peak shape of an actual spectrogram in the nuclear magnetic resonance data comprises the following steps: Extracting actual spectrogram data from nuclear magnetic resonance data in a target chemical displacement range, and calculating global half-width as a peak shape parameter; Generating a solvent peak template according to the theoretical coupling constant value corresponding to the solvent and the Lorentz function, or Based on the statistical average template of the historical actual measurement solvent spectrogram, a corresponding solvent peak template is obtained. Further, in the above method for analyzing a compound structure based on nmr data, the expression for generating the solvent peak template by combining the lorentz function according to the theoretical coupling constant value corresponding to the solvent is: ; ; Wherein, the Representing the intensity ratio of the peak, L represents the half-width calculated i