Search

CN-122024902-A - Chemical evolution prediction method based on spectroscopy inversion neural network algorithm

CN122024902ACN 122024902 ACN122024902 ACN 122024902ACN-122024902-A

Abstract

The invention discloses a chemical evolution prediction method based on a spectroscopy inversion neural network algorithm, and relates to the technical field of inversion prediction. The method comprises the steps of firstly constructing a chemical and material database, providing data support for subsequent prediction, regenerating a molecular descriptor fused with spectral characteristics, microstructure and physical property related information, overcoming the defects of single information and insufficient representativeness of the traditional molecular descriptor, further constructing a spectral structure efficiency relation prediction model with common characteristic extraction and multi-branch special prediction capability by taking the molecular descriptor as input, realizing synchronous accurate learning of spectrum-structure, structure-efficiency and spectrum-efficiency association, and finally directly outputting chemical structure change parameters, molecular interaction rules and physical property evolution trend data through model inversion by carrying out noise reduction and standardized pretreatment on target spectral data, thereby avoiding the defects of complex steps of resolution and experimental verification of the traditional method.

Inventors

  • CHEN XIANG
  • LI CHENG
  • ZHOU JUNLEI

Assignees

  • 北京机数小来智能科技有限公司

Dates

Publication Date
20260512
Application Date
20260202

Claims (10)

  1. 1. The chemical evolution prediction method based on the spectroscopy inversion neural network algorithm is characterized by comprising the following steps of: constructing a chemical and material database; Generating a molecular descriptor for the microstructure and physical characteristics of the chemical based on the quantum chemical raman spectrum theoretical data and the experimental spectrum data in the chemical and material database; taking the molecular descriptors as input features, combining the chemical and material databases, and constructing a spectral structure-activity relation prediction model through neural network training; acquiring target spectroscopy measurement data of a chemical to be predicted, performing noise reduction and standardization pretreatment on the target spectroscopy measurement data, and inputting the target spectroscopy measurement data into the spectrum structure-activity relationship prediction model; And carrying out inversion analysis on the input preprocessed target spectroscopy measurement data through the spectrum structure-activity relation prediction model, and outputting chemical structure change parameters, molecular interaction rules and physical property evolution trend data of the chemical to be predicted.
  2. 2. The method for predicting chemical evolution based on a spectral inversion neural network algorithm according to claim 1, wherein the process of constructing a chemical and material database is: determining a data acquisition range and performing multi-source data acquisition; performing keyword recognition on open literature data in the multi-source data based on a natural language processing technology by adopting a machine intelligent reading algorithm, and establishing unstructured intermediate data containing entity association information; defining a structured data field system based on a chemical field ontology library, mapping and filling unstructured intermediate data according to the structured data field system to generate standardized structured document data; converting quantum chemical Raman spectrum theoretical data in the multi-source data by adopting a unified data format protocol to form a standard theoretical spectrogram data set; Abnormal values of experimental actual measurement data in the multi-source data are removed by adopting a 3 sigma rule, missing data complement is completed through a linear interpolation method, dimension unification is carried out according to an international general dimension standard, a material unique identification code is allocated to each piece of standardized experimental data, and an association mapping of the experimental data and corresponding materials is established to obtain a standardized experimental actual measurement data set; Constructing a chemical knowledge graph containing element conservation rules, reaction condition rationality rules, and rules corresponding to the spectral characteristics and the material structure and physical property association constraint rules; based on the unique substance identification and the substance association code, establishing a spectrum-structure-effect association relationship among structured literature data, a standard theoretical spectrogram data set and standardized experimental measured data; And storing the structured literature data, the standard theoretical spectrogram data set, the standardized experiment actual measurement data set and the spectrum-structure-effect association relation among the structured literature data, the standard theoretical spectrogram data set and the standardized experiment actual measurement data set by adopting a chemical knowledge graph to form a chemical and material database.
  3. 3. The chemical evolution prediction method based on the spectroscopy inversion neural network algorithm according to claim 2, wherein the element conservation rule, the reaction condition rationality rule, the spectroscopy feature and substance structure correspondence rule and the physical property association constraint rule are established by the following steps: Based on the principle of chemometrics, defining quantitative judgment standards of conservation of atomic numbers of elements in reactants and products by taking a unique identification code of a substance as an association standard, defining an element type identification threshold value and an atomic number error allowable range, establishing one-to-one correspondence check logic of a reactant element composition set-a product element composition set, and outputting an element conservation structural rule; integrating condition parameters in structured literature data and standardized experimental measured data, dividing effective intervals of the condition parameters of different reaction types by combining chemical thermodynamics and dynamics theory, defining logical constraint relations among the parameters, and outputting reasonable structured rules of the reaction conditions; Based on the characteristic spectrum data in the standard theoretical spectrogram data set and the standardized experimental actual measurement data, associating the structural parameters of the corresponding substances, establishing a mapping relation model of the spectral characteristic parameters and the structural parameters of the substances, defining characteristic spectrum peak intervals and intensity thresholds corresponding to different structural units, and outputting structural rules corresponding to the spectral characteristics and the structures of the substances; based on physical property test data in standardized experimental actual measurement data, and combining physical property gradient rules of similar substances, defining association constraint among physical property parameters and corresponding constraint of physical properties and structural parameters, and outputting physical property association constraint structuring rules.
  4. 4. The chemical evolution prediction method based on a spectral inversion neural network algorithm according to claim 1, wherein the process of generating a molecular descriptor for a chemical microstructure and physical characteristics based on quantum chemical raman spectrum theory data and experimental spectroscopy data in the chemical and material database is as follows: Screening a target substance set from a chemical and material database by taking a substance unique identification code as a retrieval bond; Based on the spectrum-structure-effect association relation in the chemical and material database, the microstructure parameters and physical property characteristic data corresponding to the associated target substances are coded through the unique identification codes of the substances, so that a four-dimensional association data set of quantum chemical Raman spectrum theory data-experimental spectrum data-microstructure parameters-physical property characteristic data is formed; Standardizing quantum chemical Raman spectrum theoretical data and experimental spectrum data, checking the consistency of the standardized quantum chemical Raman spectrum theoretical data and the experimental spectrum data based on the corresponding rule of the spectral characteristics and the material structure in the chemical knowledge graph, calculating the similarity of the spectrograms, and reserving four-dimensional associated data groups with the similarity larger than a set threshold value to form an effective data set; extracting local spectroscopy features and global spectroscopy features from a standardized spectroscopy data set in an effective data set, and fusing the local spectroscopy features and the global spectroscopy features to form initial spectroscopy feature vectors with unified dimensions; carrying out structured coding on the microstructure parameters to obtain microstructure quantized vectors, and carrying out classification quantization on physical characteristic data to obtain physical characteristic quantized vectors; And constructing a spectroscopy pre-training model, training based on the initial spectroscopy feature vector, the microstructure quantization vector and the physical property feature quantization vector, and outputting a molecular descriptor.
  5. 5. The chemical evolution prediction method based on the spectral inversion neural network algorithm according to claim 4, wherein the process of constructing the spectral pre-training model and training based on the initial spectral feature vector, the microstructure quantization vector and the physical feature quantization vector, and outputting the molecular descriptor is as follows: The method comprises the steps of constructing a feature coding branch and a supervision constraint branch of a spectroscopy pre-training model, wherein the feature coding branch takes an initial spectroscopy feature vector as input, performs feature mapping and dimension compression through a multi-layer perceptron, and outputs an intermediate feature vector; Taking the initial spectroscopy feature vector as input, taking the corresponding microstructure quantization vector and physical property feature quantization vector as supervision labels, substituting the supervision labels into a constructed spectroscopy pre-training model for iterative training, and stopping training when the joint loss function value is converged to a preset threshold value and the verification accuracy is greater than a set accuracy threshold value to obtain a converged spectroscopy pre-training model; Extracting intermediate feature vector output layer data of feature coding branches in a converged spectroscopy pre-training model, and taking the intermediate feature vector output layer data as an initial molecular descriptor; calculating the association confidence of the initial molecular descriptor and the microstructure parameters and physical property characteristic data of the corresponding substances based on the spectrum-structure-effect association relation in the chemical and material database, and reserving the characteristic dimension that the association confidence is larger than a set confidence threshold; And performing principal component analysis dimension reduction optimization on the reserved characteristic dimension, removing characteristic redundancy, and forming a standardized vector, namely the molecular descriptor.
  6. 6. The chemical evolution prediction method based on a spectral inversion neural network algorithm according to claim 1, wherein the process of constructing a spectral structure-activity relationship prediction model by training a neural network by using the molecular descriptor as an input feature and combining the chemical and material databases is as follows: Acquiring an initial fusion feature vector, and preprocessing the initial fusion feature vector to obtain a standardized input feature matrix; Constructing a to-be-trained spectrum structure-activity relation prediction model comprising a shared feature layer and a branch prediction layer, wherein: the shared feature layer takes a standardized input feature matrix as input, adopts a ReLU activation function, combines an attention mechanism, and outputs a common feature vector; the branch prediction layer comprises a spectrum-structure association prediction branch, a structure-effect association prediction branch and a spectrum-effect association prediction branch, which are input by common feature vectors, wherein the spectrum-structure association prediction branch outputs microstructure parameter prediction values, the structure-effect association prediction branch outputs physical property feature data prediction values, and the spectrum-effect association prediction branch outputs numerical values of continuous physical property parameters and ordered labels of hierarchical physical property parameters; the loss values of the three branch tasks and the chemical rule constraint penalty term are fused to be used as a joint loss function; An adaptive momentum estimation algorithm is adopted as an optimizer, an initial learning rate is set to be 1e-4, a learning rate attenuation strategy is adopted, the training batch size is set to be 32, and the maximum training round is set to be 1000 rounds; Optimizing the super parameters by adopting a grid search method, taking the comprehensive performance index of the verification set as an optimization target, and screening the optimal super parameter combination; And inputting the test set into an optimized spectrum structure-activity relation prediction model, respectively calculating performance indexes of three kinds of associated prediction tasks, and training if the performance indexes reach the standard, otherwise, re-optimizing the spectrum structure-activity relation prediction model.
  7. 7. The chemical evolution prediction method based on the spectroscopy inversion neural network algorithm according to claim 6, wherein the process of obtaining the initial fusion feature vector and preprocessing the initial fusion feature vector to obtain the standardized input feature matrix is as follows: selecting a target training substance set from a chemical and material database by taking a substance unique identification code as a retrieval key; Based on the spectrum-structure-effect association relation in the chemical and material database, establishing alignment mapping through the unique identification code of the substance to obtain training data pairs corresponding to the set molecular descriptors and the labels; invoking a rule set of the chemical knowledge graph, carrying out rationality check on the training data pairs, eliminating abnormal data pairs against the rule, and dividing the training set, the verification set and the test set according to the proportion; Extracting a set molecular descriptor in a training set, and fusing complementary features of corresponding substances in a chemical and material database to form an initial fused feature vector; And (3) adopting Z-score standardization processing to eliminate dimension difference, and forming a standardized input feature matrix by preserving effective features through mutual information entropy.
  8. 8. The chemical evolution prediction method based on the spectral inversion neural network algorithm according to claim 1, wherein the process of obtaining target spectral measurement data of the chemical to be predicted, performing noise reduction and standardization preprocessing on the target spectral measurement data, and inputting the target spectral measurement data into the spectral structure-activity relationship prediction model is as follows: Adopting a self-adaptive wavelet noise reduction algorithm to the original target spectroscopy measurement data, removing instrument noise and environmental interference signals, retaining characteristic spectrum peak information, adopting a 3 sigma criterion to identify and reject abnormal peak position and abnormal intensity data in the original target spectroscopy measurement data, and retaining effective characteristics conforming to a spectroscopy data distribution rule; According to a unified data format protocol of experimental spectroscopy data in a chemical and material database, converting the preprocessed effective features into a standard numerical matrix to obtain standardized target spectroscopy data; And calling a rule corresponding to the spectral characteristics and the material structure in the chemical knowledge graph, performing basic validity check on the standardized target spectral data, removing invalid data, and inputting the standardized target spectral data into a spectral structure-activity relationship prediction model.
  9. 9. The chemical evolution prediction method based on the spectral inversion neural network algorithm according to claim 1, wherein the process of performing inversion analysis on the input preprocessed target spectral measurement data by the spectral structure-activity relation prediction model and outputting chemical structure change parameters, molecular interaction rules and physical property evolution trend data of the chemical to be predicted is as follows: Obtaining a standardized input feature vector based on the preprocessed target spectroscopy measurement data, and processing the standardized input feature vector by a spectrum structure efficiency relation prediction model to obtain a microstructure parameter predicted value, a physical property feature predicted value and a physical property feature auxiliary predicted value of the chemical to be predicted; Carrying out multi-level cross check on the microstructure parameter predicted value, the physical property characteristic predicted value and the physical property characteristic auxiliary predicted value of the chemical to be predicted to obtain a checked and corrected microstructure parameter predicted value and a unified physical property characteristic predicted value; Combining known basic information of chemicals to be predicted, and extracting basic indexes from the microstructure parameter predicted values after verification and correction; The method comprises the steps of establishing a corresponding model of spectral characteristic variation and microstructure variation by combining characteristic variation of standardized target spectral data through a structural evolution rule of a substance unique identification code matching similar chemical, and deducing chemical structure variation parameters of the chemical to be predicted in a target reaction process based on extracted basic indexes; based on the microstructure parameter predicted value after verification and correction, combining with molecular action energy calculation logic in quantum chemistry Raman spectrum theoretical data, and analyzing chemical bond strength in molecules of a chemical to be predicted and acting force types among molecules; the variation trend of the outputted unified physical property characteristic predicted value after verification and correction is associated, the quantitative association rule of the molecular interaction strength and the physical property parameter is deduced by combining the outputted chemical structure variation parameter, and the molecular interaction rule data is outputted; constructing an initial evolution model of physical property parameters by taking the verified and corrected unified physical property characteristic predicted value as a reference and combining known reaction condition parameters, and integrating chemical structure change parameters and molecular interaction rule data to correct the initial evolution model; And predicting the physical property evolution direction, evolution rate and final stable value of the chemical to be predicted in different reaction stages through the modified evolution model, and outputting complete physical property evolution trend data based on the historical verification errors of the physical property characteristic auxiliary predicted value and the unified physical property characteristic predicted value.
  10. 10. The chemical evolution prediction method based on the spectroscopy inversion neural network algorithm according to claim 9, wherein the process of obtaining the microstructure parameter predicted value and the unified physical property characteristic predicted value after verification and correction is as follows: verifying atom composition rationality of an initial microstructure parameter predicted value through an element conservation rule, verifying the matching property of the initial microstructure parameter and standardized target spectroscopy data through a rule corresponding to a spectroscopy characteristic and a material structure, and verifying the suitability of the two types of physical property predicted values and a known reaction system through a rule of rationality of reaction conditions; calculating the relative error of the initial physical property characteristic predicted value and the physical property characteristic auxiliary predicted value, and simultaneously checking the relevance between the initial microstructure parameter predicted value and the two physical property predicted values; If the rule coincidence rate of a certain branch prediction result is less than 90%, correcting the branch prediction value based on the chemical knowledge graph correlation reasoning capacity and the similar mass spectrum-structure-effect correlation relation data in the database; if the relative error between the initial physical property characteristic predicted value and the physical property characteristic auxiliary predicted value is more than or equal to 10%, based on the initial physical property characteristic predicted value, fusing trend information of the physical property characteristic auxiliary predicted value and spectrum-effect associated characteristics in a molecular descriptor of the spectroscopy pre-training, and correcting by a weighted fusion algorithm to obtain a unified physical property characteristic predicted value; If the revised microstructure parameters and the unified physical property characteristic predicted values still violate the physical property association constraint rules, backtracking and adjusting the initial microstructure parameter predicted values until all data meet the rule requirements; Outputting the microstructure parameter predicted value after verification and correction and the unified physical property characteristic predicted value after verification and correction.

Description

Chemical evolution prediction method based on spectroscopy inversion neural network algorithm Technical Field The invention relates to the technical field of inversion prediction, in particular to a chemical evolution prediction method based on a spectroscopy inversion neural network algorithm. Background In a chemical evolution prediction scene, high-dimension and strong correlation exist among spectral data, structural parameters and physical characteristics related to a chemical process, experimental data are sparse and poor in repeatability, and the following defects are exposed in the prior art: Firstly, chemical data are scattered in multiple sources such as literature, theoretical calculation, experimental measurement and the like in the prior art, so that independent steps such as manual splitting data collection, format conversion, association matching and the like are needed, experiments such as structural analysis, physical property test and the like are also needed to be carried out independently, and the operation flow is fragmented, so that the time cost is greatly increased. Secondly, the chemical structure change and evolution law cannot be directly deduced from the spectroscopy measurement data in the prior art, the structural hypothesis is verified through multiple hypothesis-experiment-corrected circulation, a single model can only predict single dimension parameters (such as only predicting a structure or only predicting physical properties), a plurality of independent models are built, and the consistency of results is verified through experiment crossing, so that the experiment verification link is repeated and complicated. Therefore, an integrated method for integrating multi-source data association storage, spectral inversion intelligent modeling and multi-dimensional synchronous prediction is needed to solve the problems of scattered process splitting and dependence on repeated experimental verification in the prior art, and improve the efficiency and accuracy of chemical evolution prediction. Disclosure of Invention Aiming at the defects of the prior art, the invention provides a chemical evolution prediction method based on a spectroscopy inversion neural network algorithm, which solves the problems of scattered process splitting and dependence on repeated experimental verification in the prior art. In order to achieve the purpose, the chemical evolution prediction method based on the spectroscopy inversion neural network algorithm comprises the following steps of: constructing a chemical and material database; Generating a molecular descriptor for the microstructure and physical characteristics of the chemical based on the quantum chemical raman spectrum theoretical data and the experimental spectrum data in the chemical and material database; taking the molecular descriptors as input features, combining the chemical and material databases, and constructing a spectral structure-activity relation prediction model through neural network training; acquiring target spectroscopy measurement data of a chemical to be predicted, performing noise reduction and standardization pretreatment on the target spectroscopy measurement data, and inputting the target spectroscopy measurement data into the spectrum structure-activity relationship prediction model; And carrying out inversion analysis on the input preprocessed target spectroscopy measurement data through the spectrum structure-activity relation prediction model, and outputting chemical structure change parameters, molecular interaction rules and physical property evolution trend data of the chemical to be predicted. The invention has the following beneficial effects: The method comprises the steps of firstly constructing a chemical and material database, providing data support for subsequent prediction, regenerating a molecular descriptor fused with spectral characteristics, microstructure and physical property related information, overcoming the defects of single information and insufficient representativeness of the traditional molecular descriptor, further constructing a spectral structure efficiency relation prediction model with common characteristic extraction and multi-branch special prediction capability by taking the molecular descriptor as input, realizing synchronous accurate learning of spectrum-structure, structure-efficiency and spectrum-efficiency association, and finally directly outputting chemical structure change parameters, molecular interaction rules and physical property evolution trend data through model inversion by carrying out noise reduction and standardized pretreatment on target spectral data, thereby avoiding the defects of complex steps of resolution and experimental verification of the traditional method. Of course, it is not necessary for any one product to practice the invention to achieve all of the advantages set forth above at the same time. Drawings FIG. 1 is a flow chart of a chemical evolution p