CN-121983160-A - Ferulic acid synthesis yield prediction and process optimization method based on machine learning
Abstract
The invention provides a machine learning-based ferulic acid synthesis yield prediction and process optimization method, which is used for fusing Morgan chemical fingerprints of molecules with physical process parameters (temperature, time and the like) to construct a unified feature matrix, introducing Bayesian optimization to automatically find the optimal configuration of 10 algorithms, eliminating the subjectivity of artificial tuning, realizing the rapid optimization from small sample data to global optimal point through the data flow direction of prediction, verification and feedback, revealing the collaborative evolution rule among variables instead of the linear superposition of single variables by utilizing the interpretability analysis, and realizing the intelligent optimization paradigm conversion of the ferulic acid synthesis process from experience driving to data driving.
Inventors
- BAI GUANGHAI
- ZHOU YUHUI
- CHEN RAN
- FENG JIAN
- Yue Penghui
- ZHANG TAO
- LIN ZHENYANG
Assignees
- 西安科技大学
Dates
- Publication Date
- 20260505
- Application Date
- 20260408
Claims (10)
- 1. The ferulic acid synthesis yield prediction and process optimization method based on machine learning is characterized by comprising the following steps of: S1, collecting ferulic acid synthesis experimental data, constructing an initial yield data set, preprocessing the experimental data, and extracting features to obtain a unified feature matrix; s2, training data through a plurality of machine learning algorithms based on a unified feature matrix, performing parameter tuning through a Bayesian optimization framework, comprehensively evaluating the performance of the model by adopting a multi-dimensional evaluation index, and selecting a model with optimal fitting precision and generalization capability; S3, predicting candidate formulas in a design space based on the optimal prediction model, generating a high-yield scheme, performing experimental verification, and feeding new data back to the model for iterative retraining; S4, analyzing the contribution weight of each reaction condition to the yield and the interaction among variables by adopting a SHAP algorithm and a partial dependence graph to obtain an optimal experimental parameter interval.
- 2. The machine learning based ferulic acid synthesis yield prediction and process optimization method of claim 1, wherein the initial yield dataset in step S1 includes reaction raw materials and process parameters as input features, synthesis yield as target variables.
- 3. The machine learning-based ferulic acid synthesis yield prediction and process optimization method of claim 2, wherein the step of S1, collecting experimental data of ferulic acid synthesis, constructing an initial yield dataset, preprocessing the experimental data, extracting features, and obtaining a unified feature matrix, comprises the following steps: S101, collecting experimental sample data covering raw material proportion and process conditions by a system, and taking a target performance index as a supervision signal; s102, eliminating the dimensional difference of physical parameters through standardized processing, and simultaneously converting a molecular structure into a digital representation by using a chemical informatics tool; S103, integrating physical parameters and chemical characteristics to construct a unified multi-mode characteristic matrix.
- 4. The machine learning-based ferulic acid synthesis yield prediction and process optimization method of claim 1, wherein the step S2 is based on a unified feature matrix, trains data through a plurality of machine learning algorithms, performs parameter tuning through a bayesian optimization framework, comprehensively evaluates model performance by using multidimensional evaluation indexes, and selects a model with optimal fitting precision and generalization capability, and specifically comprises the following steps: s201, establishing a basic model performance evaluation framework by transversely comparing the prediction performances of a plurality of typical regression algorithms; S202, adopting a Bayesian optimization strategy to carry out automatic super-parameter optimization on various algorithms; And S203, screening out an optimal prediction model through the determination coefficient R 2 , the average absolute error MAE and the root mean square error RMSE.
- 5. The machine learning-based ferulic acid synthesis yield prediction and process optimization method of claim 3, wherein the step of S3 predicting candidate formulas in a design space based on an optimal prediction model, generating a high yield scheme and performing experimental verification, and feeding new data back to the model for iterative retraining specifically comprises: S301, screening a high-yield potential formula of a preset candidate process space based on an optimal prediction model; s302, feeding back actual synthesized data to a database through experimental verification; s303, quickly converging to a high-efficiency process interval within the limited experiment times.
- 6. The machine learning based ferulic acid synthesis yield prediction and process optimization method of claim 3, wherein in step S1, 80 ferulic acid synthesis experimental samples are collected, the number of data added for each subsequent active learning cycle is 4, each sample is described by 8 input features, the experimental parameters comprise 6 reaction raw materials and 2 process parameters, and the target variable is yield; Wherein, the reaction raw materials are vanillin, malonic acid, piperazine, aniline, toluene and potassium carbonate, and the technological parameters are reaction temperature and reaction time.
- 7. The machine-learning-based ferulic acid synthesis yield prediction and process optimization method of claim 6, wherein the step S102 of eliminating the dimensional difference of the physical parameters by the normalization process and simultaneously converting the molecular structure into the digital representation by using the chemoinformatics tool comprises the following steps: The Z-score standardization is carried out on 8 experimental parameters, the experimental parameters are mapped to the [0,1] interval, a RDKit tool package is utilized to convert a raw material SMILES character string into 1024-bit Morgan molecular fingerprint, and the raw material SMILES character string is combined with the standardized reaction characteristics to construct an initial characteristic matrix containing 1032-dimensional information.
- 8. The machine learning-based ferulic acid synthesis yield prediction and process optimization method of claim 4, wherein the step S2 is based on a unified feature matrix, trains data through a plurality of machine learning algorithms, performs parameter tuning through a bayesian optimization framework, comprehensively evaluates model performance by using multidimensional evaluation indexes, and selects a model with optimal fitting precision and generalization capability, and specifically comprises the following steps: Evaluating and comparing gradient lifting decision trees, random forests, extreme gradient lifting, support vector regression, self-adaptive lifting algorithm, category characteristic lifting algorithm, K-nearest neighbor algorithm, linear regression, decision tree and ridge regression model, taking MAE (maximum likelihood of occurrence) of minimum 5-fold cross verification as a target by means of Optuna Bayes optimization framework, automatically searching for optimal configuration in a preset parameter space through 50 rounds of iterative search, and selecting the random forests as optimal yield prediction models.
- 9. A computer readable storage medium for storing a computer program, wherein the computer program when executed by a processor implements the steps of the machine learning based ferulic acid synthesis yield prediction and process optimization method of any of claims 1-8.
- 10. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the machine learning based ferulic acid synthesis yield prediction and process optimization method steps of any of claims 1-8 when the computer program is executed.
Description
Ferulic acid synthesis yield prediction and process optimization method based on machine learning Technical Field The invention belongs to the technical field of intersection of chemical informatics and machine learning, and relates to a ferulic acid synthesis yield prediction and process optimization method based on machine learning. Background Ferulic acid (Ferulic Acid, FA) is a phenolic compound widely existing in plant cell walls, and has important application value in the fields of medicine, food and cosmetics due to the excellent biological activities such as antioxidation, anti-inflammatory, antibiosis, anticancer and the like. FA, for example, is a precursor for natural preservatives, preservatives and vanillin synthesis and is in great demand in the food industry market. FA is also increasingly favored in the development of high-end cosmetics for sun protection, whitening, etc. by virtue of its efficacy in inhibiting melanin formation, resisting photoaging, and promoting wound healing. In view of the wide industrial demand and various biological activities, the development of an efficient ferulic acid acquisition path has important practical significance. Currently, the pathway of Ferulic Acid (FA) acquisition mainly includes natural extraction, biosynthesis and chemical synthesis. The natural extraction method is limited by the problems of long period, low yield, high cost and the like, and the biosynthesis method has the potential of green and high efficiency, but is difficult to realize large-scale industrialized application at present. In contrast, chemical synthesis methods (such as Knoevenagel condensation, perkin and Wittig-Horner reactions) have become the main way of mass production by virtue of short production cycle, controllable cost and the like. However, conventional chemical process optimization is highly dependent on time-consuming trial-and-error experiments, and complex nonlinear interactions between process parameters are difficult to resolve, resulting in low development efficiency, and the need for introducing advanced optimization strategies is urgent. Although machine learning driven methods are of increasing interest in the multidisciplinary crossover field, their application in the optimization of synthesis processes for fine chemicals such as Ferulic Acid (FA) has not been reported. The synthesis of ferulic acid involves multiple variables such as raw material ratio (vanillin, malonic acid), catalyst type and dosage (such as piperazine, aniline), solvent (toluene), reaction temperature and time. There is a complex nonlinear coupling relationship between these factors, and it is difficult for traditional statistical methods or simple univariate analysis to accurately characterize the complex interactions of such multidimensional spaces. Traditional process optimization mainly relies on experience of experimenters to test errors, a large number of repeated experiments are often needed to find the optimal yield, the research and development period is extremely long, a large amount of chemical reagents are consumed, and the production cost is high. Therefore, in order to effectively cope with the multivariable coupling challenges in complex reaction systems such as Knoevenagel condensation, development of a machine learning intelligent optimization paradigm for ferulic acid is highly needed. Disclosure of Invention Based on the technical problems in the prior art, the invention provides a machine learning-based ferulic acid synthesis yield prediction and process optimization method. According to a first aspect of an embodiment of the present invention, a machine learning-based ferulic acid synthesis yield prediction and process optimization method is provided. Specifically, the method comprises the following steps: S1, collecting ferulic acid synthesis experimental data, constructing an initial yield data set, preprocessing the experimental data, and extracting features to obtain a unified feature matrix; s2, training data through a plurality of machine learning algorithms based on a unified feature matrix, performing parameter tuning through a Bayesian optimization framework, comprehensively evaluating the performance of the model by adopting a multi-dimensional evaluation index, and selecting a model with optimal fitting precision and generalization capability; S3, predicting candidate formulas in a design space based on the optimal prediction model, generating a high-yield scheme, performing experimental verification, and feeding new data back to the model for iterative retraining; S4, analyzing the contribution weight of each reaction condition to the yield and the interaction among variables by adopting a SHAP algorithm and a partial dependence graph to obtain an optimal experimental parameter interval. On the basis of the above scheme, the initial yield dataset in step S1 includes as input features the reaction starting materials and process parameters, with the synthes