CN-121983178-A - Machine learning-based biomass-based bifunctional catalyst screening method

CN121983178ACN 121983178 ACN121983178 ACN 121983178ACN-121983178-A

Abstract

The invention provides a biomass-based bifunctional catalyst screening method based on machine learning, which comprises the steps of obtaining multi-source catalyst data, cleaning, dividing and standardizing, performing mechanism and data fusion feature engineering, constructing a core feature set containing specific derivative features, adopting a machine learning framework of fusion transfer learning, meta learning and Bayesian optimization, training a high-performance catalyst performance prediction model by using the core feature set, and finally screening candidate catalysts and outputting results by using the model. According to the method, the efficient and accurate screening of the Ni-Ca dual-function catalyst in a small sample scene is realized through multi-source data integration, mechanism-data fusion characteristic engineering and a small sample adaptation multi-algorithm fusion framework.

Inventors

PENG NANA
Zhen Jiazhe
WANG QIANG
GAI CHAO

Assignees

北京林业大学

Dates

Publication Date: 20260505
Application Date: 20260126

Claims (10)

1. The biomass-based bifunctional catalyst screening method based on machine learning is characterized by comprising the following steps of: The method comprises the steps of obtaining multisource catalyst data comprising a biomass-based catalyst component formula, preparation parameters, structural characteristics and performance indexes, and performing cleaning, dividing and standardization treatment on the data to form a training data set; Extracting and screening basic characteristics related to the performance of the catalyst from the preprocessed data, generating derivative characteristics capable of strengthening the relevance of the characteristics and performance indexes based on the reaction mechanism of the bifunctional catalyst, and constructing a core characteristic set for model training, wherein the basic characteristics at least comprise raw material characteristic characteristics, preparation process characteristics, structural characteristic characteristics and performance relevant characteristics; A machine learning framework integrating transfer learning, meta learning and Bayesian optimization is adopted, and the core feature set is utilized to train and super-parameter optimize the model so as to construct a catalyst performance prediction model; and inputting the characteristic data of the candidate catalyst into the catalyst performance prediction model to obtain a performance prediction result, screening out a target catalyst according to the prediction result, and outputting a screening result.
2. The method of claim 1, wherein the data cleaning comprises identifying outliers using a quartile range method and verifying in combination with knowledge of the catalyst domain; and adopting a K neighbor algorithm, matching neighbor samples of missing data based on the feature similarity, calculating a missing value through weighted average, and filling the missing value.
3. The method of claim 2, wherein the data partitioning includes partitioning the cleaned data according to a sample size dynamic selection partitioning strategy, specifically, partitioning the training set and the test set by a leave-one-out method when a total sample size is smaller than a predetermined threshold, and randomly partitioning the training set and the test set according to a preset ratio when the total sample size is greater than or equal to the predetermined threshold.
4. The method of claim 1, wherein extracting and screening basic features and derivative features associated with catalyst performance from the preprocessed data, and constructing a core feature set for model training comprises at least applying a mechanism-oriented and data-driven combined strategy to the preprocessed data, specifically: retaining preset features directly related to catalytic activity and selectivity from the basic features and derivative features based on a catalyst reaction mechanism; And secondly, selecting the characteristics with the contribution degree exceeding a preset threshold value from the residual characteristics by using a SHAP value analysis method so as to jointly form the core characteristic set.
5. The method according to claim 1, characterized in that the method further comprises the generation of derived features, in particular: Generating a Ni 0 site density/CaO content ratio based on the synergistic effect of Ni 0 and CaO; generating a specific surface area-activation temperature correlation value based on the correlation of the preparation process and the structural characteristics; Based on the correlation of feedstock characteristics with catalytic performance, a C/H ratio-ash content interaction term is generated.
6. The method of claim 1, wherein training the model and optimizing the super parameters using the core feature set to construct the catalyst performance prediction model using a machine learning framework that incorporates transfer learning, meta-learning and bayesian optimization comprises a transfer learning training process, specifically: selecting XGBoost regression models as basic models, and pre-training by adopting catalyst data covering characteristic-performance association rules of various metal oxide catalysts; and fixing core structure parameters of the model obtained by pre-training, taking the core feature set as input, and continuing training the model by adjusting the learning rate and the number of newly added trees, wherein the core structure parameters comprise the learning rate and the number of newly added trees.
7. The method of claim 6, wherein training and super-parametric optimization of the model using the core feature set to construct the catalyst performance prediction model using a machine learning framework that incorporates transfer learning, meta-learning and bayesian optimization comprises a meta-learning training process, specifically: constructing a catalyst task generator to generate a plurality of simulated catalyst small sample tasks, each task comprising a support set for rapid fine tuning of the model and a query set for evaluation; And training and updating the meta-parameters alternately through an inner loop and an outer loop, wherein the inner loop carries out quick fine adjustment and calculates loss on the model parameters based on the support set, and the outer loop calculates the meta-loss and updates the meta-parameters based on the query set.
8. The method of claim 7, wherein the fine tuning comprises dividing real catalyst small sample data into a support set and a query set, and adapting the model to the real tasks by internal loop fine tuning using meta-parameters obtained by meta-training.
9. The method of claim 6, wherein using a machine learning framework that incorporates transfer learning, meta-learning and bayesian optimization to train and super-parameter optimize the model with the core feature set to construct a catalyst performance prediction model further comprises using a bayesian optimization algorithm to automatically optimize super-parameters in the transfer learning training process and/or the meta-learning training process with model prediction accuracy as an optimization objective.
10. The method according to claim 9, wherein using a bayesian optimization algorithm with model prediction accuracy as an optimization target, performing automatic optimization on super parameters in the transfer learning training process and/or the meta learning training process specifically includes: Taking a determination coefficient R 2 of the prediction model as a core optimization target, and taking a forward prediction error root mean square RMSE as a constraint condition; constructing a super-parameter-performance probability model based on a Gaussian process; adopting an expected improvement criterion as a sampling strategy, and performing iterative search in a preset super-parameter space to obtain an optimal super-parameter combination; and automatically configuring the obtained optimal super-parameter combination into a corresponding training process.

Description

Machine learning-based biomass-based bifunctional catalyst screening method Technical Field The invention relates to the technical field of intersection of catalyst screening and machine learning, and particularly provides a biomass-based bifunctional catalyst screening method based on machine learning. Background The biomass-based (such as Ni-Ca) bifunctional catalyst has wide application prospect in the biomass resource utilization field due to excellent catalytic activity and stability. The traditional catalyst screening method mainly relies on experimental trial and error, and a large number of orthogonal experiments are needed to optimize the catalyst formula and preparation parameters, so that the problems of long experimental period, high cost and low efficiency exist. With the development of machine learning technology, a data-driven catalyst screening method appears, rapid prediction of catalyst performance is realized by constructing a prediction model, and the experiment times are reduced. However, existing data-driven catalyst screening techniques still suffer from the following disadvantages: (1) The small sample has poor adaptability, the preparation and performance test of the biomass-based Ni-Ca catalyst are difficult, the cost is high, the available sample size is usually less (less than 30), the traditional machine learning model is easy to be fitted under the small sample data, the generalization capability is poor, and the prediction precision is difficult to ensure; (2) The prior art adopts a single machine learning algorithm, such as a neural network and a random forest, and the reaction mechanism of the catalyst and the learning characteristic of a small sample are not fully combined, so that the fitting capacity and the generalization capacity of a model are difficult to balance; (3) The interactive experience is poor, the interface design of the existing screening system is multi-adaptive to industrial production scenes, is not optimized aiming at the use habit of scientific researchers, is lack of visual flow monitoring, parameter configuration and result visualization functions, and has high operation threshold; (4) The data integration capability is weak, multi-source data such as laboratory data, literature data and the like are difficult to integrate effectively, the data utilization rate is low, and the improvement of the model performance is further limited. Therefore, the biomass-based Ni-Ca dual-function catalyst screening method which is adaptive to a small sample scene, integrates the advantages of multiple algorithms, is interactive and friendly and supports multi-source data integration is provided, and becomes an urgent need in the current catalyst research and development field. Disclosure of Invention The invention is proposed to overcome the above-mentioned drawbacks, and to solve or at least partially solve the problems of poor adaptability to small samples, single algorithm, poor interaction experience and weak data integration capability in the prior art. The invention provides a machine learning-based biomass-based bifunctional catalyst screening method, which comprises the following steps of: The method comprises the steps of obtaining multisource catalyst data comprising a biomass-based catalyst component formula, preparation parameters, structural characteristics and performance indexes, and performing cleaning, dividing and standardization treatment on the data to form a training data set; Extracting and screening basic characteristics related to the performance of the catalyst from the preprocessed data, generating derivative characteristics capable of strengthening the relevance of the characteristics and performance indexes based on the reaction mechanism of the bifunctional catalyst, and constructing a core characteristic set for model training, wherein the basic characteristics at least comprise raw material characteristic characteristics, preparation process characteristics, structural characteristic characteristics and performance relevant characteristics; A machine learning framework integrating transfer learning, meta learning and Bayesian optimization is adopted, and the core feature set is utilized to train and super-parameter optimize the model so as to construct a catalyst performance prediction model; and inputting the characteristic data of the candidate catalyst into the catalyst performance prediction model to obtain a performance prediction result, screening out a target catalyst according to the prediction result, and outputting a screening result. Preferably, the data cleaning comprises identifying abnormal values by a quarter-bit method and verifying by combining knowledge in the field of catalysts; and adopting a K neighbor algorithm, matching neighbor samples of missing data based on the feature similarity, calculating a missing value through weighted average, and filling the missing value. Preferably, the data partitioning comprises t