CN-122024930-A - Multi-algorithm fused toluene catalytic combustion catalyst rapid screening method
Abstract
The invention discloses a multi-algorithm fused toluene catalytic combustion catalyst rapid screening method, which relates to the field of catalytic materials and comprises the specific steps of collecting a document containing a single metal-metal oxide catalyst, extracting data and processing the data to form a standardized data set, constructing a multi-dimensional characteristic system, determining target variables, selecting 11 models, dividing the data set and adjusting parameters, screening an optimal model according to error indexes, mining key characteristics and association rules, inputting the parameters of the catalyst to be screened into the model for prediction, judging the combination condition, and determining that more than or equal to 75% and the characteristics meet the high-performance candidates.
Inventors
- HUANG HAIBAO
- Yi Zihua
- HE PEIZE
Assignees
- 新疆大学
Dates
- Publication Date
- 20260512
- Application Date
- 20260202
Claims (10)
- 1. A multi-algorithm fused toluene catalytic combustion catalyst rapid screening method is characterized by comprising the following specific steps: S100, data set construction, namely collecting a literature containing a single metal-metal oxide catalyst, extracting component composition, preparation parameters, test conditions and toluene conversion rate data of the single metal-metal oxide catalyst, and interpolating through structural, standardized and missing values to form a standardized data set; S200, constructing a multidimensional feature system, namely constructing a multidimensional input feature set based on a standardized data set, determining a target variable as toluene conversion rate, and forming a feature-target variable data set; S300, training and optimizing a multi-algorithm model, namely taking a characteristic-target variable data set as input, selecting 11 supervised regression models, dividing a training set and a test set according to the ratio of 8:2, carrying out standardized processing on numerical characteristics, designing a super-parameter search range aiming at each model, and carrying out parameter optimization on all models by combining grid search and 10-fold cross verification; S400, screening an optimal model and feature mining, namely comparing training sets, test sets and cross-validation performances of 11 models by taking average absolute errors and root mean square errors of cross-validation as indexes, and screening the optimal model; S500, rapidly screening the catalyst, namely after characteristic parameters of the catalyst to be screened are arranged and standardized according to a characteristic system, inputting a stored optimal model to predict the toluene catalytic combustion conversion rate, and judging the performance grade of a model prediction result by combining a catalytic reaction physical constraint condition and an excavated key characteristic threshold value, wherein when the toluene conversion rate prediction value of the catalyst to be screened is more than or equal to 75%, and the key characteristic accords with a preset threshold value range, the catalyst is judged to be a high-performance candidate catalyst.
- 2. The method for rapidly screening the multi-algorithm fused toluene catalytic combustion catalyst according to claim 1, wherein in the construction of the S100 dataset, the literature collection of the single metal-metal oxide-containing catalyst is carried out through Web of Science core collection databases and general literature in the toluene catalytic oxidation field, search keywords are tolene toluene, metal oxide and CATALYTIC OXIDATION catalytic oxidation, the collected data comprise component composition preparation parameter test conditions and toluene conversion rate, wherein the component composition comprises metal oxide carrier type load metal types and load metal mole fractions, the preparation parameter comprises catalyst synthesis method calcination temperature calcination time and calcination atmosphere, the test conditions comprise reaction temperature catalyst mass reaction airspeed and toluene content, and the toluene conversion rate is a specific conversion percentage value under corresponding test conditions.
- 3. The method for rapidly screening the multi-algorithm fused toluene catalytic combustion catalyst according to claim 1 is characterized in that in the construction of the S200 multi-dimensional characteristic system, a multi-dimensional input characteristic set is constructed by integrating a multi-source tool and a database on the basis of a standardized dataset and following an availability characterization uniqueness physical completeness principle, wherein element physicochemical characteristics are extracted through Matminer and Magpie kits, dimensional physical property type and electronic property type characteristics are covered, a multi-component system is subjected to mass weighted average treatment, a valence electronic structure characteristic is constructed through a ElectronicStructure module of Pymatgen and comprises an orbit filling characteristic electron energy level statistical characteristic and a coding characteristic, a thermodynamic stability characteristic is obtained on the basis of a FToxid database and comprises Gibbs free energy decomposition temperature lattice energy phase transition temperature and oxygen vacancy energy formation energy generated by standard formation enthalpy standard formation gibbs of metals and metal oxides, experimental preparation and test parameter characteristics are manually extracted from the standardized dataset, a category type variable is subjected to One-hot encoding, a method for synthesizing metal oxide preparation method and test condition related parameters are covered by a material proportioning post-treatment parameter, and the catalyst structure characteristic is extracted through literature characterization data, and comprises spatial aperture average specific surface area of metal oxide and total aperture volume average surface area of the metal oxide.
- 4. The method for rapidly screening the multi-algorithm fused toluene catalytic combustion catalyst according to claim 1 is characterized in that in the construction of the multi-dimensional characteristic system, a target variable is determined to be the basis of toluene conversion rate, the parameter is a core index for toluene catalytic combustion performance evaluation, specific conversion percentage values under corresponding test conditions are recorded in standardized data sets, the method has data integrity and usability, the characteristic-target variable data set is formed by associating multi-dimensional input characteristics with target variables according to sample dimensions, element physicochemical characteristics, valence electronic structural characteristics, thermodynamic stability characteristics, experimental preparation and test parameter characteristics and catalyst structural characteristics of each sample are strictly corresponding to toluene conversion rate under the same test condition, a unified data format is a numerical value type, abnormal samples of which the characteristics cannot be matched with the target variable are removed, and finally a structured characteristic-target variable data set is formed, wherein the multi-dimensional input characteristics comprise element physicochemical characteristics, valence electronic structural characteristics, thermodynamic stability characteristics, experimental preparation and test parameter characteristics and multiple dimensions in the catalyst structural characteristics.
- 5. The method for rapidly screening the multi-algorithm fused toluene catalytic combustion catalyst according to claim 1, wherein in the training optimization of the S300 multi-algorithm model, the 11 selected supervised regression models are specifically random forest regression, decision tree regression, extreme random tree regression, gradient lifting regression, minimum absolute shrinkage and selection operator regression, partial least square regression, support vector regression, ridge regression, elastic network regression, extreme gradient lifting regression and Bayesian ridge regression.
- 6. The method for rapidly screening the toluene catalytic combustion catalyst fused by multiple algorithms is characterized in that in the training optimization of the S300 multiple algorithm model, parameters are optimized by combining grid search and 10-fold cross validation, the divided training set is equally divided into 10 mutually exclusive subsets according to a 10-fold cross validation rule, 9 subsets are used as training subsets and 1 subset are used as validation subsets, 10 rounds of training and validation are alternately completed to cover all subsets, then the respective algorithm characteristics and catalysis performance of 11 supervision regression models are used for predicting scene demands, reasonable hyper-parameter search ranges are respectively designed, the hyper-parameters cover model complexity, regularization strength and kernel function parameter key dimensions, then grid search is started, the model is trained one by one in a 10-fold cross validation process according to preset hyper-parameter combinations, cross validation average absolute errors and root mean square errors corresponding to each group of hyper-parameters are synchronously recorded, after all the hyper-parameter combinations are traversed, the hyper-parameter combinations with the minimum cross validation average absolute errors and the root mean square errors are screened out to be used as optimal parameters, and finally the optimal parameters are used for retraining the complete model on the complete training set.
- 7. The method for rapidly screening a multi-algorithm fused toluene catalytic combustion catalyst according to claim 1, wherein in the screening of the S400 optimal model, the mathematical expression of the cross-validated mean absolute error MAE is: root mean square error RMSE is a mathematical expression: , wherein, The number of samples of the training set, the test set or the cross-validation subset respectively, As a true value of the toluene conversion, And respectively calculating the MAE and the RMSE of 11 supervised regression models in a training set, a testing set and cross verification when comparing, and screening out the model with the minimum cross verification MAE and the minimum RMSE as an optimal model.
- 8. The multi-algorithm fused toluene catalytic combustion catalyst rapid screening method according to claim 1 is characterized in that in the S400 optimal model screening, feature mining specifically adopts a mode of combining feature importance ranking, a part of dependence graphs and SHAPLEY ADDITIVE exPlanations interpretation models, a key feature screening standard is characterized in that features with obvious monotonous association or nonlinear association with toluene conversion rate are verified through the part of dependence graphs, association rules of the key features and the toluene conversion rate are characterized through SHAP value quantification, and a dynamic trend that the contribution direction and the contribution strength of a single feature to the toluene conversion rate change along with a feature value is determined.
- 9. The multi-algorithm fused toluene catalytic combustion catalyst rapid screening method according to claim 1 is characterized in that in the S500 catalyst rapid screening, an optimal model is a fusion model of a screened optimal supervised regression model and a catalytic reaction physical constraint condition, wherein the physical constraint condition comprises a catalyst thermodynamic stability boundary, an oxygen vacancy formation reasonable interval and a reaction dynamics rate limiting condition.
- 10. The method for rapidly screening a multi-algorithm fused toluene catalytic combustion catalyst according to claim 1, wherein in the rapid screening of the S500 catalyst, a key characteristic threshold is determined based on an SHAP value analysis result and a catalytic reaction mechanism, and specifically comprises the steps of forming oxygen vacancies with an energy of 2-4 eV, a specific surface area of the catalyst being more than or equal to 50 m < 2 >/g, and a reaction airspeed being less than or equal to 20000 The calcination temperature is 300-600 ℃, the molar fraction of the supported metal is 0.5-3%, the standard generation Gibbs free energy of the metal oxide carrier is less than or equal to-200 kJ/mol, and the average pore diameter of the catalyst is 2-20 nm.
Description
Multi-algorithm fused toluene catalytic combustion catalyst rapid screening method Technical Field The invention relates to the field of catalytic materials, in particular to a multi-algorithm fused toluene catalytic combustion catalyst rapid screening method. Background Toluene is a common volatile organic pollutant in industrial production and daily life, widely comes from industrial emissions such as chemical industry, coating, printing and the like, if the emission is not effectively treated and is directly emitted, the pollution to the atmospheric environment is caused, and the human health is possibly endangered, so that the efficient purification of toluene is an important research direction in the environment-friendly field, the catalytic combustion method has the advantages of high purification efficiency, low energy consumption, no secondary pollution and the like, becomes a key technical means for treating toluene waste gas at present, the catalyst is used as a core component of the technology, the performance of the catalyst directly determines the efficiency and the stability of toluene catalytic combustion, wherein the single metal-metal oxide catalyst is widely focused in toluene catalytic combustion research due to the characteristics of relatively low preparation cost, strong catalytic activity adjustability and the like, and a large amount of literature data about the single metal-metal oxide catalyst is continuously accumulated along with the continuous deep research of related research, the data such as component composition, preparation parameters, test conditions, toluene conversion rate and the like of the catalyst are covered, a foundation is provided for developing catalyst screening in a data driving mode, and how to establish a high-efficiency and accurate screening method based on the data is further important to promote the development of toluene catalytic combustion technology in various aspects. The traditional toluene catalytic combustion catalyst screening method is mainly characterized in that experimental trial and error is used as a core, research and development personnel need to repeatedly prepare catalyst samples with different components and different parameters according to experience, and toluene conversion performance is tested through a large number of experiments, the whole process needs to consume a large amount of reagents, equipment resources and labor cost, the problem of long research and development period is solved, the requirement of the actual application on a high-performance catalyst is difficult to respond quickly, meanwhile, the traditional screening method is often insufficient in consideration of catalyst performance influence factors, usually only focuses on component composition or single preparation parameter of the catalyst, the comprehensive effect of multidimensional characteristics such as element physicochemical characteristics, valence electronic structure, thermodynamic stability and catalyst microstructure on the catalytic performance is ignored, so that the intrinsic correlation between each factor and toluene conversion rate cannot be captured by a system, and in addition, the screening method based on a model is also mostly used for adopting a single regression algorithm, the model parameter lack of system optimization and the comparison verification of the multi-model performance are insufficient, the accuracy and stability of model prediction are difficult to guide the screening of the high-performance catalyst reliably, and the research and development efficiency of the catalyst is further improved. Disclosure of Invention The invention aims to make up the defects of the prior art, and provides a multi-algorithm fused toluene catalytic combustion catalyst rapid screening method which comprises the steps of constructing a standardized data set through collecting documents, further constructing a multi-dimensional characteristic system, selecting 11 supervised regression models for training and optimizing, screening an optimal model through indexes such as average absolute errors, mining key characteristics and association rules, finally inputting characteristic parameters of a catalyst to be screened into the optimal model to predict conversion rate, and judging whether the catalyst is a high-performance candidate catalyst or not by combining physical constraint conditions and key characteristic thresholds, so that screening efficiency and accuracy are improved. The invention provides a multi-algorithm fused toluene catalytic combustion catalyst rapid screening method for solving the technical problems, which comprises the following specific steps: S100, data set construction, namely collecting a literature containing a single metal-metal oxide catalyst, extracting component composition, preparation parameters, test conditions and toluene conversion rate data of the single metal-metal oxide catalyst, and interpolating through struct