CN-122024909-A - Multifunctional catalyst performance prediction method based on high-throughput calculation and machine learning
Abstract
The invention discloses a multifunctional catalyst performance prediction method based on high-throughput calculation and machine learning, which relates to the technical field of catalytic material prediction, and comprises the following steps of acquiring basic structure information of a target catalyst, correspondingly calculating catalytic performance data and related characteristic data of the target catalyst, establishing a data set by pairing the basic structure information and the catalytic performance data, guiding the data set into a plurality of machine learning models for training, performing super-parameter tuning by utilizing multi-objective Bayesian optimization, guiding the data set into an interpretable machine learning model SISSO for multitasking by utilizing SHAP value sorting based on the screened optimal model, acquiring an explicit mathematical relation between a characteristic combination and the catalytic performance as a descriptor formula based on the interpretable machine learning model SISSO and through a symbol regression method, and realizing rapid prediction of the catalytic performance of the catalyst in multiple aspects, thereby greatly improving the research and development efficiency of the catalyst.
Inventors
- WANG TIANSHUAI
- CUI KAI
- ZHANG QIUYU
Assignees
- 西北工业大学
Dates
- Publication Date
- 20260512
- Application Date
- 20260328
Claims (10)
- 1. The method for predicting the performance of the multifunctional catalyst based on high-throughput calculation and machine learning is characterized by comprising the following steps: S1, constructing a data set, namely acquiring basic structure information of a target catalyst from a public database, correspondingly calculating catalytic performance data and related characteristic data of the target catalyst, and pairing the catalytic performance data and the related characteristic data to construct the data set; S2, machine learning modeling and evaluation, namely dividing a data set into a training set and a testing set, respectively importing the training set and the testing set into a plurality of machine learning models for training, performing super-parameter tuning on the plurality of machine learning models by utilizing multi-objective Bayesian optimization, searching for an optimal parameter combination, and evaluating a model fitting effect; S3, interpretive machine learning, namely based on the screened optimal model, screening key features affecting the performance of the catalyst by utilizing SHAP value sequencing, and importing the key features into an interpretive machine learning model SISSO for multitasking training after de-duplication processing; S4, constructing and applying descriptors, namely searching an explicit mathematical relation between a feature combination and catalytic performance in a feature space as a descriptor formula based on an interpretable machine learning model SISSO, simultaneously predicting a plurality of key performances of a target catalyst by using a single descriptor formula, establishing a curve of which the performances change along with the features according to the descriptor formula, searching the features corresponding to the optimal solution, and searching a multifunctional catalyst with potential excellent performances according to the characteristics.
- 2. The method for predicting the performance of a multifunctional catalyst based on high-throughput computing and machine learning according to claim 1, wherein the calculating the catalytic performance data and the related characteristic data of the target catalyst in S1 is specifically as follows: The method comprises the steps of obtaining basic structure information of a target catalyst from a public database, constructing a catalyst surface model, obtaining an adsorption configuration by using a first sexual principle, searching an optimal adsorption configuration, obtaining adsorption energy, searching a reaction path and obtaining a reaction energy barrier; Relevant characteristic data of the target catalyst are calculated, namely, catalytic activity center characteristics of the target catalyst are extracted from the public database, the catalytic activity center characteristics comprise physical characteristics, and electronic structural characteristics are obtained through calculation.
- 3. The high throughput computing and machine learning based multi-functional catalyst performance prediction method of claim 2, wherein said physical characteristics include electronegativity, atomic mass, atomic radius, fermi level, first ionization energy, electron affinity energy, valence electron number; the electron structural characteristics comprise electron filling numbers of upward and downward electron filling numbers of the outermost layer orbit spin of atoms, a band center of the outermost layer orbit and bond lengths.
- 4. The method for predicting performance of a multifunctional catalyst based on high-throughput computing and machine learning according to claim 2, wherein the data set is established in S1, specifically as follows: Preprocessing related characteristic data, analyzing the correlation among characteristics based on the related characteristic data and introducing pearson correlation coefficient, screening out the characteristic of high redundancy of information, and then converting the original data into standardized values with dimensionless and countless magnitude differences by using standardized processing; and pairing the catalytic performance data with the processed related characteristic data to establish a data set.
- 5. The method for predicting performance of a multifunctional catalyst based on high-throughput computing and machine learning according to claim 1, wherein the machine learning modeling in S2 is specifically as follows: the machine learning model adopts at least one of random forest, gradient lifting, self-adaptive lifting, automatic correlation determination regression and support vector regression; When the model is super-parameter optimized by using multi-objective Bayesian optimization, at least two of maximizing prediction precision, maximizing robustness, maximizing sparsity, minimizing prediction uncertainty and minimizing model complexity are selected as optimization indexes.
- 6. The method for predicting the performance of the multifunctional catalyst based on high-throughput computing and machine learning according to claim 5, wherein when the super-parameter tuning is performed on the model by using multi-objective Bayesian optimization, a physical and chemical principle is used as a constraint condition to be fused into an optimization process, so that the model prediction accords with the basic rule of electrochemical reaction, and the introduced physical and chemical principle constraint condition is selected from a Sabatier principle or a Bronsted-Evans-Polanyi relationship.
- 7. The method for predicting the performance of a multifunctional catalyst based on high-throughput computing and machine learning according to claim 1, wherein the key features affecting the performance of the catalyst in S3 are as follows: and extracting the first ten features with the greatest influence on the catalytic performance in the optimal machine learning model corresponding to each catalytic performance according to SHAP value sequencing.
- 8. The method for predicting performance of a multifunctional catalyst based on high-throughput computing and machine learning of claim 7, wherein the multi-task training performed in the interpretable machine learning model SISSO in S3 is as follows: Summarizing the first ten features with the greatest influence on each catalytic performance, screening out repeated items, taking the screened residual features as input data, taking the corresponding catalytic performance as output data, constructing a new data set, and inputting the data set into an interpretable machine learning model SISSO for multitasking training.
- 9. The method for predicting the performance of the multifunctional catalyst based on high-throughput computing and machine learning as claimed in claim 1, wherein the construction and application of the descriptor in S4 is as follows: when training using the interpretable machine learning model SISSO, the interpretable machine learning model SISSO parameters include a multitasking number, a number of features, a set of mathematical operators, dimensions, feature complexity, a set of feature dimensions, for determining whether the selected descriptor formula has a definite mathematical expression and conforms to a physicochemical intuitive structure-activity relationship; Evaluating the descriptor formula and selecting an optimal mathematical expression as a final descriptor; and establishing an activity trend graph according to the descriptor formula, searching physical characteristics corresponding to local highest points or global highest points of the curve, and searching materials corresponding to the actual materials.
- 10. A high throughput computing and machine learning based multi-functional catalyst performance prediction system for use in the method of any one of claims 1-9, comprising: The model construction unit is used for constructing a catalyst model, acquiring characteristic data, utilizing the catalytic performance data obtained by calculation simulation as output data, combining the screened characteristic data as input data, and training to obtain machine learning association models aiming at different catalytic demands; the verification evaluation unit optimizes and evaluates the model by using K-fold cross verification and screens out a machine learning model with optimal prediction performance and feature importance ranking; The descriptor generating unit is used for integrating the features with the importance being ranked at the front in the optimal model, removing repeated items, inputting the repeated items into the interpretable machine learning model SISSO, obtaining mathematical expression models between the features and the catalytic performance under different dimensions and feature complexity, evaluating the accuracy of the mathematical expression models, screening out the model with the most accurate prediction as a final descriptor, and predicting the catalytic performance of other catalysts by using the descriptor.
Description
Multifunctional catalyst performance prediction method based on high-throughput calculation and machine learning Technical Field The invention belongs to the technical field of catalytic material prediction, and particularly relates to a multifunctional catalyst performance prediction method based on high-throughput calculation and machine learning. Background Catalysis is a basic stone of the modern chemical industry, and with the aid of the catalyst, raw materials can be gradually converted into high-added-value products. There is a significant difference in the catalytic activity of the different catalysts, which directly determines the kinetic rate of the reaction and the depth of reaction. Suitable catalysts can reduce the conditions required for chemical reactions to occur, allowing product production and energy conversion to be achieved under relatively relaxed conditions. For example, room temperature sodium sulfur batteries (RT-SSB) are considered to be the most promising large-scale energy storage technology because of their advantages of high theoretical energy density (1274 Wh Kg -1), abundant natural resources, and low cost. However, its practical application still faces many challenges, mainly including the "shuttling effect" caused by polysulfide dissolution and the high energy barrier associated with liquid-solid phase (Na 2S4→Na2S2) and solid-solid phase (Na 2S2→Na2 S) transformations during discharge, leading to slow overall reaction kinetics. To solve the above problems, the introduction of a catalyst has proven to be an effective strategy. However, the development of traditional catalysts relies mainly on empirically driven "trial and error" methods, i.e. screening materials by repeated synthesis, characterization and performance testing. This approach is not only time consuming and consumable, but also highly dependent on the personal experience of the researcher, making the rational design of the catalyst slow to progress, often limited to the intuitive knowledge of the chemist. In recent years, with the gradual maturity of theoretical calculation methods and the remarkable improvement of calculation capability, researchers have been able to predict catalytic performance by means of first-principle calculation and other methods and explore reaction mechanisms. However, the complexity of the actual reaction conditions places higher demands on the catalyst, and single-function catalysts have been difficult to meet the actual demands. In practical reaction systems, the pursuit of a high reaction rate alone is often insufficient to meet application requirements, and the catalyst is also required to have various functional characteristics to adapt to complex reaction environments. For example, for an intermediate product that is easily dissolved during the reaction, the catalyst needs to have a specific adsorption capacity to limit it to the reaction interface, prevent the loss of active substances and maintain the continuous progress of the reaction. Thus, for a specific reaction, it is necessary to comprehensively consider the characteristics thereof, and a catalyst having a plurality of functions is sought, which makes the development of a "multi-function catalyst" a necessary choice. The catalyst needs to cooperatively process continuous intermediate adsorption and conversion processes, so that dynamic migration is very easy to occur in a rate control step in a reaction network, and a reaction mechanism is difficult to accurately define. The complexity makes it difficult to accurately screen out the catalyst which can meet the application requirements in many aspects by simply relying on the first principle. Therefore, there is a need to develop a new method capable of accurately predicting a multifunctional catalyst. In this context, machine learning methods are becoming an important tool to accelerate catalyst design because they can automatically learn from data and optimize decisions. Thanks to the continuous evolution of the algorithm model and the remarkable enhancement of the computing processing capacity, the current machine learning algorithm can efficiently mine rules from huge experimental data and theoretical computing results, and the prediction of unknown results is realized through the constructed data model. However existing machine learning algorithms (e.g., random forests, neural networks, etc.) generally fall into a typical black box model. Although the model can realize higher prediction precision through training mass data, the internal characterization mechanism is extremely complex and highly abstract, and an explicit physical mapping relation is difficult to establish between input variables and output results of the model. In other words, such black box models often lack explicit interpretability, and users cannot intuitively learn key feature factors affecting prediction results and specific weight contributions thereof from the models, so that intrinsic rules