Search

CN-122021311-A - Theory and experiment-based substance iterative exploration method and system

CN122021311ACN 122021311 ACN122021311 ACN 122021311ACN-122021311-A

Abstract

The invention discloses a substance iterative exploration method and system based on theory and experiment, and relates to the technical field of experimental scheme generation. According to the material iterative exploration method based on theory and experiment, the complete closed loop of the optimization strategy feedback update model is generated by constructing the modeling prediction based on initial parameters and optimization instructions, screening recommended experimental parameters by combining experimental feasibility constraint, scheduling resource execution experiments, collecting data and aligning analysis data, the cleavage barriers of the theory and the experiment are broken, the dynamic synergy of the theory and the experiment is realized, blind experiments can be reduced through accurate model prediction and parameter recommendation, the pertinence and success rate of the experiments are improved, the prediction precision can be continuously improved through reverse optimization of the experimental data on the model, and finally the material scientific space exploration period is effectively compressed, so that the resource loss and cost investment in the research and development process are remarkably reduced.

Inventors

  • JIANG JUN
  • Zhu Shuoying
  • WANG CHUNLONG
  • LI HAILIN

Assignees

  • 中国科学技术大学

Dates

Publication Date
20260512
Application Date
20260130

Claims (10)

  1. 1. The iterative substance exploration method based on theory and experiment is characterized by comprising the following steps: Constructing a material science simulation model based on the initial parameters and the optimization instructions, predicting material performance and structure, outputting a prediction result, and receiving an optimization strategy to update parameters and simulation logic of the material science simulation model; Screening and sequencing the prediction results through a multi-objective optimization algorithm in combination with experimental feasibility constraint to generate recommended experimental parameters; Scheduling experiment resources, executing experiments and collecting experimental process data based on recommended experiment parameters; and carrying out data alignment, error tracing and feature mining on the experimental process data and the prediction result to generate an optimization strategy.
  2. 2. The method for iterative exploration of materials based on theory and experiment of claim 1, wherein said initial parameters comprise material structure data and experimental condition parameters, said material structure data is converted into feature vectors by a feature engineering module, and said experimental condition parameters are standardized and used as supplementary features; The optimization instruction is used as a super-parameter constraint for training a material science simulation model; The material science simulation model is composed of a plurality of basic algorithms.
  3. 3. The iterative substance exploration method based on theory and experiment according to claim 2, wherein when the substance science simulation model adopts a parallel input mode, the process of constructing the substance science simulation model based on initial parameters and optimization instructions, predicting substance performance and structure and outputting a prediction result is as follows: preprocessing the feature vector and the supplementary features to generate an input matrix which can be interpreted by a model; parameter tuning is performed based on a super-parameter searching strategy in the optimizing instruction, and a candidate model set is generated; evaluating the performance of the candidate model through cross validation, and outputting a substance performance predicted value and a structural stability score of the candidate model; sequencing the prediction precision of each candidate model based on a preset index, and distributing fusion weights according to the precision; If the prediction results output by the candidate models are different, the prediction results passing through the structural rationality verification are preferentially adopted, or the consistent parts are combined and output to obtain the final prediction results; And if the difference does not exist, fusing the material performance predicted values output by the candidate models by adopting a weighted average method based on the distributed fusion weights to obtain a final predicted result.
  4. 4. The iterative substance exploration method based on theory and experiment according to claim 2, wherein when the substance science simulation model adopts a serial input mode, the process of constructing the substance science simulation model based on initial parameters and optimization instructions, predicting substance performance and structure and outputting a prediction result is as follows: taking original material structure data as input of a feature extraction model, outputting high-dimensional molecular embedded features as intermediate features, and transmitting the intermediate features to a performance prediction model; The performance prediction model receives the high-dimensional molecular embedded characteristics and experimental condition parameters of the characteristic extraction model, outputs a preliminary prediction result of the key performance of the substance, and marks a performance abnormal region; The structure optimization model is based on a performance abnormal region output by the performance prediction model, the material structure parameters of the preliminary prediction result are adjusted in a targeted manner, and the optimized prediction result and the performance improvement prediction value are output; And reversely transmitting the output of the structural optimization model to the performance prediction model, and updating feature extraction logic to form an iterative optimization closed loop.
  5. 5. The iterative discovery method of materials based on theory and experiment of claim 1, wherein the optimization strategy comprises initial discovery and iteration times, acquisition functions, model hyper-parametric search space definition, and cross-validation and early-stop mechanisms; The process of receiving the optimization strategy to update the parameters and simulation logic of the material science simulation model is as follows: Dynamically adjusting parameter searching distribution according to historical test results by using a Optuna-based TPE sampler; Adjusting parameters for model type differentiation; calculating verification scores of each parameter combination by using an objective function with preset indexes as targets; updating hpo _ trained _model parameters to replace the current model instance when the new trial score exceeds the historical optimum; slicing and screening the feature space of the reaction components, and extracting key features; Setting a dynamic search range while fixing reactants, and generating a feature matrix input by an adaptation model through SearchSpaceLoader; And (3) automatically switching model branches according to the molecular characteristic types to finish updating parameters and simulation logic of a material science simulation model.
  6. 6. The iterative discovery method of materials based on theory and experimentation of claim 1, wherein the experimental feasibility constraints comprise physicochemical conditions constraints, material properties constraints, experimental operation constraints, and safety and ethics constraints.
  7. 7. The iterative substance exploration method based on theory and experiment according to claim 1, wherein the process of screening and sequencing the prediction results by the multi-objective optimization algorithm to generate recommended experimental parameters is as follows: quantifying a target by using a model predicted value, and taking an experimental primitive constraint target and a safe feasibility target as multiple optimization targets; Rejecting prediction results which do not meet experimental feasibility based on expected_range and material compatibility; For any two predicted results which remain after the elimination, if the yield of one predicted result is not less than the yield of the other predicted result and the cost of the predicted result is not greater than the cost of the other predicted result, the other predicted result is dominated; reserving all non-dominant solutions to form pareto fronts as candidate experimental schemes; Weighting and fusing multiple optimization targets of candidate experimental schemes, ascending arrangement is carried out according to the comprehensive scores, if conflict targets exist in weight adjustment of each target, the thresholds of main performance targets are preferably met, ascending arrangement is carried out according to the cost, and the candidate schemes with the comprehensive scores of 10 are reserved; Extracting parameters from the candidate scheme, analyzing substances and conditions corresponding to the dynamic_indices, and generating recommended experimental parameters including substance components, operation conditions and auxiliary information by combining the fixed_components with the screened dynamic parameters.
  8. 8. The iterative discovery method of materials based on theory and experiment according to claim 1, wherein the process of scheduling experiment resources, executing experiments and collecting experimental process data based on recommended experiment parameters is: retrieving stock from a physical library according to the material components in the recommended experimental parameters, and verifying whether the hardware equipment meets the operation conditions; based on the comprehensive score ranking of the recommended experiment parameters, the high-priority experiment preferentially allocates resources; for conflict-free experiments, multithreaded parallel execution is realized through n_jobs: -1 configuration; if a certain experimental resource is temporarily unavailable, the compatibility constraint based on feature_groups is automatically replaced by an alternative substance; Triggering experimental equipment through a rear end interface of run_full_work flow.py, executing an experiment according to a preset program, and monitoring the state of the equipment in real time in the process; Continuous data are collected through a sensor in the reaction process, and a time sequence curve is recorded; Automatically stopping after reaching the preset reaction time, and separating and purifying the product; Detecting the product attribute, and obtaining a product structure through mass spectrum or X-ray diffraction; all data are associated with recommended parameter IDs, and are stored in the experimental record files under the output/catalog.
  9. 9. The iterative substance exploration method based on theory and experiment according to claim 1, wherein the process of carrying out data alignment, error tracing and feature mining on experimental process data and predicted results to generate an optimization strategy is as follows: molecular characteristics used in prediction are aligned with SMILES analysis results of actual substances in experiments; For the operation condition characteristics, mapping is established between the predicted expected_range and the experimental actual measurement value, and abnormal values exceeding the range are removed; Aligning the dynamic process data with a time stamp to enable the predicted reaction end point performance to be matched with an actual measurement value when the experiment is terminated; Calculating the deviation between the predicted value and the measured value, classifying the deviation, and analyzing the characteristic of abnormal contribution degree in a high-error sample by using a characteristic importance map generated by feature_ importance:true configuration: if unimol features of a certain type of catalyst are too high in weight in prediction but have large experimental deviation, the features may not be matched with an actual reaction mechanism; Judging whether errors are caused by feature missing or redundancy or not by referring to an abs_result; Aiming at experimental parameters exceeding expected_range, positioning sensitive conditions through feature_group_analysis, and checking whether the attention weight of the model to key experimental parameters is consistent with actual sensitivity; Calculating a prediction residual error based on the error tracing result, adding the prediction residual error as a new feature into a model, and capturing the uncovered information of the original feature; generating a characteristic correlation matrix by utilizing multi-feature-effect-analysis: true configuration, and finding out a synergistic effect; Performing correlation analysis on solvent parameters and reaction yield to generate new combination characteristics; extracting time sequence characteristics of time sequence data in an experimental process, and adding a model through custom_features configuration to replace static reaction time parameters; If the specific model performs poorly in the high-error sample, adjusting the tracking models _to_run configuration, stacking the models by adopting a best_enstable function, and reducing the deviation of a single model through dynamic weight adjustment; aiming at the error sensitive parameter range, the search space is reduced in bayesian _optimization, and the sampling density of the interval is increased; Based on the synergistic effect found by feature mining, fixing part of parameters in the reaction_components, supplementing missing features found by error tracing, and updating the per_smiles_col_generators configuration; And reducing the dimension of the redundant features and outputting an optimization strategy.
  10. 10. A material iterative discovery system based on theory and experiment, comprising: The prediction result output module is used for constructing a material science simulation model based on the initial parameters and the optimization instructions, predicting material performance and structure, outputting a prediction result, and receiving an optimization strategy to update parameters and simulation logic of the material science simulation model; the experimental parameter output module is used for screening and sequencing the prediction results through a multi-objective optimization algorithm in combination with experimental feasibility constraint to generate recommended experimental parameters; The data acquisition module is used for scheduling experiment resources, executing experiments and acquiring experimental process data based on recommended experiment parameters; And the optimization strategy generation module is used for carrying out data alignment, error tracing and feature mining on the experimental process data and the prediction result to generate an optimization strategy.

Description

Theory and experiment-based substance iterative exploration method and system Technical Field The invention relates to the technical field of experimental scheme generation, in particular to a material iterative exploration method and system based on theory and experiment. Background In a substance iterative exploration scene, the structure-activity relationship between a substance structure and performance is complex, experimental conditions (temperature, pressure, component proportion and the like) influence factors are multiple, and the ideal environment predicted by theory is not matched with the dynamic interference of an actual experiment and the data dimension (theoretical characteristics and experimental actual measurement data), so that the following defects are exposed in the prior art: firstly, in the prior art, a theoretical model is mostly built based on fixed parameters or a single algorithm, and parameters and simulation logic are dynamically adjusted without combining experimental actual measurement data, so that the deviation between a theoretical prediction result and actual experimental performance is large, the design of an experimental scheme cannot be accurately guided, the unidirectional output of theory and experiment is formed, and effective linkage is lacked. Secondly, the prior art lacks a closed-loop mechanism of theoretical prediction, experimental verification, data feedback and model optimization, experimental process data is not updated by a systematic back feeding theoretical model, experimental parameter selection is not fully fused with performance trend and practical feasibility constraint of theoretical prediction, so that theoretical separation from experimental reality and experimental lack of scientific guidance are caused, and blind trial and error is increased. Finally, in the prior art, error analysis and feature mining exist in isolation, systematic tracing is not performed aiming at deviation of theoretical prediction and experimental results, key factors (such as feature missing, unreasonable parameter searching range and the like) causing inaccurate prediction are difficult to locate, and iterative optimization of a model and an experimental scheme cannot be realized. Therefore, a substance iteration exploration method for constructing a theoretical model-a recommended algorithm-experimental scheduling-a result analysis closed-loop mechanism is needed to solve the defects of the prior art, such as theory and experimental fracture and no closed-loop iteration, improve the prediction precision, reduce blind experiments and shorten the exploration period of a substance science space. Disclosure of Invention Aiming at the defects of the prior art, the invention provides a substance iteration exploration method and system based on theory and experiment, which solve the problems of theory and experiment fracture and no closed loop iteration defect in the prior art. In order to achieve the aim, the invention is realized by the following technical scheme that the material iterative exploration method based on theory and experiment comprises the following steps: And constructing a material science simulation model based on the initial parameters and the optimization instructions, predicting the material performance and the structure, outputting a prediction result, and receiving an optimization strategy to update the parameters and simulation logic of the material science simulation model. And screening and sequencing the prediction results through a multi-objective optimization algorithm in combination with experimental feasibility constraint to generate recommended experimental parameters. Scheduling experiment resources, performing experiments and collecting experimental process data based on recommended experiment parameters. And carrying out data alignment, error tracing and feature mining on the experimental process data and the prediction result to generate an optimization strategy. Further, the initial parameters include material structure data and experimental condition parameters, wherein the material structure data is converted into feature vectors through a feature engineering module, and the experimental condition parameters are used as supplementary features after being subjected to standardization processing. The optimization instruction is used as a super-parameter constraint for training a material science simulation model. The material science simulation model is composed of a plurality of basic algorithms. Further, when the material science simulation model adopts a parallel input mode, the process of constructing the material science simulation model based on the initial parameters and the optimization instructions, predicting the material performance and the structure, and outputting the prediction result is as follows: The feature vectors and the supplemental features are preprocessed to generate an input matrix that is interpretable by the model.