CN-122003717-A - Modeling of molecular structure generation
Abstract
Methods, computer program products, and computer systems for generating modeling of molecular structures for chemical applications. The method comprises providing training data for training a marker of a generative model on a defined feature space, wherein the training data of the marker comprises a representation of molecular structures and attribute values for each molecular structure, and wherein the generative model outputs generated candidate molecular structures having target attributes. The method includes receiving an evaluation of the generated candidate molecular structure output from the generation model, the evaluation providing a feature representation of the candidate with an evaluation tag. The method generates or updates decision boundary rules based on the evaluation, and applies the decision boundary rules to update the labeled training data.
Inventors
- JENNINGS ED MICHAEL
- KISHIMOTO AKIHIRO
- O. ALCA
- D. Zubariv
- TAKEDA MASASHI
- E. Daley
Assignees
- 国际商业机器公司
Dates
- Publication Date
- 20260508
- Application Date
- 20241003
- Priority Date
- 20231011
Claims (20)
- 1. A computer-implemented method for generating modeling of a molecular structure for chemical applications, the method comprising: Providing labeled training data for training a generative model on a defined feature space, wherein the labeled training data comprises a representation of molecular structures and attribute values for each molecular structure, and wherein the generative model outputs generated candidate molecular structures having target attributes; Receiving an evaluation of the generated candidate molecular structure output from the generation model, wherein the evaluation provides a feature representation of the candidate with an evaluation tag; generating or updating decision boundary rules based on the evaluation, and The decision boundary rules are applied to update the labeled training data.
- 2. The method according to claim 1, comprising: modifying a generation algorithm of the generation model with structural constraints representing the decision boundary rules.
- 3. The method of claim 1 or claim 2, comprising: Features of the candidate feature representations with evaluation labels are passed to a generative model as user specified features to update the generative model.
- 4. A method according to any one of claims 1 to 3, comprising: The training data is modified to represent learned decision boundaries using chemical similarity metrics in the feature space.
- 5. The method of any of the preceding claims, wherein generating a decision boundary rule comprises preparing eigenvalues of conditions for constructing the decision boundary rule, comprising: Preparing class functions for each class in the ontology graph, wherein the class functions check whether the molecule in question belongs to a class, and A list of elements is prepared, wherein each element consists of a numerator, a tag, and a set of values calculated by the class function.
- 6. The method of any preceding claim, wherein receiving an evaluation of the generated molecular structure output receives an evaluation using the ontology feature representation from a subject matter expert to provide an evaluation tag of the candidate representation.
- 7. The method of any one of claims 1 to 5, wherein receiving an evaluation of the generated molecular structure output comprises measuring a predicted attribute value relative to a test attribute value to provide an evaluation tag of a candidate representation in the form of a predicted attribute drift tag.
- 8. The method of claim 7, wherein the test attribute values are obtained from real or simulated experimental data.
- 9. The method of any preceding claim, wherein receiving an evaluation of the generated molecular structure output receives an evaluation using a previously generated decision boundary rule.
- 10. A system for generating modeling of molecular structures for chemical applications, comprising: A processor and a memory configured to provide computer program instructions to the processor to perform the functions of: A training input component for providing training data for training a marker of a generated model on a defined feature space, wherein the training data of the marker comprises a representation of molecular structures and attribute values of each molecular structure, and wherein the generated model outputs generated candidate molecular structures having target attributes; an evaluation component for receiving an evaluation of the generated candidate molecular structure output from the generation model, wherein the evaluation provides a feature representation of the candidate with an evaluation tag; a decision boundary rule component for generating or updating decision boundary rules based on the evaluation, and And the training data updating component is used for applying the decision boundary rule to update the marked training data.
- 11. The system of claim 10, comprising: a model constraint input component for modifying a generation algorithm of the generation model with structural constraints representing the decision boundary rules.
- 12. The system of claim 10 or claim 11, comprising: a model feature updating component for passing features of the candidate feature representation with the evaluation tag to the generative model as user specified features to update the generative model.
- 13. The system of any of claims 10 to 12, wherein the training data updating component is to modify training data to represent learned decision boundaries using chemical similarity metrics in feature space.
- 14. The system of any of claims 10 to 13, wherein the decision boundary rule component comprises a feature value preparation component for preparing feature values for conditions used to construct the decision boundary rule, comprising: Preparing class functions for each class in the ontology graph, wherein the class functions check whether the molecule in question belongs to a class, and A list of elements is prepared, wherein each element consists of a numerator, a tag, and a set of values calculated by the class function.
- 15. The system of any of claims 10 to 14, wherein the evaluation component receives an evaluation of the generated molecular structure output from a subject matter expert receives an evaluation using the ontology feature representation to provide an evaluation tag of the candidate representation.
- 16. The system of any of claims 10 to 14, wherein the evaluation component receives an evaluation of the generated molecular structure output in the form of a predictive attribute drift tag obtained by measuring a predictive attribute value relative to a test attribute value.
- 17. The system of any of claims 10 to 16, wherein the evaluation component receives an evaluation of the generated molecular structure output using previously generated decision boundary rules when available.
- 18. The system of any of claims 10 to 17, comprising a user interface for interaction between a user and the modeling system to provide an evaluation tab.
- 19. The system of any one of claims 10 to 18, wherein the system is incorporated into a molecular discovery accelerator platform comprising a generative model.
- 20. A computer program stored on a computer readable medium and loadable into the internal memory of a digital computer, comprising software code portions, for performing the method steps of any of claims 1 to 9, when said program is run on a computer.
Description
Modeling of molecular structure generation Technical Field The present invention relates to the generative modeling of molecular structures, and more particularly, to generative modeling of decision boundaries with learning for chemical applications. Background The discovery of new molecules has become an important topic for application in many industrial fields involving chemistry, such as pharmaceutical, chemical, food and materials industries. Due to the large number of possible molecules that may be present in the real world, effectively discovering new molecules that meet specific targets (e.g., nitrogen fixation, specific toxicity to semiconductors) is a great challenge. In fact, trial and error experiments, which design possible new structures, synthesize them and evaluate their effectiveness, are not sustainable in view of the required manpower. The artificial material discovery platform aims to accelerate material discovery by providing chemists with artificial intelligence techniques embedded into workflows such as molecular structure generation. These methods have a critical in-loop expert-in-the-loop element to evaluate the quality of molecular structure generation. To date, no method has generated an explicit representation of this human knowledge in the form of decision boundaries. The acceptance rate of the generated model candidates is low at present. Given that the expected number of outputs for future activities is about 10 6, the experimentation to verify these outputs, as well as the manual input to each output, would be very expensive. Disclosure of Invention According to one aspect of the invention, a computer-implemented method for generating modeling of molecular structures for chemical applications is provided, the method comprising providing labeled training data for training a generation model on a defined feature space, wherein the labeled training data comprises a representation of molecular structures and attribute values for each molecular structure, and wherein the generation model outputs generated candidate molecular structures having target attributes, receiving an evaluation of the generated candidate molecular structure output from the generation model, wherein the evaluation provides a feature representation of candidates having evaluation labels, generating or updating decision boundary rules based on the evaluation, and applying the decision boundary rules to update the labeled training data. According to another aspect of the invention there is provided a system for generating modeling of molecular structures for chemical applications, comprising a processor and a memory configured to provide computer program instructions to the processor to perform the functions of a training input component for providing training data for training a marker of a generating model on a defined feature space, wherein the training data of the marker comprises a representation of a molecular structure and attribute values of each molecular structure, and wherein the generating model outputs generated candidate molecular structures with target attributes, an evaluation component for receiving an evaluation of the generated candidate molecular structure output from the generating model, wherein the evaluation provides a candidate feature representation with an evaluation tag, a decision boundary rule component for generating or updating a decision boundary rule based on the evaluation, and a training data updating component for applying the decision boundary rule to update the training data of the marker. According to a further aspect of the invention there is provided a computer program product for generating modeling of molecular structures for chemical applications, the computer program product comprising a computer readable storage medium having program instructions embodied therewith, the program instructions being executable by a processor to cause the processor to provide training data for training a marker of a generation model on a defined feature space, wherein the training data of the marker comprises a representation of a molecular structure and attribute values of each molecular structure, and wherein the generation model outputs generated candidate molecular structures having target attributes, receive an evaluation of the generated candidate molecular structure output from the generation model, wherein the evaluation provides a feature representation of candidates having evaluation labels, generate or update decision boundary rules based on the evaluation, and apply the decision boundary rules to update the training data of the marker. The computer readable storage medium may be a non-transitory computer readable storage medium and the computer readable program code may be executed by a processing circuit. Drawings Embodiments of the present invention will now be described, by way of example only, with reference to the accompanying drawings, in which: FIG. 1 is a flow chart