EP-4742251-A1 - DEVICE AND METHOD FOR MEASURING RELIABILITY OF MOLECULAR STRUCTURE PREDICTION MODEL
Abstract
A system, a computer program, a device, and a method for measuring confidence of a molecular structure prediction model. The method includes obtaining a first molecular structure image, obtaining a first molecular structure graph using the molecular structure prediction model, performing image rendering on the first molecular structure image based on the first molecular structure graph, and determining confidence of the first molecular structure graph based on the image rendering result and the first molecular structure graph.
Inventors
- KIM, Jiye
- JO, Yeonsik
- LEE, Soonyoung
Assignees
- LG Management Development Institute Co., Ltd.
Dates
- Publication Date
- 20260513
- Application Date
- 20250124
Claims (15)
- A system for measuring the confidence of a molecular structure prediction model, comprising: a memory storing one or more instructions; and at least one processor configured to execute the one or more instructions stored in the memory, wherein: the at least one processor, by executing the one or more instructions, obtains a first molecular structure image; obtains a first molecular structure graph determined using the molecular structure prediction model; performs image rendering on the first molecular structure image based on the first molecular structure graph; and determines the confidence of the first molecular structure graph based on the image rendering result and the first molecular structure graph.
- The system of claim 1, wherein: the at least one processor identifies at least one of a first component and a second component based on the first molecular structure graph; identifies a first portion corresponding to the first component in the first molecular structure image; identifies a second portion corresponding to the second component in the first molecular structure image; and performs the image rendering by distinguishing the first portion and the second portion using different markings; and each of the first component and the second component includes one of a first atom, a second atom, a first bond, and a second bond.
- The system of claim 1, wherein the molecular structure prediction model includes a first learning model trained to extract a chemical table file graph with a molecular structural formula image as input.
- The system of claim 1, wherein the at least one processor outputs the confidence using a second learning model with the image rendering result and the first molecular structure graph as input.
- The system of claim 4, wherein the second learning model includes: an image backbone model configured to extract a feature of the image rendering result; a graph backbone model configured to extract a feature of the first molecular structure graph; a feature concatenation unit configured to concatenate the feature of the image rendering result and the feature of the first molecular structure graph; and a linear layer model configured to determine the confidence with an output of the feature concatenation unit as input.
- The system of claim 4, wherein the second learning model is trained to output a first value when the image rendering result matches the first molecular structure graph and to output a second value when the image rendering result does not match the first molecular structure graph.
- The system of claim 1, wherein the graph of the molecular structure with the confidence equal to or greater than a predetermined level is stored in a database.
- A method for measuring the confidence of a molecular structure prediction model, performed by at least one processor, comprising: obtaining a first molecular structure image; obtaining a first molecular structure graph using the molecular structure prediction model; performing image rendering on the first molecular structure image based on the first molecular structure graph; and determining the confidence of the first molecular structure graph based on the image rendering result and the first molecular structure graph.
- The method of claim 8, wherein: the performing of the image rendering on the first molecular structure image includes: identifying at least one of a first component and a second component based on the first molecular structure graph; identifying a first portion corresponding to the first component in the first molecular structure image; identifying a second portion corresponding to the second component in the first molecular structure image; and performing the image rendering by distinguishing the first portion and the second portion using different markings; and each of the first component and the second component includes one of a first atom, a second atom, a first bond, and a second bond.
- The method of claim 8, wherein the molecular structure prediction model includes a first learning model trained to extract a chemical table file graph with a molecular structural formula image as input.
- The method of claim 8, wherein the determining of the confidence of the first molecular structure graph includes outputting the confidence of the first molecular structure graph using a second learning model with the image rendering result and the first molecular structure graph as input.
- The method of claim 11, wherein the second learning model includes: an image backbone model configured to extract a feature of the image rendering result; a graph backbone model configured to extract a feature of the first molecular structure graph; a feature concatenation unit configured to concatenate the feature of the image rendering result and the feature of the first molecular structure graph; and a linear layer model configured to determine the confidence with an output of the feature concatenation unit as input.
- The method of claim 11, wherein the second learning model is trained to output a first value when the image rendering result matches the first molecular structure graph and to output a second value when the image rendering result does not match the first molecular structure graph.
- The method of claim 8, wherein the graph of the molecular structure with the confidence equal to or greater than a predetermined value is stored in a database.
- A program stored on a computer-readable recording medium to execute the method of any one of claims 8 to 14 on a computer.
Description
[Technical Field] Embodiments of the invention relate generally to a device and a method for measuring the confidence of a molecular structure prediction model, and more specifically, the invention provides convenience by providing the confidence in the result when a molecular structure prediction model provides a predicted molecular structure. [Background Art] A structural formula is a graphical representation of a chemical structure or a molecular structure and may show how atoms are arranged in a three-dimensional space. The structural formula may clearly or implicitly indicate chemical bonds of a molecule. In particular, unlike a molecular formula that has a limited number of symbols and may only provide limited descriptions, the structural formula may provide geometric information of the molecular structure. For example, isomers having the same molecular formula but different atomic structures or arrangements may be represented. In various documents, papers, patents, etc., the structural formulas are often provided in the form of images. However, unlike text, images are difficult to search, making it difficult to find documents that include the corresponding structural formula. Accordingly, various methods for searching images such as the structural formula are being developed. Models for extracting the structural formulas by analyzing images are mainly used to create academic databases, and when incorrect data is included in such academic databases due to erroneous predictions, it becomes a critical drawback for research. Accordingly, there is a need for a method that provides confidence information about predicted structural formulas to determine which predicted structural formulas should be regarded as reliable information and stored in a database. [Disclosure] [Technical Problem] Embodiments of the invention provide a method and a device in which a model that predicts a molecular structure using an image provides a confidence score together when predicting the molecular structure. [Technical Solution] One embodiment of the present disclosure may provide a device and a method for measuring the confidence of a molecular structure prediction model. According to one or more embodiments of the invention, a system for measuring the confidence of the molecular structure prediction model is provided. The system includes a memory storing one or more instructions, and at least one processor configured to execute the one or more instructions stored in the memory. The at least one processor, by executing the one or more instructions, may obtain a first molecular structure image, obtain a first molecular structure graph determined using the molecular structure prediction model, perform image rendering on the first molecular structure image based on the first molecular structure graph, and determine the confidence of the first molecular structure graph based on the image rendering result and the first molecular structure graph. In one embodiment, the at least one processor may identify at least one of a first component and a second component based on the first molecular structure graph, identify a first portion corresponding to the first component in the first molecular structure image, identify a second portion corresponding to the second component in the first molecular structure image, and perform the image rendering by distinguishing the first portion and the second portion using different markings. Each of the first component and the second component may include one of a first atom, a second atom, a first bond, and a second bond. In one embodiment, the molecular structure prediction model may include a first learning model, wherein the first learning model may be trained to extract a chemical table file graph with a molecular structural formula image as input. In one embodiment, the at least one processor may output the confidence using a second learning model with the image rendering result and the first molecular structure graph as input. In one embodiment, the second learning model may include an image backbone model configured to extract a feature of the image rendering result, a graph backbone model configured to extract a feature of the first molecular structure graph, a feature concatenation unit configured to concatenate the feature of the image rendering result and the feature of the first molecular structure graph, and a linear layer model configured to determine the confidence with an output of the feature concatenation unit as input. In one embodiment, the second learning model may be trained to output a first value when the image rendering result matches the first molecular structure graph and to output a second value when the image rendering result does not match the first molecular structure graph. In one embodiment, the graph of the molecular structure with the confidence equal to or greater than a predetermined level may be stored in a database. According to yet another embodiment of the in