Search

CN-116071554-B - Chemical structure identification method and system

CN116071554BCN 116071554 BCN116071554 BCN 116071554BCN-116071554-B

Abstract

The invention relates to a chemical structure identification method and a system, wherein the method comprises the steps of acquiring a chemical structure image; determining a target value according to a preset image calculation rule and a chemical structure image, determining a model parameter according to a preset parameter comparison table and the target value, determining a pretreatment image according to an image pretreatment model, the chemical structure image and the model parameter, determining a target image according to an image identification model and the pretreatment image, and determining target chemical molecule data according to a chemical molecule construction rule and the target image. The invention has the effect of being able to convert chemical molecular images in electronic documents into machine readable format.

Inventors

  • XU YOUJUN
  • LI HEMIN
  • ZHANG JIANHANG
  • PEI JIANFENG
  • YANG HUA
  • ZHOU JIAHAN
  • HAN NINGSHENG
  • ZHU JINTAO

Assignees

  • 北京英飞智药科技有限公司
  • 苏州四季唤鱼生物科技有限公司

Dates

Publication Date
20260512
Application Date
20230221

Claims (5)

  1. 1. A method of chemical structure identification comprising: Determining a target value according to a preset image calculation rule and the chemical structure image; The method comprises the steps of determining a target numerical value, wherein the chemical structure image comprises a plurality of pixel points, each pixel point corresponds to a pixel value, acquiring target pixel points which are preset values in pixel values corresponding to each row and each column of pixel points, determining the maximum number of the continuous target pixel points in each row and each column according to the target pixel points, removing the maximum number according to an abnormal value removing rule, determining a target number sequence, and calculating the median of the target number sequence, wherein the median is the target numerical value; Determining model parameters according to a preset parameter comparison table and the target values, and determining a preprocessing image according to an image preprocessing model, the chemical structure image and the model parameters; The method comprises the steps of obtaining an image corrosion model and an image expansion model, determining a target corrosion model according to the model parameters and the image corrosion model, determining a target expansion model according to the model parameters and the image expansion model, determining a corrosion image according to the chemical structure image and the target corrosion model, determining a pretreatment image according to the corrosion image and the target expansion model, wherein the image expansion model and the image corrosion model comprise kernel-size parameters, the kernel-size parameters have different values, the image processing effect is different, the corresponding model parameters are called according to the target values, and the kernel-size parameters in the image expansion model and the image corrosion model are set to the values corresponding to the model parameters; Determining target images according to the image recognition model and the preprocessing image, wherein the target images comprise element labels, chemical bond labels and hypertext images, and determining target chemical molecule data according to chemical molecule construction rules and the target images.
  2. 2. The method of claim 1, wherein the step of acquiring the chemical structure image comprises: acquiring an electronic document; determining a document image according to a picture conversion rule and the electronic document; and dividing the document image according to the semantic division model to determine a chemical structure image.
  3. 3. The method of claim 1, wherein determining target chemical molecular data based on chemical molecular building rules and the target image comprises: Identifying the target image according to an image identification model, and determining a hypertext image; Determining a hypertext character string according to the OCR character recognition model and the hypertext image; and combining the element tag, the chemical bond tag and the hypertext character string according to a preset combination rule to form target chemical molecule data.
  4. 4. A method of chemical structure recognition according to claim 2, further comprising, prior to said acquiring the chemical structure image, training the semantic segmentation model: Acquiring a training data set, wherein the training data set comprises a plurality of training images, and the training images comprise text labels and chemical molecule position labels; and inputting the training data set into a preset segmentation model, and determining a semantic segmentation model.
  5. 5. A chemical structure identification system, comprising: An image acquisition module (201) for acquiring a chemical structure image; The image calculation module (202) is used for determining a target value according to a preset image calculation rule and the chemical structure image, wherein the target value comprises a plurality of pixel points, each pixel point corresponds to a pixel value, acquiring target pixel points which are preset values in pixel values corresponding to each row and each column of pixel points, determining the maximum number of the continuous target pixel points in each row and each column according to the target pixel points, removing the maximum number according to an abnormal value removing rule, determining a target number sequence, and calculating the median of the target number sequence, wherein the median is the target value; the parameter determining module (203) is used for determining model parameters according to a preset parameter comparison table and the target values; an image preprocessing module (204) for determining a preprocessed image from an image preprocessing model, the chemical structure image and the model parameters; The method comprises the steps of obtaining an image corrosion model and an image expansion model, determining a target corrosion model according to the model parameters and the image corrosion model, determining a target expansion model according to the model parameters and the image expansion model, determining a corrosion image according to the chemical structure image and the target corrosion model, determining a pretreatment image according to the corrosion image and the target expansion model, wherein the image expansion model and the image corrosion model comprise kernel-size parameters, the kernel-size parameters have different values, the image processing effect is different, the corresponding model parameters are called according to the target values, and the kernel-size parameters in the image expansion model and the image corrosion model are set to the values corresponding to the model parameters; an image recognition module (205) for determining a target image from the image recognition model and the pre-processed image, the target image comprising an element tag and a chemical bond tag; A molecular building block (206) for determining target chemical molecular data based on chemical molecular building rules and the target image.

Description

Chemical structure identification method and system Technical Field The present application relates to the field of image processing technologies, and in particular, to a method and a system for identifying a chemical structure. Background At present, in the fields of chemistry and medicine discovery, a large amount of literature data such as journals, patents and the like exist, the literature data mostly contains a large number of chemical molecule pictures, and when scientific researchers retrieve related data, the paper version data or electronic documents need to be checked, so that analysis and research on the content of related chemical molecules are carried out. However, the current system for managing document materials cannot convert chemical molecular images in electronic documents into machine-readable format for storage, so that the electronic documents cannot be screened and searched through chemical molecules, and the efficiency of scientific research work is affected to a certain extent. Disclosure of Invention In order to solve the problem that chemical molecular images in electronic documents cannot be converted into machine-readable format for storage, the application provides a chemical structure identification method and a system. In a first aspect of the application, a chemical structure identification method is provided. The method comprises the following steps: acquiring a chemical structure image; determining a target value according to a preset image calculation rule and the chemical structure image; Determining model parameters according to a preset parameter comparison table and the target values; determining a preprocessing image according to an image preprocessing model, the chemical structure image and the model parameters; Determining a target image according to the image recognition model and the preprocessing image, wherein the target image comprises an element label, a chemical bond label and a hypertext image; And determining target chemical molecule data according to chemical molecule construction rules and the target image. According to the technical scheme, according to a preset image calculation rule, analysis and calculation are carried out on a chemical structure image, a target value is determined, a model parameter corresponding to the target value is determined according to a parameter comparison table and the target value, pretreatment is carried out on the chemical structure image according to an image pretreatment model, the chemical structure image and the model parameter, a pretreatment image is determined, a target image is determined according to an image identification model and the pretreatment image, and then target chemical molecule data is determined according to a chemical molecule construction rule and the target image. The method has the advantages that the chemical structure image is converted into the target chemical molecular data in the computer readable format, the problem that the chemical molecular image in the electronic document cannot be converted into the machine readable format for storage is solved, the retrieval time of scientific research work in chemical molecular retrieval can be further shortened, and the efficiency of the scientific research work is improved to a certain extent. In one possible implementation, the acquiring the chemical structure image includes: acquiring an electronic document; determining a document image according to a picture conversion rule and the electronic document; and dividing the document image according to the semantic division model to determine a chemical structure image. In one possible implementation manner, the determining the target value according to the preset image calculation rule and the chemical structure image includes: the chemical structure image comprises a plurality of pixel points, and each pixel point corresponds to a pixel value; acquiring target pixel points which are preset values in pixel values corresponding to each row and each column of pixel points; determining the maximum number of continuous target pixel points in each row and each column according to the target pixel points; Removing the maximum number according to an abnormal value removing rule, and determining a target number sequence; and calculating the median of the target number sequence, wherein the median is a target numerical value. In one possible implementation, the determining the preprocessed image according to the image preprocessing model, the chemical structure image, and the model parameters includes: acquiring an image erosion model and an image dilation model; Determining a target corrosion model according to the model parameters and the image corrosion model; determining a target expansion model according to the model parameters and the image expansion model; determining a corrosion image according to the chemical structure image and the target corrosion model; And determining a preprocessing image ac