Search

CN-121983282-A - Method and system for discovering diabetes specific potential drugs based on interpretable semantic reasoning

CN121983282ACN 121983282 ACN121983282 ACN 121983282ACN-121983282-A

Abstract

The invention provides a method and a system for discovering potential drugs for diabetes based on interpretable semantic reasoning, and relates to the technical field of semantic reasoning of drug discovery. The method comprises the steps of extracting entity relations among clinical guidelines, electronic medical records and medical documents, constructing a diabetes-specific disease semantic knowledge graph comprising entities such as diseases, complications, medicines, targets, passages and the like, establishing a disease process stage label set and a disease process sequence template library, giving a disease process stage label to related entities to form a labeled knowledge graph, generating a multi-hop candidate reasoning path on the graph by taking the diabetes disease entities as the end points, calculating path contribution degree according to disease process sequence consistency and evidence fields, and polymerizing a plurality of consistency paths of the same medicine to obtain candidate scores, so that candidate medicine ordering and interpretive reasoning chain output accompanying the disease process stage labels are realized, and the clinical consistency and interpretive of potential medicine discovery results are improved.

Inventors

  • YANG SHUO
  • ZHOU ZEKUN
  • LI QIN
  • ZHOU WENJIA

Assignees

  • 中国中医科学院中医药信息研究所

Dates

Publication Date
20260505
Application Date
20260123

Claims (10)

  1. 1. The method for discovering the potential drug for the diabetes based on the interpretable semantic reasoning is characterized by comprising the following steps: performing entity relation extraction on the clinical guideline, the electronic medical record and the medical document to form a triplet set containing a drug entity, a target entity, a pathway entity, a diabetes entity and a complications entity, and constructing a diabetes-specific semantic knowledge graph based on the triplet set; constructing a disease course stage label set and a disease course sequence template library, wherein the disease course stage label set comprises a disease pathogenesis stage label and a complications stage label; According to the disease course sequence template library, a disease course stage label is given to the diabetes entity, the complication entity and the target entity and the path entity which have association relations with the diabetes entity or the complication entity in the triplet set, so as to obtain a labeled knowledge graph; Determining a candidate drug entity set in the labeled knowledge graph, and searching each candidate drug entity in the candidate drug entity set by taking the diabetes entity as an inference terminal point to obtain a multi-hop candidate inference path set; Analyzing each multi-hop candidate reasoning path in the multi-hop candidate reasoning path set to obtain a disease course stage sequence, matching the disease course stage sequence with the disease course sequence template library, removing paths which do not meet preset consistency conditions to obtain a consistency path set, and calculating the path contribution degree of each consistency path in the consistency path set based on a relation reliability parameter and disease course sequence consistency; And aggregating the path contribution degrees corresponding to the same candidate drug entity to obtain candidate scores, outputting candidate drug sorting results according to the candidate scores, and outputting at least one consistent path with the highest contribution and the disease course stage labels corresponding to the entities in the consistent path as an interpretable semantic reasoning chain.
  2. 2. The method for developing latent diabetes based on interpretable semantic reasoning according to claim 1, wherein performing entity relation extraction on clinical guidelines, electronic medical records and medical documents to form a triplet set including a drug entity, a target entity, a pathway entity, a diabetes disease entity and a complication entity, and constructing a diabetes-specific semantic knowledge graph based on the triplet set, comprises: constructing a diabetes specific disease entity dictionary and a relation type set, wherein the diabetes specific disease entity dictionary covers standard names and synonymous names of a drug entity, a target entity, a pathway entity, a diabetes disease entity and a complication entity; performing entity identification and entity standardization on the clinical guideline, the electronic medical record and the medical document based on the diabetes specific disease entity dictionary to obtain an entity set; and executing relation extraction on the entity set based on the relation type set, generating entity pairs and relations, and combining the entity pairs and the relations to form the triplet set.
  3. 3. The method for finding potential diabetes mellitus-specific drugs based on interpretable semantic reasoning according to claim 1, wherein the forming manner of the triplet set comprises: The relationship type identifier at least comprises one or more of a treatment action relationship, an action targeting relationship, a path participation relationship and a complication evolution relationship; assigning an evidence field to each triplet in the triplet set, wherein the evidence field is used for representing that the triplet is derived from at least one of the clinical guideline, the electronic medical record and the medical document; and performing consistent processing on the repeated triples according to the evidence field to obtain the triples after de-duplication.
  4. 4. The method for discovering potential drugs for diabetes based on interpretable semantic reasoning as set forth in claim 1, wherein constructing a disease course stage label set and a disease course sequence template library comprises: Constructing a label item set of a pathogenesis stage label and a label item set of a complications stage label, and forming the disease stage label set based on the label item set of the pathogenesis stage label and the label item set of the complications stage label; Constructing a stage sequence template set, wherein the stage sequence template is a template sequence formed by the disease course stage labels in sequence; Configuring a constraint permission rule for each stage sequence template in the stage sequence template set, wherein the constraint permission rule is used for limiting the precedence relation between adjacent stages and cross-stages in the template sequence; And forming the stage sequence template set and the permission sequence constraint rule into the disease course sequence template library.
  5. 5. The method for discovering potential diabetes specific drugs based on interpretable semantic reasoning according to claim 1, wherein the step of assigning a disease stage label to the diabetes disease entity, the complication entity, the target entity and the path entity which have association relations with the diabetes disease entity or the complication entity in the triplet set according to the disease process sequence template library to obtain a labeled knowledge graph comprises the steps of: Assigning corresponding disease stage labels to the diabetes disease entity and the complications entity according to the disease stage label set; Searching the target entity and the path entity which have association relation with the diabetes disease entity or the complications entity in the triplet set, and respectively endowing the target entity and the path entity with the disease stage label corresponding to the diabetes disease entity or the complications entity which have association relation with the target entity or the path entity in the triplet set or with the adjacent stage label conforming to the permission constraint rule; And combining the entity with the label after the completion of the disease course stage with the triplet set to construct the labeled knowledge graph.
  6. 6. The method for finding a diabetes mellitus-specific potential drug based on interpretable semantic reasoning according to claim 1, wherein determining a candidate drug entity set in the labeled knowledge graph, and searching each candidate drug entity in the candidate drug entity set to obtain a multi-hop candidate reasoning path set by taking the diabetes mellitus entity as a reasoning endpoint, comprises: performing reachability retrieval in the labeled knowledge graph by taking the diabetes entity as a target node to obtain a drug entity in connection with the diabetes entity, and forming the obtained drug entity into the candidate drug entity set; Performing path search on each candidate drug entity in the candidate drug entity set by taking the candidate drug entity as a starting point and taking the diabetes entity as an inference terminal point to obtain the multi-hop candidate inference path set formed by sequentially connecting entities with a relation; and executing path deduplication and cyclic path rejection on the multi-hop candidate inference path set to obtain the multi-hop candidate inference path set for subsequent analysis.
  7. 7. The method for finding a diabetes mellitus-specific potential drug based on interpretable semantic reasoning according to claim 4, wherein resolving each multi-hop candidate reasoning path in the multi-hop candidate reasoning path set to obtain a disease course stage sequence and matching the disease course stage sequence with the disease course sequence template library, and removing paths which do not meet a preset consistency condition to obtain a consistency path set comprises: Reading the disease stage labels of the entities in the path along each multi-hop candidate reasoning path; The read disease course stage labels are formed into the disease course stage sequence according to the path sequence; matching the disease course stage sequence with the stage sequence template set to obtain a matching result; setting the preset consistency condition to simultaneously meet the conditions that the disease process stage sequence meets the allowable sequence constraint rule and the same disease process stage labels are allowed to continuously appear in the disease process stage sequence; judging whether the preset consistency condition is met according to the matching result, and forming paths meeting the preset consistency condition into the consistency path set.
  8. 8. The method for finding a potential drug for diabetes based on interpretable semantic reasoning according to claim 1, wherein calculating the path contribution of each consistent path in the consistent path set based on the consistency of a relationship reliability parameter and a disease course sequence, the relationship reliability parameter being used for characterizing the reliability of the relationship in the path, comprises: Determining the relation reliability parameter of each relation in the consistency path based on the evidence field of the triples contained in the consistency path in the triples set; determining the course sequence consistency of the consistency path based on a matching result of the course stage sequence corresponding to the consistency path and the allowable precedence constraint rule; And calculating the path contribution degree of the consistency path based on the consistency of the relation reliability parameter and the course sequence according to a preset path contribution degree calculation rule.
  9. 9. The method for finding a potential drug for diabetes based on interpretable semantic reasoning according to claim 1, wherein aggregating the path contribution degrees corresponding to the same candidate drug entity to obtain a candidate score, outputting a candidate drug ordering result according to the candidate score, and outputting at least one consistent path with the highest contribution and the disease stage label corresponding to each entity in the consistent path as an interpretable semantic reasoning chain, comprising: performing aggregation on the path contribution degree in the consistent path set corresponding to the same candidate drug entity to obtain the candidate score of the candidate drug entity; sorting the candidate drug entity sets according to the candidate scores to obtain the candidate drug sorting results; and for each candidate drug entity, selecting at least one consistency path with the highest path contribution degree from the corresponding consistency path set, and outputting the disease course stage label corresponding to each entity in the consistency path to form the interpretable semantic reasoning chain.
  10. 10. An interpretive semantic reasoning-based diabetes specific potential drug discovery system, comprising: The entity relation extraction and knowledge graph construction unit is used for performing entity relation extraction on the clinical guideline, the electronic medical record and the medical document to form a triplet set containing a drug entity, a target entity, a pathway entity, a diabetes entity and a complications entity, and constructing a diabetes-specific semantic knowledge graph based on the triplet set; The disease course semantic modeling unit is used for constructing a disease course stage label set and a disease course sequence template library, wherein the disease course stage label set comprises a disease course stage label and a complication stage label; the disease course label giving unit is used for giving a disease course stage label to the diabetes entity, the complication entity, the target entity and the path entity which have association relations with the diabetes entity or the complication entity in the triplet set according to the disease course sequence template library, so as to obtain a labeled knowledge graph; The candidate drug path generation unit is used for determining a candidate drug entity set in the labeled knowledge graph, and searching each candidate drug entity in the candidate drug entity set by taking the diabetes entity as an inference terminal point to obtain a multi-hop candidate inference path set; The disease course consistency screening and path contribution calculating unit is used for analyzing each multi-hop candidate reasoning path in the multi-hop candidate reasoning path set to obtain a disease course stage sequence, matching the disease course stage sequence with the disease course sequence template library, removing paths which do not meet preset consistency conditions to obtain a consistency path set, and calculating the path contribution degree of each consistency path in the consistency path set based on a relation reliability parameter and disease course sequence consistency; And the candidate score and interpretable chain output unit is used for aggregating the path contribution degrees corresponding to the same candidate drug entity to obtain candidate scores, outputting candidate drug sequencing results according to the candidate scores, and outputting at least one consistent path with the highest contribution and the disease course stage labels corresponding to the entities in the consistent path as an interpretable semantic reasoning chain.

Description

Method and system for discovering diabetes specific potential drugs based on interpretable semantic reasoning Technical Field The invention relates to the technical field of semantic reasoning of drug discovery, in particular to a method and a system for discovering potential drugs for diabetes based on interpretable semantic reasoning. Background In recent years, medical knowledge maps have been used to structurally represent entities and relationships in clinical guidelines, expert knowledge, electronic medical records, and medical literature and to provide semantic retrieval and reasoning capabilities. For example Shaohui and the like in the "blood sugar management knowledge graph construction and application research of diabetes patients", semantic entities and relations are extracted based on clinical guidelines, expert experiences and hospital electronic medical records, a graph database is utilized to construct a diabetes knowledge graph and carry out application verification, meanwhile She Yajuan and the like in the "diabetes electronic medical record entities and relation annotation corpus construction" are used for building entity and relation classification systems around diabetes electronic medical records and forming annotation corpuses, data bases are provided for the extraction of specific entity relations and the subsequent graph construction, han Pu and the like in the "multi-mode knowledge graph construction method research for Chinese electronic medical records" are further used for providing construction methods for multi-mode data organization of Chinese electronic medical records, and continuous promotion of specific knowledge organization and mapping expression is embodied. In the aspect of potential drug discovery, one of the trends is to integrate multi-source biomedical data into a knowledge graph and realize drug repositioning and evidence output through relationship reasoning on the graph. For example, zhang, an Xinyu and Liu Chunhe in drug knowledge discovery based on multi-source semantic knowledge map, develop drug knowledge discovery research based on multi-source semantic knowledge map based on drug repositioning as evidence, illustrate that the knowledge map is becoming an important data organization form for drug repositioning, and in addition, hou Mengwei and the like in knowledge map research review and application thereof in medical field indicate that the knowledge map is combined with big data technology and deep learning technology, and are pushing application development of medical intelligent semantic retrieval, question-answer and clinical decision support and the like, so that interpretable semantic reasoning and knowledge utilization are becoming important evolution directions. The traditional medicine discovery scheme based on the knowledge graph often uses a link prediction or multi-jump path as an inference carrier to explain an associated path layer which is usually remained on a graph, but diabetes belongs to chronic metabolic diseases and has the staged characteristics of pathogenesis, metabolic abnormality, complication evolution and the like (the application of knowledge graph in health management of diabetes patients, such as happiness and the like, is reviewed in 2025, the health management and knowledge graph application of diabetes mellitus), if the inference path only meets the topological reachability and lacks the constraint of the semantics and the phase sequence of the disease-specific course stage, an explanation chain with the undefined stage or the inconsistent stage sequence easily appears, so that the explanation is difficult to review and reuse by the disease-specific diagnosis and treatment logic, and Hou Mengwei and the like indicate that the medical knowledge graph still has common problems in the aspects of efficiency, limitation condition, expansibility and the like, and the landing difficulty of the disease-specific reasoning can be further amplified. Disclosure of Invention In order to overcome the defects of the prior art, the invention aims to provide a method and a system for discovering potential drugs for diabetes mellitus based on interpretable semantic reasoning, which explicitly introduce a semantic reasoning mechanism in the disease course stage so that the potential drug discovery result has clinical consistency and interpretability in the disease-specific semantic space. In order to achieve the above object, the present invention provides the following solutions: an interpretive semantic reasoning-based method for discovering potential drugs for diabetes mellitus, comprising the following steps: performing entity relation extraction on the clinical guideline, the electronic medical record and the medical document to form a triplet set containing a drug entity, a target entity, a pathway entity, a diabetes entity and a complications entity, and constructing a diabetes-specific semantic knowledge graph based on the triplet