CN-122021825-A - Medical knowledge graph construction method and device
Abstract
The embodiment of the specification provides a method and a device for constructing a medical knowledge graph. In the method, a page in an electronic file of a first medical document is input into a large model, entity data of an entity in the page is identified, the identified entity data is combined into a constructed knowledge graph through semantic similarity matching, the combination comprises taking the entity as a node, a knowledge point as an attribute of the node, and a non-type connection edge between the entity and an associated entity is constructed. And when the entity data corresponding to all pages of the electronic file are combined into the constructed knowledge graph, obtaining the medical knowledge graph of the first medical document. The entity data comprises an entity name, an entity type, an associated entity and a knowledge point related to the entity. Knowledge point text is the original sentence or slightly modified text of the text content in the page, and contains complete and rich association relation information between associated entities. The above construction process requires protection of the private data.
Inventors
- CHEN XIUYUAN
- LIU JINGNAN
- XIAO HANSONG
- Lin Guihu
- SUN ZEWEN
- DENG JIAMING
- LI PEIHAO
- CHEN KEZHONG
- ZHEN SHUAI
- LIU JUNWEI
- CHEN ZHE
Assignees
- 支付宝(杭州)数字服务技术有限公司
- 北京大学人民医院
Dates
- Publication Date
- 20260512
- Application Date
- 20251014
Claims (13)
- 1. A method for constructing a medical knowledge graph comprises the following steps: Inputting a plurality of pages in an electronic file of a first medical document into a large model, and identifying entity data of a plurality of entities in the plurality of pages through the large model, wherein the page comprises text content, any one entity data comprises an entity name, an entity type, an associated entity with an association relation with the entity and a knowledge point related to the entity; carrying out semantic similarity matching on the identified entities and the entities in the constructed knowledge graph; And merging the entity data of the plurality of identified entities into the constructed knowledge graph according to the semantic similarity matching result, wherein the merging comprises taking the entities as nodes and knowledge points as attributes of the nodes, and constructing a connecting edge between the entities and related entities.
- 2. The method of claim 1, the step of entering pages in the electronic file of the first medical document into a large model, comprising: inputting a plurality of pages and a constructed knowledge graph into the large model, and identifying entity data of a plurality of entities in the plurality of pages by using the entities in the constructed knowledge graph as references through the large model.
- 3. The method of claim 2, the step of inputting pages and constructed knowledge-graph into the large model comprising: And inputting a plurality of pages and the association relationship between entity names and entities in the constructed knowledge graph into the large model.
- 4. The method of claim 1, the step of entering pages in the electronic file of the first medical document into a large model, comprising: Inputting a plurality of pages contained in a current identification unit into the large model, wherein all pages contained in the electronic file are split into a plurality of identification units; After merging entity data of the entities identified from the current identification unit into the constructed knowledge-graph, the method further comprises: Updating the next recognition unit to the current recognition unit, and returning to the step of inputting the pages contained in the current recognition unit into the large model.
- 5. The method of claim 1, the step of semantically matching the identified entities with entities in the constructed knowledge-graph comprising: and carrying out semantic similarity matching on the entity names of the identified entities and the entity names of the entities in the constructed knowledge graph.
- 6. The method of claim 1, the step of incorporating the identified entity data of the number of entities into the constructed knowledge-graph, comprising: when the first entity in the plurality of identified entities and the second entity in the constructed knowledge graph are determined to belong to similar entities, combining the associated entity and knowledge points in the entity data of the first entity into the node data of the second entity; when it is determined that the third entity of the identified plurality of entities and any entity in the constructed knowledge-graph do not belong to similar entities, the third entity is directly added to the constructed knowledge-graph as a new node.
- 7. The method of claim 6, the step of adding the third entity as a new node directly into the constructed knowledge-graph, comprising: Determining a knowledge point number corresponding to the knowledge point text of the third entity according to the stored knowledge point text and the corresponding knowledge point number in the shared knowledge point pool, and determining that the knowledge point text of the third entity is stored in the shared knowledge point pool; And adding the determined knowledge point number into the attribute of the new node corresponding to the third entity in the constructed knowledge graph.
- 8. The method of claim 7, the steps of determining a knowledge point number corresponding to the knowledge point text of the third entity and determining that the knowledge point text of the third entity is stored in the shared knowledge point pool, comprising: When the knowledge point text which is the same as the knowledge point text of the third entity exists in the shared knowledge point pool, correspondingly taking the number of the knowledge point text in the shared knowledge point pool as the number of the knowledge point text of the third entity; When the knowledge point text which is the same as the knowledge point text of the third entity does not exist in the shared knowledge point pool, generating the number of the knowledge point text of the third entity based on the existing knowledge point number in the shared knowledge point pool, and correspondingly storing the number and the knowledge point text of the third entity in the shared knowledge point pool.
- 9. The method of claim 1, when obtaining a medical knowledge-graph corresponding to the first medical document, further comprising: According to the query subgraph to be queried, querying in the medical knowledge graph to obtain a query result; and generating a required question and/or answer based on the knowledge points corresponding to the entity and the related entity in the query result.
- 10. The method of claim 1, when obtaining a medical knowledge-graph of the first medical document, further comprising: storing the medical knowledge-graph in a storage space different from that of the second medical document; when the search is needed, a first result is obtained based on the medical knowledge graph of the first medical document, and a second result is obtained based on the medical knowledge graph of the second medical document; And taking the first result, the second result and respectively corresponding provenance information as search results, wherein the provenance information comprises the names of corresponding medical documents.
- 11. A medical knowledge graph construction apparatus, comprising: The system comprises an entity identification module, a treatment module and a treatment module, wherein the entity identification module is configured to input a plurality of pages in an electronic file of a first medical document into a large model, identify entity data of a plurality of entities in the plurality of pages through the large model, wherein the page comprises text content, any one entity data comprises an entity name, an entity type, an associated entity with an association relation with the entity and a knowledge point related to the entity; The data matching module is configured to match the identified entities with semantic similarity of the entities in the constructed knowledge graph; And the data merging module is configured to merge the entity data of the plurality of identified entities into the constructed knowledge graph according to the semantic similarity matching result, wherein the merging comprises the steps of taking the entities as nodes and knowledge points as attributes of the nodes, and constructing a connecting edge between the entities and related entities thereof.
- 12. A computer readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method of any of claims 1-10.
- 13. A computing device comprising a memory having executable code stored therein and a processor, which when executing the executable code, implements the method of any of claims 1-10.
Description
Medical knowledge graph construction method and device The application relates to a method for constructing a medical knowledge graph and a device thereof, which are divisional patent application with the application number 2025114682447 submitted by 2025, 10 and 14. Technical Field One or more embodiments of the present disclosure relate to the field of artificial intelligence medical application technologies, and in particular, to a method and apparatus for constructing a medical knowledge graph. Background With the rapid development of artificial intelligence (ARTIFICIAL INTELLIGENCE, AI) technology, the application of the technology in the medical field is increasingly wide, but the inherent illusion problem of the AI system makes the technology not directly applicable to medical guidance. AI illusions are phenomena in which artificial intelligence systems generate seemingly reasonable but actual erroneous or false information, and are inherent limitations of current AI technology. In medical applications, AI hallucinations may lead to erroneous diagnostic advice or treatment regimens. To ensure the reliability and safety of AI medical applications, a perfect evaluation system needs to be established to verify the accuracy and credibility of the AI system. The medical knowledge graph is a data model for representing knowledge by using graph structures, and consists of entities (nodes) and relations (edges), and can describe concepts, entities and interrelationships thereof in the medical knowledge world. According to the medical knowledge graph, an evaluation set can be quickly constructed, so that the large-scale and comprehensive evaluation requirement for AI medical application is met. The medical knowledge graph contains privacy data, and the privacy data needs to be protected. Whether the medical knowledge graph can accurately reflect semantic knowledge in medical literature influences the quality of an evaluation set to a great extent, and therefore the performance of AI medical application is influenced. Thus, an improved solution is desired that can construct a knowledge graph that more accurately describes medical knowledge in the original medical literature. Disclosure of Invention One or more embodiments of the present disclosure describe a method and apparatus for constructing a medical knowledge graph to construct a knowledge graph that can more accurately restore medical knowledge. The specific technical scheme is as follows. In a first aspect, an embodiment provides a method for constructing a medical knowledge graph, including: Inputting a plurality of pages in an electronic file of a first medical document into a large model, and identifying entity data of a plurality of entities in the plurality of pages through the large model, wherein the page comprises text content, any one entity data comprises an entity name, an entity type, an associated entity with an associated relation with the entity and a knowledge point related to the entity, the entity type only comprises a disease entity type and a treatment method entity type, the knowledge point comprises a first knowledge point text, the first knowledge point text is determined based on the corresponding text content and is used for describing associated relation information between the entity and the associated entity, and the associated relation information comprises the type of the associated relation and at least one of a condition, a reason and a manner of establishment; carrying out semantic similarity matching on the identified entities and the entities in the constructed knowledge graph; Combining the entity data of the plurality of identified entities into the constructed knowledge graph according to the semantic similarity matching result, wherein the combination comprises taking the entities as nodes and knowledge points as attributes of the nodes, and constructing a non-type connecting edge between the entities and related entities; And when the entity data corresponding to all pages of the electronic file are merged into the constructed knowledge graph, the constructed knowledge graph is used as the medical knowledge graph of the first medical document. In one implementation, the step of inputting pages into the large model includes: inputting a plurality of pages and a constructed knowledge graph into the large model, and identifying entity data of a plurality of entities in the plurality of pages by using the entities in the constructed knowledge graph as references through the large model. In one implementation, the step of inputting pages and constructed knowledge-graph into the large model includes: And inputting a plurality of pages and the association relationship between entity names and entities in the constructed knowledge graph into the large model. In one implementation, the step of inputting pages into the large model includes: Inputting a plurality of pages contained in a current identification unit into the