CN-122021855-A - Knowledge graph construction method and system based on Xinan medical science of warming disease
Abstract
The invention discloses a knowledge graph construction method and a system based on Xin 'an medical science and temperature science, which particularly relate to the technical field of traditional Chinese medicine knowledge graphs, and are characterized in that new' an medical science and temperature science books and medical records are collected, illness state descriptions and prescription fragments are subjected to term standardization according to medical record numbers to construct an alignment text data set, the co-occurrence frequency of illness and prescription terms is counted through a sliding window, an illness-prescription candidate associated edge set is generated through conditional probability calculation, a pathogenesis stage sequence of illness state description fragments is extracted according to medical record time sequence, a pathogenesis stage conversion candidate sequence is established, medicine addition and subtraction differential vectors between adjacent prescription fragments are calculated, matching degree calculation is carried out between the medicine addition and subtraction differential vectors and the pathogenesis stage sequence, hidden evidence conversion data are generated, and finally a hidden evidence chain candidate knowledge graph is constructed, and link consistency constraint is applied to form a dominant hidden evidence chain knowledge graph. The invention improves the accuracy and the interpretability of the traditional Chinese medicine knowledge graph in the clinical application of epidemic febrile disease.
Inventors
- LIU LIUQING
- GUO JINCHEN
- Yang Qinjun
- ZHANG XIAOJUN
- SHI XIAOYU
- Chen Yuzhuang
Assignees
- 安徽中医药大学
Dates
- Publication Date
- 20260512
- Application Date
- 20260414
Claims (10)
- 1. The knowledge graph construction method based on Xinan medical science of warming disease is characterized by comprising the following steps: S1, acquiring electronic texts of new Anwining medical books and medical cases, aligning a disease description segment and a prescription segment according to medical case numbers, and carrying out term standardization to obtain an aligned text data set; S2, based on the aligned text data set, counting the co-occurrence frequency of the disease terms and the prescription terms and calculating the conditional probability as the side weight to obtain a disease-prescription candidate association side set; S3, extracting a pathogenesis stage sequence in the illness state description fragment according to the medical case time sequence based on the aligned text data set, and generating a pathogenesis stage conversion candidate sequence; S4, calculating the matching degree of the drug addition and subtraction difference vector between adjacent prescription fragments and the pathogenesis stage transformation candidate sequence based on the aligned text data set to obtain hidden evidence transformation evidence data; S5, fusing the disease-prescription drug candidate association edge set, the disease-mechanism stage transformation candidate sequence and the hidden evidence transformation evidence data, and constructing a directional weighted graph structure to obtain a hidden evidence chain candidate knowledge graph; and S6, applying link consistency constraint to the hidden evidence chain candidate knowledge graph, and outputting a hidden evidence chain dominant knowledge graph.
- 2. The method for constructing a knowledge graph based on Xinan medical science of warm disease according to claim 1, wherein S1 specifically comprises the following steps: Acquiring electronic texts of new Anwining medical classbooks and medical cases, analyzing the medical case numbers, and extracting a disease description text segment and a prescription text segment corresponding to the medical case numbers; Establishing a corresponding relation between the illness state description text segment and the prescription text segment according to the medical records number to form an illness state description segment and a prescription segment; And performing word segmentation, part-of-speech tagging and term normalization on the illness description fragment and the prescription fragment, and mapping the illness term and the prescription term into uniform term identification to obtain an aligned text data set.
- 3. The method for constructing a knowledge graph based on Xinan medical science of warm disease according to claim 2, wherein S2 specifically comprises: based on the aligned text data set, reading a disease term sequence and a prescription term sequence corresponding to the unified term identification for each disease description fragment and prescription fragment; traversing the disease terms according to a preset sliding window in the disease term sequence, counting the co-occurrence frequency matched with the prescription term sequence in the sliding window and counting the occurrence frequency of the disease terms; And calculating conditional probability according to the co-occurrence frequency and the occurrence frequency to be used as an edge weight, and generating a disease-prescription drug candidate association edge set.
- 4. The knowledge graph construction method based on Xinan doctor' S temperature pathology according to claim 3, wherein S3 specifically comprises: based on the aligned text data set, arranging the illness state description fragments according to the medical case number sequence to form an illness state description fragment sequence; based on the disease description fragment sequence, sequentially extracting the pathogenesis stage terms in each disease description fragment according to the unified term identification to form a pathogenesis stage sequence; performing difference comparison on adjacent pathogenesis stage terms in the pathogenesis stage sequence, and determining the change position of the pathogenesis stage terms; dividing the pathogenesis stage sequence into continuous pathogenesis stage term fragments according to the pathogenesis stage term change positions, recording the sequence relation of adjacent continuous pathogenesis stage term fragments, and generating a pathogenesis stage conversion candidate sequence.
- 5. The method for constructing a knowledge graph based on Xinan medical science of warm disease according to claim 4, wherein S4 specifically comprises: based on the aligned text data set, extracting prescription term sequences in each prescription segment according to the unified term identification, and arranging the prescription segments according to the medical case number sequence to form the prescription term sequences; Respectively comparing the prescription drug term differences of adjacent prescription segments in the prescription drug term sequence, and calculating a drug addition-subtraction differential vector corresponding to the prescription drug term differences; And calculating the matching degree of the drug addition and subtraction difference vector and the transformation candidate sequence at the pathogenesis stage, and generating hidden evidence transformation evidence data.
- 6. The method for constructing a knowledge graph based on Xinan medical warm disease science according to claim 5, wherein the implicit syndrome transformation evidence data comprises all effectively matched medicine addition and subtraction differential vectors, corresponding pathogenesis stage term change positions and matching degrees.
- 7. The method for constructing a knowledge graph based on Xinan medical science of warm disease according to claim 6, wherein S5 specifically comprises: Based on the disease-prescription candidate association edge set, taking a disease term and a prescription term as syndrome nodes and prescription nodes; based on the pathogenesis phase transformation candidate sequence, taking the pathogenesis phase term as a pathogenesis phase node; And determining the connection relation and the corresponding side weight between the nodes according to the hidden evidence transformation evidence data, constructing a directed weighted graph structure comprising the evidence nodes, the prescription nodes and the pathogenesis stage nodes, and generating a hidden evidence chain candidate knowledge graph.
- 8. The method for constructing a knowledge graph based on Xinan medical science and temperature pathology according to claim 5, wherein the link consistency constraint comprises a pathogenesis layer unidirectional constraint, a medical case time sequence consistency constraint and a prescription difference consistency constraint.
- 9. The method for constructing a knowledge graph based on Xinan medical science of warm disease according to claim 8, wherein S6 specifically comprises: Determining a one-way constraint of a pathogenesis hierarchy between nodes of a pathogenesis stage according to a pathogenesis stage term sequence of a pathogenesis stage conversion candidate sequence; determining medical case time sequence consistency constraint according to medical case number sequence of the disease description fragment sequence and the prescription drug term sequence; determining prescription difference consistency constraint according to the implicit syndrome transformation evidence data; and applying a pathogenesis layer unidirectional constraint, a medical case time sequence consistency constraint and a prescription difference consistency constraint to the hidden evidence chain candidate knowledge graph to generate a hidden evidence chain dominant knowledge graph.
- 10. A new-safety-doctor-home-temperature-study-based knowledge graph construction system for realizing the new-safety-doctor-home-temperature-study-based knowledge graph construction method as set forth in any one of claims 1 to 9, comprising: The text normalization module is used for acquiring electronic texts of new Anwining medical classbooks and medical cases, aligning a illness description segment and a prescription segment according to medical case numbers, and carrying out term normalization to obtain an aligned text data set; The association calculation module is used for counting the co-occurrence frequency of the disease terms and the prescription terms based on the aligned text data set and calculating the conditional probability as the side weight to obtain a disease-prescription candidate association side set; the pathogenesis sequence module is used for extracting a pathogenesis stage sequence in the illness state description fragment according to the medical case time sequence based on the aligned text data set and generating a pathogenesis stage conversion candidate sequence; The difference matching module is used for calculating the matching degree of the drug addition and subtraction difference vector between adjacent prescription fragments and the pathogenesis stage transformation candidate sequence based on the aligned text data set to obtain hidden evidence transformation evidence data; The map construction module is used for fusing the disease-prescription drug candidate association edge set, the disease-stage transformation candidate sequence and the hidden evidence transformation evidence data, constructing a directed weighted map structure and obtaining a hidden evidence chain candidate knowledge map; and the constraint optimization module is used for applying link consistency constraint to the hidden evidence chain candidate knowledge graph and outputting a hidden evidence chain dominant knowledge graph.
Description
Knowledge graph construction method and system based on Xinan medical science of warming disease Technical Field The invention relates to the technical field of traditional Chinese medicine knowledge maps, in particular to a knowledge map construction method and system based on Xinan medical science of warming disease. Background The Xinan doctor's stomach warming theory is taken as an important branch of traditional Chinese medicine, records a great deal of clinical diagnosis and treatment experience taking medical records as cores, is usually embodied in the form of disease description and corresponding prescriptions, and relates to complex association of disease terms, prescription terms and pathogenesis stages. However, in the prior art, the research on the new An medical science of warming diseases is mostly carried out in a simple text finishing or manual classification summarization stage, the deep excavation of the hidden association relationship among symptoms, prescription and pathogenesis stage is lacking, the internal rule of the evolution of the symptoms and signs of the diseases is difficult to be revealed from a large amount of medical case data, and the popularization and application of the theory of warming diseases in clinical practice are restricted. In order to solve the above problems, a technical solution is now provided. Disclosure of Invention In order to overcome the above-mentioned drawbacks of the prior art, embodiments of the present invention provide a method and a system for constructing a knowledge graph based on Xinan medical science of warm disease to solve the problems set forth in the above-mentioned background art. In order to achieve the above purpose, the present invention provides the following technical solutions: a knowledge graph construction method based on Xinan medical science of warming disease includes the following steps: S1, acquiring electronic texts of new Anwining medical books and medical cases, aligning a disease description segment and a prescription segment according to medical case numbers, and carrying out term standardization to obtain an aligned text data set; S2, based on the aligned text data set, counting the co-occurrence frequency of the disease terms and the prescription terms and calculating the conditional probability as the side weight to obtain a disease-prescription candidate association side set; S3, extracting a pathogenesis stage sequence in the illness state description fragment according to the medical case time sequence based on the aligned text data set, and generating a pathogenesis stage conversion candidate sequence; S4, calculating the matching degree of the drug addition and subtraction difference vector between adjacent prescription fragments and the pathogenesis stage transformation candidate sequence based on the aligned text data set to obtain hidden evidence transformation evidence data; S5, fusing the disease-prescription drug candidate association edge set, the disease-mechanism stage transformation candidate sequence and the hidden evidence transformation evidence data, and constructing a directional weighted graph structure to obtain a hidden evidence chain candidate knowledge graph; and S6, applying link consistency constraint to the hidden evidence chain candidate knowledge graph, and outputting a hidden evidence chain dominant knowledge graph. In a preferred embodiment, S1 is specifically: Acquiring electronic texts of new Anwining medical classbooks and medical cases, analyzing the medical case numbers, and extracting a disease description text segment and a prescription text segment corresponding to the medical case numbers; Establishing a corresponding relation between the illness state description text segment and the prescription text segment according to the medical records number to form an illness state description segment and a prescription segment; And performing word segmentation, part-of-speech tagging and term normalization on the illness description fragment and the prescription fragment, and mapping the illness term and the prescription term into uniform term identification to obtain an aligned text data set. In a preferred embodiment, S2 is specifically: based on the aligned text data set, reading a disease term sequence and a prescription term sequence corresponding to the unified term identification for each disease description fragment and prescription fragment; traversing the disease terms according to a preset sliding window in the disease term sequence, counting the co-occurrence frequency matched with the prescription term sequence in the sliding window and counting the occurrence frequency of the disease terms; And calculating conditional probability according to the co-occurrence frequency and the occurrence frequency to be used as an edge weight, and generating a disease-prescription drug candidate association edge set. In a preferred embodiment, S3 is specifically: based on the aligned t