CN-122019794-A - Medical literature citation link tracking method and system
Abstract
The invention particularly discloses a medical literature citation link tracking method and a medical literature citation link tracking system, relates to medical health information processing, and solves the problems of large citation network scale, data isomerism, incomplete metadata analysis and low tracking efficiency in clinical evidence-based decision and drug safety monitoring. The method comprises the steps of collecting heterogeneous medical documents, normalizing, marking evidence-based medical evidence grades, analyzing a reference document list, complementing DOI/PMID metadata, constructing a reference topological network of a graph database, utilizing MeSH tree-shaped hierarchical structure cluster nodes and extracting PICO elements, calculating reference strength by integrating reference frequency, position weight, context motivation, evidence grade and PICO similarity, screening key paths, supporting tracing and diffusion tracking, dynamically monitoring withdrawal events and adverse drug reaction signals, and updating real-time early warning through streaming increment. The invention realizes visual tracking and risk assessment of medical literature quotation relations and provides efficient and accurate evidence tracing support for clinical evidence-based decisions and drug safety assessment.
Inventors
- CHEN JIALEI
- YANG MEIYAN
- ZHANG MINGWEI
- ZHANG YEMING
- Guo Chenyan
- HUANG SHIQI
- WANG ZHENGYANG
Assignees
- 福建医科大学附属第一医院
- 福清黼黻文章数字科技有限公司
Dates
- Publication Date
- 20260512
- Application Date
- 20260416
Claims (8)
- 1. A method for tracking a medical document cited link, comprising the steps of: Step one, collecting heterogeneous medical literature data, carrying out normalization processing, and marking evidence-based medical evidence grades; Step two, analyzing the reference list through a structural analysis engine to extract the leading element data and complement the deletion identifier, and extracting PICO elements; Step three, constructing a medical literature reference relation topological network, wherein node attributes comprise subject classification codes, PICO element vectors, disease codes and medicine codes which are mapped based on a medical subject vocabulary MeSH tree-level structure, and clustering nodes by utilizing the medical subject vocabulary MeSH tree-level structure; Step four, integrating the quotation frequency, the position weight, the quotation motivation, the evidence grade weight factor and the PICO element similarity as clinical relevance adjusting coefficients, and calculating a quotation intensity score; fifthly, carrying out link tracking based on the reference strength score, constructing a clinical evidence link tracking model and visualizing; And step six, dynamically monitoring abnormal links and early warning, and evaluating the withdrawal influence and the drug safety risk.
- 2. The method for tracking a medical document cited link according to claim 1, wherein the step two of performing deep analysis on the normalized document data includes: Integrating a sequence labeling algorithm based on deep learning, and automatically identifying author, title, periodical name, volume number, option number and page number fields in a medical quotation character string by adopting a two-way long-short-term memory network in combination with a conditional random field model; In the record that the DOI or the PMID cannot be directly extracted, fuzzy matching is executed by calling a search interface of an external authority index library and utilizing the identified quotation key field, similarity verification is carried out on a matching result and an original quotation character string, and when the similarity exceeds a preset threshold value, reverse complement of missing DOI or PMID is executed; When the full text document is a scanned image or the text cannot be directly extracted, extracting the text content by adopting an optical character recognition technology, and recognizing the logic structure of the document by utilizing a paragraph segmentation algorithm based on rules so as to finish reference list positioning and PICO element extraction; mapping disease names, drug names and operation names appearing in the literature to a standard concept unique identifier CUI by using a medical term normalization module based on a unified medical language system UMLS so as to eliminate entity ambiguity caused by synonyms and abbreviations; For non-English medical documents, an integrated machine translation engine is called to translate key fields into target languages uniformly before metadata extraction is performed, cross-language medical reference link association is achieved, and evidence link integrity among clinical research documents of different languages is ensured.
- 3. The method of claim 1, wherein the method of calculating the integrated reference strength score in the fourth step comprises: Counting the total frequency of the target quotation mentioned in the text, and generating a frequency parameter; dividing the text into an introduction part, a method part, a result part, a discussion part and a conclusion part according to the general logic structure of the medical paper, and respectively giving corresponding preset position weight coefficients to the parts; Performing standing polarity analysis on sentences where the quotations are located by using a preset medical reference standing dictionary, judging whether the quotation attribute is a supporting quotation, a neutral quotation or a questionable quotation, and correspondingly giving an incentive adjustment coefficient; According to the evidence-based medical evidence grade marked by the cited document, giving an evidence grade weight factor, and calculating PICO element similarity between the cited document and the cited document as a clinical relevance adjusting coefficient; Introducing a quoting half-life parameter to consider the timeliness of medical knowledge, and executing link weakening treatment on documents with long publication time and gradually weakening quoting strength; And carrying out combined calculation on the frequency parameters, the position weight coefficient, the motivation adjusting coefficient, the evidence grade weight factor and the clinical relevance adjusting coefficient, and generating a quantized comprehensive reference strength score by combining an academic impact index and a time attenuation factor of the cited document.
- 4. The method according to claim 1, wherein the process of constructing the multidimensional clinical evidence chain tracking model in the fifth step comprises: Providing a bidirectional tracking mode, wherein the bidirectional tracking mode comprises a tracing mode for tracing knowledge sources upwards and a diffusion mode for tracing research evolution and clinical transformation downwards; When a tracking model is constructed, a heuristic search algorithm is adopted, and branch paths with reference strength scores higher than a preset score threshold are preferentially expanded; Introducing a time window limiting parameter, and limiting and displaying a clinical evidence evolution state in a specific time period according to a user instruction; the clinical evidence chain tracking topological graph has multi-level interaction characteristics, and supports checking metadata, abstract, evidence-based medical evidence grade, PICO elements and clinical evidence contribution index of the literature in the link through click nodes; When detecting that the reference link spans different medical disciplines, automatically marking a cross-discipline intersection based on the level distance of the nearest public ancestor node in the MeSH tree-shaped hierarchical structure, analyzing the pushing effect of the intersection on discipline fusion, extracting key experimental conclusion and clinical evidence, and marking the disease type, the drug category or the diagnosis and treatment technology related to the intersection; when there are literature nodes in the trace path reporting opposite outcomes for the same clinical problem, the literature nodes are labeled as clinical evidence dispute nodes and their respective sample size, study design type, and outcome indicators are extracted for comparison by clinical decision makers.
- 5. A medical literature cited link tracking system serving clinical evidence-based decision support and drug safety signal monitoring, comprising: The clinical literature collection and evidence grade marking module is used for accessing a plurality of medical literature databases through a distributed crawler architecture or an application programming interface, wherein the medical literature databases comprise at least one of a biomedical literature index database, a clinical test registration database or a medicine supervision literature database, acquiring a heterogeneous medical literature data set containing at least one of a clinical research report, a medicine test record, a disease diagnosis and treatment literature or a medicine adverse reaction report, and performing normalization processing of deduplication, field normalization and format conversion on the heterogeneous medical literature data set so as to eliminate literature data format differences among different platforms and ensure that time fields are unified as standard time stamps; The PICO element extraction and quotation element data analysis module is used for extracting PICO elements from the abstract of each document based on a medical structured abstract format, wherein the PICO elements comprise patient group characteristics, intervention measures, a comparison scheme and clinical outcome indexes, meanwhile, a structured analysis engine is called to analyze normalized document data, a pattern matching algorithm is used for positioning a reference document area, DOI, PMID, clinical test registration number, publication year, author, title, periodical name, volume number, option number and page number information in the reference document are extracted one by one to construct a structured quotation element data object, and when the DOI or PMID is absent, fuzzy matching and verification are executed by calling an external authority index library interface by using a quotation key field to complete a missing identifier; The medical evidence network construction module is used for mapping the structured quotation metadata objects into a graph database, taking single documents as nodes, taking quotation relations as directed edges, establishing a medical document quotation relation topological network covering a full quantity of samples, and distributing a global unique identifier for each node, wherein the node attributes comprise discipline classification codes, evidence-based medical evidence grades, PICO element vectors and disease codes and drug codes related to documents based on medical subject list MeSH tree-level structure mapping; The evidence-based medical evidence weight evaluation module is used for extracting the distribution characteristics of the quotation in the document text, analyzing paragraph labels or coordinate information of the document to determine specific chapters of the quotation, giving differentiated position weight coefficients to the quotation of different chapters according to a preset position weight table, identifying quotation motivations by combining the medical semantic characteristics of sentences in which the quotation is positioned, superposing evidence-based medical evidence grade weight factors and PICO element similarity on basic quotation scores to serve as clinical relevance adjustment coefficients, and calculating to obtain comprehensive quotation strength scores of each directed edge for quantifying the clinical evidence support degree; The clinical evidence chain tracking and visualizing module is used for receiving a tracking request instruction of clinical evidence-based query or drug safety tracking, recursively tracing or downstream diffusion tracking along the directed edge by taking a target literature node as a starting point, identifying an evidence propagation path crossing a disease pedigree or crossing pharmacological categories by utilizing the upper and lower relationship of the MeSH tree-shaped hierarchical structure in the path traversal process, screening a key evidence path with a reference strength score higher than a preset threshold value, constructing a multi-dimensional clinical evidence chain tracking model, dynamically rendering a topological graph by utilizing a layout algorithm, reflecting literature influence by node size, reflecting reference strength by connecting line thickness, and providing a visual rendering result for a clinician, a drug supervisor or a diagnosis and treatment guideline formulator; The medical document quotation relation topology network is dynamically maintained by an incremental update strategy, the node states in a link are monitored in real time according to a preset medical quotation logic rule, when the state change of a core node is detected, a link reconstruction mechanism is triggered, an early warning prompt is sent to a user terminal, when a document withdrawal event is detected, the potential influence range of the document withdrawal on a downstream clinical research conclusion and related diagnosis and treatment guidelines is evaluated based on the matching relation between the MeSH tree hierarchy structure and PICO elements, a clinical risk evaluation report comprising the affected disease type, the related medicine name and the related diagnosis and treatment guideline number is generated, and when the document of the document quotation of the new medicine quotation adverse reaction is detected, the medicine safety signal early warning is triggered, and a clinical and medicine supervision user is notified.
- 6. The medical document cited link tracking system according to claim 5, wherein the clinical document collection and evidence grade marking module performs fingerprint extraction on a document title and an author list by using a hash algorithm in the process of acquiring data, and eliminates repeatedly recorded document records by comparing fingerprint information; The PICO element extraction and introduction element data analysis module has an author disambiguation function, comprehensively analyzes the affiliated mechanism, partner network, research field consistency and publishing history of an author by establishing an author feature vector space, judges the consistency of an author entity, and solves node confusion caused by homonymy and heteronymy or one person and more names; the disease-drug dimension medical evidence network construction module adopts an attribute graph model to store a medical literature reference relation, and the directed edge records reference directions and stores referenced hierarchical depth information and physical coordinate information of the reference in a text; the graph database adopts a slicing technology, document nodes of different medical disciplines are distributed on different storage nodes based on top discipline classification of the MeSH tree-shaped hierarchical structure, and topology distances among the nodes are synchronously calculated and stored when directed edges are established.
- 7. The system according to claim 5, wherein the withdrawal impact assessment and drug safety signal early warning module is integrated with an anomaly identification algorithm based on link topology characteristics, and identifies medical academic unterminal behaviors or clinical data faking risks and generates anomaly diagnostic reports by analyzing whether there is a cyclic reference, excessive self-reference or abnormal increase of reference quantity in a specific time in the reference link; the manuscript removal influence assessment and drug safety signal early warning module supports user-defined early warning rules, sets a specific monitoring threshold according to user requirements, and pushes an analysis report to a user when the strength of the core reference to be guided is reduced to reach a preset proportion threshold in a preset time period; The manuscript withdrawal influence assessment and drug safety signal early warning module is internally provided with a state monitor, the state monitor periodically synchronizes the manuscript withdrawal or correction state of documents in an external database, when detecting that a core node in a link withdraws manuscript, the position of the core node in all the cited links is automatically identified, the influenced downstream documents are risk marked, and meanwhile, the potential influence range of clinical conclusions related to the PICO elements and the MeSH codes of the manuscript withdrawal documents on the current diagnosis and treatment guide or drug use scheme is assessed based on the PICO elements and the MeSH codes of the manuscript withdrawal documents, so that a guide-level risk conduction report is generated.
- 8. The medical document cited link tracking system according to claim 5, further comprising a clinical evidence-based decision-making auxiliary interface for receiving a disease name, a drug name or a clinical problem description input by a user, automatically mapping input contents to corresponding MeSH subject words and PICO element frames, positioning core document links of the related fields, extracting sample size, research design type, curative effect index, adverse reaction rate and evidence-based medical evidence grade in the documents, outputting an evidence abstract according to the evidence grade from high to low, and providing hierarchical data support for clinical path formulation or scientific research and choice questions and discussion; the evidence-based medical evidence weight assessment module adopts an attention mechanism to capture core verbs and adjectives in the quoted sentences when calculating the motivation adjustment coefficient, and identifies vocabulary with medical academic directions to adjust the weight; When the clinical evidence chain tracking and visualizing module renders a large-scale network, a progressive loading strategy is adopted, a core link is displayed in advance, secondary branches are dynamically loaded along with user interaction, and interaction smoothness in a large-data-volume environment is ensured; when the clinical document collection and evidence grade marking module processes massive heterogeneous data, a distributed message queue is adopted to conduct task scheduling, and data acquired by all collection nodes are pushed to a unified cleaning center to execute normalization mapping.
Description
Medical literature citation link tracking method and system Technical Field The invention belongs to the technical field of medical health information processing, and particularly relates to a medical literature citation link tracking method and system. Background In the field of current medical science research, with the deep research of academic and the rapid development of medical technology, the output scale of medical literature presents explosive growth situation, and a global quotation network with complicated structure is formed. The quotation relation of the medical literature not only records the evolution track of scientific discovery, but also is an important basis for evaluating scientific research value, tracking knowledge sources and guaranteeing academic integrity, and through the association and excavation of massive literature data, the discipline front dynamic state can be effectively revealed, clinical evidence-based decision can be assisted, and the method has important application value in the scenes of medicine curative effect evaluation, diagnosis and treatment guide updating, adverse reaction signal detection and the like. The medical literature quotation link tracking method and the system are used as core technical means for supporting academic evaluation and knowledge tracing, and the basic principle is that a link model capable of reflecting logical inheritance relations among documents is constructed through accurate identification and association of literature quotation metadata. The technology aims at realizing the extension from a single literature node to a multidimensional reference network, thereby providing a clear, visual and time-dimensional knowledge propagation path map for scientific researchers. The traditional medical document tracking technology has the obvious defects that firstly, the traditional scheme focuses on specific negative scenes such as manuscript withdrawal or misleading, the tracking logic of the traditional scheme is excessively dependent on specific manuscript withdrawal time nodes, the application range is too narrow, normal and wide medical reference behaviors are difficult to systematically cover, secondly, part of technology focuses on content analysis based on semantic similarity, mainly focuses on influence diffusion of a content layer rather than explicit reference relations in strict sense, due to the fact that a great number of non-referenced related documents are mixed in tracking results due to the lack of deep structural analysis capability of key metadata such as reference document lists, digital object identifiers and medical document index numbers, accuracy and speciality of a reference link are seriously affected, and thirdly, when the traditional system processes massive dynamically updated document data, a high-efficiency link reconstruction mechanism and multi-layer visual expression capability are lacked, and the evolution full view of medical knowledge is difficult to be accurately reflected in real time. In addition, the prior art generally lacks the perception capability of evidence-based medical evidence level, cannot distinguish the differentiated value of system reviews and case reports in knowledge propagation, and cannot introduce PICO element matching mechanisms to evaluate the clinical relevance of reference relationships, so that tracking results are difficult to directly serve clinical evidence-based decisions. Due to the existence of the problems, the existing literature tracking means cannot meet the urgent demands of medical scientific research on high-quality, high-precision and full-chain knowledge tracing, and an intelligent tracking innovation scheme capable of integrating evidence-based medical methodologies and deep structural processing is urgently required to be developed. Disclosure of Invention The invention aims to overcome the defects of the prior art, and provides a medical literature quotation link tracking method and a system, which are used for solving the problems of large scale of a medical quotation network, heterogeneous data sources, incomplete quotation metadata analysis, insufficient link tracking efficiency and the like. Aiming at the technical problems that a quotation network structure is complex, a traditional tracking technology depends on a specific negative scene, metadata deep structuring resolving capability is insufficient, a link reconstruction mechanism is low-efficiency and the like caused by the rapid increase of a medical document output scale, the method and the device realize the accurate identification, deep association and multi-level visual tracking of a medical document quotation link by constructing a medical document knowledge graph containing multidimensional metadata and combining a high-precision structuring resolving engine and a quotation intensity evaluation model, and provide systematic evidence chain support for clinical evidence-based decision-making, medicin