CN-121980004-A - Medical field academic information intelligent retrieval analysis and traceability system based on AI drive

CN121980004ACN 121980004 ACN121980004 ACN 121980004ACN-121980004-A

Abstract

The invention relates to an AI-driven intelligent medical field academic information retrieval analysis and traceability system, and belongs to the technical field of intelligent medical academic information processing. The system comprises an academic tracking collaboration unit, an information format extraction unit, a depth interpretation rasterization unit, an academic tracing verification unit and an academic tracing verification unit, wherein the academic tracking collaboration unit builds an information priority quantization model based on a multi-source medical information priority ordering algorithm, the information format extraction unit extracts data format characteristics of multi-source medical information through a dynamic format adaptation iteration algorithm and a layering deviation early warning mechanism, the depth interpretation rasterization unit classifies and performs multi-dimensional analysis according to medical sub-fields by adopting an academic granularity weighted aggregation algorithm, and outputs academic granularity analysis results, the academic tracing verification unit receives all-link data and generates a verification report through multi-source comparison, abnormal point intelligent identification and tracing link visualization mechanism. The system effectively improves the information retrieval efficiency, analysis precision and traceability credibility of medical academic through the advanced fusion of the characteristics of the AI technology and the medical field, and provides high-efficiency data support for medical research.

Inventors

CHEN HONGYI
LU LE
ZHENG SISI

Assignees

上海医望网络科技有限公司

Dates

Publication Date: 20260505
Application Date: 20251216

Claims (12)

1. The medical field academic information intelligent retrieval analysis and tracing system based on AI driving is characterized by comprising an academic tracking cooperation unit, an information format extraction unit, a depth interpretation rasterization unit and an academic tracing verification unit; The academic tracking cooperation unit acquires the release characteristics of medical information sources based on a multisource medical information priority ordering algorithm and constructs an information priority quantization model; The information format refining unit establishes a hierarchical deviation early warning mechanism based on a dynamic format adaptation iterative algorithm, and establishes a hierarchical deviation early warning mechanism, monitors a format deviation value to give weight, generates a rule iteration scheme when the deviation value reaches a preset field matching degree threshold value, and performs rule validity check through a preset historical medical academic data sample base to generate data format characteristics of multi-source medical information; The depth interpretation rasterization unit classifies the multi-source medical information according to medical sub-fields by an academic granularity weighted aggregation algorithm, starts corresponding multi-dimensional rasterization analysis aiming at different sub-fields, calculates the historical analysis accuracy and term coverage rate of a large language model on the current sub-fields in a weighted aggregation link, generates professional degree weight, introduces a medical evidence grade ordering mechanism when processing contradiction conclusion, and outputs academic granularity analysis results; And the academic traceability verification unit receives the full-link data, performs multi-source comparison on the academic granularity analysis result based on an AI-driven medical academic traceability verification algorithm, and generates an academic traceability verification report comprising a traceability link map, a core data verification result and an outlier.
2. The system of claim 1, wherein the process of generating the preliminary screening dataset by the academic tracking collaboration unit specifically comprises the steps of associating a preset medical term word stock with a medical field authoritative term database, issuing new terms or revising existing term expressions, triggering a word stock synchronization instruction, importing updated contents into the preset medical term word stock and completing field mapping, constructing a medical semantic similarity matching model based on a pre-trained medical language model, inputting a medical corpus comprising disease alias correspondence, a drug common name and trade name mapping rule and clinical index synonymous expression cases, and generating concept association logic specific to the medical field through iterative training to generate the preliminary screening dataset.
3. The system of claim 1, wherein the information priority quantification model specifically comprises setting a basic value for influence factor weight in combination with professional attribute difference of a journal in a medical sub-field in an initial construction stage, introducing a medical information timeliness grading logic for release time attenuation coefficient, constructing a user field portrait through multidimensional data aiming at user attention field matching degree, collecting clicking, collecting and downloading behaviors of a user on screening results, associating topic clustering results of historical search keywords of the user and research direction marking information of a mechanism to which the user belongs, and calculating the fit degree of the user field and literature topics through semantic matching.
4. The system according to claim 1, wherein the hierarchical deviation early warning mechanism is implemented by firstly carding functional attributes of fields in medical academic information, dividing fields related to authenticity verification of academic data into core fields, dividing auxiliary description type information into common fields, setting weights according to influence degree of the fields on academic analysis, and carrying out real-time early warning on format deviation of the core fields in a follow-up format monitoring process, and summarizing early warning on deviation of the common fields according to a preset period.
5. The system according to claim 1, wherein the information format extraction unit performs the rule validity check through a preset historical medical academic data sample library by firstly screening document samples covering different publishing periods, different academic directions and different periodical levels from the preset historical medical academic data sample library, applying an iterated format extraction rule to the document samples, calculating the extraction accuracy and field integrity of each field by the format extraction rule, and returning to an adjustment iteration scheme if the extraction accuracy and the field integrity do not meet preset academic analysis requirements.
6. The system according to claim 1, wherein the deep interpretation gridding unit classifies the multi-source medical information according to the medical sub-fields by calling a medical subject word list, matching keywords in a document with standard terms in the subject word list, primarily generating a primary medical field to which the document belongs, starting a keyword clustering algorithm based on research direction description, experimental method keywords and research object information in the whole document, further dividing the primary medical field into secondary sub-fields, associating an academic research database of the secondary sub-fields, extracting research directions and common experimental methods, and generating a sub-field feature list.
7. The system according to claim 1, wherein the deep interpretation rasterization unit calculates the analysis capability of the large language model to the current sub-domain by screening document analysis cases which are related to the current sub-domain and have been evaluated by the same row from document analysis verification reports issued by academic communities, comparing the historical output results of the large language model to the document analysis cases with standard conclusions in the verification reports to generate a historical analysis accuracy, constructing a special term library of the current sub-domain, extracting the analysis output text of the large language model to the current sub-domain document, counting the proportion of the number of special terms contained in the text to the total number of terms in the special term library, generating term coverage, and generating the professional degree weight of the large language model in the current sub-domain by combining the historical analysis accuracy and the evaluation result of the term coverage.
8. The system according to claim 1, wherein the medical evidence ranking mechanism introduced by the depth interpretation rasterization unit is implemented by first identifying a study type in a document, generating an evidence type by extracting a study design description in the document, and then assigning corresponding ranking attributes to different types of evidence based on the medical evidence ranking mechanism.
9. The system of claim 1, wherein the deep interpretation gridding unit outputs the academic granularity analysis result by dividing the academic granularity analysis result into a base layer and a core layer according to the importance of information, wherein the base layer information is obtained by directly extracting titles, author names, organizations, document release time and journal names from document data extracted in a format, the core layer information is extracted by a multidimensional gridding analysis mechanism, experimental design schemes, statistical analysis data and clinical application suggestions in the documents are disassembled, and meanwhile, the core layer information is marked with a credibility rating according to academic influence including journals of document sources, compliance of research methods and integrity of the data.
10. The system of claim 1, wherein the academic traceability verification unit manages all-link data by recording operation process data including an operation execution body, operation instruction content and operation execution time stamp in real time when the unit executes an operation, and then establishing an associated index for each link data according to a data flow sequence, wherein the associated index includes a unique identifier of the previous link data, a generation basis of the current link data and a time node of data transfer.
11. The system of claim 1, wherein the academic traceability verification unit integrates a traceability link visualization mechanism, and specifically comprises the steps of constructing a dynamic medical academic traceability knowledge graph based on the full-link data, converting unit operation data into graph nodes and associated edges, wherein the graph nodes comprise data acquisition source identifiers, screening rule IDs, format extraction parameter sets and model calling records, the associated edges mark data circulation directions and conversion relations, and differentiating academic elements from operation process data by adopting differentiated visual identifiers.
12. The system of claim 1, wherein the process of finding the outlier by the academic traceability verification unit specifically comprises dividing the dimension according to the academic information type in a multi-source comparison link to generate data outlier, comparing conclusion trends of different documents on the same research problem through semantic similarity analysis for conclusion expression type information to generate conclusion contradiction outlier, checking consistency of information across platforms for associated information type content to generate associated information outlier, calculating an outlier confidence coefficient for the outlier through an outlier confidence coefficient evaluation mechanism, and incorporating the outlier with the outlier confidence coefficient reaching a preset threshold into the academic traceability verification report.

Description

Medical field academic information intelligent retrieval analysis and traceability system based on AI drive Technical Field The invention belongs to the technical field of intelligent processing of medical academic information, and particularly relates to an intelligent retrieval analysis and tracing system for the academic information in the medical field based on AI driving. Background Along with globalization and interdisciplinary fusion of medical research, academic information in the medical field presents explosive growth situation, sources of the academic information cover multiple channels such as Chinese and foreign language core journals, clinical test registration platforms, medical conference discussion sets, research reports of scientific research institutions and the like, and data formats are remarkably different, so that a serious challenge is brought to medical researchers for efficiently acquiring trusted information. The existing retrieval system is dependent on keyword matching or simple field screening, is not fully adapted to the characteristics of the medical field, for example, the corresponding relation between disease aliases (such as 'myocardial infarction' and 'acute myocardial infarction') and drug common names and trade names cannot be effectively identified, so that the redundant information is high in proportion, meanwhile, the information priority is ordered by a plurality of single reference journal influence factors, the timeliness difference of the medical sub-fields (such as the need of acquiring recent documents in infectious diseases and the need of considering long-term classical achievements in chronic disease research) and the matching degree of the research directions of users are not combined, and high-value information is submerged. The existing format extraction tool mostly adopts a fixed rule template, when a medical information source (such as a periodical website and a database) updates a typesetting format, the extraction rule needs to be manually reconfigured, the response hysteresis is strong, and core fields (such as a clinical experiment registration number, DOI and a statistical method) and common auxiliary fields (such as an author communication address) in medical academic information are not distinguished, so that the core field extraction omission or high error rate is caused, and the reliability of subsequent analysis is directly influenced. The existing analysis system processes medical academic information at the aspects of topic classification and keyword frequency statistics, does not refine analysis dimension according to medical sub-fields (such as oncology targeted therapy and cardiovascular intervention technology), directly processes medical data by adopting a general large language model in model application, does not optimize term coverage rate and analysis accuracy aiming at the sub-fields, and is in face of contradiction conclusion of multi-model output, and the output result is difficult to meet the requirement of medical research on full-chain interpretation of experimental design-statistical data-clinical meaning. The prior system only stores final analysis results or partial intermediate data, does not establish full-link data association of acquisition-screening-format refinement-analysis, cannot trace back the original data source and model parameter adjustment record of a certain conclusion, and does not combine medical common knowledge (such as reasonable range of clinical indexes) with multi-source cross comparison (such as information consistency of a clinical test on different platforms) due to multi-dependency of single data dimension check (such as format error detection) of abnormal point identification, so that academic information authenticity is difficult to verify, and the risk of referencing error data in medical research is increased. The problems cause that a medical researcher spends a large amount of time in screening information, checking data authenticity and integrating contradictory conclusions, so that the core research time is seriously occupied, and the medical research efficiency and the result conversion speed are restricted. Therefore, developing a set of academic information processing system which integrates the characteristics of the AI technology and the deep adaptation medical field and has the capabilities of accurate retrieval, intelligent format adaptation, professional deep analysis and full-link traceability becomes an urgent need in the field of current medical scientific research informatization. Disclosure of Invention In order to solve the problems in the prior art, the invention provides an AI-driven intelligent retrieval analysis and traceability system for academic information in the medical field, The aim of the invention can be achieved by the following technical scheme: The medical field academic information intelligent retrieval analysis and tracing system based on AI driving is cha