CN-121980007-A - System evaluation-oriented multi-mode evidence-based medical data extraction method and system

CN121980007ACN 121980007 ACN121980007 ACN 121980007ACN-121980007-A

Abstract

The invention relates to the technical fields of natural language processing, multi-modal question answering, medical literature understanding and system evaluation, in particular to a multi-modal evidence-based medical data extraction method and system for system evaluation, comprising the following steps of (1) constructing a multi-modal evidence-based medical data set, wherein the steps comprise data collection, QA pair generation, QA pair verification and data set division; and (2) extracting and reasoning the multi-modal medical document data, wherein the multi-modal medical document data comprises input receiving, document reconstruction, multi-modal reasoning and answer outputting. The realization system of the method comprises a data acquisition module, a data set construction module, a document reconstruction module, a MoE reasoning module, a training module and a result output module, the medical document data extraction method is efficient and accurate, has strong system practicability and wide application range, and has wide application prospect.

Inventors

ZHANG XIAOBO
FENG RUI
WANG YINGWEN
WANG QING
Ji Changkai
FU WEIJIA
WANG LIBO

Assignees

复旦大学附属儿科医院

Dates

Publication Date: 20260505
Application Date: 20251230

Claims (10)

1. The multimode evidence-based medical data extraction method for system evaluation is characterized by comprising the following steps of: step (1) multi-mode evidence-based medical data set construction: a. Extracting structured research characteristics verified by field experts from Cochrane system evaluation, wherein the structured research characteristics comprise research types, nano-ranking standards of research objects, demographic information, intervention measures and clinical results, carrying out document retrieval in PubMedCentral (PMC) through PMCID numbers of research included in system evaluation, extracting multi-mode data content capable of accessing full-text documents, wherein the multi-mode data content comprises texts, tables and image format tables, the structured HTML tables are converted into CSV files, and the image format tables retain original image formats; Generating QA pairs, namely converting the structural research features extracted in the step (1) into at least one QA pair of choice question type by adopting a GPT-4o model, wherein each QA pair comprises 1 correct option and 3 reasonable interference options, the QA pairs are divided into evidence support type QA (ESQA) and unreported information type QA (UIQA), the answer of ESQA has clear text or form evidence support in multi-modal content, and the answer of UIQA is 'unreported' or 'unexplained'; The QA pair verification is carried out by adopting a double LLM verification mechanism, independently responding to the generated QA pair by adopting a LLaMA model and a Qwen-2.5 model by taking the structural research characteristics as the context, wherein the answers of the two models are consistent and correct and are high-confidence QA pairs and reserved; d. Dividing the verified QA pairs and corresponding multi-mode medical document data into training sets and test sets according to the proportion of 7:3 to form a multi-mode evidence-based medical data set Evi MMQA; step (2) multi-mode evidence-based medical data extraction reasoning: a. receiving multi-mode medical document data to be processed and user inquiry, wherein the user inquiry extracts class problems for evidence related to system evaluation; b. Document reconstruction, namely processing multi-mode medical document data through a query-aware document reconstruction module, extracting text contents of a form and a chart (CSV form directly uses original contents, image types are extracted through OCR), dividing a plain text into blocks according to maximum token limitation, taking the form and the chart as independent units and associating titles and related texts of the form and the chart; c. The multi-modal reasoning comprises the steps of processing reconstructed documents and user queries through a mixed expert (MoE) reasoning module, converting text fragments and queries into text tokens to be embedded, encoding tables and table images into visual tokens and mapping the visual tokens to a text token representation space, and splicing to form an input sequence, inputting the visual tokens into LLM (LLM) containing N Transformer layers, wherein each Transformer layer comprises a multi-head self-attention (MSA) module, a MoE module and a Layer Normalization (LN) module, the MoE module comprises 4 expert MLPs, activating Top-2 experts through a trainable router, weighting and summing and outputting; d. and outputting the answer, namely outputting evidence extraction results corresponding to the user query after being processed by a transducer layer, and outputting unreported or corresponding options by UIQA type query.
2. The multi-modal evidence-based medical data extraction method of claim 1 wherein the parameters of the GPT-4o model in step (1) b are set to be temperature=0, top-p=0.95, maximum token number=1024, and qa pairs are classified into Method, participant, intervention, outcome, context categories by content.
3. The method of claim 1, wherein the shared text encoder in step (2) b is selected from the group consisting of PubMedBERT, sentence-BERT, text-embedding-ada-002 or LLaVA embedded layers, and the pre-set templates comprise text blocks, images of tables/charts, titles and extracted text, forming a multi-modal triplet representation of "image-title-extracted text".
4. The method for extracting multi-modal evidence-based medical data according to claim 1, wherein the training strategy of the MoE reasoning module in the step (2) c is two-stage training, namely, a first stage of fine tuning the pretrained LLM and the multi-modal projector to realize alignment of text and visual modes, a second stage of copying the fine tuned MLP layer weights to initialize the MoE expert, freezing the LLM and the visual encoder, and jointly training the MoE expert and the router.
5. The multi-modal evidence-based medical data extraction method of claim 1 wherein the loss function of the MoE module in step (2) c is: Wherein L CLM is causal language modeling loss, L MoE is MoE load balancing loss, and the calculation formula of λ=0.1 and L MoE is: L is the number of MoE layers, M is the number of experts, T is the sequence length, The tier-t tokens are assigned a gating score to the e-th expert.
6. The method for extracting multi-mode evidence-based medical data according to claim 1, wherein the OCR tool in the step (2) adopts PaddleOCR, the visual encoder in the step (2) is a pre-training SigLIP model, the multi-mode projector is two-layer MLP, and the MoE layer in the transducer layer adopts a sparse distribution strategy, and the MoE layer and the standard dense layer are alternately arranged.
7. The multimode evidence-based medical data extraction system for system evaluation is characterized by comprising a data acquisition module, a data set construction module, a document reconstruction module, a MoE reasoning module, a training module and a result output module, wherein the modules are sequentially in communication connection: (1) The data acquisition module is used for evaluating and extracting structural research characteristics from the Cochrane system, carrying out document retrieval in PubMedCentral (PMC) through PMCID numbers of the incorporated research, acquiring corresponding document text, extracting multi-mode data contents such as texts, tables, table images and the like, and converting and storing the multi-mode data contents according to formats; (2) The data set construction module is used for calling the GPT-4o model to generate a QA pair, and dividing the QA pair and the multi-mode full-text document data into a training set and a testing set through a double LLM verification mechanism and manual auditing and screening of the QA pair to form a multi-mode evidence-based medical data set; (3) The document reconstruction module is used for receiving the multi-mode evidence-based medical data to be processed and the user query, executing text extraction and blocking, correlation retrieval and structural integration operation, and outputting a reconstructed document; (4) The MoE reasoning module is used for receiving the reconstructed document and the user query, executing modal alignment, sequence construction and MoE reasoning operation, and outputting evidence extraction primary results; (5) The training module is used for training the MoE reasoning module by adopting a two-stage training strategy and optimizing the modal alignment effect and the expert routing accuracy; (6) And the result output module is used for carrying out format standardization processing on the preliminary result of the MoE reasoning module and outputting a final evidence extraction result corresponding to the user query.
8. The multi-modal evidence-based medical data extraction system of claim 7 further comprising a storage module for storing the full text of the medical document, the multi-modal evidence-based medical data set, the trained model parameters, the reconstructed document and the data extraction results, wherein the storage module supports the classified storage of text, CSV files, and image format data using a distributed storage architecture.
9. The multi-modal evidence-based medical data extraction system of claim 7 wherein the hardware environment of the system includes a GPU cluster for model training, a 0.5B parametric model training employing 4 stages NVIDIARTX3090 GPUs, a 7B parametric model training employing 4 stages NVIDIARTXH GPU, computing nodes for inferential deployment supporting parallel computation of at least 6144 token context windows, and the software environment of the system includes LLaMA-factor toolkit, LMDeploy toolkit, deepSpeedZeRO-2 optimization framework, paddleOCR tools.
10. The multi-modal evidence-based medical data extraction system of claim 7, wherein the MoE reasoning module comprises a number n=24 of converters layers, the MoE module comprises 4 expert MLPs, the hidden layer dimension of each expert MLP is consistent with the FFN layer of the LLM, and the training parameters of the training module are LoRA fine-tuning learning rate 2×10 -5 , cosine attenuation scheduling, batch size=1, and maximum token number of text block=128.

Description

System evaluation-oriented multi-mode evidence-based medical data extraction method and system Technical Field The invention relates to the technical fields of natural language processing, multi-mode question answering, medical literature understanding and system evaluation, in particular to a multi-mode evidence-based medical data extraction method and an implementation system for service literature system evaluation production, which are applicable to an automatic data extraction scene for system evaluation in evidence-based medicine. Background Systematic evaluation (SRs) provides high-level evidence for clinical decision making by tightly integrating multiple clinical study results. However, along with the exponential growth of medical literature, the manufacturing difficulty of system evaluation is remarkably increased, wherein the data extraction step, namely collecting key information such as research types, crowd characteristics, clinical fatalities and the like, is time-consuming, laborious and prone to error, and becomes a bottleneck for restricting the manufacturing efficiency and quality of the system evaluation. The existing evidence-based medical data extraction technology has three core defects of limited data sets, multiple dependency on document abstracts, adoption of rigid marking frames such as PICO (peripherally inserted central office), neglecting multi-mode contents such as tables, charts and the like, difficulty in matching information requirements of real system evaluation, low processing efficiency, long medical document space, complex structure, strong heterogeneity of multi-mode data contents, large calculation cost, difficult retrieval caused by direct processing of an existing model, insufficient reasoning reliability, easiness in illusion when the model faces unreported information query, and serious influence on accuracy of system evaluation conclusion. Therefore, it is highly desirable to construct a evidence-based medical data extraction method and system for covering multi-mode and supporting diversified problems, which combines the calculation efficiency and the reasoning accuracy, and meets the automatic data extraction requirement of a real system evaluation scene. Disclosure of Invention The technical problem to be solved by the invention is to overcome the defects of the prior art, and provide a multimode evidence-based medical data extraction method for system evaluation, which comprises the following steps: step (1) multi-mode evidence-based medical data set construction: a. retrieving corresponding original medical full text documents in PubMedCentral (PMC) through research IDs cited in system evaluation, screening to obtain documents providing full text access, and extracting full text multi-modal content, wherein the multi-modal content comprises texts, tables and image format tables, the structured HTML tables are converted into CSV files, and the image format tables retain original image formats; Generating QA pairs, namely converting the structural research features extracted in the step (1) into at least one QA pair of choice question type by adopting a GPT-4o model, wherein each QA pair comprises 1 correct option and 3 reasonable interference options, the QA pairs are divided into evidence support type QA (ESQA) and unreported information type QA (UIQA), the answer of ESQA has clear text or form evidence support in multi-modal content, and the answer of UIQA is 'unreported' or 'unexplained'; The QA pair verification is carried out by adopting a double LLM verification mechanism, independently responding to the generated QA pair by adopting a LLaMA model and a Qwen-2.5 model by taking the structural research characteristics as the context, wherein the answers of the two models are consistent and correct and are high-confidence QA pairs and reserved; d. Dividing the verified QA pairs and the corresponding multi-modal full text documents into training sets and test sets according to the proportion of 7:3 to form a multi-modal medical evidence dataset Evi MMQA; step (2) multi-mode evidence-based medical data extraction reasoning: a. receiving multi-mode medical document data to be processed and user inquiry, wherein the user inquiry extracts class problems for evidence related to system evaluation; b. Document reconstruction, namely processing multi-mode medical document data through a query-aware document reconstruction module, extracting text contents of a form and a chart (CSV form directly uses original contents, image types are extracted through OCR), dividing a plain text into blocks according to maximum token limitation, taking the form and the chart as independent units and associating titles and related texts of the form and the chart; c. The multi-modal reasoning comprises the steps of processing reconstructed documents and user queries through a mixed expert (MoE) reasoning module, converting text fragments and queries into text tokens to be embedded,