CN-121009174-B - Intelligent medical question-answering system and method based on mixed retrieval and lightweight reordering
Abstract
The application provides an intelligent medical question-answering system and method based on mixed retrieval and lightweight reordering, which are applied to the technical field of medical data processing. The application extracts five types of core entities through a preset Chinese medical NER model, and forms a structured knowledge base through relation extraction modeling association. Analyzing the Chinese inquiry of the user, extracting medical entity, identifying four kinds of requirements and converting the four kinds of requirements into semantic vectors. And adopting PubMedBERT coding medical documents to generate vector storage, executing BM25 and vector retrieval in parallel when receiving inquiry, and fusing by RRF algorithm to generate candidate documents. And generating a scoring data set based on the prompt template, and training the lightweight model to sort and output the evidence set. The reduced context is generated by search-sort-compress in combination with query semantics, integrating FlashAttention optimization computations. And generating a structured report by utilizing the optimized U-Net segmented image, integrating the multi-mode information, and generating a precise answer considering the image and medical knowledge through LLM reasoning.
Inventors
- LU ZHIHAO
- WANG FENGYUAN
- LI YUHAO
- Geng Shiqin
- HAO ZHE
Assignees
- 北京肿瘤医院(北京大学肿瘤医院)
Dates
- Publication Date
- 20260512
- Application Date
- 20250813
Claims (8)
- 1. An intelligent medical question-answering method based on mixed retrieval and lightweight reordering is characterized by comprising the following steps: Acquiring multisource authoritative Chinese medical data, query text input by a Chinese medical user, a Chinese medical knowledge base document and a Chinese clinical scene medical image uploaded by the user; extracting five types of core Chinese medical entities based on a preset Chinese medical NER model, and constructing a Chinese medical knowledge graph containing standard names and Chinese alias attributes through relation extraction modeling entity association to form a structured knowledge base center; Extracting four types of core requirements including medical entities and identifying illness consultation from the Chinese query of the user, and converting the Chinese query into semantic vectors by utilizing a fine-tuning Chinese medical Sentence-BERT model to realize cross-expression semantic equivalent mapping in the Chinese medical field; The method comprises the steps of encoding Chinese medical documents through PubMedBERT to generate vectors and storing, parallel executing BM25 keyword retrieval and vector semantic retrieval when receiving Chinese inquiry, fusing the obtained results to generate a primary candidate document list through RRF algorithm, taking the accuracy and semantic relevance of the Chinese keywords into consideration, constructing a text encoding and storing basic framework based on PubMedBERT model, generating an initial encoding storage framework, inputting preprocessed text data into the initial encoding storage framework, extracting and modeling the entities and the association relations through entity recognition algorithm and relation extraction algorithm, generating intermediate knowledge data, performing attribute expansion and integration on the intermediate knowledge data, generating a Chinese medical knowledge map, storing the Chinese medical knowledge map into a special database, generating a structured knowledge base center, parallel executing BM25 sparse retrieval and vector database approximate nearest neighbor semantic retrieval when receiving Chinese medical user inquiry, and generating a primary candidate document list through reciprocal ordering fusion algorithm; generating a Chinese medical question-answer correlation scoring dataset based on the structured prompting template, training a lightweight model to score and sort candidate documents, outputting a Chinese medical field evidence set, and improving evidence accuracy; By combining with the Chinese query semantics of the user, generating a simplified context through a search-sequencing-compression pipeline, optimizing Attention calculation by integrating a Flash Attention technology, and reducing the memory occupation of Chinese long text processing; the medical image focus is segmented by utilizing an optimized U-Net model, a structured Chinese visual report is generated, the report, the Chinese questions of the user and the retrieval knowledge are integrated to form a multi-mode context, and answer information considering both image characteristics and Chinese medical knowledge is generated through LLM reasoning.
- 2. The method of claim 1, wherein extracting five types of core chinese medical entities based on a preset chinese medical NER model, associating by a relationship extraction modeling entity, constructing a chinese medical knowledge graph comprising standard names, chinese alias attributes, forming a structured knowledge base hub, comprising: integrating the multisource authoritative Chinese medical data and preprocessing the multisource authoritative Chinese medical data to generate preprocessed text data; An entity extraction frame is constructed based on a preset Chinese medical NER model, and an entity identification model is generated, wherein the preset Chinese medical NER model extracts five types of core Chinese medical entities including disease entities, symptom entities, medicine entities, inspection entities and treatment scheme entities from the pretreatment text; inputting the preprocessed text data into an entity identification model, and generating entity association data by combining a relation extraction model, wherein the relation extraction process is used for defining the semantic relation among entities through medical logic association among modeling entities; constructing an initial Chinese medical knowledge graph by taking the entities as nodes and the relationship among the entities as edges, and generating a basic graph structure, wherein the nodes contain core information of the entities, and the edges mark specific association types among the entities; and performing attribute expansion and optimization on the basic atlas structure to generate a structured knowledge base center, wherein standard names, definitions and Chinese alias attributes of nodes are expanded, error association in the atlas is corrected, the integrity and accuracy of the knowledge atlas are improved, and support is provided for subsequent semantic disambiguation and answer generation.
- 3. The method of claim 1, wherein extracting four types of core appeal including medical entity and identifying condition consultation from user chinese query, converting chinese query into semantic vector using fine-tuning chinese medical Sentence-BERT model, implementing chinese medical field cross-expression semantic equivalent mapping, comprising: Preprocessing a Chinese query text input by a user to generate a standardized query text, wherein the preprocessing comprises word segmentation, stop word removal and special symbol cleaning of the text, so that the normalization of the query text is ensured, and a foundation is laid for subsequent entity extraction and intention recognition; Constructing a medical entity extraction framework based on the NER model, and generating an entity recognition tool, wherein the tool is focused on extracting medical entities from standardized query texts, and accurately capturing core medical elements in the query; inputting standardized query text into an intention classifier to generate a core appeal label, wherein the classification process identifies four types of core appeal according to text semantics, including illness state consultation, medication consultation, concept query and treatment scheme query, and the specific purpose of user query is clarified; Constructing a semantic vector conversion framework based on a fine-tuned Chinese medical Sentence-BERT model, and generating a vector conversion tool, wherein the tool takes a complete query or an 'entity+intention' reconstruction statement as input, converts the complete query or the 'entity+intention' reconstruction statement into a high-dimensional dense semantic vector, and realizes mathematical expression of semantics; The cross-expression semantic equivalent mapping is realized through vector space calculation, and a semantic association result is generated, wherein the consistent understanding of the semantic equivalent Chinese medical query is ensured through calculating cosine similarity of different query vectors, and the processing capacity of the system on diversified expressions is improved.
- 4. The method of claim 1, wherein generating reduced context through a search-sort-compress pipeline in combination with user chinese query semantics, the integrating FlashAttention technique optimizing attention calculations, reducing chinese long text processing memory footprint, comprises: Carrying out semantic association analysis and preprocessing on the Chinese query text of the user and the retrieved Chinese long text data to generate association preprocessing data, wherein the association preprocessing data comprises semantic alignment and redundant information screening on the text, and sorting the retrieval results according to relevance; Constructing a simplified context generation framework based on a retrieval-sequencing-compression pipeline to generate an initial processing framework, wherein the initial processing framework comprises a retrieval module, a sequencing module and a compression module, the retrieval module is used for matching related texts, the sequencing module is used for sequencing according to the relevance, and the compression module is used for extracting core information; Inputting the associated preprocessing data into an initial processing frame, and carrying out priority evaluation on the text fragments by utilizing a user query semantic guidance and lightweight reordering model to generate an intermediate simplified text, wherein a relevance scoring mechanism and an information density evaluation index are adopted in the evaluation process so as to preserve high-value content; And performing attention calculation optimization on the intermediate text integration FlashAttention technology to generate a target simplified context, wherein in the optimization process, memory occupation during Chinese long text processing is reduced through video memory access optimization and calculation kernel fusion, and processing efficiency is improved.
- 5. The method of claim 4, wherein segmenting medical imaging lesions using an optimized U-Net model and generating a structured chinese visual report, integrating the report, user chinese questions and search knowledge to form a multi-modal context, generating answer information that accounts for both imaging features and chinese medical knowledge via LLM reasoning, comprising: Preprocessing medical image data and associated text information to generate preprocessed image data and a structured text template, wherein the preprocessing comprises the steps of performing focus area enhancement and standardized size adjustment on the image, and performing format unification and field specification on a visual report template; constructing an optimized medical image segmentation model structure based on a U-Net framework to generate an initial segmentation model, wherein the initial segmentation model comprises an encoder, a decoder and jump connection, the encoder is used for extracting image focus features, the decoder is used for positioning focus areas, and the jump connection is used for fusing different-level features; Inputting the preprocessed image data into an initial segmentation model, and performing iterative training on model parameters by using a back propagation algorithm and a loss function to generate a trained segmentation model, wherein a dice loss function and a cross entropy loss function are adopted in the training process, so that the segmentation precision of the model on a focus region is improved; applying the trained segmentation model to the medical image to generate a structured Chinese visual report; integrating the structured Chinese visual report, the Chinese problem of the user and the retrieved medical knowledge to generate multi-mode context data, wherein semantic alignment and redundancy removal are carried out on the information in the integration process; The multimodal context data is entered into the LLM and answer information is generated using an attention mechanism and a generative decoding strategy.
- 6. An intelligent medical question-answering device based on hybrid retrieval and lightweight reordering for implementing the method of claim 1.
- 7. An electronic device, comprising: and a memory for storing executable instructions of the first processor; Wherein the first processor is configured to perform the hybrid retrieval and lightweight reordering based intelligent medical question-answering method of any one of claims 1-5 via execution of the executable instructions.
- 8. A computer readable storage medium having stored thereon a computer program, wherein the computer program when executed by a second processor implements the hybrid retrieval and lightweight reordering based intelligent medical question-answering method according to any one of claims 1-5.
Description
Intelligent medical question-answering system and method based on mixed retrieval and lightweight reordering Technical Field The invention relates to the technical field of medical data processing, in particular to an intelligent medical question-answering system and method based on mixed retrieval and lightweight reordering. Background In recent years, artificial intelligence technology, particularly a Large Language Model (LLM), has been increasingly used in the medical health field. The technical framework represented by the retrieval enhancement generation (RETRIEVAL-AugmentedGeneration, RAG) provides a new solution for scenes such as clinical decision support, medical knowledge question-answering and the like by combining the generation capability of an external knowledge base and LLM. However, the prior art still faces many challenges when applied to highly specialized and stringent chinese medical fields. First, natural Language Processing (NLP) of chinese medical text itself has inherent difficulties such as word segmentation ambiguity, new word utterances, spoken expressions, and huge differences between terms of art. Secondly, the rapid iteration and mass growth of medical knowledge put high demands on the construction, maintenance and real-time calling of a knowledge base. More critical is the validity of the standard RAG framework highly dependent on the quality of the retrieval phase. If the retrieved document snippets have a low signal-to-noise ratio, i.e., contain irrelevant, outdated, or even erroneous information, these "toxic" contexts will severely mislead the LLM, resulting in what appears to be reasonable and false "illusion" content. In the high risk area of medical treatment, such errors may have serious consequences. In addition, processing of ultra-long text containing information such as complex course descriptions and multiple examination reports is a common technical bottleneck faced by current LLM, meanwhile clinical diagnosis and treatment not only depends on text information, but also highly depends on interpretation of medical images (such as CT and MRI), and the general lack of the capability of processing and understanding visual mode information of the existing question-answering system has the problems of high calculation cost and forgetting information ("lost in the middle"), so that the deep application of the system in a real clinical scene is limited. Therefore, how to optimize the retrieval and information processing flow in the RAG framework for the chinese medical scenario, improve the correlation and accuracy of the retrieved content, suppress the model illusion, and process the long text information with high efficiency is a technical problem to be solved in the art. It should be noted that the information disclosed in the above background section is only for enhancing understanding of the background of the present disclosure and thus may include information that does not constitute prior art known to those of ordinary skill in the art. Disclosure of Invention The application aims to provide an intelligent medical question-answering system and method based on mixed retrieval and lightweight reordering, which at least overcome the problems in the prior art to a certain extent, and the application does not adopt a single retrieval mode, but designs a set of precise assembly line. The method comprises the steps of carrying out complementary recall through BM25 and vector retrieval, balancing keyword accuracy and semantic universality through a reciprocal ordering fusion (RRF) algorithm, and carrying out refined scoring and ordering on candidate evidences through a special lightweight reordering model. The architecture can maximally improve the context signal-to-noise ratio of the input LLM, and is a key guarantee for suppressing model illusion. Another innovation of the present application is the way in which the reordering model is trained. The powerful understanding capability of a large language model (such as GPT-4) is ingeniously utilized, massive high-quality (problem, document and relevance score) marking data are generated through prompt engineering (PromptEngineering), and then a special reordering model with small parameter quantity and high reasoning speed is trained by using the data. The method gives consideration to marking quality and training efficiency, and realizes accurate reordering with low cost and high performance. In order to solve the problem of long text processing, the application proposes to accurately extract the core information fragment most relevant to the problem from massive texts by utilizing the intelligent compression technology of query perception, and remarkably compress the context scale. Then, on the compressed reduced context, a hardware-aware efficient attention algorithm such as FlashAttention is applied. The huge overhead of LLM processing redundant information is fundamentally avoided, and the high-efficiency analysis of