CN-117033568-B - Medical data index interpretation method, device, storage medium and equipment

CN117033568BCN 117033568 BCN117033568 BCN 117033568BCN-117033568-B

Abstract

Firstly extracting medical concepts and entity information of target problem texts to be interpreted, which are input by target users, and then constructing candidate subgraphs corresponding to the target problem texts based on the medical concepts and the entity information of the target problem texts by using a preset rule engine and a weak classifier; and then constructing a template text for prompting interpretation according to the candidate subgraph, and further determining an index interpretation result corresponding to the target problem text according to the template text for prompting interpretation. In this way, the medical concept and entity information in the target problem text are firstly extracted, then the candidate subgraph is constructed to explicitly establish the dependency relationship among a plurality of groups of indexes, and the auxiliary reasoning results of the preset rule engine and the weak classifier are fused to perform medical data index interpretation, so that the index interpretation difficulty of a large model can be greatly reduced, and the interpretation accuracy of medical data indexes is improved.

Inventors

HE ZHIYANG
LIU QUAN
HU GUOPING
DU QIANYUN
CHU HUI
HU JIAXUE
ZHAO JINGHE
LU XIAOLIANG
LIU CONG
WEI SI
WANG SHIJIN

Assignees

讯飞医疗科技股份有限公司

Dates

Publication Date: 20260508
Application Date: 20230505

Claims (8)

1. A medical data index interpretation method, comprising: acquiring a target problem text to be interpreted, which is input by a target user, and extracting medical concepts and entity information in the target problem text; linking the traditional Chinese medicine concept and entity information in the target problem text to a knowledge graph by using a preset entity linking technology, and returning triple information within N hops of each candidate entity in the knowledge graph to construct an initial candidate subgraph corresponding to the target problem text by taking each candidate entity in the knowledge graph as a center, wherein N is a positive integer greater than 0; Performing entity disambiguation on the initial candidate subgraph, and performing numerical value normalization on the initial candidate subgraph to obtain a preprocessed candidate subgraph; Based on the standard range and the actual value in the map, a preset rule engine is utilized to carry out normalization and comparison on the standard range and the actual value in the preprocessed candidate subgraph, so that a comparison result is obtained; inputting the comparison result as auxiliary information into a preset weak classifier to classify the numerical values in the preprocessed candidate subgraphs, so as to obtain a classification result; Fusing the classification result and the preprocessed candidate subgraphs, introducing virtual nodes into the preprocessed candidate subgraphs, and using edges to represent the relative magnitude of the numerical values and the disease probability to form fused candidate subgraphs which are used as candidate subgraphs corresponding to the target problem text; Constructing a template text for prompting interpretation according to the candidate subgraph; and determining an index interpretation result corresponding to the target problem text according to the template text interpreted by the prompt.
2. The method of claim 1, wherein the extracting medical concept and entity information in the target question text comprises: Constructing an information extraction template text taking a preset medical concept and an entity as slot positions by utilizing the target problem text; inputting the information extraction template text into a pre-constructed large language model LLM, and predicting medical concepts and entity information in the target problem text; The large language model LLM is obtained by training language rules and modes through an autoregressive generation mode by utilizing a large-scale language data set.
3. The method of claim 1, wherein N has a value of 2.
4. The method of claim 1, wherein said performing an entity disambiguation process on said initial candidate sub-graph comprises: constructing a template text for entity disambiguation by using the initial candidate subgraph; and inputting the template text into a pre-constructed large language model LLM to realize entity disambiguation processing of the initial candidate subgraph.
5. The method of claim 1, wherein constructing template text prompting interpretation from the candidate subgraphs comprises: and compressing the paths according to the candidate subgraphs, the comparison results and the classification results, and constructing template texts for prompting interpretation according to the compression results.
6. A medical data index interpretation apparatus, comprising: The extraction unit is used for acquiring target problem text to be interpreted, which is input by a target user, and extracting medical concepts and entity information in the target problem text; The first construction unit is used for constructing a candidate subgraph corresponding to the target problem text by using a preset rule engine and a weak classifier based on the traditional Chinese medicine concept and entity information in the target problem text; the second construction unit is used for constructing template text for prompting interpretation according to the candidate subgraph; the determining unit is used for determining an index interpretation result corresponding to the target problem text according to the template text interpreted by the prompt; The first building unit includes: The first construction subunit is used for linking the traditional Chinese medicine concept and entity information in the target problem text to a knowledge graph by utilizing a preset entity linking technology, taking each candidate entity in the knowledge graph as a center, and returning triple information within N hops of the candidate entity to construct an initial candidate subgraph corresponding to the target problem text, wherein N is a positive integer greater than 0; the processing subunit is used for performing entity disambiguation processing on the initial candidate subgraph and performing numerical normalization processing on the initial candidate subgraph to obtain a preprocessed candidate subgraph; The comparison subunit is used for carrying out normalization and comparison on the standard range and the actual value in the preprocessed candidate subgraph by utilizing a preset rule engine based on the standard range and the actual value in the map to obtain a comparison result; The classification subunit is used for inputting the comparison result as auxiliary information into a preset weak classifier to classify the numerical value in the preprocessed candidate subgraph, so as to obtain a classification result; And the composition subunit is used for carrying out fusion processing on the classification result and the preprocessed candidate subgraph, introducing virtual nodes into the preprocessed candidate subgraph, and using edges to represent the relative magnitude of the numerical value and the disease probability to form the fused candidate subgraph as the candidate subgraph corresponding to the target problem text.
7. A medical data index interpretation device is characterized by comprising a processor, a memory and a system bus; The processor and the memory are connected through the system bus; The memory is for storing one or more programs, the one or more programs comprising instructions, which when executed by the processor, cause the processor to perform the method of any of claims 1-5.
8. A computer readable storage medium, characterized in that the computer readable storage medium has stored therein instructions, which when run on a terminal device, cause the terminal device to perform the method of any of claims 1-5.

Description

Medical data index interpretation method, device, storage medium and equipment Technical Field The present application relates to the field of medical technologies, and in particular, to a method, an apparatus, a storage medium, and a device for interpreting medical data indexes. Background With the increasing popularity of fast-paced lifestyle phenomena, people are increasingly focusing on their health problems. The application of the method in a core of the medical scene is interpretation and reasoning of medical data indexes, the traditional mode adopts a mode of fusing a rule engine and a specific classification model, and the index interpretation in a specific intention scene can be realized aiming at the field of vertical subdivision, but the method is difficult to combine with common knowledge to process the inquiry of a user in a general scene, and the scene expandability is also greatly limited. At present, a dominant medical data index interpretation method adopts a generation type general large model, a chat generation pre-training converter (CHAT GENERATIVE PRE-trained Transformer, chatGPT for short) is taken as a typical representative, the basic principle is that different natural language processing (Natural Language Processing, NLP) tasks are unified into GPT tasks depending on knowledge of a pre-training corpus fusion subject, and an end-to-end interpretation result is generated in an autoregressive paradigm. However, by using the end-to-end autoregressive generation method, a text which looks like a correct grammar is easy to generate, but knowledge accuracy is problematic, so that the interpretation result of the medical data index is not accurate enough. Disclosure of Invention The embodiment of the application mainly aims to provide a medical data index interpretation method, a device, a storage medium and equipment, which can effectively improve the interpretation accuracy of medical data indexes and further improve user experience. The embodiment of the application provides a medical data index interpretation method, which comprises the following steps: acquiring a target problem text to be interpreted, which is input by a target user, and extracting medical concepts and entity information in the target problem text; Based on the traditional Chinese medicine concept and entity information in the target problem text, constructing a candidate subgraph corresponding to the target problem text by using a preset rule engine and a weak classifier; Constructing a template text for prompting interpretation according to the candidate subgraph; and determining an index interpretation result corresponding to the target problem text according to the template text interpreted by the prompt. In a possible implementation manner, the constructing, based on the concept and the entity information of the traditional Chinese medicine in the target question text, a candidate sub-graph corresponding to the target question text by using a preset rule engine and a weak classifier includes: Constructing an initial candidate subgraph corresponding to the target question text by utilizing traditional Chinese medicine concepts and entity information in the target question text; Performing entity disambiguation on the initial candidate subgraph, and performing numerical value normalization on the initial candidate subgraph to obtain a preprocessed candidate subgraph; And classifying the numerical values in the preprocessed candidate subgraphs by using a preset rule engine and a weak classifier, and fusing the obtained classification result and the preprocessed candidate subgraphs to obtain fused candidate subgraphs which are used as the candidate subgraphs corresponding to the target problem text. In a possible implementation manner, the extracting medical concept and entity information in the target problem text includes: Constructing an information extraction template text taking a preset medical concept and an entity as slot positions by utilizing the target problem text; inputting the information extraction template text into a pre-constructed large language model LLM, and predicting medical concepts and entity information in the target problem text; The large language model LLM is obtained by training language rules and modes through an autoregressive generation mode by utilizing a large-scale language data set. In a possible implementation manner, the constructing an initial candidate subgraph corresponding to the target question text by using the traditional Chinese medicine concept and the entity information in the target question text includes: And taking each candidate entity in the knowledge graph as a center, returning triple information within N hops of the candidate entity to construct an initial candidate subgraph corresponding to the target problem text, wherein N is a positive integer greater than 0. In one possible implementation, the value of N is 2. In a possible implementation manner, the per