CN-121979990-A - Retrieval enhancement scientific question-answering method and system based on deep evidence reordering

CN121979990ACN 121979990 ACN121979990 ACN 121979990ACN-121979990-A

Abstract

The invention belongs to the fields of natural language processing and knowledge retrieval, and relates to a retrieval enhancement scientific question-answering method and system based on deep evidence reordering. The method comprises the steps of analyzing an input scientific problem into a structured problem intention representation, carrying out logic relevance scoring on candidate evidence paragraphs according to the problem intention representation, screening high-reliability evidence paragraphs from the candidate evidence paragraphs by utilizing a logic relevance scoring threshold according to a logic relevance scoring result, and generating a summary oriented to the scientific problem based on the screened high-reliability evidence paragraphs. The invention can solve the problems of similar semantics but irrelevant logic text interference, lack of reasoning judging capability in the reordering stage, insufficient reliability of answer evidence and the like existing in the retrieval enhancement generation framework in the existing scientific literature question-answering system.

Inventors

LONG QINGQING
ZHU HENGSHU
ZHOU YUANCHUN
Chen haotian

Assignees

中国科学院计算机网络信息中心

Dates

Publication Date: 20260505
Application Date: 20260123

Claims (10)

1. The retrieval enhancement scientific question-answering method based on deep evidence reordering is characterized by comprising the following steps of: analyzing the input scientific questions into a structured question intent representation; Scoring the logical relevance of the candidate evidence segments according to the question intent representation; Screening high-credibility evidence paragraphs from the candidate evidence paragraphs by utilizing a logical relevance scoring threshold according to the logical relevance scoring result; And generating a summary oriented to the scientific problem based on the screened high-credibility evidence paragraphs.
2. The method of claim 1, wherein parsing the input scientific question into a structured question intent representation comprises: Submitting the input problem to a large language model for intent recognition to generate a structured problem intent representation comprising four key elements: The scientific theme to which the problem belongs is used for indicating the subject or research direction of the problem; the core entity type is used for determining key scientific objects involved in the problem; question intent, indicating a cognitive task that a user wishes to perform; The type of answer is expected, specifying the type of information that the answer should provide.
3. The method of claim 2, wherein scoring the candidate evidence segments for logical relevance based on the question intent representation is performed by scoring each candidate evidence segment in the set of candidate evidence segments for logical relevance to the structured question intent representation by: calculating a logic relevance score of each candidate evidence segment through a large language model, wherein the logic relevance score reflects the direct evidence value of the candidate evidence segment on the scientific problem, and the higher the score is, the stronger the logic relevance is; grouping each candidate evidence segment and its logical relevance score into a set; And sequencing the candidate evidence paragraphs in the set according to the logical relevance score from high to low to obtain a rearranged candidate evidence paragraph sequence.
4. The method of claim 3, wherein screening highly reliable evidence segments from the candidate evidence segments using a logical relevance score threshold includes setting a logical relevance score threshold, selecting candidate evidence segments from the rearranged sequence of candidate evidence segments that score higher than the threshold as a final retained set of highly reliable segments that contain candidate evidence segments that are most directly related to the scientific problem and that have logical support.
5. The method of claim 5, wherein generating a summary of the scientific problem based on the screened paragraphs of highly-trusted evidence is generating a summary of the scientific problem by invoking a large language model for each paragraph in the set of highly-trusted paragraphs.
6. The method according to claim 5, characterized in that the summary generated should satisfy the following principles: extracting key information, namely extracting core information for directly answering scientific questions in candidate evidence paragraphs; context preservation, namely reserving original scientific terms and entity names; conciseness, namely generating evidence text with clear and concise logic.
7. A retrieval enhanced scientific question-answering system based on deep evidence reordering, comprising: The intention recognition module is used for analyzing the input scientific questions into structured question intention representations; the relevance evaluation module is used for scoring the logical relevance of the candidate evidence segments according to the problem intention representation; The evidence screening module is used for screening high-credibility evidence paragraphs from the candidate evidence paragraphs by utilizing a logic relevance scoring threshold according to the logic relevance scoring result; and the evidence summarization module is used for generating a summary facing the scientific problem based on the screened high-credibility evidence paragraphs.
8. The system of claim 7, wherein the evidence summarization module comprises: The key information extraction unit is used for extracting core information for directly answering the scientific questions in the candidate evidence paragraphs; a term holding unit for holding original scientific terms and entity names; And the summary generating unit is used for generating evidence text with clear and concise logic.
9. A computer device comprising a memory and a processor, the memory storing a computer program configured to be executed by the processor, the computer program comprising instructions for performing the method of any of claims 1-6.
10. A computer readable storage medium, characterized in that the computer readable storage medium stores a computer program which, when executed by a computer, implements the method of any one of claims 1-6.

Description

Retrieval enhancement scientific question-answering method and system based on deep evidence reordering Technical Field The invention relates to the field of natural language processing and knowledge retrieval, in particular to a retrieval enhancement scientific question-answering method and system based on deep evidence reordering, which belong to the application of a retrieval enhancement generation (RAG, RETRIEVAL-Augmented Generation) technology in scientific question-answering. Background The scientific questions and answers are taken as important ways for promoting scientific research innovation and knowledge discovery, and are widely applied to a plurality of fields such as biomedicine, material science, chemical engineering and the like. For example, molecular biology researchers need to systematically search relevant literature to analyze off-target effects while discussing CRISPR gene editing techniques, and clinicians need to quickly develop reliable evidence from a vast array of newly published studies to support evidence-based medical decisions during global public health events. With the dramatic increase in the number of scientific literature, researchers are facing increasing challenges in literature screening and knowledge acquisition. Conventional information retrieval systems rely heavily on keyword matching or static semantic embedding models, which tend to return only documents that are lexically or surface semantically related, and it is difficult to ensure their logical relevance or fact-supporting capabilities. In recent years, a large language model represents remarkable progress in natural language understanding and reasoning, but the large language model still has the problems of illusion generation, knowledge outdated, insufficient facts and the like. To overcome the above-mentioned drawbacks, search enhancement generation techniques are widely employed. The RAG is generated by combining external retrieval and a language model, so that the generated result can be traced and verified based on the fact document, and the method has important advantages in the aspects of improving answer accuracy, interpretability and field adaptability. Despite the fact that the RAG framework effectively alleviates the problem of the language model, its performance is still limited by the context quality of the retrieval phase. Since the initial search results often contain a large number of semantically close but logically unrelated segments, even containing noisy or outdated information, these "pseudo-relevant" content can significantly reduce the reliability of the final answer. Therefore, researchers introduce a reordering module to reevaluate the relevance of candidate text segments through models such as a cross encoder and the like so as to improve retrieval accuracy. Typical reordering models include BGE, BCE, jina, etc., which compute the relevance scores of queries and documents by double sequence coding, significantly improving the quality of the ranking in most scenarios. However, existing reordering models still rely primarily on vector similarity metrics at word or sentence level, and it is difficult to identify semantically similar but logically unrelated text. This "semantic illusion" phenomenon is particularly prominent in scientific questions and answers, where two documents may be similar in semantic space, but not logically answer a research question. Experiments prove that the semantic misjudgment can lead the model to search a large amount of 'surface-related' but 'logic-inconsistent' contents, so that the language model is induced to generate wrong or evidence-lacking answers in the generation stage, and the robustness and the credibility of the RAG system are weakened. For this problem, researchers have attempted to generate "interference segments" using large language models to evaluate the anti-noise performance of current embedded models. Experimental results show that the noise robust score and the context distinguishing rate of the mainstream embedding model (such as Qwen-Embedding-8B, E5-Mistral-7B, BGE-M3) are all obviously lower than the ideal level when facing interference samples with similar semantics but irrelevant logic. This shows that the existing RAG retrieval and reordering mechanism based on embedding still has obvious defects at the logic discrimination level. Therefore, a novel reordering mechanism capable of combining the reasoning capability of a large language model and multidimensional semantic understanding is needed to surpass the traditional vector similarity measurement, so that the dynamic evaluation of the logic consistency and the evidence reliability of the retrieval candidate text is realized, and the robustness and the reliability of the RAG system in the scientific field are improved. Disclosure of Invention In order to solve the problems that the existing scientific literature question-answering system has similar semantics but has no logic