CN-121998104-A - Multi-hop reasoning method and device based on shielding knowledge activation

CN121998104ACN 121998104 ACN121998104 ACN 121998104ACN-121998104-A

Abstract

The application provides a multi-hop reasoning method and device based on shielding knowledge activation, and relates to the technical field of artificial intelligence. The method comprises the steps of determining a corresponding candidate key phrase set from initial query information by using a natural language processing library, detecting a shielded key phrase in the candidate key phrase set, searching from a search database by using a dense searcher based on search conditions, wherein the search conditions comprise a current round of questions corresponding to a current round of queries and the shielded key phrase, determining a target document from the candidate document set, generating a next round of questions corresponding to a next round of queries by using a large language model based on the shielded key phrase, the current round of questions and the target document, and determining a final answer based on the initial query information, the next round of questions and a single-hop termination condition. According to the application, the target document is locked, and the document is utilized to complement the shielded key missing information to generate the next round of problems, so that the accuracy of retrieval enhancement generation in a complex scene is improved.

Inventors

MA SHAONAN
WU YONGWEI
WU HAO
Nishikata
MA HUIPENG

Assignees

启元实验室

Dates

Publication Date: 20260508
Application Date: 20260409

Claims (10)

1. A multi-hop inference method based on masking knowledge activation, comprising: Determining a corresponding candidate key phrase set of the current round of query from initial query information of a user by using a preset natural language processing library, and detecting a shielded key phrase in the candidate key phrase set; Searching from a search database by using a dense searcher obtained through fine granularity comparison learning training based on a search condition, wherein the search condition comprises a current round of questions corresponding to the current round of queries and the shielded key phrases; determining a target document from the candidate document set, and generating a next round of questions corresponding to a next round of query by using a large language model based on the masked key phrase, the current round of questions and the target document; and determining a final answer based on the initial query information, the next round of questions and a preset single-hop termination condition.
2. The method of claim 1, wherein the natural language processing library comprises Spacy frameworks that are pre-configured for key-phrase masking; The method for determining the candidate key phrase set of the corresponding current round of query from the initial query information of the user by utilizing a preset natural language processing library, and detecting the shielded key phrase in the candidate key phrase set comprises the following steps: inputting the initial query information into the Spacy framework to determine pure key phrases obtained after the stop words are removed; Inputting the clean key phrase into the Spacy framework to output the set of candidate key phrases; the masked key-phrase in the set of candidate key-phrases is detected.
3. The method of claim 2, wherein the detecting the obscured key phrase in the set of candidate key phrases comprises: Measuring the influence of each candidate key phrase in the candidate key phrase set on the output distribution of the large language model by using a Gaussian disturbance mechanism so as to determine the similarity; And determining the candidate key phrase with the highest similarity as the shielded key phrase.
4. The method of claim 3, wherein the measuring the effect of each candidate key phrase in the set of candidate key phrases on the large language model output distribution using a gaussian perturbation mechanism to determine similarity comprises: Constructing a binary mask corresponding to the candidate key phrase set, Injecting Gaussian noise into the word element embedding position corresponding to the binary mask in the candidate key phrase set to determine the disturbed input embedding representation; Respectively inputting an original input embedded representation and the perturbed input embedded representation into the large language model to determine an original output distribution and a perturbed output distribution; Carrying out average pooling processing on the original output distribution and the disturbance output distribution in a time dimension to determine an original vector and a disturbance vector; And calculating cosine similarity of the original vector and the disturbance vector, and determining a candidate key phrase with the maximum cosine similarity as the shielded key phrase.
5. The method of claim 1, wherein the determining a final answer based on the initial query information, the next round of questions, and a preset single-hop termination condition comprises: judging whether the next round of problems meet the single-hop termination condition or not; under the condition that the next round of problems do not meet the single-hop termination condition, taking the next round of problems as new current round of problems corresponding to new current round of inquiry to perform reasoning iteration; under the condition that the next round of problems meet the single-hop termination condition, the iteration is terminated after the last round of retrieval is executed; And acquiring a target document and the initial query information in the iterative process, and inputting the target document and the initial query information into the large language model to generate the final answer.
6. The method of claim 5, wherein said determining whether the next round of problems satisfies the single hop termination condition comprises: inputting the next round of questions into the large language model so that the large language model judges whether the next round of questions are single-hop queries; if the judgment result is single-hop inquiry, determining that the single-hop termination condition is met; if the judging result is non-single-hop inquiry and the preset maximum iteration number is not reached, determining that the single-hop termination condition is not met; and if the preset maximum iteration number is reached, determining to forcibly meet the single-hop termination condition.
7. The method as recited in claim 1, further comprising: Constructing a training sample set, wherein the training sample set comprises a positive sample document, a semi-positive sample document and a negative sample document, the positive sample document is related to a problem and a shielded key phrase at the same time, the semi-positive sample document is related to the problem but not directly related to the shielded key phrase, and the negative sample document is not related to the problem and the shielded key phrase; Training a retriever based on the training sample set to generate the dense retriever using a fine-grained contrast learning penalty function, wherein the fine-grained contrast learning penalty function includes a first penalty term for maximizing a ratio of a search score of a positive sample document relative to a sum of search scores of a semi-positive sample document and a negative sample document, a second penalty term for maximizing a ratio of a sum of search scores of a positive sample document and a semi-positive sample document relative to a sum of search scores of a negative sample document, and a total penalty term that is a weighted sum of the first penalty term and the second penalty term.
8. The method according to claim 1, wherein the retrieving the candidate document set from the retrieval database using the dense retriever trained through fine-grained contrast learning based on the retrieval condition comprises: Splicing or fusing the current round of questions and the shielded key phrase to generate a retrieval input vector; And calculating the similarity between the search input vector and each of a plurality of document vectors in the search database by using the dense retriever based on the search condition, so as to determine a preset number of documents with highest similarity as the candidate document set.
9. The method of claim 1, wherein the determining a target document from the set of candidate documents and generating a next round of questions corresponding to a next round of queries using a large language model based on the obscured key phrase, the current round of questions, and the target document comprises: Inputting each candidate document pair by pair in the current round of questions and the candidate document set into the large language model to output a correlation discrimination result of each candidate document and the current round of questions, wherein the discrimination result is a probability value of yes or no; Determining a candidate document with the highest probability value as the target document; And based on the target document and the shielded key phrase, complementing the information in the current round of inquiry to generate the next round of questions.
10. A multi-hop inference device based on masking knowledge activation, comprising: The phrase detection module is used for determining a corresponding candidate key phrase set of the current round of inquiry from initial inquiry information of a user by utilizing a preset natural language processing library, and detecting a shielded key phrase in the candidate key phrase set; The document set retrieval module is used for retrieving a candidate document set from a retrieval database by utilizing a dense retriever obtained through fine-granularity comparison learning training based on retrieval conditions, wherein the retrieval conditions comprise a current round of questions corresponding to the current round of query and the shielded key phrase; The problem generation module is used for determining a target document from the candidate document set, and generating a next round of problems corresponding to a next round of inquiry by using a large language model based on the shielded key phrase, the current round of problems and the target document; and the answer generation module is used for determining a final answer based on the initial query information, the next round of questions and a preset single-hop termination condition.

Description

Multi-hop reasoning method and device based on shielding knowledge activation Technical Field The application relates to the technical field of artificial intelligence, in particular to a multi-hop reasoning method and device based on shielding knowledge activation. Background With the rapid development of large language models (Large Language Models, LLMs), the application of the large language models in the fields of question-answering systems, decision support, content generation and the like is increasingly wide. However, LLM is mainly generated by relying on parameter memories in training data, and when faced with problems of knowledge intensive, time efficient or long tail entities, it is easy to generate "illusion", i.e. to generate content that lacks a real basis or is inconsistent with the real world. To solve this problem, search enhancement generation (RETRIEVAL-Augmented Generation, RAG) techniques have been developed. The RAG inputs the retrieved evidence segments and the user questions together into the LLM by introducing an external knowledge base for retrieval before generating answers, thereby improving the accuracy and verifiability of the generated contents. In a simple single-hop question-answer scenario, a traditional single-round RAG (i.e., directly using the original question to retrieve and generate an answer at one time) can usually achieve a better effect. However, in the face of complex Multi-hop reasoning (Multi-hop Reasoning) scenarios, a problem often implies multiple reasoning steps, and a complete answer link needs to be found across multiple documents, for example, in scenarios such as the name of a famous bridge of a place of birth of a composer of a work, single-round retrieval is difficult to cover all intermediate evidence, so the industry gradually turns to a Multi-round RAG scheme. The multi-round RAG adopts an iterative closed loop of query generation, retrieval and regeneration, wherein a model firstly generates a sub-question or query of the next round based on the current information, a new document is retrieved by utilizing the query, and the inference state is updated by combining with the new document, and the model loops until a final answer is obtained. In the iterative process of the related art, the phenomenon of "knowledge masking" is easy to occur, namely, when the problem comprises a plurality of parallel or combined conditions, the model is often subjected to more obvious, more frequently-occurring or more "familiar" conditions when generating the next round of inquiry or reasoning text, and thus another condition which is also key but relatively insignificant is ignored. The condition that is ignored is often precisely the "key" that advances the next hop retrieval and reasoning. Once masking occurs, the model may generate content related to the dominant condition but not related to the true inference chain, and further retrieve irrelevant documents, further misleading the model to strengthen the error direction in the next generation round, forming a continuous "yaw-retrieve noise-misleading generation" chain reaction. In multi-hop inference, multiple conditions of a problem typically have strong coupling relationships (e.g., "famous bridge names of places of birth of composers of a work"), and any one condition that is ignored may break the inference chain. On this basis, once a certain round of inquiry deviates from the fact or omits a key condition, the subsequent retrieval is misdirected, so that the error is accumulated and amplified in iteration, and finally, an answer with high confidence but error is generated, so that the reasoning accuracy of the current reasoning method is lower. Disclosure of Invention The application aims to provide a multi-hop reasoning method, a device, electronic equipment and a storage medium based on shielding knowledge activation. According to one aspect of the application, a multi-hop inference method based on shielding knowledge activation is provided, which comprises the steps of utilizing a preset natural language processing library to determine a candidate key phrase set of a corresponding current round of query from initial query information of a user, detecting shielded key phrases in the candidate key phrase set, utilizing dense retrievers obtained through fine-granularity contrast learning training to retrieve from a retrieval database based on retrieval conditions, wherein the retrieval conditions comprise the current round of questions and the shielded key phrases corresponding to the current round of query, determining target documents from the candidate document set, utilizing a large language model to generate a next round of questions corresponding to the next round of query based on the shielded key phrases, the current round of questions and the target documents, and determining final answers based on the initial query information, the next round of questions and preset single-hop termination conditi