CN-122019708-A - Search enhancement generation method and device, electronic equipment and storage medium
Abstract
The application relates to a retrieval enhancement generation method, a device, an electronic device and a storage medium, wherein the method firstly generates an initial answer corresponding to a question text according to a document set, and performs sentence level verification on the initial answer, thereby filtering out a part with obvious illusion or wrong quotation to obtain a reliable statement set, then evaluates each document in the document set according to the question text, determines a supporting evidence set in the document set, namely only performs evidence extraction on useful documents in the document set, ensures that the extracted evidence is really from an original text and highly related to the question text, and finally generates a final answer corresponding to the question text according to the reliable statement set and the supporting evidence set, ensures that each sentence in the final answer has traceable quotation, and ensures the reliability and quotation quality of the output final answer.
Inventors
- LIANG YANJUN
Assignees
- 北京奇艺世纪科技有限公司
Dates
- Publication Date
- 20260512
- Application Date
- 20260115
Claims (10)
- 1. A search enhancement generation method, the method comprising: Retrieving a document set from a knowledge base based on the problem text; Generating an initial answer corresponding to the question text according to the document set, and performing sentence-level verification on the initial answer to obtain a reliable statement set; Evaluating each document in the document set according to the problem text, and determining a supporting evidence set in the document set; and generating a final answer corresponding to the question text according to the reliable statement set and the supporting evidence set.
- 2. The method of claim 1, wherein generating an initial answer for the question text from the set of documents comprises: Inputting the problem text and each document in the document set into a large language model, wherein the document carries a document number during retrieval; And acquiring the initial answer carrying the document number and output by the large language model.
- 3. The method of claim 1, wherein performing sentence-level verification on the initial answer to obtain a set of reliable claims, comprises: carrying out sentence-level splitting on the initial answer to obtain a plurality of sentence-level statements, wherein each sentence-level statement carries at least one document number; Inputting the problem text, the sentence level statement and a target document corresponding to the document number carried by the sentence level statement into a large language model to obtain a judging result of whether the sentence level statement can be supported by the target document; And determining the reliable statement set according to all the statement level statements supported by the judging result.
- 4. The method of claim 3, further comprising rejecting all of the sentence-level claims for which the determination is not supported.
- 5. The method of claim 1, wherein evaluating each document in the set of documents based on the problem text, determining a set of supporting evidence in the set of documents, comprises: For each document in the document set, constructing a first input question according to the question text and the documents in the document set; inputting the first input problem into a large language model to evaluate the document usefulness of the document, so as to obtain a first evaluation result; determining a first evidence set from the document set according to the first evaluation result; For each document in the first evidence set, constructing a second input question according to the question text and the documents in the first evidence set; inputting the second input problem into a large language model to evaluate the document for evidence usefulness, so as to obtain a second evaluation result; And determining the supporting evidence set from the first evidence set according to the second evaluation result.
- 6. The method of claim 1, wherein generating a final answer for the question text from the set of reliable claims and the set of supporting evidence comprises: Combining the states with the similarity larger than a preset threshold value in the reliable state set to obtain a target state set; And generating a final answer corresponding to the question text according to the target statement set and the support evidence set, wherein each fact statement sentence in the final answer corresponds to a document number in retrieval.
- 7. The method of claim 6, wherein after generating a final answer for the question text from the set of reliable claims and the set of supporting evidence, the method further comprises: establishing a mapping relation between the document numbers and the document meta information; Displaying the final answer and the document number corresponding to the fact statement sentence; acquiring a viewing request for the document number; and displaying the target document meta-information corresponding to the viewing request according to the viewing request and the mapping relation.
- 8. A retrieval enhancement generation device, the device comprising: the preliminary retrieval module is used for retrieving a document set from the knowledge base based on the problem text; the sentence level verification module is used for generating an initial answer corresponding to the question text according to the document set, and carrying out sentence level verification on the initial answer to obtain a reliable statement set; The evidence evaluation module is used for evaluating each document in the document set according to the problem text and determining a supporting evidence set in the document set; and the answer generation module is used for generating a final answer corresponding to the question text according to the reliable statement set and the supporting evidence set.
- 9. The electronic equipment is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory are communicated with each other through the communication bus; a memory for storing a computer program; a processor for implementing the retrieval enhancement generation method according to any one of claims 1 to 7 when executing a program stored on a memory.
- 10. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements a retrieval enhancement generation method as claimed in any one of claims 1-7.
Description
Search enhancement generation method and device, electronic equipment and storage medium Technical Field The present application relates to the field of search enhancement generation technologies, and in particular, to a search enhancement generation method, a search enhancement generation device, an electronic device, and a storage medium. Background In the questioning and answering scenario of a film and television making Artificial Intelligence (AI) assistant, authors often need to frequently query historical data, professional documents, script scripts and the like, and hope that the system clearly marks information sources when answering, so that the authors can further check, supplement and dig. For example, when the drama queries "architectural style features of Song dynasty", it is desirable that the system can tell him which descriptions come from "build French" which come from a certain academic paper or an existing documentary script, and can click on the traceability. The current industry mainstream practice is to apply a framework based on a large language model of retrieval enhancement generation (RAG), wherein the framework comprises the steps of firstly vectorizing and indexing internal documents (scripts, histories, academic papers, encyclopedia items and the like), then retrieving a plurality of relevant document fragments according to user questions, and finally inputting the document fragments and the questions into the large language model together, and generating natural language answers by the model. However, when noise exists in the retrieved document, key information is lost, or the problem itself exceeds the range of the knowledge base, the large language model still can "give out an answer" forcefully, and a scenario background or historical details are compiled, so that the output question-answer result is inaccurate. Disclosure of Invention The application provides a search enhancement generation method, a device, electronic equipment and a storage medium, which are used for solving the technical problem of how to ensure the reliability of search result output. In a first aspect, the present application provides a search enhancement generation method, the method including: Retrieving a document set from a knowledge base based on the problem text; Generating an initial answer corresponding to the question text according to the document set, and performing sentence-level verification on the initial answer to obtain a reliable statement set; Evaluating each document in the document set according to the problem text, and determining a supporting evidence set in the document set; and generating a final answer corresponding to the question text according to the reliable statement set and the supporting evidence set. Optionally, generating an initial answer corresponding to the question text according to the document set includes: Inputting the problem text and each document in the document set into a large language model, wherein the document carries a document number during retrieval; And acquiring the initial answer carrying the document number and output by the large language model. Optionally, performing sentence-level verification on the initial answer to obtain a reliable statement set, including: carrying out sentence-level splitting on the initial answer to obtain a plurality of sentence-level statements, wherein each sentence-level statement carries at least one document number; Inputting the problem text, the sentence level statement and a target document corresponding to the document number carried by the sentence level statement into a large language model to obtain a judging result of whether the sentence level statement can be supported by the target document; And determining the reliable statement set according to all the statement level statements supported by the judging result. Optionally, the method further comprises rejecting all the sentence-level claims that are not supported by the judging result. Optionally, evaluating each document in the document set according to the problem text, determining a supporting evidence set in the document set includes: For each document in the document set, constructing a first input question according to the question text and the documents in the document set; inputting the first input problem into a large language model to evaluate the document usefulness of the document, so as to obtain a first evaluation result; determining a first evidence set from the document set according to the first evaluation result; For each document in the first evidence set, constructing a second input question according to the question text and the documents in the first evidence set; inputting the second input problem into a large language model to evaluate the document for evidence usefulness, so as to obtain a second evaluation result; And determining the supporting evidence set from the first evidence set according to the second evaluation resul