Search

CN-121998091-A - Large model enhanced search generation method, device, equipment and medium based on real-time fact verification

CN121998091ACN 121998091 ACN121998091 ACN 121998091ACN-121998091-A

Abstract

The embodiment of the application provides a large model enhanced search generation method, device, equipment and medium based on real-time fact verification, and relates to the technical field of search generation. The method comprises the steps of responding to user inquiry, starting a text generation process to generate a reply text stream unit by unit, monitoring generated text content in real time, interrupting text generation and extracting content to be verified when the trigger condition is met, obtaining a verification evidence set related to the content to be verified, verifying the content to be verified based on the verification evidence set to obtain a corresponding verification result, generating a correction instruction based on the verification result, and adjusting the text content generated in the text generation process based on the correction instruction. The embodiment of the application can intercept and correct the fact errors in real time in the text generation process by constructing the dynamic closed loop generation flow of generation, monitoring, interruption, checking and feedback, thereby remarkably improving the accuracy and reliability of the generated content.

Inventors

  • HAN YAXIN
  • CHEN YONG
  • ZENG WENJIA
  • CHEN XINYUE

Assignees

  • 零犀(北京)科技有限公司

Dates

Publication Date
20260508
Application Date
20260123

Claims (13)

  1. 1. A large model enhanced search generation method based on real-time fact verification is characterized by comprising the following steps: in response to a user query, initiating a text generation process to generate a reply text stream on a unit-by-unit basis; In the text generation process, monitoring the generated text content in real time; When the text content is monitored to meet a preset trigger condition, interrupting the text generation process, and extracting the content to be checked based on the text content which is generated currently; Acquiring a verification evidence set related to the content to be verified; Verifying the content to be verified based on the verification evidence set to obtain a corresponding verification result; Generating a correction instruction based on the verification result; And adjusting the text content generated in the text generation process based on the correction instruction.
  2. 2. The method for generating the large model enhanced search based on real-time fact verification according to claim 1, wherein the preset triggering conditions include at least one of the following: Identifying a new named entity in the generated text content; identifying that a complete proposition conforming to a preset semantic structure is formed based on the semantic role mark; The entropy value of the probability distribution output by the text generation process when generating the current unit exceeds a preset threshold.
  3. 3. The method for generating the large model enhanced search based on real-time fact verification according to claim 2, wherein the identifying that a complete proposition conforming to a preset semantic structure has been formed based on semantic role labels comprises: carrying out syntactic analysis and semantic role marking on the generated text content, and identifying a semantic framework taking predicates as cores; When the semantic framework contains the core argument of the predicate, the predicate is judged to form a complete proposition conforming to a preset semantic structure.
  4. 4. The method for generating the large model enhanced search based on real-time fact verification according to claim 1, wherein the step of obtaining the verification evidence set related to the content to be verified comprises the steps of: Constructing a query instruction based on the atomic facts extracted from the current text content; and retrieving relevant document fragments from an external knowledge base based on the query instruction to form the check evidence set.
  5. 5. The method for generating the large model enhanced search based on real-time fact verification according to claim 1, wherein the verifying the content to be verified based on the verification evidence set to obtain a corresponding verification result comprises: Comparing the content to be checked with each evidence in the checking evidence set, and detecting whether contradiction exists; if the contradictions exist, the credibility of each contradiction evidence is evaluated based on the source authority and timeliness of the evidence; and determining a verification result corresponding to the content to be verified based on the credibility of each contradictory evidence.
  6. 6. The method for generating the large model enhanced search based on real-time fact verification according to claim 5, wherein the comparing the content to be verified with each evidence in the verification evidence set, detecting whether there is a contradiction, includes: And calculating the similarity of the semantic vector between the content to be checked and each piece of evidence, and judging that contradiction exists when the similarity is lower than a preset threshold value.
  7. 7. The method for generating the large model enhanced search based on real-time fact verification according to claim 5, wherein the evaluating the credibility of each contradictory evidence based on source authority and timeliness of the evidence comprises: and determining authority scores of evidence sources according to a preset authority mapping dictionary, determining timeliness scores of the evidence according to information release time, and carrying out weighted summation based on the authority scores and the timeliness scores to obtain the credibility of the contradictory evidence.
  8. 8. The method for generating the large model enhanced search based on real-time fact verification according to claim 5, wherein the determining the verification result corresponding to the content to be verified based on the credibility of each contradictory evidence comprises: Acquiring initial confidence level of a text generation model in generating the content to be checked; Calibrating and comparing the initial confidence level with the confidence level of contradictory evidence; Judging based on the comparison result to determine the credit tendency of the external evidence and the original generated content; and determining a verification result corresponding to the content to be verified based on the letter picking tendency.
  9. 9. The method for generating the large model enhanced search based on real-time fact verification according to claim 1, wherein the adjusting the text content generated by the text generating process based on the correction instruction comprises at least one of the following means: injecting the correction instruction as prefix constraint into a subsequent generation context of the text generation model to adjust the text content of the subsequent generation; And controlling the text generation process to fall back to a target node corresponding to the content to be checked, and restarting the text generation process under the prompt of adding the correction instruction so as to correct the generated text content.
  10. 10. The real-time fact check based large model enhancement search generation method of claim 1, wherein said initiating a text generation process to generate a reply text stream unit by unit in response to a user query comprises: Responding to user inquiry, and carrying out initial retrieval from a preset knowledge base to obtain an initial reference document set; a text generation process is initiated to generate a reply text stream on a unit-by-unit basis based on the user query and the initial set of reference documents.
  11. 11. A real-time fact verification-based large model enhanced search generation device, comprising: A text generation initiation module for initiating a text generation process to generate a reply text stream unit by unit in response to a user query; the content real-time monitoring module is used for monitoring the generated text content in real time in the text generation process; The fact verification triggering module is used for interrupting the text generation process when the fact verification triggering module detects that the text content meets the preset triggering condition, and extracting the content to be verified based on the text content which is generated currently; The verification evidence collection module is used for obtaining a verification evidence set related to the content to be verified; the evidence comparison and verification module is used for verifying the content to be verified based on the verification evidence set to obtain a corresponding verification result; the correction instruction generation module is used for generating a correction instruction based on the verification result; And the text generation adjustment module is used for adjusting the text content generated in the text generation process based on the correction instruction.
  12. 12. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor is operable to implement the real-time fact check-based large model augmentation retrieval generating method of any one of claims 1-10 when the program is executed by the processor.
  13. 13. A computer readable storage medium, characterized in that the computer readable storage medium has stored thereon a computer program which, when executed by a processor, performs the large model enhanced search generation method based on real-time facts checking as claimed in any of claims 1-10.

Description

Large model enhanced search generation method, device, equipment and medium based on real-time fact verification Technical Field The application relates to the technical field of search generation, in particular to a large model enhanced search generation method, device, equipment and medium based on real-time fact verification. Background The search enhancement generation technique aims to improve the fact accuracy of the output of a large language model by introducing external knowledge search before generation. The current mainstream technology adopts a serial static flow of 'search-generation', namely, related documents are searched from a knowledge base according to a query, and then the related documents are used as a fixed context input model to generate a final reply. However, the conventional technology has the remarkable defects that firstly, the fact verification is completely dependent on an initial search result, if search information is incomplete or outdated, a model can generate 'illusion' content based on an error context and lacks real-time intervention capability in the generation process, secondly, a system is difficult to trace specific fact statement in output to a precise evidence source in a search document, the interpretation is insufficient, and finally, the system cannot immediately detect and correct deviation and editing spontaneously generated in subsequent paragraphs when long text is generated. In view of the above, a solution is needed that can improve the accuracy and reliability of retrieving generated content. Disclosure of Invention The embodiment of the application aims to provide a large model enhanced retrieval generation method, device, equipment and medium based on real-time fact verification, which are used for improving the accuracy and reliability of retrieval generated content. In a first aspect, an embodiment of the present application provides a method for generating a large model enhanced search based on real-time fact verification, including: in response to a user query, initiating a text generation process to generate a reply text stream on a unit-by-unit basis; In the text generation process, monitoring the generated text content in real time; When the text content is monitored to meet a preset trigger condition, interrupting the text generation process, and extracting the content to be checked based on the text content which is generated currently; Acquiring a verification evidence set related to the content to be verified; Verifying the content to be verified based on the verification evidence set to obtain a corresponding verification result; Generating a correction instruction based on the verification result; And adjusting the text content generated in the text generation process based on the correction instruction. In the embodiment of the application, the fact errors can be intercepted and corrected in real time in the text generation process by constructing the dynamic closed loop generation flow of generation, monitoring, interruption, checking and feedback, so that the accuracy and the reliability of the generated content are obviously improved. In some embodiments, the preset trigger condition includes at least one of: Identifying a new named entity in the generated text content; identifying that a complete proposition conforming to a preset semantic structure is formed based on the semantic role mark; The entropy value of the probability distribution output by the text generation process when generating the current unit exceeds a preset threshold. In the embodiment of the application, key fact points to be checked can be accurately captured by setting diversified triggering conditions based on entities, complete propositions or uncertainties, so that a check flow is started efficiently on the premise of ensuring fluency. In some embodiments, the identifying, based on the semantic role flag, that a complete proposition conforming to a preset semantic structure has been formed includes: carrying out syntactic analysis and semantic role marking on the generated text content, and identifying a semantic framework taking predicates as cores; When the semantic framework contains the core argument of the predicate, the predicate is judged to form a complete proposition conforming to a preset semantic structure. In the embodiment of the application, the verification is triggered by judging the core argument homogeneity in the semantic framework, so that the verification of complete and verifiable propositions can be ensured, the invalid verification of incomplete expression is avoided, and the efficiency of the whole system is further improved. In some embodiments, the acquiring the set of proof of verification related to the content to be verified includes: Constructing a query instruction based on the atomic facts extracted from the current text content; and retrieving relevant document fragments from an external knowledge base based on the query inst