CN-121981252-A - Illusion detection method based on large language model

CN121981252ACN 121981252 ACN121981252 ACN 121981252ACN-121981252-A

Abstract

The disclosure provides a hallucination detection method based on a large language model, relates to the technical field of artificial intelligence, and particularly relates to the technical fields of large language models, deep learning and the like. The LLM-based training data construction method comprises the steps of obtaining initial data, wherein the initial data comprises a reference answer, adopting LLM, injecting illusions into the reference answer to obtain an illusion answer, conducting illusion detection on candidate answers to obtain a detection result, wherein the candidate answers comprise the reference answer and/or the illusion answer, screening the candidate answers to obtain a target answer, and constructing training data based on the target answer and the detection result of the target answer. The present disclosure may improve the quality of training data for the phantom detection model.

Inventors

TAN ZHENDONG
LI SHUPENG
LU WEIPENG
WU JIANMIN

Assignees

北京百度网讯科技有限公司

Dates

Publication Date: 20260505
Application Date: 20251212

Claims (13)

1. A training data construction method based on a large language model LLM comprises the following steps: Acquiring initial data, wherein the initial data comprises a reference answer; injecting illusions into the reference answers by adopting LLM to obtain illusion answers; performing illusion detection on candidate answers to obtain a detection result, wherein the candidate answers comprise the reference answer and/or the illusion answer; screening the candidate answers to obtain target answers; and constructing training data based on the target answers and the detection results of the target answers.
2. The method of claim 1, wherein, The initial data also comprises questions and contexts corresponding to the reference answers; The employing LLM to inject hallucinations into the reference answers to obtain hallucination answers includes: and sending first prompt information to the LLM, wherein the first prompt information comprises the question, the context, the reference answer and preset error description information, so that the LLM rewrites the reference answer according to the error description information, the question and the context to obtain the illusion answer.
3. The method of claim 1, wherein the performing the illusion detection on the candidate answer to obtain a detection result comprises: And sending second prompt information to the LLM, wherein the second prompt information comprises the candidate answer and a hallucination judging standard, so that the LLM obtains a detection result of the candidate answer according to the hallucination judging standard.
4. The method of claim 1, wherein the screening the candidate answers to obtain target answers comprises: If the candidate answer is a phantom answer, sending third prompt information to the LLM, wherein the third prompt information comprises preset error description information, a detection result of the candidate answer and a determination mode of a target parameter, so that the LLM processes the error description information and the detection result according to the determination mode to obtain the target parameter corresponding to the phantom answer; determining a quality score for the phantom answer based on the target parameter; And taking the phantom answer as a target answer in response to the quality score being greater than a preset threshold.
5. The method of claim 4, wherein, The target parameters comprise a standard call number, a predicted number and a reference number, wherein the reference number is the total number of errors in the error description information, the predicted number is the total number of errors in the detection result, and the standard call number is the number of errors contained in both the detection result and the error description information; the determining a quality score for the phantom answer based on the target parameters includes: determining an accuracy rate based on the number of quasi-calls and the predicted number; determining a recall rate based on the quasi-recall number and the benchmark number; the quality score is determined based on the precision and recall.
6. A hallucination detection model training method based on LLM comprises the following steps: acquiring training data; training a phantom detection model by adopting the training data; wherein the training data is constructed using the method of any one of claims 1-5.
7. A LLM-based hallucination detection method, comprising: acquiring content to be detected; Adopting a illusion detection model to detect the illusion of the content to be detected; Wherein the phantom detection model is trained using the method of claim 6.
8. A LLM based training data construction apparatus comprising: The acquisition module is used for acquiring initial data, wherein the initial data comprises a reference answer; The generation module is used for adopting LLM, injecting illusions into the reference answers so as to obtain illusion answers; the detection module is used for carrying out illusion detection on the candidate answers to obtain a detection result, wherein the candidate answers comprise the reference answers and/or the illusion answers; the screening module is used for screening the candidate answers to obtain target answers; And the construction module is used for constructing training data based on the target answers and the detection results of the target answers.
9. An LLM-based phantom detection model training apparatus, comprising: The acquisition module is used for acquiring training data; the training module is used for training the phantom detection model by adopting the training data; wherein the training data is constructed using the method of any one of claims 1-5.
10. A LLM based illusion detection device comprising: the acquisition module is used for acquiring the content to be detected; the detection module is used for carrying out illusion detection on the content to be detected by adopting an illusion detection model; Wherein the phantom detection model is trained using the method of claim 6.
11. An electronic device, comprising: at least one processor, and A memory communicatively coupled to the at least one processor, wherein, The memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-7.
12. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-7.
13. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any of claims 1-7.

Description

Illusion detection method based on large language model Technical Field The disclosure relates to the technical field of artificial intelligence, in particular to the technical fields of large models, deep learning and the like, and particularly relates to a phantom detection method based on a large language model. Background The illusion problem of large language models (Large Language Model, LLM) refers to the fact that the generated content of LLM deviates from a given context, its authenticity is difficult to verify, or the generated result contradicts or even conflicts with the context. To ensure the quality of the generated content of the LLM, whether the generated content of the LLM has a illusion can be identified by an illusion detection model. The phantom detection model is trained based on training data. Disclosure of Invention The present disclosure provides a method for detecting hallucinations based on a large language model, and related methods and products. According to one aspect of the disclosure, a training data construction method based on LLM is provided, which comprises the steps of obtaining initial data, wherein the initial data comprises a reference answer, adopting LLM, injecting illusions into the reference answer to obtain illusions, conducting illusion detection on candidate answers to obtain detection results, wherein the candidate answers comprise the reference answer and/or the illusions, screening the candidate answers to obtain target answers, and constructing training data based on the detection results of the target answers and the target answers. According to another aspect of the present disclosure, there is provided a training method for a hallucination detection model based on LLM, including acquiring training data, training the hallucination detection model using the training data, wherein the training data is constructed using any one of the above methods. According to another aspect of the disclosure, there is provided a LLM-based illusion detection method, including obtaining content to be detected, and performing illusion detection on the content to be detected using an illusion detection model, wherein the illusion detection model is trained using any one of the methods described above. According to another aspect of the disclosure, a training data construction device based on LLM is provided, which comprises an acquisition module for acquiring initial data, wherein the initial data comprises a reference answer, a generation module for injecting illusions into the reference answer to obtain an illusion answer by using LLM, a detection module for carrying out illusion detection on candidate answers to obtain a detection result, wherein the candidate answers comprise the reference answer and/or the illusion answer, a screening module for screening the candidate answers to obtain a target answer, and a construction module for constructing training data based on the target answer and the detection result of the target answer. According to another aspect of the disclosure, a device for training a hallucination detection model based on LLM is provided, which comprises an acquisition module for acquiring training data, and a training module for training the hallucination detection model by using the training data, wherein the training data is constructed by adopting any one of the methods. According to another aspect of the disclosure, there is provided an LLM-based illusion detection apparatus, including an acquisition module configured to acquire content to be detected, and a detection module configured to perform illusion detection on the content to be detected using an illusion detection model, where the illusion detection model is trained using the method as set forth in any one of the above. According to another aspect of the present disclosure there is provided an electronic device comprising at least one processor and a memory communicatively coupled to the at least one processor, wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of the above aspects. According to another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method according to any one of the above aspects. According to another aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements a method according to any of the above aspects. According to the embodiment of the disclosure, the quality of training data of the illusion detection model can be improved. It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other feature