CN-121996753-A - Training method and device for large language model, information processing method and device, electronic equipment, storage medium and program product

CN121996753ACN 121996753 ACN121996753 ACN 121996753ACN-121996753-A

Abstract

The invention relates to a training method, a training device, an information processing method, an information processing device, electronic equipment, a storage medium and a program product of a large language model, which relate to the technical field of artificial intelligence and reduce the phenomenon of phantom reply of the large language model. The training method of the large language model comprises the steps of obtaining a target data set, wherein target data in the target data set comprises preset problems and true value replies of the preset problems, obtaining auxiliary information of the preset problems in the target data, inputting a first prompt, the preset problems in the target data and the auxiliary information of the preset problems into the large language model to generate replies of the preset problems, wherein the first prompt is used for indicating the large language model to output negative replies under the condition that the preset problems cannot be determined, checking the replies of the large language model according to the true value replies of the preset problems, and fine-adjusting the large language model according to a checking result.

Inventors

Request for anonymity

Assignees

摩尔线程智能科技(北京)股份有限公司

Dates

Publication Date: 20260508
Application Date: 20251226

Claims (14)

1. A method for training a large language model, comprising: acquiring a target data set, wherein target data in the target data set comprises a preset problem and a true value reply of the preset problem; Acquiring auxiliary information of a preset problem in the target data; inputting a first prompt, a preset question in the target data and auxiliary information of the preset question into a large language model to generate a reply of the preset question, wherein the first prompt is used for indicating the large language model to output a negative reply under the condition that an answer of the preset question cannot be determined And verifying the reply of the large language model according to the true value reply of the preset problem, and fine-tuning the large language model according to the verification result.
2. The method of claim 1, wherein fine-tuning the large language model based on the verification result comprises: Responding to the verification result of the reply of the large language model to be correct, and generating at least one similar reply similar to the true value reply of the preset problem; and fine-tuning the large language model according to the verification result and the at least one similar reply.
3. The method of claim 2, wherein generating at least one similarity response similar to the true value response of the preset question comprises: Inputting a second prompt, the preset problem and the true value reply of the preset problem to a second large language model to generate at least one similar reply similar to the true value reply of the preset problem, wherein the second prompt is used for prompting the second large language model to generate at least one similar reply similar to the true value reply of the preset problem according to the preset problem and the true value reply of the preset problem.
4. The method of claim 3, further comprising, after generating at least one similarity response similar to the true value response of the preset question: verifying the at least one similar reply according to the true value reply of the preset problem; and, based on the verification result and the at least one similar reply, fine-tuning the large language model includes: and trimming the large language model by adopting a low-rank adaptive trimming method according to the verification result of the at least one similar reply and the verification result of the reply of the large language model.
5. The method according to claim 1, wherein, in the case where the preset question is in the form of a graphic text, the obtaining auxiliary information of the preset question in the target data includes: retrieving, by a search engine, at least one similar image that is similar to an image in the preset question; Extracting text information of any one of the at least one similar image, and taking the text information as auxiliary information of the preset problem.
6. The method according to claim 1, wherein, in the case where the preset question is in the form of text, obtaining the auxiliary information of the preset question in the target data includes: Inputting a preset problem in a text form into a search engine to obtain at least one uniform resource locator related to the preset problem; Extracting resource information in any uniform resource locator of the at least one uniform resource locator, and taking the resource information as auxiliary information of the preset problem.
7. The method of any of claims 1-6, wherein the assistance information is used to provide objective external references associated with the pre-set questions to the large language model during the training phase to assist the large language model in determining whether its internal knowledge is authentic, thereby suppressing the generation of hallucination replies when the internal knowledge of the large language model is insufficient to answer the questions.
8. An information processing method, characterized by comprising: acquiring a problem input by a user; Processing the problem by a large language model to obtain a reply, wherein the large language model is obtained by adopting the training method of the large language model according to any one of claims 1 to 7; and returning the answer to the user.
9. The method as recited in claim 8, further comprising: The large language model obtaining a plurality of answers based on the questions entered by the user a plurality of times, and Ranking the plurality of replies by relevance using the large language model; and returning one or more replies with the top relevance rank according to the set strategy.
10. A training device for a large language model, comprising: The first acquisition unit is used for acquiring a target data set, wherein target data in the target data set comprises a preset problem and a true value reply of the preset problem; A second obtaining unit, configured to obtain auxiliary information of a preset problem in the target data; A first generation unit for inputting a first prompt, a preset problem in the target data and auxiliary information of the preset problem into a large language model to generate a reply of the preset problem, wherein the first prompt is used for indicating the large language model to output a negative reply under the condition that the preset problem cannot be determined, and And the verification unit is used for verifying the reply of the large language model according to the true value reply of the preset problem and fine-tuning the large language model according to the verification result.
11. An information processing apparatus, characterized by comprising: a third obtaining unit for obtaining a problem inputted by a user; a processing unit, configured to process the question through a large language model to obtain a response, where the large language model is obtained by using the training method of the large language model according to any one of claims 1 to 7; and the return unit is used for returning the answer to the user.
12. A computer-readable storage medium, characterized in that the computer-readable storage medium comprises a stored program, wherein the program, when run, controls a device in which the storage medium is located to perform the training method of the large language model of any one of claims 1 to 7, or the information processing method of any one of claims 8 to 9.
13. An electronic device, comprising: A memory storing an executable program; A processor for running the program, wherein the program runs to execute the training method of the large language model according to any one of claims 1 to 7 or the information processing method according to any one of claims 8 to 9.
14. A computer program product, characterized in that the computer program product comprises a stored computer program which, when being executed by a processor, implements the training method of a large language model according to any one of claims 1 to 7, or the information processing method according to any one of claims 8 to 9.

Description

Training method and device for large language model, information processing method and device, electronic equipment, storage medium and program product Technical Field The present invention relates to the field of artificial intelligence, and in particular, to a training method and apparatus for a large language model, an information processing method and apparatus, an electronic device, a storage medium, and a program product. Background In recent years, as the parameter number of large language models such as GPT-4, paLM-E, gemini and the like and the training data scale are exponentially increased, the overall quality of a question-answering system is remarkably improved, and a user can obtain coherent and authoritative answers only by asking questions in natural language. However, these large language models are subject to the phenomenon of hallucination reversion under the "generative, i.e., rational," paradigm. Hallucination replies refer to phenomena in which the contents of the large language model replies do not match the real data, or deviate from the user instructions. In addition, as instructions within the large language model follow, contextual learning and logical reasoning capabilities are migrated into the context of the multimodal question-answering system, the illusion-reply phenomenon inherent to the large language model is not eliminated in the multimodal question-answering, but instead presents new features, such as visual illusions. Visual illusion refers to a large language model giving a positive description of objects, attributes or relationships not present in the image, e.g. "white truck" is said to be "red car". Therefore, how to reduce the phenomenon of the illusion recovery of the large language model is a technical problem to be solved currently. Disclosure of Invention In view of the foregoing, embodiments of the present invention are directed to a training method, apparatus, information processing method, apparatus, electronic device, storage medium, and program product for a large language model, so as to reduce the phenomenon of phantom reversion of the large language model. In a first aspect, an embodiment of the present invention provides a training method for a large language model, including: acquiring a target data set, wherein target data in the target data set comprises a preset problem and a true value reply of the preset problem; Acquiring auxiliary information of a preset problem in the target data; inputting a first prompt, a preset problem in the target data and auxiliary information of the preset problem into a large language model to generate a reply of the preset problem, wherein the first prompt is used for indicating the large language model to output a negative reply under the condition that the preset problem cannot be determined And verifying the reply of the large language model according to the true value reply of the preset problem, and fine-tuning the large language model according to the verification result. Based on the further improvement of the method, the trimming the large language model by adopting a low-rank adaptive trimming method according to the verification result comprises the following steps: Responding to the verification result of the reply of the large language model to be correct, and generating at least one similar reply similar to the true value reply of the preset problem; And trimming the large language model by adopting a low-rank adaptive trimming method according to the verification result and the at least one similar reply. Based on a further improvement of the above method, generating at least one similar reply similar to the true value reply of the preset question comprises: Inputting a second prompt, the preset problem and the true value reply of the preset problem to a second large language model to generate at least one similar reply similar to the true value reply of the preset problem, wherein the second prompt is used for prompting the second large language model to generate at least one similar reply similar to the true value reply of the preset problem according to the preset problem and the true value reply of the preset problem. Based on a further improvement of the above method, after generating at least one similar reply similar to the true value reply of the preset question, further comprising: verifying the at least one similar reply according to the true value reply of the preset problem; And, according to the verification result and the at least one similar reply, fine tuning the large language model by using a low rank adaptation fine tuning method includes: and trimming the large language model by adopting a low-rank adaptive trimming method according to the verification result of the at least one similar reply and the verification result of the reply of the large language model. Based on a further improvement of the above method, in the case that the preset problem is in the form of an image-text, obt