CN-122021865-A - Large language model training system and method based on summarization and thinking-back

CN122021865ACN 122021865 ACN122021865 ACN 122021865ACN-122021865-A

Abstract

The invention discloses a large language model training system and method based on summarization and jeopardy, and belongs to the technical fields of Natural Language Processing (NLP) and artificial intelligence. The method comprises the steps of 1, generating an inference path by using a strong language model, reasoning answers and a stepwise summarizing template, obtaining a training set of summarizing and anti-thinking modes through a secondary input model, 2, carrying out gradient feedback updating on parameters of a large language model by using an iteration loop until the large language model with updated iteration parameters is obtained, 3, searching and matching the reasoning summarizing and the anti-thinking obtained by using the large language model with updated iteration parameters as a task memory library and a task of online reasoning of a user to generate a task prompt word, and guiding the large language model to carry out reasoning.

Inventors

LI KAN
WANG XINGLIN

Assignees

北京理工大学

Dates

Publication Date: 20260512
Application Date: 20251201

Claims (5)

1. A large language model training method based on summary and jeopardy is characterized by comprising the following steps, Step 1, generating an inference path and an inference answer by using a strong language model, and obtaining a training set of summarization and anti-thinking modes by a secondary input model with a staged summarization template; step 1.1, obtaining an inference path and an inference answer for an open source data set with a real tag by using a strong language model; Step 1.2, cleaning data of the reasoning path and the reasoning answers; step 1.3, secondarily inputting the cleaned reasoning path, the reasoning answers and the staged summarization template into a strong language model to obtain a training set with summarization and anti-thinking modes; Step 2, carrying out gradient feedback updating on parameters of the large language model by using an iteration loop until the large language model with updated iteration parameters is obtained; Step 2.1, performing supervised training on a large language model of the current turn based on a training set with summary and thinking-back modes; Setting an reasoning ending condition, and executing the step 2.1 in an iterative loop mode until a large language model with updated iteration parameters is obtained when the reasoning ending condition is met; Step 3, searching and matching the task memory library with the task of the user on-line reasoning by utilizing the reasoning summary and the reasoning and countercheck acquired by the large language model with updated iteration parameters to generate a task prompt word, and guiding the large language model to perform reasoning; Step 3.1, storing reasoning summary and reasoning thinking obtained by the large language model after iteration parameter updating into a task memory library; Step 3.2, when the user performs task online reasoning, keyword matching search is performed in a task memory base according to the prompt words input by the user, and reasoning summary and reasoning thinking back corresponding to the task are obtained; And 3.3, constructing task prompt words by utilizing reasoning summary and reasoning and thinking back corresponding to the tasks, and guiding the large language model to perform reasoning.
2. The large language model training method based on summary and jeopardy according to claim 1, wherein the step 1.3 is implemented by, Step 1.3.1, generating a staged summary template by adopting summary and disfigurement prompt words of jeopardy based on task description and optional additional information; And step 1.3.2, secondarily inputting the reasoning path, the reasoning answers and the staged summarization template into the strong language model to obtain a training set with summarization and anti-thinking modes.
3. The large language model training method based on summary and jeopardy according to claim 1, wherein the step 2.1 is implemented by, Step 2.1.1, inputting task description, a staged summary template and a reference reasoning example into a current round large language model, and sequentially obtaining a reasoning process, a reasoning result, a reasoning summary and a reasoning countercheck of the current large language model; step 2.1.2, verifying the correctness of the reasoning result by utilizing the discrimination model to obtain a verification result; and 2.1.3, transmitting the verification result back to the current round large language model through gradient, and obtaining the large language model with updated current parameters.
4. The large language model training method based on summary and jeopardy according to claim 1, wherein the step 2.1 is implemented by, Setting a round threshold, and taking the iteration times as an inference ending condition when the iteration times are not less than the round threshold or the inference result generated by the large language model is consistent with the real label; and 2.2.2, when the reasoning ending condition is met, obtaining the large language model with updated iteration parameters.
5. The large language model training system based on summary and jeopardy for implementing the method of claim 1, comprising a data input module, a training module and a memory module; The data input module is used for receiving the reasoning task demands input by a user, and comprises task description, reference reasoning examples, task constraint conditions and reasoning budget information; the training module consists of a scheduling module, an executing module and a verifying module and is used for carrying out gradient feedback updating on parameters of the large language model and taking the parameters as input of the memory module; The scheduling module is used for analyzing task demands and generating a staged summary template, controlling an inference iteration process, executing process control and termination judgment, judging whether to enter the next round of inference or end tasks according to the state of the input tasks, the current feedback signal and the preset stop condition, and performing data interaction with the execution module and the verification module; The execution module is used for executing corresponding reasoning steps according to the instruction issued by the scheduling module, and returning the result to the scheduling module or directly transmitting the result to the verification module to be used as the input of the verification module; The verification module is used for carrying out external verification on the reasoning result, and the verification mode comprises unit test, fact retrieval consistency check and symbol solution; The memory module is used for storing the summary, the thinking back and the reasoning result and the feedback information of each round of reasoning, abstracting a reusable reasoning mode from the reasoning result as a reasoning primitive to be stored in the cross-task memory library, and can be used for information backtracking in tasks and realizing experience migration among different tasks.

Description

Large language model training system and method based on summarization and thinking-back Technical Field The invention relates to a large language model training system and method based on summarization and jeopardy, which belongs to the technical field of Natural Language Processing (NLP) and artificial intelligence and is applied to the generation of reasoning results with high accuracy, high stability and strong generalization capability in a multi-cycle reasoning process. Background Most of the current large language models adopt a single generation or fixed chain type reasoning mode when complex reasoning tasks are executed, and the capability of self-assessment and correction in the reasoning process is lacking. In the case of multiple rounds of reasoning, cross-domain knowledge integration, or information uncertainty, the model is prone to error conclusions, and it is difficult to effectively self-correct after the reasoning deviates. In addition, the prior art lacks abstraction and cross-task multiplexing of the reasoning mode, which results in insufficient generalization capability of the model on new tasks, and is difficult to realize high-quality initial reasoning performance in unfamiliar fields. Therefore, how to promote the generalization ability of reasoning of a large language model in a supervised training scenario has become a problem to be solved. Disclosure of Invention The invention aims at solving the technical problem of improving the generalization capability of reasoning of a large language model in a supervised training scene, and provides a large language model training system and method based on summarization and jeopardy. According to the invention, by introducing a staged summarizing and anti-thinking mechanism in the reasoning process and combining external verification and training optimization, the reasoning closed-loop control is realized, and the accuracy, stability and generalization capability of reasoning are improved. The invention aims at realizing the following technical scheme: The invention discloses a large language model training method based on summarization and jeopardy, which comprises the following steps: Step 1, generating an inference path and an inference answer by using a strong language model, and obtaining a training set of summarization and anti-thinking modes by a secondary input model with a staged summarization template; step 1.1, obtaining an inference path and an inference answer for an open source data set with a real tag by using a strong language model; Step 1.2, cleaning data of the reasoning path and the reasoning answers; step 1.3, secondarily inputting the cleaned reasoning path, the reasoning answers and the staged summarization template into a strong language model to obtain a training set with summarization and anti-thinking modes; Step 1.3.1, generating a staged summary template by adopting summary and disfigurement prompt words of jeopardy based on task description and optional additional information; Step 1.3.2, secondarily inputting the reasoning path, the reasoning answers and the staged summarization template into the strong language model to obtain a training set with summarization and anti-thinking modes; Step 2, carrying out gradient feedback updating on parameters of the large language model by using an iteration loop until the large language model with updated iteration parameters is obtained; Step 2.1, performing supervised training on a large language model of the current turn based on a training set with summary and thinking-back modes; Step 2.1.1, inputting task description, a staged summary template and a reference reasoning example into a current round large language model, and sequentially obtaining a reasoning process, a reasoning result, a reasoning summary and a reasoning countercheck of the current large language model; step 2.1.2, verifying the correctness of the reasoning result by utilizing the discrimination model to obtain a verification result; step 2.1.3, the verification result is transmitted back to the current round large language model through gradient, and the large language model with updated current parameters is obtained; Setting an reasoning ending condition, and executing the step 2.1 in an iterative loop mode until a large language model with updated iteration parameters is obtained when the reasoning ending condition is met; Setting a round threshold, and taking the iteration times as an inference ending condition when the iteration times are not less than the round threshold or the inference result generated by the large language model is consistent with the real label; 2.2.2, when the reasoning ending condition is met, obtaining a large language model with updated iteration parameters; Step 3, searching and matching the task memory library with the task of the user on-line reasoning by utilizing the reasoning summary and the reasoning and countercheck acquired by the large language model with up