CN-121998012-A - Model training method, model reasoning method and related device

CN121998012ACN 121998012 ACN121998012 ACN 121998012ACN-121998012-A

Abstract

The embodiment of the application provides a model training method, a model reasoning method and a related device, which are used for reasoning input data through a trained combined model to obtain a reasoning result, and small language models in the trained combined model interact based on a cross attention mechanism, so that the quality of the reasoning result generated by the model is improved. The method comprises the steps of obtaining a first intermediate state, obtaining a second intermediate state, carrying out cross attention operation on the first intermediate state and the second intermediate state based on a cross attention mechanism to obtain a first operation result, analyzing the first operation result and the second intermediate state through a second small language model to obtain a first reasoning result, training parameters of a combined model according to the first reasoning result to obtain a trained combined model, and carrying out reasoning on the trained combined model according to an input text to obtain question-answer data of the input text.

Inventors

WANG YIMING
LIU YANG
XIAO AN
WANG YUNHE

Assignees

华为技术有限公司

Dates

Publication Date: 20260508
Application Date: 20241108

Claims (20)

1. A method of model training, comprising: Acquiring a first intermediate state, wherein the first intermediate state is an output result obtained by a first intermediate layer according to first input data, the first intermediate layer is an intermediate layer of a first small language model in a combined model, and the combined model comprises at least two small language models; Acquiring a second intermediate state, wherein the second intermediate state is an output result obtained by a second intermediate layer according to second input data, the second intermediate layer is an intermediate layer of a second small language model in the combined model, and the second input data is obtained according to the first input data; Based on a cross attention mechanism, performing cross attention operation on the first intermediate state and the second intermediate state to obtain a first operation result, wherein the first operation result is used for indicating the importance degree of the first intermediate state for generating the second intermediate state; Analyzing the first operation result and the second intermediate state through the second small language model to obtain a first reasoning result; and training the parameters of the combined model according to the first reasoning result to obtain a trained combined model, wherein the trained combined model is used for reasoning according to an input text to obtain question-answer data of the input text.
2. The method of claim 1, wherein the combined model further comprises a third small language model, the method further comprising: Acquiring updated second intermediate states and third intermediate states, wherein the updated second intermediate states are obtained by updating the second intermediate states by the second intermediate layers according to the first operation result, the third intermediate states are output results obtained by the third intermediate layers according to third input data, the third intermediate layers are intermediate layers of the third small language model, and the third input data are obtained according to the second input data; Performing cross attention operation on the updated second intermediate state and the third intermediate state to obtain a second operation result, wherein the second operation result is used for indicating the importance degree of the updated second intermediate state for generating the third intermediate state; And analyzing the second operation result and the third intermediate state through the third small language model to obtain a second reasoning result.
3. The method of claim 2, wherein training the parameters of the combined model according to the first inference result to obtain a trained combined model comprises: and training the parameters of the combined model according to the first reasoning result and the second reasoning result to obtain the trained combined model.
4. A method according to claim 3, wherein the training the parameters of the combined model according to the first inference result and the second inference result to obtain the trained combined model comprises: and training the parameters of the combined model according to the error between the first reasoning result and the first actual result and the error between the second reasoning result and the second actual result to obtain the trained combined model.
5. The method according to any one of claims 1 to 4, wherein prior to said acquiring the second intermediate state, the method further comprises: Reasoning the first input data through the first small language model to obtain a third reasoning result; And taking the third reasoning result and the first input data as the second input data.
6. The method according to any one of claims 1 to 5, wherein analyzing the first operation result and the second intermediate state by the second small language model to obtain a first inference result includes: updating the second intermediate state through a second intermediate layer of the second small language model according to the first operation result to obtain an updated second intermediate state; And analyzing the updated second intermediate state through an output layer of the second small language model to obtain the first reasoning result.
7. A method of model reasoning, comprising: Acquiring an input text; And reasoning the input text through the trained combined model to obtain question-answer pair data, wherein the question-answer pair data comprises questions and answers to the questions, and the trained combined model comprises at least two small language models.
8. The method of claim 7, wherein the at least two small language models comprise a first small language model and a second small language model, wherein the reasoning the input text through the trained combined model to obtain question-answer pair data comprises: reasoning the input text through the first small language model to obtain the problem; And reasoning the questions and the input text through the second small language model to obtain answers to the questions.
9. The method of claim 8, wherein said reasoning about said question from said input text by said second small language model, prior to obtaining an answer to said question, further comprises: obtaining a first intermediate result and a second intermediate result, wherein the first intermediate result is an output result obtained by a first intermediate layer according to the input text, the first intermediate layer is an intermediate layer of the first small language model, the second intermediate result is an output result obtained by a second intermediate layer according to the problem and the input text, and the second intermediate layer is an intermediate layer of the second small language model; And performing cross attention operation on the first intermediate result and the second intermediate result based on a cross attention mechanism to obtain an intermediate operation result, wherein the intermediate operation result is used for indicating the importance degree of the first intermediate result for generating the second intermediate result.
10. The method of claim 9, wherein said reasoning about the question from the input text via the second small language model to obtain an answer to the question comprises: Updating the second intermediate result through the intermediate layer of the second small language model according to the intermediate operation result to obtain an updated second intermediate result; And analyzing the updated second intermediate result through an output layer of the second small language model to obtain an answer to the question.
11. The method according to any one of claims 7 to 10, wherein the trained combined model is obtained by training the combined model using the method of the preceding claims 1 to 6.
12. A model training device, comprising: The system comprises an acquisition module, a first intermediate state acquisition module and a second intermediate state acquisition module, wherein the first intermediate state is an output result obtained by a first intermediate layer according to first input data, the first intermediate layer is an intermediate layer of a first small language model in a combined model, and the combined model comprises at least two small language models; The obtaining module is further configured to obtain a second intermediate state, where the second intermediate state is an output result obtained by a second intermediate layer according to second input data, the second intermediate layer is an intermediate layer of a second small language model in the combined model, and the second input data is obtained according to the first input data; The operation module is used for carrying out cross attention operation on the first intermediate state and the second intermediate state based on a cross attention mechanism to obtain a first operation result, wherein the first operation result is used for representing the importance degree of the first intermediate state for generating the second intermediate state; The analysis module is used for analyzing the first operation result and the second intermediate state through the second small language model to obtain a first reasoning result; And the training module is used for training the parameters of the combined model according to the first reasoning result to obtain a trained combined model.
13. The apparatus of claim 12, wherein the combined model further comprises a third small language model, The obtaining module is further configured to obtain an updated second intermediate state and a third intermediate state, where the updated second intermediate state is obtained by updating the second intermediate state by the second intermediate layer according to the first operation result, the third intermediate state is an output result obtained by a third intermediate layer according to third input data, the third intermediate layer is an intermediate layer of the third small language model, and the third input data is obtained according to the second input data; The operation module is further configured to perform a cross attention operation on the updated second intermediate state and the third intermediate state, so as to obtain a second operation result, where the second operation result is used to represent an importance degree of the updated second intermediate state for generating the third intermediate state; the analysis module is further configured to analyze the second operation result and the third intermediate state through the third small language model, so as to obtain a second inference result.
14. The apparatus according to claim 13, wherein the training module is specifically configured to: and training the parameters of the combined model according to the first reasoning result and the second reasoning result to obtain the trained combined model.
15. The apparatus according to claim 14, wherein the training module is specifically configured to: and training the parameters of the combined model according to the error between the first reasoning result and the first actual result and the error between the second reasoning result and the second actual result to obtain the trained combined model.
16. The apparatus according to any one of claims 12 to 15, wherein prior to said acquiring the second intermediate state, the apparatus further comprises: The reasoning module is used for reasoning the first input data through the first small language model to obtain a third reasoning result; And the determining module is used for taking the third reasoning result and the first input data as the second input data.
17. The apparatus according to any one of claims 12 to 16, wherein the analysis module is specifically configured to: updating the second intermediate state through a second intermediate layer of the second small language model according to the first operation result to obtain an updated second intermediate state; And analyzing the updated second intermediate state through an output layer of the second small language model to obtain the first reasoning result.
18. A model reasoning apparatus, comprising: The acquisition module is used for acquiring an input text; The inference module is used for inferring the input text through the trained combination model to obtain question-answer pair data, the question-answer pair data comprise questions and answers of the questions, and the trained combination model comprises at least two small language models.
19. The apparatus of claim 18, wherein the at least two small language models comprise a first small language model and a second small language model, the inference module being specifically configured to: reasoning the input text through the first small language model to obtain the problem; And reasoning the questions and the input text through the second small language model to obtain answers to the questions.
20. The apparatus of claim 19, wherein the means for obtaining, before reasoning the question with the input text via the second small language model, the answer to the question is further configured to: obtaining a first intermediate result and a second intermediate result, wherein the first intermediate result is an output result obtained by a first intermediate layer according to the input text, the first intermediate layer is an intermediate layer of the first small language model, the second intermediate result is an output result obtained by a second intermediate layer according to the problem and the input text, and the second intermediate layer is an intermediate layer of the second small language model; The apparatus further comprises: The operation module is specifically configured to perform a cross attention operation on the first intermediate result and the second intermediate result based on a cross attention mechanism, so as to obtain an intermediate operation result, where the intermediate operation result is used to represent an importance degree of the first intermediate result for generating the second intermediate result.

Description

Model training method, model reasoning method and related device Technical Field The application relates to the field of artificial intelligence, in particular to a model training method, a model reasoning method and a related device. Background To improve the performance of large language models (Large Language Model, LLM) over a particular domain, the large language models can be trained using domain-specific data, however, collecting domain-specific Question-answer pair (query-ANSWERPAIRS, Q & a) data requires a significant amount of manpower and time. Along with the rapid development of the large language model, the question-answer pair data in the specific field can be generated through the large language model, so that the generation efficiency of the question-answer pair in the specific field is improved. Currently, in order to ensure the quality of the generated question and answer pair data, a large-scale language model is generally adopted for data generation, however, the large-scale language model is used for generating data, so that a large amount of computing resources are occupied and a large amount of cost is consumed. Therefore, how to reduce the consumption of computing resources while ensuring the quality of the generated data is a urgent issue to be resolved. Disclosure of Invention The embodiment of the application provides a model training method, a model reasoning method and a related device, which are used for reasoning input data through a trained combined model to obtain a reasoning result, and small language models in the trained combined model interact based on a cross attention mechanism, so that the quality of the reasoning result generated by the model is improved. The first aspect provides a model training method, which comprises the steps of obtaining a first intermediate state, wherein the first intermediate state is an output result obtained by a first intermediate layer according to first input data, the first intermediate layer is an intermediate layer of a first small language model in a combined model, the combined model comprises at least two small language models, obtaining a second intermediate state, the second intermediate state is an output result obtained by a second intermediate layer according to second input data, the second intermediate state is an intermediate layer of a second small language model in the combined model, the second input data is obtained according to the first input data, when a plurality of small language models in the combined model perform step-by-step reasoning, interaction can be performed between the small language models, after the first intermediate state of the first small language model and the second intermediate state of the second small language model are obtained, cross attention operation can be performed on the first intermediate state and the second intermediate state based on a cross attention mechanism to obtain a first operation result, the first operation result is used for representing the importance degree of the first intermediate state for generating the second intermediate state, the second intermediate state is obtained by performing step-by step reasoning on the first small language model and the second intermediate state, the combined result is used for obtaining a training text after the combined result is used for obtaining a question-answer text by question-answering. In the embodiment of the application, the trained combined model can be used for reasoning according to the input text to obtain question-answer data of the input text. Wherein the trained combined model comprises at least two small language models. The combined model can conduct step-by-step reasoning according to the plurality of small language models in the combined model during reasoning, the latter model (the second small language model) in the plurality of small language models can interact according to the former model (the first small language model), and the second small language model can generate the importance degree of the result for the second small language model according to the context information in the first small language model, so that the second small language model can combine the related context information during reasoning, generate the reasoning result according to the importance degree of the context information for the reasoning, and further improve the quality of the generated reasoning result. Therefore, when the trained combined model is adopted to infer the input text, compared with a large language model, less calculation resources can be consumed, and a high-quality reasoning result is generated. In a possible implementation manner, the combined model further comprises a third small language model, the method further comprises the steps of obtaining an updated second intermediate state and a third intermediate state, wherein the updated second intermediate state is obtained by updating the second intermediate st