CN-119167982-B - Task processing method, system, device, medium and computer program product

CN119167982BCN 119167982 BCN119167982 BCN 119167982BCN-119167982-B

Abstract

The application discloses a task processing method, a system, equipment, a medium and a computer program product, which relate to the technical field of knowledge distillation, and the method comprises the steps of sending a task processing thinking chain generation request to second equipment based on a task training set corresponding to at least one task, and receiving the task processing thinking chain fed back by the second equipment according to a large model and the task training set; determining first target indication information and second target indication information of multi-target combined training of the small model based on task data, training the first target of the small model based on task data, task processing result labels and the first target indication information, and iteratively training the second target of the small model based on task data, task processing thinking chains and the second target indication information until a preset combined training ending condition is met, and obtaining a trained small model to perform task processing according to the trained small model. The application can realize knowledge distillation of the small model according to the large model at the far end.

Inventors

CHEN WEIJING
MA GUOQIANG
WU YU
WEI WENBIN
FAN TAO
FAN LIXIN
YANG QIANG

Assignees

深圳前海微众银行股份有限公司

Dates

Publication Date: 20260512
Application Date: 20240830

Claims (11)

1. A task processing method, applied to a first device, where the first device is connected to a second device, where a small model to be trained is deployed in the first device, and a large model is deployed in the second device, the method comprising: A task processing thinking chain generating request is sent to a second device based on a task training set corresponding to at least one task, and a task processing thinking chain fed back by the second device is received, wherein the task training set comprises task data and task processing result labels, the task processing thinking chain is used for explaining thinking chain logic for processing the task data to generate the task processing result labels, and the second device outputs the task processing thinking chain based on the large model and the task training set; Determining first target indication information and second target indication information of multi-target joint training of the small model based on the task data, wherein the first target indication information is used for indicating to process the task data, and the second target indication information is used for indicating to explain thinking chain logic for processing the task data; Training a first target of the small model based on the task data, the task processing result label and the first target indication information, and iteratively training a second target of the small model based on the task data, the task processing thought chain and the second target indication information until a preset joint training end condition is met, and obtaining a trained small model so as to perform task processing according to the trained small model.
2. The method of claim 1, wherein determining the first target indication information and the second target indication information of the multi-target joint training of the small model based on the task data comprises: acquiring a preset task processing prefix, and performing splicing processing on the preset task processing prefix and the task data to obtain the first target indication information; And acquiring a preset thinking chain generation prefix, and performing splicing processing on the preset thinking chain generation prefix and the task data to obtain the second target indication information.
3. The method of claim 1, wherein the training the first goal of the small model based on the task data, the task processing result tag, and the first goal indication information comprises: determining a first prediction result corresponding to the task data, which is output by the small model, according to the first target indication information; Calculating to obtain a first prediction loss according to the first prediction result and the task processing result label; optimizing the small model according to the first prediction loss to train a first target of the small model.
4. The task processing method according to claim 1, wherein the iteratively training the second objective of the small model based on the task data, the task processing thought chain, and the second objective instruction information includes: determining a second prediction result corresponding to the task data, which is output by the small model, according to the second target indication information; calculating to obtain a second prediction loss according to the second prediction result and the task processing thinking chain; Optimizing the small model according to the second prediction loss to train a second target of the small model.
5. The task processing method according to claim 1, wherein the training the first target of the small model based on the task data, the task processing result tag, and the first target indication information, and the iteratively training the second target of the small model based on the task data, the task processing thought chain, and the second target indication information, further comprises: in each round of iterative training process, determining to obtain a first prediction loss based on the task data, the task processing result label and the first target indication information; determining to obtain a second prediction loss based on the task data, the task processing thought chain and the second target indication information; And carrying out weighted summation on the first prediction loss and the second prediction loss to obtain total prediction loss, and optimizing the small model according to the total prediction loss so as to jointly train a first target and a second target of the small model.
6. The method of claim 1, wherein the task processing mental chain generation request comprises a decision task processing mental chain generation request and a question-answering task processing mental chain generation request, The task training set corresponding to the at least one task is based on the task processing thinking chain generation request sent to the second device, and the task processing thinking chain generation request comprises: When the task is a judging task, generating a judging task processing thinking chain generating request according to a task training set corresponding to the judging task, sending the judging task processing thinking chain generating request to a second device, and/or, When the task is a question-answer task, generating a question-answer task processing thinking chain generation request according to a task training set corresponding to the question-answer task, and sending the question-answer task processing thinking chain generation request to second equipment.
7. A task processing method, applied to a second device, where the second device is connected to a first device, where a small model to be trained is deployed in the first device, and where a large model is deployed in the second device, the method comprising: Receiving a task processing thinking chain generation request sent by the first equipment based on a task training set corresponding to at least one task, wherein the task training set comprises task data and task processing result labels, and the task processing thinking chain is used for explaining thinking chain logic for processing the task data to generate the task processing result labels; based on the task processing thinking chain generation request, inputting task data in the task training set into the large model, outputting to obtain a task processing thinking chain, and sending the task processing thinking chain to the first equipment; The first device determines first target indication information and second target indication information of multi-target joint training of the small model based on the task data, and trains the first target of the small model based on the task data, the task processing result label and the first target indication information, and iteratively trains the second target of the small model based on the task data, the task processing thought chain and the second target indication information until a preset joint training end condition is met, and then obtains a trained small model to perform task processing according to the trained small model, wherein the first target indication information is used for indicating processing of the task data, and the second target indication information is used for indicating interpretation of thought chain logic for processing of the task data.
8. A task processing system is characterized by comprising a first device and a second device which are connected with each other, wherein a small model to be trained is deployed in the first device, a large model is deployed in the second device, The first device is configured to send a task processing thought chain generation request to a second device based on a task training set corresponding to at least one task, and receive a task processing thought chain fed back by the second device, where the task training set includes task data and a task processing result tag, and the task processing thought chain is configured to interpret thought chain logic that processes the task data to generate a task processing result tag; the second device is configured to receive a task processing thinking chain generation request sent by the first device based on a task training set corresponding to at least one task, input task data in the task training set to the large model based on the task processing thinking chain generation request, output to obtain a task processing thinking chain, and send the task processing thinking chain to the first device; The first device is further configured to determine, based on the task data, first target indication information and second target indication information of multi-target joint training of the small model, where the first target indication information is used to indicate processing of the task data, and the second target indication information is used to indicate interpretation of mental chain logic for processing of the task data; The first device is further configured to train a first target of the small model based on the task data, the task processing result label and the first target indication information, and train a second target of the small model based on the task data, the task processing thought chain and the second target indication information, iteratively until a preset joint training end condition is met, and then obtain a trained small model, so as to perform task processing according to the trained small model.
9. A task processing device, characterized in that the device comprises a memory, a processor and a computer program stored on the memory and executable on the processor, the computer program being configured to implement the steps of the task processing method according to any one of claims 1 to 7.
10. A medium, characterized in that the medium is a computer-readable storage medium, on which a computer program is stored, which computer program, when being executed by a processor, realizes the steps of the task processing method according to any one of claims 1 to 7.
11. A computer program product, characterized in that the computer program product comprises a computer program which, when executed by a processor, implements the steps of the task processing method as claimed in any of claims 1 to 7.

Description

Task processing method, system, device, medium and computer program product Technical Field The present application relates to the field of specific knowledge distillation technology, and in particular, to a task processing method, system, device, medium, and computer program product. Background In the age of large language models, such as GPT-4 (GENERATIVE PRE-trained Transformer, generating pre-training converter 4) and GLM-4, knowledge distillation between large models and small models is difficult to complete under the background of wide application of large models with huge parameters, while the traditional knowledge distillation algorithm needs to deploy the large models and the small models locally and load the large models and the small models simultaneously for training, but many small model owners have difficulty in running large-scale large models locally due to own hardware limitation, so that knowledge distillation can not be performed on the small models according to the large models for small model owners only with the small models, and further the model performance of the small models is improved. The foregoing is provided merely for the purpose of facilitating understanding of the technical solutions of the present application and is not intended to represent an admission that the foregoing is prior art. Disclosure of Invention The application mainly aims at providing a task processing method, a task processing system, a task processing device, a task processing medium and a task processing computer program product, and aims at solving the technical problem of how to realize knowledge distillation of a small model according to a large model aiming at a small model holder only provided with the small model. In order to achieve the above object, the present application provides a task processing method applied to a first device, where the first device is connected to a second device, a small model to be trained is deployed in the first device, and a large model is deployed in the second device, the method includes: A task processing thinking chain generating request is sent to a second device based on a task training set corresponding to at least one task, and a task processing thinking chain fed back by the second device is received, wherein the task training set comprises task data and task processing result labels, the task processing thinking chain is used for explaining thinking chain logic for processing the task data to generate the task processing result labels, and the second device outputs the task processing thinking chain based on the large model and the task training set; Determining first target indication information and second target indication information of multi-target joint training of the small model based on the task data, wherein the first target indication information is used for indicating to process the task data, and the second target indication information is used for indicating to explain thinking chain logic for processing the task data; Training a first target of the small model based on the task data, the task processing result label and the first target indication information, and iteratively training a second target of the small model based on the task data, the task processing thought chain and the second target indication information until a preset joint training end condition is met, and obtaining a trained small model so as to perform task processing according to the trained small model. In an embodiment, determining the first target indication information and the second target indication information of the multi-target joint training of the small model based on the task data comprises: acquiring a preset task processing prefix, and performing splicing processing on the preset task processing prefix and the task data to obtain the first target indication information; And acquiring a preset thinking chain generation prefix, and performing splicing processing on the preset thinking chain generation prefix and the task data to obtain the second target indication information. In an embodiment, training the first target of the small model based on the task data, the task processing result tag, and the first target indication information includes: determining a first prediction result corresponding to the task data, which is output by the small model, according to the first target indication information; Calculating to obtain a first prediction loss according to the first prediction result and the task processing result label; optimizing the small model according to the first prediction loss to train a first target of the small model. In an embodiment, iteratively training a second goal of the small model based on the task data, the task processing thought chain, and the second goal indication information, comprises: determining a second prediction result corresponding to the task data, which is output by the small model, according to the second target indicati