CN-122021795-A - Training method and device for large language model, storage medium and electronic equipment

CN122021795ACN 122021795 ACN122021795 ACN 122021795ACN-122021795-A

Abstract

The application discloses a training method and device of a large language model, a storage medium and electronic equipment. The method comprises the steps of receiving first interaction data acquired by utilizing a buried point strategy through a marking platform, visually displaying the first interaction data on the marking platform, receiving a marking result of a marking object on the first interaction data by utilizing the marking platform to obtain marking data, and retraining a large language model based on the marking data to obtain an adjusted large language model. The application solves the problem of poor tuning effect of the large language model caused by periodic retraining of the large language model in the related technology.

Inventors

TIAN LIN

Assignees

中国工商银行股份有限公司

Dates

Publication Date: 20260512
Application Date: 20260130

Claims (10)

1. A method for training a large language model, comprising: The method comprises the steps of receiving first interaction data acquired by utilizing a buried point strategy through a marking platform, wherein the marking platform is used for marking the first interaction data, the first interaction data comprises first dialogue data, second dialogue data and adoption data, the first dialogue data comprises dialogue data for carrying out business consultation on customer service of a financial institution by a target object, the second dialogue data comprises dialogue data between a large language model and the customer service, the adoption data comprises whether the customer service replies to the target object by using data in the second dialogue or not and target reply content, and the target reply content comprises reply content for replying to the target object by the customer service; the first interaction data is visually displayed on the marking platform, and a marking result of a marking object on the first interaction data is received by the marking platform to obtain marking data, wherein the marking object is used for marking a training sample on line; and retraining the large language model based on the marking data to obtain an adjusted large language model.
2. The training method of claim 1, wherein the first interaction data collected using the buried point strategy comprises: Collecting consultation contents of the target object to consultation with customer service of a financial institution by adopting a first buried point to obtain first dialogue data; collecting N search words generated by the large language model and M reply contents generated by the large language model by adopting a second buried point to obtain second dialogue data, wherein the search words refer to the reply contents generated by the large language model to generate the consultation contents, and N, M is a positive integer; Adopting a third buried point to acquire whether the customer service replies to the target object and the target reply content by using the data in the second dialogue, so as to obtain the adoption data; and obtaining the first interaction data based on the first dialogue data, the second dialogue data and the adoption data.
3. The training method of claim 1, wherein retraining the large language model based on the annotation data comprises: Obtaining a parameter adjustment strategy, wherein the parameter adjustment strategy comprises a learning rate, and the learning rate comprises a step length of parameter updating of the large language model; And retraining the large language model by utilizing the parameter adjustment strategy based on the labeling data.
4. The training method of claim 2, further comprising, after collecting the N search words generated by the large language model using the second buried point: visually displaying the N search words, and detecting whether the customer service selects any search word; inputting the search word into the large language model under the condition that the customer service clicks any one of the search words, and generating M reply contents; and generating the M reply contents based on the target search words under the condition that the customer service inputs the target search words to the large language model.
5. The training method of claim 4, wherein, in a case where the customer service inputs a target search word from the large language model, after generating the M pieces of reply content based on the target search word, further comprising: Visually displaying the M reply contents, and detecting whether the customer service selects any reply content; under the condition that any reply content is selected by the customer service, determining the reply content as the target reply content; And determining the reply content submitted by the target object as the target reply content under the condition that the customer service does not select any reply content in the M reply contents.
6. The training method of claim 1, further comprising, after retraining the large language model based on the annotation data to obtain an adjusted large language model: Collecting second interaction data, wherein the second interaction data comprises data of interaction between the customer service and a customer by using the adjusted large language model; Calculating the similarity between the second interaction data and the annotation data; And under the condition that the similarity between the second interaction data and the marking data is larger than a preset threshold value, acquiring feedback information of the customer service on the second interaction data, wherein the feedback information is used for feeding back whether the marking data is accurate or not.
7. The training method of claim 1, wherein receiving, by the labeling platform, a labeling result of the first interaction data by the labeling object, and obtaining labeling data, includes: Determining a labeling strategy, wherein the labeling strategy comprises the steps that the labeling object clicks the first interaction data on a target interface of the labeling platform, and the target interface is used for displaying the first interaction data and receiving data of the labeling object clicking the first interaction data; and receiving a labeling result of the labeling object on the first interaction data by using the labeling platform based on the labeling strategy to obtain labeling data.
8. A training device for a large language model, comprising: The system comprises an acquisition unit, a marking platform and a target response unit, wherein the acquisition unit is used for receiving first interaction data acquired by utilizing a buried point strategy through the marking platform, the marking platform is used for marking the first interaction data, the first interaction data comprises first dialogue data, second dialogue data and adoption data, the first dialogue data comprises dialogue data for carrying out business consultation on customer service of a financial institution by a target object, the second dialogue data comprises dialogue data between a large language model and the customer service, the adoption data comprises whether the customer service uses data in the second dialogue to reply to the target object and target response content, and the target response content comprises response content for replying to the target object by the customer service; The processing unit is used for visually displaying the first interaction data on the marking platform, receiving marking results of marking objects on the first interaction data by using the marking platform, and obtaining marking data, wherein the marking objects are used for marking training samples on line; and the training unit is used for retraining the large language model based on the marking data to obtain an adjusted large language model.
9. A computer readable storage medium, characterized in that the computer readable storage medium comprises a stored executable program, wherein the executable program when run controls a device in which the computer readable storage medium is located to perform the training method of the large language model according to any one of claims 1 to 7.
10. A computer program product comprising computer instructions which, when executed by a processor, implement the steps of the training method of a large language model as claimed in any one of claims 1 to 7.

Description

Training method and device for large language model, storage medium and electronic equipment Technical Field The application relates to the field of artificial intelligence, in particular to a training method and device of a large language model, a storage medium and electronic equipment. Background Currently, the tuning method of a common large language model (for example, in a financial institution, for the consultation content of a customer, the large language model provides reference reply content for customer service of the financial institution) mainly relies on offline labeling and periodic offline retraining, the tuning process usually requires a large amount of computing resources and time, and the offline labeling has the following problems: The offline labeling can cause poor timeliness of the labeling, and the newly-appearing data characteristics and changes cannot be reflected in time, so that the adaptability of the model to new conditions is affected, labeling errors are inevitably introduced by manual operation in the labeling process, and the errors can be amplified in subsequent model training to influence the accuracy of the model. In addition, the whole process is complicated and time-consuming, and a long period is required from data collection to model training completion, which is unfavorable for rapid iterative optimization of model capacity. Aiming at the problem that the tuning effect of the large language model is poor due to the fact that the large language model is periodically retrained in the related technology, no effective solution is proposed at present. Disclosure of Invention The application mainly aims to provide a training method and device for a large language model, a storage medium and electronic equipment, so as to solve the problem that the tuning effect of the large language model is poor due to the fact that the large language model is periodically retrained in the related technology. In order to achieve the above object, according to one aspect of the present application, there is provided a training method of a large language model. The method comprises the steps of receiving first interaction data acquired by utilizing a buried point strategy through a marking platform, wherein the marking platform is used for marking the first interaction data, the first interaction data comprise first dialogue data, second dialogue data and adoption data, the first dialogue data comprise dialogue data for carrying out business consultation on customer service of a financial institution by a target object, the second dialogue data comprise dialogue data between a large language model and the customer service, the adoption data comprise reply and target reply contents of whether the customer service uses data in the second dialogue to the target object or not, the target reply contents comprise reply contents of the customer service to the target object, the marking platform is used for carrying out visual display on the first interaction data, the marking platform is used for receiving marking results of the first interaction data by the marking object, the marking object is used for marking a training sample on line, and the large language model is obtained after the large language model is adjusted based on the data. Further, the first interactive data collected by utilizing the buried point strategy comprises collecting consultation contents of consultation of the target object to customer service of a financial institution by adopting a first buried point, collecting N search words generated by the large language model and M reply contents generated by the large language model by adopting a second buried point to obtain second dialogue data, wherein the search words refer to the reply contents of the consultation contents generated by the large language model, N, M is a positive integer, and collecting whether the customer service replies to the target object and the target reply contents by using data in the second dialogue by adopting a third buried point to obtain the adoption data, and obtaining the first interactive data based on the first dialogue data, the second dialogue data and the adoption data. Further, retraining the large language model based on the labeling data comprises the steps of obtaining a parameter adjustment strategy, wherein the parameter adjustment strategy comprises a learning rate, the learning rate comprises a step length of parameter updating of the large language model, and retraining the large language model based on the labeling data by utilizing the parameter adjustment strategy. Further, after the second buried point is adopted to collect N search words generated by the large language model, the method further comprises the steps of visually displaying the N search words, detecting whether any search word is selected by customer service, inputting the search word into the large language model to generate M reply contents under the condit