CN-121998093-A - Large language model training method and device, electronic equipment and readable storage medium

CN121998093ACN 121998093 ACN121998093 ACN 121998093ACN-121998093-A

Abstract

The invention discloses a large language model training method, a device, electronic equipment and a readable storage medium, and particularly relates to the technical field of artificial intelligence; determining regularization loss according to singular values of a first low-rank matrix or a second low-rank matrix corresponding to a weight matrix of a target layer in a large language model, determining total loss of training data in a current batch according to task loss and regularization loss, updating parameters of the first low-rank matrix and the second low-rank matrix according to the total loss, and updating the target layer according to the first low-rank matrix and the second low-rank matrix under the condition that the large language model meets a preset training ending condition. The method solves the problem that the complexity of the model cannot be reduced in the prior art, so that the model cannot pay attention to main features of a specific field of a sample.

Inventors

FENG XING
LIU ZHIQIANG
SU JINGJIE
LIU ZIHAO
CAI YONGWEI
Ran Decai

Assignees

中化现代农业有限公司

Dates

Publication Date: 20260508
Application Date: 20260128

Claims (10)

1. A method for training a large language model, the method comprising: determining task loss according to a prediction result corresponding to training data of the current batch of the large language model and a real label; Determining regularization loss according to singular values of a first low-rank matrix or a second low-rank matrix corresponding to a weight matrix of a target layer in the large language model; Determining the total loss of the training data in the current batch according to the task loss and the regularization loss; Updating parameters of the first low-rank matrix and the second low-rank matrix according to the total loss; And under the condition that the large language model meets the preset training ending condition, updating the target layer according to the first low-rank matrix and the second low-rank matrix.
2. The method of claim 1, wherein determining regularization loss based on singular values of a low rank matrix of a target layer in the large language model comprises: performing singular value decomposition on the first low-rank matrix and/or the second low-rank matrix which are currently trained in batches to obtain the singular values; And carrying out L2 regularization treatment on the singular values to obtain the regularization loss.
3. The method of claim 2, wherein determining the total loss of the training data at the current lot based on the task loss and the regularization loss comprises weighting and summing the task loss and the regularization loss to obtain the total loss of the training data at the current lot.
4. The method according to claim 1, wherein updating parameters of the first low rank matrix and the second low rank matrix according to the total loss comprises: determining gradients of parameters of the first low-rank matrix and the second low-rank matrix according to the total loss; and updating parameters of the first low-rank matrix and the second low-rank matrix through a gradient descent algorithm.
5. The method of claim 4, wherein a learning rate of the first low rank matrix and a learning rate of the second low rank matrix are set to different values.
6. The method of claim 1, wherein updating the target layer according to the first low rank matrix and the second low rank matrix comprises: determining an incremental matrix from the first low rank matrix and the second low rank matrix; and determining the final weight of the target layer according to the increment matrix and the original weight of the target layer.
7. The method of claim 6, wherein the step of providing the first layer comprises, The determining an incremental matrix from the first low rank matrix and the second low rank matrix includes: Wherein, the In the form of an incremental matrix, For the first low-rank matrix, For the second low rank matrix; And determining the final weight of the target layer according to the increment matrix and the original weight of the target layer, wherein the final weight comprises the following steps: Wherein, the For the final weight to be given, Is the original weight.
8. A large language model training apparatus, comprising: the first determining module is used for determining task loss according to the prediction result and the real label corresponding to the training data of the current batch of the large language model; the second determining module is used for determining regularization loss according to singular values of a first low-rank matrix or a second low-rank matrix corresponding to a weight matrix of a target layer in the large language model; a third determining module, configured to determine, according to the task loss and the regularization loss, a total loss of the training data in a current batch; a first updating module, configured to update parameters of the first low-rank matrix and the second low-rank matrix according to the total loss; and the second updating module is used for updating the target layer according to the first low-rank matrix and the second low-rank matrix under the condition that the large language model meets the preset training ending condition.
9. An electronic device, comprising: At least one processor, and A memory communicatively coupled to the at least one processor, wherein, The memory stores instructions executable by the at least one processor to enable the at least one processor to perform the large language model training method of any one of claims 1 to 7.
10. A computer readable storage medium storing a computer program, wherein the computer program when executed by a processor implements the large language model training method of any one of claims 1 to 7.

Description

Large language model training method and device, electronic equipment and readable storage medium Technical Field The present invention relates to the field of artificial intelligence, and in particular, to a method and apparatus for training a large language model, an electronic device, and a readable storage medium. Background Large language models (e.g., GPT series, BERT, etc.) perform well in natural language processing tasks, but full parameter fine tuning requires a large amount of computational resources. When model training is completed, a model cannot pay attention to main characteristics of agricultural texts due to excessive complexity of the model when a model is processed by a sample in a specific field, for example, a text processing model is used for texts in an agricultural field. It can be seen that the complexity of the model cannot be reduced in the prior art, and the model cannot pay attention to the main features of the specific field of the sample, not noise. The information disclosed in this background section is only for enhancement of understanding of the general background of the invention and should not be taken as an acknowledgement or any form of suggestion that this information forms the prior art already known to a person of ordinary skill in the art. Disclosure of Invention The invention aims to solve the problem that the complexity of a model cannot be reduced and the model cannot pay attention to main characteristics of a specific field of a sample in the prior art, and provides a large language model training method, a large language model training device, electronic equipment and a readable storage medium. The first aspect of the invention provides a training method of a large language model, which comprises the steps of determining task loss according to a prediction result and a real label corresponding to training data of a current batch of the large language model, determining regularization loss according to singular values of a first low-rank matrix or a second low-rank matrix corresponding to a weight matrix of a target layer in the large language model, determining total loss of the training data in the current batch according to the task loss and the regularization loss, updating parameters of the first low-rank matrix and the second low-rank matrix according to the total loss, and updating the target layer according to the first low-rank matrix and the second low-rank matrix under the condition that the large language model meets a preset training ending condition. In one embodiment of the invention, determining the regularization loss according to the singular values of the low rank matrix of the target layer in the large language model comprises performing singular value decomposition on the first low rank matrix or the second low rank matrix currently being trained in the batch to obtain the singular values, and performing L2 regularization on the singular values to obtain the regularization loss. In one embodiment of the invention, determining the total loss of the training data in the current batch according to the task loss and the regularization loss comprises the step of carrying out weighted summation on the task loss and the regularization loss to obtain the total loss of the training data in the current batch. In one embodiment of the invention, the gradient of the parameters of the first low-rank matrix and the second low-rank matrix is determined according to the total loss, the parameters of the first low-rank matrix and the second low-rank matrix are updated through a gradient descent algorithm, and the learning rate of the first low-rank matrix and the learning rate of the second low-rank matrix are set to different values. In an embodiment of the present invention, the learning rate of the first low rank matrix and the learning rate of the second low rank matrix are set to different values. In one embodiment of the invention, the updating the target layer according to the first low-rank matrix and the second low-rank matrix comprises determining an increment matrix according to the first low-rank matrix and the second low-rank matrix, and determining the final weight of the target layer according to the increment matrix and the original weight of the target layer. In an embodiment of the present invention, the determining an incremental matrix according to the first low rank matrix and the second low rank matrix includes: Wherein, the In the form of an incremental matrix,For the first low-rank matrix,For the second low rank matrix; And determining the final weight of the target layer according to the increment matrix and the original weight of the target layer, wherein the final weight comprises the following steps: Wherein, the For the final weight to be given,Is the original weight. The second aspect of the invention provides a large language model training device, which comprises a first determining module, a second determining module, a th