CN-122019942-A - Fine tuning method and device for model, storage medium and electronic equipment
Abstract
The application discloses a fine tuning method and device of a model, a storage medium and electronic equipment. Firstly, acquiring a weight matrix of a sparse model, and then generating an adapter matrix and a sparse mask matrix based on the weight matrix, wherein the adapter matrix adapts the model to a new task through parameter adjustment, and the sparse mask matrix maintains a sparse structure of the model. Through carrying out multi-round training based on the matrix, and in the training process, the control adapter matrix parameters are adjusted based on the gradient, and the weight matrix parameters remain unchanged, so as to finally determine a target sparse model adapting to the target task, optimize the use of the video memory and reduce the training cost. The application solves the technical problems of lower model fine tuning efficiency caused by excessive model parameters due to model fine tuning in the related technology.
Inventors
- ZHANG JING
- HU YUXUAN
- ZHAO ZHE
- YAN ZHENXIANG
Assignees
- 腾讯科技(深圳)有限公司
Dates
- Publication Date
- 20260512
- Application Date
- 20241112
Claims (13)
- 1. A method for fine tuning a model, comprising: acquiring a weight matrix used by the sparse model; generating an adapter matrix based on the weight matrix, wherein the adapter matrix is used for performing model fine tuning on the sparse model to adapt to a target task; Sharing the memory of the weight matrix to the adapter matrix, keeping the weight matrix unchanged, regenerating a sparse mask matrix according to the weight matrix in each round of training, keeping the sparsity of the sparse model based on the sparse mask matrix, and performing model training through the weight matrix and the adapter matrix until the adapter matrix meeting the conditions is obtained; And merging the adapter matrix meeting the conditions to the sparse model to obtain a target sparse model after fine adjustment, wherein the target sparse model represents a sparse model adapting to the target task.
- 2. The method of claim 1, wherein regenerating a sparse mask matrix from the weight matrix comprises: Acquiring the positions of non-zero elements in the weight matrix; Generating the sparse mask matrix based on the number of rows and columns of the weight matrix and the positions of the non-zero elements, wherein the size of the sparse mask matrix is the same as the size of the weight matrix.
- 3. The method of claim 2, wherein the generating the sparse mask matrix based on the number of rows and columns of the weight matrix and the locations of the non-zero elements comprises: Calculating the weight matrix by using a preset conditional expression and the number of rows and columns of the weight matrix to determine a Boolean matrix, wherein the positions of the non-zero elements are represented by the Boolean matrix; and performing floating point conversion operation on the Boolean matrix to obtain the sparse mask matrix, wherein the sparse mask matrix is in the form of a floating point number matrix.
- 4. The method of claim 1, wherein the generating an adapter matrix based on the weight matrix comprises: Acquiring a rank corresponding to the adapter matrix, wherein the rank is r; And decomposing the weight matrix based on the rank to obtain a first adapter matrix and a second adapter matrix, wherein the size of the weight matrix is NxM, the size of the first adapter matrix is Nxr, the size of the second adapter matrix is rxM, r, M and N are all positive integers, and r is smaller than M or N.
- 5. The method according to claim 1, wherein the merging the adapter matrix satisfying the condition into the sparse model to obtain the fine-tuned target sparse model comprises: Acquiring a training sample associated with the sparse model; Generating a target weight matrix according to the weight matrix, the adapter matrix and the sparse mask matrix, wherein the target weight matrix is obtained through memory calculation corresponding to the weight matrix; determining a model output value by forward propagation based on the training samples and the target weight matrix; back propagation is carried out based on the model output value and a preset loss function, and a target gradient is determined; and adjusting parameters of the adapter matrix by using the target gradient until the model output value meets a preset loss condition, and determining the target sparse model.
- 6. The method of claim 5, wherein the determining model output values by forward propagation based on the training samples, the weight matrix, the adapter matrix, and the sparse mask matrix comprises: performing matrix multiplication on a first adapter matrix and a second adapter matrix to determine a first intermediate weight matrix, wherein the first intermediate weight matrix is stored in a memory corresponding to the weight matrix, and the adapter matrix comprises the first adapter matrix and the second adapter matrix; multiplying the first intermediate weight matrix and the sparse mask matrix element by element to determine a second intermediate weight matrix, wherein the second intermediate weight matrix is stored in a memory corresponding to the weight matrix; determining a third intermediate weight matrix based on the second intermediate weight matrix and the matrix sum of the weight matrices, wherein the third intermediate weight matrix is stored in a memory corresponding to the weight matrix; and determining the model output value based on the model input value determined by the training sample and the third intermediate weight matrix, wherein the model output value and the sparse mask matrix are stored in a target video memory, the target video memory comprises video memories respectively applied for the model output value and the sparse mask matrix, and the sparse mask matrix is allowed to be released after being used.
- 7. The method of claim 6, wherein the method further comprises: generating a target calculation graph based on the weight matrix, the adapter matrix and the sparse mask matrix, wherein the target calculation graph is used for indicating a calculation flow in each round of training process; And recording parameters related to the model input values in the target calculation graph in a first video memory, and releasing the parameters after the training of the wheel is finished, wherein the target video memory comprises the first video memory.
- 8. The method of claim 5, wherein the determining the target gradient based on the model output values and a predetermined loss function back-propagating comprises: regenerating the sparse mask matrix based on the weight matrix with the sparse mask matrix released; Determining a first gradient based on the model output value, the preset loss function and the target weight matrix, wherein the first gradient corresponds to the model output value, and the target gradient comprises the first gradient; determining a second gradient based on the model output value, the model input value, and the sparse mask matrix, wherein the second gradient corresponds to the target weight matrix, the target gradient comprising the second gradient; Determining a third gradient and a fourth gradient based on the second gradient, the first adapter matrix, and the second adapter matrix, wherein the third gradient corresponds to the first adapter matrix, the fourth gradient corresponds to the second adapter matrix, and the target gradient includes the third gradient and the fourth gradient.
- 9. The method of claim 5, wherein the back-propagating based on the model output values and a preset loss function, after determining a target gradient, the method further comprises: adjusting parameters of the adapter matrix using the target gradient in response to the target gradient being determined; and recovering the target weight matrix into the weight matrix, performing the next training until the model output value meets the preset loss condition, and determining the target sparse model.
- 10. A device for fine tuning a model, comprising: The acquisition module is used for acquiring a weight matrix used by the sparse model; The first generation module is used for generating an adapter matrix based on the weight matrix, wherein the adapter matrix is used for carrying out model fine tuning on the sparse model to adapt to a target task; The second generation module is used for sharing the memory of the weight matrix to the adapter matrix, keeping the weight matrix unchanged, regenerating a sparse mask matrix according to the weight matrix in each round of training, keeping the sparsity of the sparse model based on the sparse mask matrix, and performing model training through the weight matrix and the adapter matrix until the adapter matrix meeting the condition is obtained; The determining module is used for merging the adapter matrix meeting the conditions to the sparse model to obtain a target sparse model after fine adjustment, and the target sparse model represents a sparse model adapting to the target task.
- 11. A computer-readable storage medium, characterized in that the computer-readable storage medium comprises a stored computer program, wherein the computer program is executable by an electronic device to perform the method of any one of claims 1 to 9.
- 12. A computer program product comprising a computer program, characterized in that the computer program, when being executed by a processor, realizes the steps of the method according to any one of claims 1 to 9.
- 13. An electronic device comprising a memory and a processor, characterized in that the memory has stored therein a computer program, the processor being arranged to execute the method according to any of the claims 1 to 9 by means of the computer program.
Description
Fine tuning method and device for model, storage medium and electronic equipment Technical Field The present application relates to the field of computers, and in particular, to a method and apparatus for fine tuning a model, a storage medium, and an electronic device. Background Currently, large language models contain a large number of weight matrices, and therefore training or fine-tuning these models often requires a large amount of computational resources. Therefore, the existing research work proposes a series of methods to save the resource overhead of training the model, for example LoRA (Low-Rank Adaptation of Large Language Models), and reduce the resource overhead of model training by freezing the initial weight matrix and inserting a trainable Low-rank matrix, but applying LoRA to a sparse large language model can cause the sparse model to be degraded into a dense model, so that the model parameters present geometric multiple rise, and the efficiency of model fine tuning is difficult to ensure. In view of the above problems, no effective solution has been proposed at present. Disclosure of Invention The embodiment of the application provides a fine tuning method and device for a model, a storage medium and electronic equipment, which at least solve the technical problems that in the related technology, model parameters are excessive and model fine tuning efficiency is low due to model fine tuning. According to one aspect of the embodiment of the application, a fine tuning method of a model is provided, which comprises the steps of obtaining a weight matrix used by a sparse model, generating an adapter matrix based on the weight matrix, wherein the adapter matrix is used for carrying out model fine tuning on the sparse model to adapt to a target task, sharing a memory of the weight matrix to the adapter matrix, keeping the weight matrix unchanged, in each round of training, regenerating a sparse mask matrix according to the weight matrix, keeping the sparse of the sparse model based on the sparse mask matrix, carrying out model training through the weight matrix and the adapter matrix until the adapter matrix meeting the condition is obtained, merging the adapter matrix meeting the condition to the sparse model, and obtaining a fine-tuned target sparse model which represents the sparse model adapting to the target task. According to another aspect of the embodiment of the application, a fine tuning device of a model is provided, which comprises an acquisition module, a first generation module and a second generation module, wherein the acquisition module is used for acquiring a weight matrix used by a sparse model, the first generation module is used for generating an adapter matrix based on the weight matrix, the adapter matrix is used for carrying out model fine tuning on the sparse model to adapt to a target task, the second generation module is used for sharing a memory of the weight matrix to the adapter matrix, the weight matrix is kept unchanged, in each round of training, a sparse mask matrix is regenerated according to the weight matrix, the sparse of the sparse model is kept based on the sparse mask matrix, model training is carried out through the weight matrix and the adapter matrix until the adapter matrix meeting the condition is obtained, the determination module is used for merging the adapter matrix meeting the condition to the sparse model, and obtaining a fine-tuned target sparse model, and the target sparse model represents the sparse model adapting to the target task. Optionally, the device is used for regenerating a sparse mask matrix according to the weight matrix by acquiring the positions of non-zero elements in the weight matrix, and generating the sparse mask matrix based on the number of rows and columns of the weight matrix and the positions of the non-zero elements, wherein the size of the sparse mask matrix is the same as the size of the weight matrix. Optionally, the device is used for generating the sparse mask matrix based on the number of rows and columns of the weight matrix and the positions of the non-zero elements by calculating the weight matrix by using a preset conditional expression and the number of rows and columns of the weight matrix to determine a Boolean matrix, wherein the positions of the non-zero elements are represented by the Boolean matrix, and performing floating point conversion operation on the Boolean matrix to obtain the sparse mask matrix, wherein the sparse mask matrix is in the form of a floating point number matrix. Optionally, the device is used for generating an adapter matrix based on the weight matrix by acquiring a rank corresponding to the adapter matrix, wherein the rank is r, decomposing the weight matrix based on the rank to obtain a first adapter matrix and a second adapter matrix, wherein the size of the weight matrix is NxM, the size of the first adapter matrix is Nxr, the size of the second adapter matrix is r x M, r, M