CN-122021794-A - Model fine tuning method, device, equipment, storage medium and product

CN122021794ACN 122021794 ACN122021794 ACN 122021794ACN-122021794-A

Abstract

The application discloses a model fine tuning method, a device, equipment, a storage medium and a product, wherein the model fine tuning method comprises the steps of obtaining and storing intermediate characteristic data sent by each terminal, wherein the intermediate characteristic data is obtained by calculating sample data by a terminal side sub-model; training the service multiplexing model based on the intermediate characteristic data, updating LoRA adapter parameters corresponding to the terminal to obtain target LoRA adapter parameters, keeping the terminal side sub-model frozen in the training process of the service multiplexing model, and performing fine adjustment on the service multiplexing model according to the target LoRA adapter parameters to obtain a model fine adjustment result. According to the application, the freezing of the terminal side sub-model avoids back propagation calculation and activation buffering, so that the memory and calculation amount of the terminal are obviously reduced, and the calculation cost, communication cost and fine adjustment time of the terminal are further reduced by storing the intermediate characteristic data.

Inventors

WU WEN
CHEN XIAOPEI
LI JING

Assignees

鹏城实验室

Dates

Publication Date: 20260512
Application Date: 20260127

Claims (10)

1. A method of fine tuning a model, the method comprising the steps of: Acquiring and storing intermediate characteristic data sent by each terminal, wherein the intermediate characteristic data is obtained by calculating sample data by a terminal side sub-model; Training a service-side multiplexing model based on the stored intermediate characteristic data, and updating LoRA adapter parameters corresponding to the terminal to obtain target LoRA adapter parameters, wherein the terminal-side sub-model is kept frozen in the training process of the service-side multiplexing model; and fine tuning the service end multiplexing model according to the target LoRA adapter parameters to obtain a model fine tuning result.
2. The method for fine tuning a model according to claim 1, wherein fine tuning the service-side multiplexing model according to the target LoRA adapter parameters to obtain a model fine tuning result comprises: aggregating the target LoRA adapter parameters of each terminal to obtain a global adapter of each neural network layer of the service end multiplexing model; And determining a model fine tuning result based on the global adapter and the service side multiplexing model, wherein the model fine tuning result comprises a service side multiplexing model after fine tuning.
3. The method for fine tuning a model according to claim 2, wherein the aggregating the target LoRA adapter parameters of each terminal to obtain the global adapter of each neural network layer of the service-side multiplexing model includes: Determining LoRA adapter parameters corresponding to each neural network layer of the service end multiplexing model according to the target LoRA adapter parameters of each terminal; Acquiring the data volume of each terminal for generating the intermediate characteristic data; And carrying out weighted aggregation on LoRA adapter parameters corresponding to each neural network layer of the service end multiplexing model based on the data quantity to obtain a global adapter corresponding to each neural network layer of the service end multiplexing model.
4. The method for fine tuning a model according to any one of claims 1 to 3, wherein training the service-side multiplexing model based on the stored intermediate feature data and updating LoRA adapter parameters corresponding to the terminal to obtain target LoRA adapter parameters includes: Acquiring a segmentation layer corresponding to the intermediate characteristic data; inputting the intermediate characteristic data to the service end multiplexing model for forward propagation training according to the segmentation layer, and performing reverse propagation training based on a loss function, wherein the trunk parameters of the service end multiplexing model are not updated in the training process, and the LoRA adapter parameters corresponding to the terminal are updated to obtain the target LoRA adapter parameters.
5. A method of fine-tuning a model according to any one of claims 1-3, wherein before the obtaining and storing of the intermediate feature data sent by each terminal, further comprises: acquiring available calculation force parameters and memory capacity parameters of each terminal; Determining the number of layers of service end multiplexing models which can be borne by each terminal according to the available calculation force parameters and the memory capacity parameters; And transmitting terminal side models with corresponding layers to each terminal according to the service terminal multiplexing model layers.
6. A method of fine tuning a model according to any one of claims 1-3, wherein training a service-side reuse model based on the intermediate feature data comprises: Acquiring a first hidden dimension of a service end multiplexing model; acquiring a second hidden dimension of the intermediate feature data; And training the service-side multiplexing model based on the intermediate characteristic data under the condition that the first hidden dimension and the second hidden dimension are consistent.
7. A model fine tuning device, characterized in that the model fine tuning device comprises: The acquisition module is used for acquiring and storing intermediate characteristic data sent by each terminal, wherein the intermediate characteristic data is obtained by calculating sample data by a terminal side sub-model; The training module is used for training the service end multiplexing model based on the stored intermediate characteristic data, updating LoRA adapter parameters corresponding to the terminal to obtain target LoRA adapter parameters, and keeping the terminal side sub-model frozen in the training process of the service end multiplexing model; and the fine tuning module is used for fine tuning the service end multiplexing model according to the target LoRA adapter parameters to obtain a model fine tuning result.
8. Model tuning apparatus, characterized in that the apparatus comprises a memory, a processor and a computer program stored on the memory and executable on the processor, the computer program being configured to implement the steps of the model tuning method according to any one of claims 1 to 6.
9. A storage medium, characterized in that the storage medium is a computer-readable storage medium, on which a computer program is stored, which computer program, when being executed by a processor, carries out the steps of the model tuning method according to any one of claims 1 to 6.
10. A computer program product, characterized in that the computer program product comprises a computer program which, when being executed by a processor, implements the steps of the model tuning method according to any one of claims 1 to 6.

Description

Model fine tuning method, device, equipment, storage medium and product Technical Field The present application relates to the field of model training technologies, and in particular, to a method, an apparatus, a device, a storage medium, and a product for fine tuning a model. Background The existing large-model distributed fine-tuning scheme based on the split federal learning aims at solving the problem that the configuration of split points of a terminal-side sub-model is inconsistent due to the isomerism of equipment, and needs to maintain server-side sub-models corresponding to a plurality of pieces of equipment on a server side. The parallel computing mode during fine tuning makes the server need to maintain and compute a large amount of intermediate activation and gradient information from different devices at the same time, so that the memory consumption is rapidly increased, the computing pressure is obviously increased, and the performance bottleneck of the server end is easily formed. Disclosure of Invention The application mainly aims to provide a model fine tuning method, device, equipment, storage medium and product, and aims to solve the technical problems of excessive memory bottleneck and communication overhead and overlong fine tuning delay of a server side, which are generated when large model cooperative training is processed in the existing segmentation federal learning. In order to achieve the above object, the present application provides a model fine tuning method, which includes: Acquiring and storing intermediate characteristic data sent by each terminal, wherein the intermediate characteristic data is obtained by calculating sample data by a terminal side sub-model; Training a service-side multiplexing model based on the stored intermediate characteristic data, and updating LoRA adapter parameters corresponding to the terminal to obtain target LoRA adapter parameters, wherein the terminal-side sub-model is kept frozen in the training process of the service-side multiplexing model; and fine tuning the service end multiplexing model according to the target LoRA adapter parameters to obtain a model fine tuning result. Optionally, the fine tuning the service-side multiplexing model according to the target LoRA adapter parameter to obtain a model fine tuning result includes: aggregating the target LoRA adapter parameters of each terminal to obtain a global adapter of each neural network layer of the service end multiplexing model; And determining a model fine tuning result based on the global adapter and the service side multiplexing model, wherein the model fine tuning result comprises a service side multiplexing model after fine tuning. Optionally, the aggregating the target LoRA adapter parameters of each terminal to obtain a global adapter of each neural network layer of the service-side multiplexing model includes: Determining LoRA adapter parameters corresponding to each neural network layer of the service end multiplexing model according to the target LoRA adapter parameters of each terminal; Acquiring the data volume of each terminal for generating the intermediate characteristic data; And carrying out weighted aggregation on LoRA adapter parameters corresponding to each neural network layer of the service end multiplexing model based on the data quantity to obtain a global adapter corresponding to each neural network layer of the service end multiplexing model. Optionally, the training the service-side multiplexing model based on the stored intermediate feature data, and updating LoRA adapter parameters corresponding to the terminal to obtain target LoRA adapter parameters, including: Acquiring a segmentation layer corresponding to the intermediate characteristic data; inputting the intermediate characteristic data to the service end multiplexing model for forward propagation training according to the segmentation layer, and performing reverse propagation training based on a loss function, wherein the trunk parameters of the service end multiplexing model are not updated in the training process, and the LoRA adapter parameters corresponding to the terminal are updated to obtain the target LoRA adapter parameters. Optionally, before the obtaining and storing the intermediate feature data sent by each terminal, the method further includes: acquiring available calculation force parameters and memory capacity parameters of each terminal; Determining the number of layers of service end multiplexing models which can be borne by each terminal according to the available calculation force parameters and the memory capacity parameters; And transmitting terminal side models with corresponding layers to each terminal according to the service terminal multiplexing model layers. Optionally, the training the service-side multiplexing model based on the intermediate feature data includes: Acquiring a first hidden dimension of a service end multiplexing model; acquiring a second hidden dimensi