CN-117744145-B - Model federal fine tuning method, text classification method, device, medium and equipment

CN117744145BCN 117744145 BCN117744145 BCN 117744145BCN-117744145-B

Abstract

The present disclosure relates to a model federal fine tuning method, a text classification method, a device, a medium, and an apparatus. The pretrained target model comprises a first model deployed at a server and a second model deployed at least one client respectively, and the fine tuning method applied to the target client in the at least one client comprises the steps of segmenting a text sample, generating embedded vectors corresponding to the segmented words respectively through the second model, determining target segmented words with classification effect on text categories marked by classification labels from the segmented words, conducting noise disturbance processing on the embedded vectors corresponding to other segmented words except the target segmented words in the segmented words to obtain disturbance vectors, and fine tuning model parameters of the first model in a cooperative mode with the server and other clients according to the disturbance vectors, the embedded vectors corresponding to the target segmented words and classification labels. The method can effectively improve the usability of the target model on the classification task while enhancing the data privacy protection of the client.

Inventors

Shen Xicong
LIU YANG
LIU HUIQI
HONG JUE
DUAN BING
WU YE
WU DI

Assignees

北京字跳网络技术有限公司
抖音视界有限公司

Dates

Publication Date: 20260505
Application Date: 20231221

Claims (14)

1. A model federal fine tuning method, wherein a pre-trained target model includes a first model deployed at a server and a second model deployed at least one client, respectively, the method being applied to a target client of the at least one client, comprising: acquiring a text sample and a classification label corresponding to the text sample; Performing word segmentation on the text sample to obtain a plurality of word segments, and generating embedded vectors corresponding to the word segments respectively through a second model deployed at the target client; determining target word segments with classification utility for the text categories marked by the classification labels from the plurality of word segments; Carrying out noise disturbance processing on embedded vectors corresponding to other word segmentation except the target word segmentation in the plurality of word segmentation to obtain disturbance vectors; According to the disturbance vector, the embedded vector corresponding to the target word and the classification label, the model parameters of the first model are cooperatively and finely adjusted with the server side and other clients in the at least one client; The determining, from the plurality of word segments, a target word segment having a classification utility for the text category labeled by the classification label includes: acquiring a classification effective word corresponding to the text category marked by the classification label; and determining the word segment belonging to the classification effective word from the plurality of word segments as the target word segment.
2. The method of claim 1, wherein the classification effect words are K reference words of high utility importance to the text category marked by the classification tag in a vocabulary for word segmentation of the text sample, K being greater than 1.
3. The method of claim 2, wherein the taxonomy-effect term is determined by: For each reference word in the vocabulary, acquiring the frequency of occurrence of the reference word in a text sample under each preset category; determining the utility importance of the reference word to the text category marked by the classification label according to the occurrence frequency of the reference word in the text sample under each preset category; And determining K reference words with highest effectiveness importance on the text category marked by the classification label in the vocabulary as the classification effective words.
4. A method according to claim 3, wherein determining the utility importance of the reference word to the text category noted by the category label based on the frequency of occurrence of the reference word in the text sample under each of the predetermined categories comprises: Determining the ratio of the frequency of the reference word in the text sample under the text category marked by the classification label to the frequency of the reference word in the text sample under the other category aiming at each other category except the text category marked by the classification label in a plurality of preset categories; and determining the utility importance of the reference word to the text category marked by the classification label according to the ratio corresponding to each other category.
5. The method of claim 1, wherein performing noise disturbance processing on embedded vectors corresponding to other word segments of the plurality of word segments except the target word segment to obtain a disturbance vector comprises: carrying out noise disturbance processing on embedded vectors corresponding to other word segmentation except the target word segmentation in the plurality of word segmentation; For each embedded vector obtained after noise disturbance processing, determining whether the embedded vector is contained in a reference embedded vector corresponding to each reference word in a vocabulary, wherein the vocabulary is used for word segmentation of the text sample; And if the reference embedded vector corresponding to each reference word in the vocabulary does not contain the embedded vector, determining the reference embedded vector closest to the embedded vector as the disturbance vector.
6. The method of claim 1, wherein the tuning model parameters of the first model in conjunction with the server and other clients of the at least one client according to the perturbation vector, the embedded vector corresponding to the target word segment, and the class label comprises: The disturbance vector and the embedded vector corresponding to the target word are sent to the server, so that the server generates a classification prediction result of the text sample through the first model according to the disturbance vector and the embedded vector corresponding to the target word and sends the classification prediction result to the target client; And determining first gradient information corresponding to an output layer of the first model according to a classification prediction result of the text sample and the classification label, and sending the first gradient information to the server so that the server can update the model parameters according to the first gradient information and second gradient information which is sent by other clients and corresponds to the output layer.
7. The method according to any of claims 1-6, wherein the second model of the target client comprises at least an embedded block comprising a plurality of embedded layers.
8. The method of claim 7, wherein the second model of the target client comprises an embedded block and at least one encoding module.
9. The method of any of claims 1-6, wherein the text sample and the category label are both local to the target client.
10. A method of text classification, comprising: acquiring a text to be classified; Obtaining a classification prediction result of the text to be classified through a pre-fine-tuned model according to the text to be classified, wherein the model is obtained through the model federal fine-tuning method according to any one of claims 1-9.
11. A model federal fine tuning device, wherein a pre-trained target model includes a first model deployed at a server and a second model deployed at least one client, respectively, the device being applied to a target client of the at least one client, comprising: The first acquisition module is used for acquiring a text sample and a classification label corresponding to the text sample; The local calculation module is used for segmenting the text sample to obtain a plurality of segmented words, and generating embedded vectors corresponding to the segmented words through the second model deployed at the target client; The target word segmentation determining module is used for determining target word segmentation with classification effect on the text category marked by the classification label from the plurality of word segmentation; The disturbance module is used for carrying out noise disturbance processing on embedded vectors corresponding to other word segmentation except the target word segmentation in the plurality of word segmentation to obtain disturbance vectors; the fine tuning module is used for cooperatively fine tuning the model parameters of the first model with the server side and other clients in the at least one client according to the disturbance vector, the embedded vector corresponding to the target word and the classification label; The target word segmentation determining module comprises: The obtaining sub-module is used for obtaining the classification effective words corresponding to the text categories marked by the classification labels; And the first determining submodule is used for determining the word segmentation belonging to the classification effective word from the plurality of word segmentation as the target word segmentation.
12. A text classification device, comprising: The second acquisition module is used for acquiring texts to be classified; The classification module is used for obtaining a classification prediction result of the text to be classified through a target model which is finely tuned in advance according to the text to be classified, wherein the target model is finely tuned according to the model federal fine tuning method of any one of claims 1-9.
13. A computer readable medium, on which a computer program is stored, characterized in that the program, when being executed by a processing device, carries out the steps of the method according to any one of claims 1-10.
14. An electronic device, comprising: A storage device having a computer program stored thereon; Processing means for executing said computer program in said storage means to carry out the steps of the method according to any one of claims 1-10.

Description

Model federal fine tuning method, text classification method, device, medium and equipment Technical Field The disclosure relates to the field of privacy protection, in particular to a model federal fine tuning method, a text classification method, a device, a medium and equipment. Background In recent years, pre-trained language models (Pre-trained Language Model, PLM) represented by transformer-based bi-directional encoder representation (Bidirectional Encoder Representation from Transformers, BERT) and generated Pre-trained (GENERATIVE PRE-Training, GPT) models exhibit strong text learning capabilities and are widely used in various fields of finance, law, health care, and the like. To improve the usability of a pre-trained language model on downstream applications, a common approach is to fine tune the model on downstream task-related datasets. However, many users cannot independently obtain the pre-trained Language Model and perform fine-tuning due to resource or technology limitations, which has led to a new business scenario that combines Language Model (LM) with Model-as-a-Service (MaaS) ecology. In MaaS, a server with sufficient computing resources and technical reserves provides rich pre-training models, service resources and core functions, while clients can use their own private data sets to fine-tune, deploy and invoke models by accessing a one-stop MaaS platform, thereby customizing language models that meet their specific needs. However, this solution, while providing efficient, customizable LM services for clients, also carries the risk of server-side model privacy disclosure and client-side data privacy disclosure. Disclosure of Invention This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter. In a first aspect, the present disclosure provides a model federal fine tuning method, where a pre-trained target model includes a first model deployed at a server and a second model deployed at least one client, respectively, the method being applied to a target client of the at least one client, including: acquiring a text sample and a classification label corresponding to the text sample; performing word segmentation on the text sample to obtain a plurality of word segments, and generating embedded vectors corresponding to the word segments through the second model deployed at the target client; determining target word segments with classification utility for the text categories marked by the classification labels from the plurality of word segments; Carrying out noise disturbance processing on embedded vectors corresponding to other word segmentation except the target word segmentation in the plurality of word segmentation to obtain disturbance vectors; And fine-tuning model parameters of the first model in cooperation with the server and other clients in the at least one client according to the disturbance vector, the embedded vector corresponding to the target word and the classification label. In a second aspect, the present disclosure provides a text classification method, comprising: acquiring a text to be classified; And obtaining a classification prediction result of the text to be classified through a target model which is finely tuned in advance according to the text to be classified, wherein the target model is finely tuned according to the model federal fine tuning method provided by the first aspect of the disclosure. In a third aspect, the present disclosure provides a model federal fine tuning device, a pre-trained target model including a first model deployed at a server and a second model deployed at least one client, respectively, the device being applied to a target client of the at least one client, comprising: The first acquisition module is used for acquiring a text sample and a classification label corresponding to the text sample; The local calculation module is used for segmenting the text sample to obtain a plurality of segmented words, and generating embedded vectors corresponding to the segmented words through the second model deployed at the target client; The target word segmentation determining module is used for determining target word segmentation with classification effect on the text category marked by the classification label from the plurality of word segmentation; The disturbance module is used for carrying out noise disturbance processing on embedded vectors corresponding to other word segmentation except the target word segmentation in the plurality of word segmentation to obtain disturbance vectors; And the fine tuning module is used for cooperatively fine tuning the model parameters of the first model with the server side and other clients in the at least one client according to the dis