CN-121562738-B - Federal learning method, system and equipment

CN121562738BCN 121562738 BCN121562738 BCN 121562738BCN-121562738-B

Abstract

The application provides a federal learning method, a federal learning system and federal learning equipment, wherein a client trains a local sharing model and a personalized model main body, the personalized model main body can capture proprietary characteristics of a private data set, the sharing model has global knowledge, the sharing model and the personalized model main body with the same hierarchical structure are fused by using fusion weights updated in a self-adaption mode, and the head of the personalized model is continuously trained on the premise of freezing fusion network parameters. The central server obtains the global model through aggregation according to the gradient of the local sharing model, so that the efficient coordination of the steady migration of global knowledge and the local self-adaptive optimization is realized on the premise of protecting privacy by introducing a selective parameter sharing and dynamic fusion mechanism, and the problems of model divergence, performance degradation, expansibility and the like are effectively solved.

Inventors

LIU HEPING
LIU SONGBAI
LI GENGHUI
MA LIJIA
LIN QIUZHEN

Assignees

深圳大学

Dates

Publication Date: 20260508
Application Date: 20260123

Claims (9)

1. The federal learning method is based on a federal learning system, and the federal learning system comprises a plurality of clients and a central server, and is characterized in that the federal learning method comprises the following steps: Training a sharing model and a personalized model main body of the client by using a private data set, wherein the personalized model main body is used for capturing the exclusive characteristics of the private data set, the sharing model has global knowledge, and the sharing model and the personalized model main body have the same hierarchical structure; based on the fusion weights given to each layer of the sharing model and the personalized model main body, the sharing model and the personalized model main body are fused layer by layer to obtain a fusion network, and a learnable vector is: , Where k represents the client sequence number, 、 An mth layer representing a personalized model body of the kth client and sharing the model, 、 The fusion weights are represented as such, Representing an mth layer of a fusion network, wherein the fusion weight is adaptively updated by a gradient descent method; Training a personalized model head by utilizing the private data set on the premise of freezing the fusion network parameters, wherein the output of the fusion network is connected to the input of the personalized model head, and the dimension of the input and the output of the personalized model head is the dimension of a classification task And the central server obtains a global model according to the training gradient of the sharing model.
2. The federal learning method according to claim 1, wherein the federal learning method further comprises: the central server generates pseudo data simulating the private data set by using a generator; the central server performs distillation training update on the global model by utilizing the pseudo data; the central server sending the updated global model to the client, and And the client updates the sharing model by using the updated global model.
3. The federal learning method according to claim 2, wherein the generator is represented as For generating pseudo-data , wherein, The parameters of the generator are represented by the parameters, Is a gaussian noise vector and y is a label randomly sampled from a uniform distribution.
4. The federal learning method according to claim 2, wherein the loss function of the generator comprises an authenticity loss term for characterizing the false data as positively identified by the client, a diversity loss term for characterizing the false data as covering a number of feature distributions, and a validity loss term for characterizing the false data as passing useful knowledge between the global model and the local model.
5. The federal learning method of claim 2, wherein the central server uses the dummy data to perform distillation training update on the global model, specifically, using the shared model of each client as a teacher model, using the global model as a student model, using the dummy data as a medium, and allowing knowledge of the shared model corresponding to each client to be integrated to be transferred to the global model.
6. The federal learning method according to claim 4, wherein the distillation training update further incorporates a sample level weighting mechanism reflecting how well each of the clients is grasping different samples.
7. The federal learning method according to claim 1, wherein the personalized model head is a lightweight multi-layer perceptron.
8. A federal learning system comprises a plurality of clients and a central server, and is characterized in that, The client is used for training a local sharing model and a personalized model main body by utilizing a private data set, wherein the personalized model main body is used for capturing the exclusive characteristics of the private data set, the sharing model has global knowledge, and the sharing model and the personalized model main body have the same hierarchical structure: , Where k represents the client sequence number, 、 An mth layer representing a personalized model body of the kth client and sharing the model, 、 The fusion weights are represented as such, On the premise of freezing parameters of the fusion network, training a personalized model head by utilizing the private data set, wherein the output of the fusion network is connected to the input of the personalized model head, and the dimension of the input and the output of the personalized model head is the dimension of a classification task; and the central server is used for obtaining a global model according to the training gradient of the sharing model.
9. An electronic device comprising a processor, and a memory coupled to the processor, The memory for storing a computer program, and The processor configured to execute the computer program stored in the memory to cause the electronic device to perform the federal learning method according to any one of claims 1-7.

Description

Federal learning method, system and equipment Technical Field The present application relates to the field of data processing technologies, and in particular, to a federal learning method, system, and apparatus. Background With the wide popularization of electronic products such as smart phones, internet of things equipment and the like, the data volume generated and processed by terminal equipment in local is exponentially increased, and the application of deep learning technology in edge scenes is greatly promoted. However, the conventional centralized machine learning method requires that the original data of each terminal is uniformly uploaded to a central server for model training, and is limited by multiple factors such as network bandwidth, transmission delay, user privacy protection and the like, and the model gradually exposes obvious limitations in practical application. For example, in an intelligent home environment, a plurality of internet of things devices such as a camera, a temperature and humidity sensor, a voice assistant and the like continuously collect multi-mode data such as images, audio and environmental parameters. If all the original data are uploaded in a centralized manner, not only the huge consumption of network resources is caused, but also serious privacy risks may be caused due to data leakage. In order to overcome the defects of centralized learning, federal learning (FEDERATED LEARNING, FL) is proposed as a distributed collaborative training framework, and allows each client to build a global model through cooperation of local training and parameter aggregation on the premise of not sharing original data. The federal average algorithm (FedAvg) is used as a most representative basic algorithm, and a simple weighted average mode is adopted to aggregate local model parameters uploaded by a client, so that knowledge sharing is realized while privacy is protected. However, the algorithm is significantly degraded in the face of non-independent co-distributed (non-IID) data that is common in practical applications. Because the data distribution of different clients has obvious difference, for example, the individual difference of users on writing habits, voice characteristics or environmental conditions, structural conflict exists between the optimization direction of the local model and the global target, and further the problems of model parameter divergence, slow convergence speed, final performance degradation and the like are caused. Especially in the consumer electronics field, terminal equipment types are various, and user use behaviors are highly personalized, so that the problem of data isomerism is more prominent, and the practicability and stability of federal learning are severely restricted. In order to alleviate the problems, the prior researches mainly focus on two directions, namely, on one hand, generalized federal learning (Generalized FEDERATED LEARNING, GFL) tries to improve the generalization capability of a global model on various clients by improving a model aggregation mechanism, introducing regularization constraint or knowledge distillation and other means, so that the global model can still keep relatively stable performance under a non-IID environment, and on the other hand, personalized federal learning (Personalized FEDERATED LEARNING, PFL) captures data characteristics specific to users by locally deploying the customized model so as to meet the requirements of terminal equipment on Personalized services. Typical PFL methods generally employ a shared-personalized dual structure, i.e., a portion of the global shared parameters are reserved for cross-client knowledge transfer, while local proprietary parameters are introduced to achieve personalized adaptation. However, while the above approach alleviates to some extent the challenges presented by data isomerization, there are a number of drawbacks that are difficult to overcome. For example, the global model trained by the GFL method often performs poorly on part of clients, and is difficult to meet the actual demands of users for personalized experiences, while the PFL method often sacrifices the universality of the global model while pursuing local performance, resulting in a significant reduction in the adaptability of the global model to newly added clients or tasks that are not seen. In addition, the existing PFL framework is not designed enough on a fusion mechanism of shared knowledge and personalized information, so that the relationship between common feature extraction and personalized feature reservation is difficult to be effectively balanced, and the expandability and deployment efficiency of the existing PFL framework in dynamic and diversified consumption scenes are further limited. In summary, when the existing federal learning technology is used for coping with the data isomerism problem, an effective balance between the generalization capability of the global model and the individua