CN-121998135-A - Personalized federal learning method based on condition separation network and weight self-adaption

CN121998135ACN 121998135 ACN121998135 ACN 121998135ACN-121998135-A

Abstract

The invention relates to the technical field of personalized federal learning, in particular to a personalized federal learning method based on a condition separation network and weight self-adaption, which comprises the following steps that each client side trains a model based on local data and calculates a prototype of each label in a prototype learning mode so as to restrict the distance between local and global characteristics; the method comprises the steps of decoupling the features output by a feature extractor into global features and local features by a condition separation network, inputting the global features and the local features into a global classifier and a local classifier respectively, calculating an aggregation weight by a server according to Fisher information matrixes of clients and cosine similarity between parameter updating directions and global directions of the Fisher information matrixes, weighting and aggregating parameters of the clients by the weight, and transmitting the aggregated parameters to the clients by the server. The method and the device effectively solve the problems of model performance degradation and client drift caused by data isomerization, improve the performance of the personalized model, and simultaneously relieve the problem of catastrophic forgetting of the client through characteristic decoupling.

Inventors

TANG FEI
LI XIMING
HUANG HUI
XIAO YUNPENG
WANG PING
HUANG YAWEN
XIA CHENYAN

Assignees

重庆邮电大学

Dates

Publication Date: 20260508
Application Date: 20260210

Claims (10)

1. The personalized federal learning method based on the condition separation network and the weight self-adaption is characterized in that a client extracts image features from images by a local feature extractor aiming at a local image classification task, and then classifies the images based on the local classifier, and the method is characterized in that the parameter updating process of each client and each server specifically comprises the following steps: S1, a server distributes global model parameters to each client, wherein the global model parameters cover a feature extractor, a condition separation network and a classifier; S2, each client trains a model based on the local data, and calculates a prototype of each label in a prototype learning mode so as to restrict the distance between the local and global features; S3, decoupling the features output by the feature extractor into global features and local features by using a condition separation network, and respectively inputting the global features and the local features into a global classifier; s4, the client side calculates a loss function to update the parameters of the client side based on the local tag prototype and the global tag prototype, and the global classifier and the local classifier calculate the loss function, and the client side uploads the more-feasible parameters to the server; S5, the server calculates a Fisher information matrix and cosine similarity between the parameter updating direction and the global direction according to the model parameters uploaded by each client, calculates an aggregation weight, and then uses the weight to carry out weighted aggregation on the client parameters to obtain updated global model parameters; S6, the server transmits the updated global model parameters to the client.
2. The method of claim 1, wherein step S1 comprises the steps of the server first randomly sampling a batch of clients from all clients as a subset of clients, distributing global feature extractor, conditional separation network and classifier parameters to each client in the subset, and each client maintaining the following components global feature extractor, local feature extractor, global classifier, local classifier and conditional separation network.
3. The personalized federal learning method based on conditional separation network and weight adaptation according to claim 2, wherein the process of the client training the model based on the local data comprises: Extracting the average value of the prototype of each label by using a global feature extractor respectively, and calculating the average value of each label prototype in the training process; By minimizing the distance between the local prototype and the global prototype, the local feature representation is constrained from deviating from the global feature distribution.
4. The personalized federal learning method based on conditional separation network and weight adaptation of claim 2, wherein the manner in which the conditional separation network implements feature decoupling comprises: Generating context vectors based on parameters of the global classifier and the local classifier, splicing the global context vectors and the local context vectors, and generating characteristic scaling factors and bias coefficients through a bottleneck structure network; Affine transformation is carried out on the original features by using the scaling factors and the bias coefficients, and global features and local features are generated, namely: ; ; Wherein, the Features extracted from the local data for the local feature extractor; in order to have local features of the personalized information, Is a feature with global information; in order to map the scaling factors of the local features, Scaling factors for mapping global features; in order to map the bias coefficients of the local features, To map the bias coefficients of the global features.
5. The personalized federal learning method based on conditional separation of networks and weight adaptation according to claim 1, wherein the calculation of the aggregate weights in step S4 comprises: calculating the trace of the Fisher information matrix of each client parameter as a measure of its contribution; Calculating cosine similarity between the client parameter updating direction and the global average updating direction, and measuring consistency between the local parameter updating direction and the global average updating direction; aggregate weights are generated based on trace and cosine similarity by a nonlinear mapping function.
6. The personalized federal learning method based on conditional separation of networks and weight adaptation of claim 5, wherein the aggregate weights are expressed as: ; Wherein, the Aggregation weight for the i-th client; the track of the Fisher information matrix of the ith client; cosine similarity between the parameter updating direction of the ith client and the global average updating direction; Selecting a training client set for the round; and is a nonlinear mapping function.
7. The personalized federal learning method based on condition separation network and weight adaptation according to claim 2, wherein the client side receives the global model parameters issued by the server and updates the global feature extractor, the local feature extractor, the global classifier and the condition separation network by using the model parameters.
8. The personalized federal learning method based on condition separation network and weight adaptation according to claim 1, wherein the calculating the loss function by the client based on the classification results of the local label prototype and the global label prototype, the global classifier and the local classifier comprises: ; Wherein, the A total loss function for the client; Is a client terminal With respect to data Is the distribution of xi as client Yi is a tag for data sample xi; Is a cross entropy loss function; is the classification result of the global classifier and the local classifier, L PA is the global prototype Local prototypes Alignment loss between them.
9. The personalized federal learning method based on conditional separation network and weight adaptation according to claim 8, wherein global prototypes are Local prototypes The alignment loss between the global prototypes and the local prototypes is the minimum mean square error between the global prototypes and the local prototypes under each category.
10. The personalized federal learning method based on conditional separation network and weight adaptation according to claim 9, wherein the process of updating the prototype of the local client tag j in a training session comprises: Client uses its local feature extractor Extracting features from the current batch data and calculating the category in the batch Is a local prototype of (a) The method comprises the following steps: ; Wherein, the Is a client terminal Possession tag Is a sample number of (a); representing an indication function, i.e. if the condition in brackets is satisfied, the function value is 1, otherwise the function value is 0;y i,k representing the client The kth tag owned is the tag X i,k denotes the client The kth tag owned is the tag Is a sample of (2); representing by client Global feature extractor of (a) A feature vector extracted from the sample x i,k ; prototype of current batch and label of previous round Prototype of (a) Weighted fusion is performed to smoothly update the local prototype, namely: ; Wherein, the Is a smoothing coefficient.

Description

Personalized federal learning method based on condition separation network and weight self-adaption Technical Field The invention relates to the technical field of federation learning individualization, is suitable for a data non-independent co-distribution scene, and particularly relates to an individualization federation learning method based on a condition separation network and weight self-adaption. Background Federal learning (FEDERATED LEARNING, FL) serves as an emerging distributed machine learning model, allows multiple clients to cooperatively train a model without local data, and effectively solves the problems of data islanding and privacy protection. The classical algorithm FedAvg generates a global model by aggregating model updates of all clients, which lays a foundation framework for federal learning. However, in practical applications, the data distribution of each client tends to exhibit significant heterogeneity, i.e., non-independent co-distribution (Non INDEPENDENT IDENTICALLY Distributed, non-IID). The statistical heterogeneity can cause the problems of slow model convergence, reduced precision and the like, so that a single global model is difficult to achieve optimal performance on all clients, and the application effect of federal learning in a real scene is severely restricted. In the edge equipment scene, the data of each client presents obvious statistical heterogeneity, namely, on one hand, the data distribution is extremely different (such as case types of different hospitals in medical terminals and motion data distribution of different users in intelligent watches), namely, non-IID characteristics, and on the other hand, the data scale is unbalanced (such as that a part of industrial sensors only collect a small number of samples, and the sample size of core nodes reaches hundreds of thousands of levels). The heterogeneity leads to a general 15% -30% drop in accuracy of the traditional federal learning global model deployed at the local client, for example, fedAvg has 22.6% lower test accuracy than centralized training in a Non-IID scene of CIFAR-10 data sets, and cannot meet practical application requirements. To address the challenges presented by data isomerization, personalized federal learning (Personalized FEDERATED LEARNING, PFL) has developed, the core goal of which is to customize a proprietary high-performance model from federal learning for each client. The existing PFL method can be divided into two categories, namely a global model training algorithm and a local model learning algorithm. Global model training algorithms aim to improve model quality from the source, including in particular data-based methods (e.g. data enhancement, client selection) and model-based methods (e.g. collaborative gaming, migration learning). The local model learning algorithm focuses on the personalized architecture and strategy of the client side, and mainly comprises an architecture-based method (such as parameter decoupling and knowledge distillation) and a similarity-based method (such as multi-task learning, model interpolation and grouping clustering). The parameter decoupling method divides the model into a sharing part and a personalized part, and is a widely adopted idea. For example FedPer shares the basic feature extractor and personalizes the classifier header, fedRoD maintains a global classifier and a personalized classifier for each client at the same time, FURL subdivides the model parameters further into federal and private parameters. However, it can be found through extensive parsing of the existing studies that, despite the remarkable progress made by the above-mentioned methods, they have significant limitations in dealing with highly complex and dynamic heterogeneous environments: (1) In the existing Personalized Federal Learning (PFL) method (such as FedPer and FURL), although partial personalization is realized through the framework of a shared feature extractor matched with a personalized classifier, a static and hard module division strategy is adopted, namely the features output by the feature extractor are regarded as a coupling body of global information and local information, the information flow direction cannot be dynamically adjusted according to an input sample, and the flexibility of adapting to an actual scene is extremely poor. For example, in an actual medical image diagnosis scene FedPer, when skin cancer images are processed, global general features such as 'skin outline' are included, local personalized features such as 'focus size, focus shape' are also included (different shooting equipment and labeling standards of different hospitals exist), but a static framework forces all features to be input into a shared extractor, so that the local classifier is interfered by irrelevant global information, the fitting risk is increased, misdiagnosis and missed diagnosis can be caused in actual application, and in an intelligent recommendation scene, st