CN-122021806-A - Federal learning method based on importance gradient update and self-adaptive differential privacy control

CN122021806ACN 122021806 ACN122021806 ACN 122021806ACN-122021806-A

Abstract

The invention provides a federal learning method based on importance gradient update and self-adaptive differential privacy control, which comprises the steps that a client performs model initialization according to global initial model weight and training parameters broadcasted by a central server, the client trains a local model based on a local training data set, calculates local model gradients, evaluates importance of each parameter dimension to obtain gradient statistical distribution vectors, the client forms sparse gradients according to the gradient statistical distribution vectors, self-adaptively adjusts differential privacy noise intensity according to collected gradient intensity metrics to obtain noise adding gradients, performs local model update after noise is injected into the sparse gradients, aggregates model parameters updated from each client by the central server to obtain new global model parameters, and repeats to realize iterative training of the global model. The invention effectively reduces invalid noise interference and improves the convergence speed and the prediction accuracy of the model while meeting the constraint of differential privacy.

Inventors

CHEN ZHUO
SHAN FANGFANG
ZHANG SHUQIN
FAN LULU
MAO YIFAN
LIU YUHANG
WANG JIAJIE
ZHANG XIAOYU
HAN YABO

Assignees

中原工学院

Dates

Publication Date: 20260512
Application Date: 20260128

Claims (10)

1. The federal learning method based on importance gradient updating and self-adaptive differential privacy control is characterized by comprising the following steps of: s1, a client performs model initialization according to global initial model weights and training parameters broadcasted by a central server, and forms a local training data set and a test set of each client; S2, the client trains the local model based on the local training data set, calculates the local model gradient, and evaluates the importance of each parameter dimension of the local model gradient to obtain a gradient statistical distribution vector; s3, the client selects key gradient components in the local model gradient according to the gradient statistical distribution vector to form a sparse gradient; S4, according to gradient intensity measurement collected in the training process of the step S2, adaptively adjusting differential privacy noise intensity to obtain a noise adding gradient, carrying out local model updating after noise is injected into the sparse gradient, and aggregating model parameters updated from all clients by a central server to update global model parameters; And S5, repeating the steps S2 to S4 to realize iterative training of the global model, evaluating the performance of the model by using the test set based on the updated global model parameters by the central server, and ending training when the global model reaches a preset maximum training round on the test set.
2. The federal learning method based on importance gradient update and adaptive differential privacy control according to claim 1, wherein the images of the training set of CIFAR-10 dataset are randomly divided into the local clients according to a non-independent and equi-distributed manner by adopting CIFAR-10 datasets with the same number of images in each category, forming the first Local training data set for individual clients The images of the test set of CIFAR-10 data sets are reserved in a central server and are not involved in training, and the images are only used for testing the updated global model after each round of global aggregation is completed, and the classification accuracy is calculated to evaluate the performance of the model; each client trains a global model together under the coordination of a central server, and optimizers of each client adopt random gradient descent; the central server broadcasts global initial model weights to all clients when the federal learning task is created Global initial learning rate Local maximum training rounds ; First, the The individual clients based on the received global initial model weights And a global initial learning rate Initializing and updating the local model to obtain initial weight of the local model And local model learning rate 。
3. The federal learning method based on importance gradient updating and adaptive differential privacy control according to claim 1 or 2, wherein in the first place In the round global training, the first one participated in training Each client receives global model parameters issued by a central server And as initial parameters for local model training And based on a pre-partitioned local training data set Training of local model with random gradient descent optimizer, at Updating and uploading local model parameters after local iteration The central server receives the local model parameters of all the clients, performs weighted average to obtain global model parameters of the next round, and sends the global model parameters to each client; and each client evaluates the importance of each parameter dimension based on each local model gradient in the local training process, and is used for reflecting the relative importance of different parameter dimensions in model training, and meanwhile, gradient strength measurement is collected.
4. The federal learning method based on importance gradient updating and adaptive differential privacy control of claim 3, wherein the method of calculating local model gradients is: flattening all trainable parameters of the global model into d-dimensional vectors to obtain parameter vectors Wherein each component Is a trainable weight in the global model, given a loss function of the local model Gradient vector Is that Wherein each gradient component represents a loss function Local sensitivity to the corresponding component; client-to-gradient vector Proceeding with Norm clipping, the first Individual clients are based on a local training dataset In the first place Tailored local model gradients obtained in global training Wherein, the A threshold value is tailored for the gradient, Representing vectors The norm of the sample is calculated, Represent the first The individual client is at the first Global training of wheel, th The local model parameters after the second local iteration, Representing the loss function of the local model, Representing a loss function Is a gradient of (a).
5. The federal learning method based on importance gradient update and adaptive differential privacy control according to claim 4, wherein the method for obtaining gradient statistical distribution vectors is: Gradient to local model Dimension of each parameter Calculating corresponding statistics to obtain an importance measurement value: Wherein, the The index of the gradient parameter dimension is represented, As a function of the absolute value of the function, Representing the first in local model gradients The magnitude of the gradient in each dimension, Representing the overall dimension of the model parameters; First, the The statistics corresponding to the parameter dimensions are ordered according to the numerical value by each client to form a gradient statistical distribution vector: Wherein, the Representing a descending order of operations; collecting local model gradients
6. The method for federal learning based on importance gradient updating and adaptive differential privacy control according to claim 5, wherein the method for forming sparse gradients is First, the The individual clients statistically distribute vectors according to gradients Constructing gradient selection masks Wherein the gradient selects a mask variable Wherein, the Is the first Threshold parameters for controlling gradient retention ratio in global training; First, the Individual clients select masks according to gradients Gradient to local model Screening to form importance guided sparse gradients Wherein, the The gradient components representing the element-wise multiplication, the unselected local model gradients are zeroed out.
7. The federal learning method based on importance gradient updating and adaptive differential privacy control according to claim 6, wherein the method is based on a preset sparsity ratio Gradient statistical distribution vector after sequencing Is selected from the first The value of each position is taken as a threshold value 。
8. The federal learning method based on importance gradient updating and adaptive differential privacy control according to any one of claims 4-7, wherein the method for adaptively adjusting differential privacy noise intensity to obtain a noise added gradient is as follows: In the first place In the round global training, the central server obtains the gradient strength measurement based on the step S2 Counting the gradient intensity metrics of a plurality of clients participating in the current round of training to obtain an average gradient intensity metric: Wherein, the Representing the number of clients participating in the current round of training; Based on average gradient strength Incorporating scaling parameters for controlling noise adjustment scale Calculating an adaptive noise adjustment factor Wherein, the As hyperbolic tangent function ; Based on adaptive noise adjustment factors Gaussian noise is injected into the sparse gradient to obtain a noise-added gradient Wherein, the Mean value zero and covariance as Is provided with a multi-dimensional gaussian distribution of (c), Is that The dimensional identity matrix is used to determine the identity of the object, A threshold is clipped for the gradient.
9. The federal learning method based on importance gradient updating and adaptive differential privacy control of claim 7, wherein the learning rate is learned by a local model Updating local model parameters After the local model is updated, all clients participating in the round of training use the local model parameters Uploading to a central server, wherein the central server locally trains the data set according to each client And carrying out weighted average to obtain global model parameters of the next round: Wherein, the For the number of local datasets owned by a client, Total data set number for clients participating in training; and the central server transmits the updated global model parameters to all clients and enters the next training round.
10. The federal learning method based on importance gradient updating and adaptive differential privacy control of claim 9, wherein the scaling parameters According to the adaptive noise adjustment factor The degree of influence of (2) is divided into: When scaling parameters Average gradient intensity metric For noise regulating factor The influence of the self-adaptive noise is weakened, the self-adaptive noise strength changes smoothly, the noise attenuation speed is low, and the strong privacy protection capability is maintained in the training process; When scaling parameters Average gradient intensity metric And noise regulating factor The balance relation is achieved, so that the compromise effect is achieved between the privacy protection strength and the model convergence performance; When scaling parameters Average gradient intensity metric For noise regulating factor The influence of the noise is enhanced, the noise intensity decays faster along with the training process, and the model is favorable for fine optimization in the later stage.

Description

Federal learning method based on importance gradient update and self-adaptive differential privacy control Technical Field The invention relates to the technical field of distributed machine learning and privacy protection calculation, in particular to a federal learning method based on importance gradient updating and self-adaptive differential privacy control. Background With the rapid development of artificial intelligence technology and internet of things, a great amount of high-value data are continuously generated by user side devices such as sensors, wearable devices, intelligent terminals, intelligent transportation systems and the like in the operation process. In conventional machine learning and deep learning frameworks, model training typically relies on uploading data sets scattered across terminals to a central server for unified processing and modeling. However, the centralized data processing method not only brings high communication cost, but also easily causes serious privacy disclosure risk in the data transmission and centralized storage process, and has difficulty in meeting increasingly strict requirements of data security and privacy compliance. Federal learning (FEDERATED LEARNING, FL) is used as a novel distributed machine learning paradigm, and an effective approach is provided for solving the problems of data island and privacy protection by realizing multiparty collaborative modeling on the premise of not sharing original data. In the federal learning framework, a central server is responsible for coordinating the training process, each client trains a model only locally by using private data, and uploads model update information to the server, so that a distributed learning mode of 'data not out of domain' is realized. The mode is preliminarily applied to the scenes such as input method prediction, recommendation systems and the like. However, although federal learning has certain advantages in terms of privacy protection, it is inevitable that gradient or model parameter information is exposed during model updating, and an attacker may recover sensitive data of a user from model updating through attack means such as gradient inversion, membership inference, and the like. To further enhance privacy preserving capabilities, differential privacy (DIFFERENTIAL PRIVACY, DP) was introduced into the federal learning framework to provide strict privacy preserving guarantees by injecting random noise during model updates to limit the impact of individual samples on model output. Most of the existing differential privacy federal learning methods adopt fixed gradient clipping threshold values and fixed noise intensities, and noise is injected uniformly in all model parameter dimensions. However, during actual training, the model gradients often exhibit significant non-uniform distribution characteristics, with a large number of gradient components concentrated in near-zero regions, with only a small number of key gradients contributing primarily to model convergence. Cutting and noise adding are carried out on all parameter dimensions simultaneously, so that a large amount of noise is easily injected into parameters with small updating contribution to the model, the privacy budget is wasted, and the convergence speed and the final precision of the model are obviously reduced. In the prior art, aiming at the privacy protection problem in federal learning, a part of invention patents have been explored. The invention patent with application publication number CN113591145A discloses a federal learning global model training method based on differential privacy and quantization, which reduces the risk of privacy leakage by quantizing model update and injecting noise, but uniformly introduces noise in all model parameter dimensions, and does not consider the importance difference of gradient in different parameter dimensions, thus easily causing the model utility to be reduced. The patent of the invention with the application publication number of CN111091199A discloses a federal learning method based on differential privacy, which meets the differential privacy constraint by introducing random noise with fixed strength in the model updating process, but the noise parameters of the method remain unchanged in the training process, so that the method is difficult to adapt to the dynamic changes of gradient amplitudes of different stages of model training, and the convergence performance of the model is influenced. The invention patent with the application publication number of CN112232528A provides a federal learning model training method, privacy protection is realized by uniformly cutting and perturbing model parameters, but the method does not distinguish the contribution degree of different gradient components to model updating, and the problems of uneven distribution of privacy budget and excessive perturbation of key gradients still exist. In addition, the patent of the invention with the a