CN-121981205-A - Sparse federal learning method and system based on shuffling differential privacy

CN121981205ACN 121981205 ACN121981205 ACN 121981205ACN-121981205-A

Abstract

The application relates to the technical field of federal learning, in particular to a sparse federal learning method and system based on shuffling differential privacy, wherein the method comprises the following steps of S1, initializing a global model by a central server module; S2, each selected client module trains to obtain a local model based on a global model, S3, each selected client module selects K parameter values with important information from the cut parameter vectors, and carries out filling processing by using guassian noise or lapace noise, S4, each selected client module converts the parameter vectors into secret parameter vectors and transmits the secret parameter vectors to a shuffler module, the shuffler module randomly arranges all secret parameter vectors, S5, the central server module recovers the parameter vectors, aggregates all the parameter vectors, updates the global model and jumps to S1 before the global model converges. The application can protect the parameter vector and reduce the dimension of the parameter vector.

Inventors

TIAN JIAO
ZHANG KUO

Assignees

新疆大学

Dates

Publication Date: 20260505
Application Date: 20260109

Claims (8)

1. The sparse federal learning method based on shuffling differential privacy is characterized by comprising the following steps: S1, initializing a global model by a central server module, and randomly selecting a plurality of client modules from all the client modules by the central server module; S2, the central server module issues a global model to each selected client module, and each selected client module is trained to obtain a local model by using a training data set of the central server module based on the global model; S3, after training each selected client module to obtain a local model, cutting the parameter vector, selecting K parameter values with important information from the parameter vector to form an intermediate parameter vector, and filling the intermediate parameter vector with guassian noise or lapace noise to obtain a new parameter vector; S4, each selected client module converts the new parameter vector into a secret parameter vector and transmits the secret parameter vector to the shuffler module, and the shuffler module randomly arranges all secret parameter vectors and sends a secret parameter vector list to the central server module; S5, the central server module recovers the corresponding parameter vector from each secret parameter vector, the central server module aggregates all the parameter vectors, the global model is updated, and the S1 is skipped before the global model converges.
2. The method of claim 1, wherein selecting K parameter values with important information from the parameter vectors comprises calculating absolute values of the parameter values in the parameter vectors, sorting all the absolute values in order from large to small to obtain an absolute value sequence, selecting K absolute values with the first order from the absolute value sequence, and selecting K parameter values in the parameter vectors corresponding to the K absolute values as K parameter values with important information.
3. The method according to claim 2, characterized in that K is obtained by: s21, calculating the mean value and variance of the parameter vector obeying Gaussian distribution, and passing through a formula Acquisition of , wherein, Is the mean value of the two values, As a function of the variance of the values, Sampling proportion for the client module; s22, through the formula Acquisition of , wherein, As the total number of parameters of the local model, For sparsification ratio of The correction process is repeatedly performed three times to obtain K.
4. The method of claim 3, wherein the correction process comprises, in the following steps In the case of (a), will Enlarged to 1.5 In the following In the case of (a), will Reduced to 0.5 。
5. The method according to claim 1, characterized in that the conversion of the new parameter vector into a secret parameter vector comprises the steps of: S41, splitting the new parameter vector into a plurality of data sets, distributing ID values which are sequentially increased by 1 for each data set, enabling the minimum ID value to be 1, and acquiring preset data; S42, carrying out secret processing on the data set corresponding to the minimum ID value aiming at preset data to obtain secret processing result data, and obtaining the secret processing result data from the secret processing data set and the secret processing result data; s43, regarding the data group corresponding to other ID values between the minimum ID value and the maximum ID value, carrying out secret processing on the transition data to obtain secret processing result data, and continuing to obtain the secret processing result data from the secret processing data group and the secret processing result data; S44, regarding the data group corresponding to the maximum ID value, carrying out secret processing on the preset data and the data group corresponding to the previous ID value to obtain transition data, carrying out secret processing on the transition data to obtain secret processing result data, and continuing to carry out secret processing on the data group and part of secret processing result data to obtain secret processing result data under the condition that the data length of the secret processing result data is larger than that of the data group; S45, sequentially connecting the secret processing result data according to the sequence from the smaller ID value to the larger ID value to obtain a secret parameter vector.
6. The method of claim 5, wherein recovering a respective parameter vector from each secret parameter vector, comprises the steps of: S51, splitting the secret parameter vector into a plurality of secret data sets, distributing ID values which are sequentially increased by 1 for each secret data set, enabling the minimum ID value to be 1, and acquiring the preset data; S52, carrying out secret processing on the secret data set corresponding to the minimum ID value aiming at preset data to obtain secret processing result data, and obtaining a recovery data set by the secret processing of the secret data set and the secret processing result data; S53, regarding the secret data set corresponding to other ID values between the minimum ID value and the maximum ID value, carrying out secret processing on the preset data and the recovery data set corresponding to the previous ID value to obtain transition data, carrying out secret processing on the transition data to obtain secret processing result data, and continuing to carry out secret processing on the secret data set and the secret processing result data to obtain the recovery data set; S54, regarding a secret data group corresponding to the maximum ID value, carrying out secret processing on preset data and a recovery data group corresponding to the previous ID value to obtain transition data, carrying out secret processing on the transition data to obtain secret processing result data, and continuing to carry out secret processing on the secret data group and part of secret processing result data to obtain the recovery data group under the condition that the data length of the secret processing result data is larger than that of the secret data group; S55, sequentially connecting the recovery data sets according to the sequence from the smaller ID value to the larger ID value to obtain the parameter vector.
7. The method of claim 1, wherein aggregating all of the parameter vectors, updating the global model comprises first passing through a formula Calculating the parameter momentum of the global model , wherein, For the preset momentum coefficient to be present, For the number of rounds of rotation, For the parameter momentum of a round on the global model, Is set to be 0, the number of the components is set to be 0, The average parameter vector calculated based on all parameter vectors of the round is calculated according to the formula Updating the global model, wherein, Is the parameter vector of the global model principal round, Is a parameter vector of a round on the global model, Is the initial parameter vector of the global model.
8. A sparse federal learning system based on shuffled differential privacy, to which the sparse federal learning method based on shuffled differential privacy of any one of claims 1-7 is applied, comprising the following modules: The central server module is used for initializing the global model, randomly selecting a plurality of client modules from all the client modules, issuing the global model to each selected client module, recovering corresponding parameter vectors from each secret parameter vector, aggregating and processing all the parameter vectors, and updating the global model; the shuffler module is used for randomly arranging all the received secret parameter vectors and sending a secret parameter vector list to the central server module; the client module is used for obtaining a local model by training a training data set of the client module based on the received global model, cutting the parameter vector after obtaining the local model by training, selecting K parameter values with important information from the parameter vector to form an intermediate parameter vector, filling the intermediate parameter vector with guassian noise or lapace noise to obtain a new parameter vector, and converting the new parameter vector into a secret parameter vector to be transmitted to the shuffler module.

Description

Sparse federal learning method and system based on shuffling differential privacy Technical Field The application relates to the technical field of federal learning, in particular to a sparse federal learning method and system based on shuffling differential privacy. Background In the medical industry, the medical levels of hospitals in different areas are different, and in the training of a disease risk prediction model, village and town hospitals can optimize a local prediction model by means of high-quality case data of trimethyl hospitals under the premise of protecting patient privacy through federal study. In addition, in the field of automatic driving, different brands of automatic driving vehicles can share driving data gradient through federal learning under the premise of protecting user privacy, and an automatic driving model is optimized. With further development of technology, the problem of obtaining a parameter vector sent by a client so as to reversely push out privacy information occurs, the shuffling differential privacy can enhance privacy protection, and as a model becomes more complex, the data volume of model parameters is rapidly increased, the problem of increasing communication load occurs, the gradient sparsification method can reduce the dimension of the parameter vector, however, the convergence of a federal learning model can be damaged due to the introduction of gradient sparsification and differential privacy noise, and concurrency in the process of updating the global model is caused. Disclosure of Invention The application provides a dynamic threshold selection method to replace topk algorithm in a flag framework, and provides a secret mechanism to convert a parameter vector into a secret parameter vector, and also provides an aggregation mechanism introducing momentum. The application provides a sparse federation learning method based on shuffling differential privacy, which comprises the following steps: S1, initializing a global model by a central server module, and randomly selecting a plurality of client modules from all the client modules by the central server module; S2, the central server module issues a global model to each selected client module, and each selected client module is trained to obtain a local model by using a training data set of the central server module based on the global model; S3, after training each selected client module to obtain a local model, cutting the parameter vector, selecting K parameter values with important information from the parameter vector to form an intermediate parameter vector, and filling the intermediate parameter vector with guassian noise or lapace noise to obtain a new parameter vector; S4, each selected client module converts the new parameter vector into a secret parameter vector and transmits the secret parameter vector to the shuffler module, and the shuffler module randomly arranges all secret parameter vectors and sends a secret parameter vector list to the central server module; S5, the central server module recovers the corresponding parameter vector from each secret parameter vector, the central server module aggregates all the parameter vectors, the global model is updated, and the S1 is skipped before the global model converges. The application also provides a sparse federation learning system based on shuffling differential privacy, which comprises the following modules: The central server module is used for initializing the global model, randomly selecting a plurality of client modules from all the client modules, issuing the global model to each selected client module, recovering corresponding parameter vectors from each secret parameter vector, aggregating and processing all the parameter vectors, and updating the global model; the shuffler module is used for randomly arranging all the received secret parameter vectors and sending a secret parameter vector list to the central server module; the client module is used for obtaining a local model by training a training data set of the client module based on the received global model, cutting the parameter vector after obtaining the local model by training, selecting K parameter values with important information from the parameter vector to form an intermediate parameter vector, filling the intermediate parameter vector with guassian noise or lapace noise to obtain a new parameter vector, and converting the new parameter vector into a secret parameter vector to be transmitted to the shuffler module. Compared with the prior art, the application has the following beneficial effects: In the technical scheme provided by the application, firstly, a central server module initializes a global model, and the central server module randomly selects a plurality of client modules from all client modules. Next, the central server module issues a global model to each selected client module, and each selected client module trains to obtain a local model based on the global m