CN-121638387-B - Federal damping type forgetting method and device, electronic equipment and storage medium

CN121638387BCN 121638387 BCN121638387 BCN 121638387BCN-121638387-B

Abstract

The invention belongs to the technical field of federal learning, and particularly relates to a federal damping forgetting method, a federal damping forgetting device, electronic equipment and a storage medium. The method comprises the steps of obtaining a pre-training global model, wherein the pre-training global model is obtained by training an initial global model by utilizing a local data set of a client based on a federal learning method, respectively constructing a forgetting class set and a reserved class set based on a selected target forgetting class, calculating Fisher information matrixes of all parameters of the pre-training global model on the forgetting class set and the reserved class set respectively, screening model parameters to be weakened and executing weakening, updating the pre-training global model based on the weakened model parameters, and executing post-training on the updated model by utilizing image samples in the reserved class set to obtain the target global model. The invention solves the problems that the performance of the model in the reserved category is greatly damaged and the performance of the model is recovered too slowly in the current federal pruning type forgetting learning.

Inventors

GAO LONGXIANG
Zhu Xunxiang
QU YOUYANG
GU SHUJUN
GE SHUXIN
WANG CHANGWEI
ZHOU WANLEI

Assignees

齐鲁工业大学(山东省科学院)
山东省计算中心（国家超级计算济南中心）

Dates

Publication Date: 20260505
Application Date: 20260205

Claims (8)

1. A federal damping type forgetting method applied to computer vision processing, voice processing, natural language processing and large language models, characterized in that the method comprises: S1, acquiring a pre-training global model, wherein the pre-training global model is obtained by training a given initial global model by utilizing a local data set of a client based on a federal learning method, the local data set is obtained by distributing the local data set based on the given training data set and comprises a plurality of image samples of different categories, and the image samples belonging to the same category in the local data set of each client are constructed as a data subset of a corresponding category on the client; S2, randomly selecting any one or more categories from the training data set as target forgetting categories, selecting the remaining categories as reserved categories, constructing a data subset belonging to the target forgetting categories in the local data set of all clients as a forgetting category set, constructing a data subset belonging to the reserved categories as a reserved category set, calculating Fisher information matrixes of all parameters of the pre-training global model on the forgetting category set and the reserved category set respectively, screening model parameters to be weakened, executing weakening, and updating the pre-training global model based on the weakened model parameters; S3, performing post-training on the updated pre-training global model by utilizing the image samples in the reserved class set of the local data set to obtain a target global model; in the step S2, computing Fisher information matrices of all parameters of the pre-trained global model on the forgetting class set and the reserved class set respectively, which specifically includes: Calculating Fisher information matrixes of all parameters of the pre-trained global model on all data subsets of a local data set of each client respectively, wherein the Fisher information matrixes comprise Fisher information matrixes corresponding to the data subsets belonging to the target forgetting category and Fisher information matrixes corresponding to the data subsets belonging to the reserved category; Aggregating Fisher information matrixes corresponding to data subsets belonging to the same category in all clients to obtain global Fisher information matrixes corresponding to all parameters of the pre-trained global model and belonging to the target forgetting category, wherein the global Fisher information matrixes correspond to all categories and belong to the reserved category; performing inter-class aggregation on the global Fisher information matrix corresponding to the target forgetting class to obtain a first target importance matrix; performing inter-class aggregation on the global Fisher information matrix corresponding to the reserved class to obtain a second target importance matrix; The step S2 further comprises the step of screening model parameters to be weakened based on the first target importance matrix and the second target importance matrix: In formula (6): representing the screened model parameter set to be weakened; representing the first object importance matrix corresponding to the object forgetting category The number of components of the composition, Representing the first target importance matrix corresponding to the retention class Components for measuring parameters of the pre-trained global model Importance to the target forget category and the retention category; the screening parameters are set; and/or, performing weakening on the screened model parameters to be weakened based on the weakening coefficient: In the formulas (7) and (8): parameters representing a pre-trained global model A corresponding weakening coefficient; Is a super parameter for controlling the degree of weakening; to pretrain the global model A parameter; Parameters of the pre-trained global model after performing the weakening operation.
2. The federal damping type forgetting method according to claim 1, wherein in S2, a Fisher information matrix of all parameters of the pre-trained global model on all data subsets of the local data set of each client is calculated, specifically: In the formula (2): Representing clients Regarding categories Fisher information matrix of (2) Individual components, measure parameters of the pre-trained global model To the client Importance of the subset of data belonging to category c; representing a pre-trained global model in a single image sample A loss function on; is the data set of the j-th lot in the data subset of client k for category c; the number of image samples included in the data set of the j-th batch; Is the sum of all data losses in the current batch; is the average loss of the current lot; is the average loss versus model parameter Is the first partial derivative of (a); a squaring operation representing size adaptation; is the total number of batches of client k for the subset of data of category c; the results of all lot calculations representing the subset of client k's data for category c are summed and divided by the total number of lots to average.
3. The federal damping type forgetting method according to claim 1, wherein in S2, aggregation is performed on Fisher information matrices corresponding to data subsets belonging to the same category in all clients to obtain global Fisher information matrices corresponding to all parameters of the pre-trained global model, specifically: In the formula (3): Representing a global Fisher information matrix corresponding to the category c obtained by aggregation; Representing the total number of clients; Representing clients The total number of image samples of the corresponding class c in the uploaded local data set; and the Fisher information matrix of the corresponding category c uploaded by the client k.
4. The federal damping type forgetting method according to claim 1, wherein in S2, the global Fisher information matrix corresponding to the target forgetting category is aggregated between classes, specifically: In the formula (4): representing the first object importance matrix corresponding to the object forgetting category Individual components, measure parameters of the pre-trained global model Importance to the target forgetting category; Representing the class of aggregate The ith component in the global Fisher information matrix, j ranges from 1 to m; and selecting the maximum value from the ith component of the global information matrix corresponding to the m target forgetting categories, and constructing the maximum value as a first target importance matrix.
5. The federal damping type forgetting method according to claim 1, wherein in S2, the global Fisher information matrix corresponding to the reserved class is subjected to type-to-type aggregation, specifically: In formula (5): representing a second target importance matrix corresponding to the reserved category; A global Fisher information matrix of a corresponding category c obtained by aggregation is represented; Representing categories The total number of image samples in all clients, i.e. the corresponding category in each client The sum of the number of samples of (a); Representing the category belonging to the reserved category The calculated results are accumulated and summed.
6. An apparatus for implementing the federally damped amnestic method according to any one of claims 1 to 5, said apparatus comprising: The model acquisition module is used for acquiring a pre-trained global model, wherein the pre-trained global model is obtained by training a given initial global model by utilizing a local data set of a client based on a federal learning method, the local data set is obtained by distributing the local data set based on the given training data set and comprises a plurality of image samples of different categories, and the image samples belonging to the same category in the local data set of each client are constructed as a data subset of the corresponding category on the client; The model parameter weakening module is used for randomly selecting any one or more categories from the training data set as target forgetting categories, the remaining categories are used as reserved categories, a data subset belonging to the target forgetting categories in the local data set of all clients is constructed as a forgetting category set, a data subset belonging to the reserved categories is constructed as a reserved category set, fisher information matrixes of all parameters of the pre-training global model on the forgetting category set and the reserved category set are calculated respectively, the model parameters to be weakened are screened, weakening is carried out, and the pre-training global model is updated based on the weakened model parameters; And the post-model training module is used for performing post-training on the updated pre-training global model by utilizing the image samples in the reserved class set of the local data set to obtain a target global model.
7. An electronic device, the electronic device comprising: at least one processor, and A memory storing instructions that, when executed by the at least one processor, cause the at least one processor to perform the federal damped forgetting-like method of any one of claims 1 to 5.
8. A machine-readable storage medium having stored thereon executable instructions that when executed cause the machine to perform the federal damped forgetting-like method of any one of claims 1 to 5.

Description

Federal damping type forgetting method and device, electronic equipment and storage medium Technical Field The invention belongs to the technical field of federal learning, and particularly relates to a federal damping forgetting method, a federal damping forgetting device, electronic equipment and a storage medium. Background Federal learning is a distributed machine learning technique for privacy protection that includes several clients and a server that help the server train a global model by sharing valuable information (e.g., gradients, model parameters, etc., trained using local datasets) without sharing local private data. Whereas traditional centralized machine learning techniques focus all private data sets on a server for training. In the federal learning paradigm, clients share information to help a server train. Although federal learning uses this approach to achieve data privacy protection, models trained on servers may reveal information about local data, i.e., models may indirectly remember users' private data. Because of the need for privacy protection and forgotten rights, some of the sample information needs to be removed from the model, such as the corresponding class of samples, certain back door samples, etc., the most straightforward and thorough approach must be to retrain the model from scratch, which technique, while guaranteeing that the data is completely deleted, is impractical due to the excessive communication and computation costs, and therefore, machine forgetting techniques have developed that mainly study how to avoid the excessive overhead of retraining the model from scratch, which approach would have resulted in erasing the effect of the target data from the model. The core goal of the machine forgetting technique is to remove knowledge of specific data from the trained model without re-training the model from scratch, while minimizing the impact on the overall performance of the model. In the federal environment, there are three main categories, namely class forgetting, client forgetting and sample level forgetting, and the current main research focuses on client forgetting and sample forgetting and ignores class forgetting. In the federal type forgetfulness field, researchers have proposed a variety of technical paths. For example, wang et al use channel pruning techniques to forget data of the corresponding category, zhao et al adjust model weights through a momentum decay mechanism to achieve forgetting of clients or categories. Yasser H Khalil et al propose a method for achieving federal forgetting by weight reversal. Furthermore, chinese patent document CN119849601a discloses a federal class forgetting learning method based on noise countermeasure training, which includes four steps of noise countermeasure training, calculating target data influence degree, forgetting training, and retraining. The noise countermeasure training is to destroy model classification capability by training noise data through error maximization and knowledge distillation, calculate gradient of target data influence degree from target data on a global model, forget training is to combine the noise data into a residual data set to participate in training and adjust parameters under the assistance of influence degree, and retraining is to perform normal federal learning process on the residual data set. It follows that in recent years, a certain research progress has been made in this field, mainly focusing on improving the efficiency of the forgetting algorithm and the effectiveness of data forgetting. However, the existing channel pruning type forgets that the performance of the model is greatly lost after the forgetting algorithm is executed, a large number of communication rounds are needed for recovering, and the performance of the model in the reserved category cannot be recovered even if the same communication rounds of training and pre-training are performed in fine tuning after some categories are forgotten. Based on this, the invention selects a technical approach of 'soft modification', namely damping is carried out according to the importance of parameters. The selection made by the invention inspires that researchers such as Foster successfully apply the damping method in centralized learning, and the method does not change the structure of the model, but weakens the weight of specific parameters, so that the original knowledge of the model can be maintained to the greatest extent, and a solid foundation is laid for quick recovery later. Disclosure of Invention The invention aims to overcome at least one defect of the prior art, and provides a federal damping forgetting method to solve the problems that the performance of the existing federal pruning forgetting learning model in a reserved category is greatly damaged and the performance of the model is slowly recovered. The invention also discloses a device loaded with the federal damping forgetting method. The technical terms r