KR-20260065176-A - Backdoor attack method for improve the security of federated learning

KR20260065176AKR 20260065176 AKR20260065176 AKR 20260065176AKR-20260065176-A

Abstract

The present invention may include a) a step of identifying vulnerable layers by applying layer-by-layer gradient masking in a client model of federated learning, b) a step of injecting a backdoor into gradient coordinates where the coordinates do not change relatively by applying top-k% gradient masking to the vulnerable layers, and c) a step of injecting a backdoor that bypasses the defense mechanism by using Projected Gradient Descent (PGD) to limit the value within the defense mechanism.

Inventors

서창호
정수용
김현일
최민영

Assignees

국립공주대학교 산학협력단

Dates

Publication Date: 20260508
Application Date: 20241101

Claims (7)

a) A step of identifying vulnerable layers by applying layer-by-layer gradient masking to the client model of federated learning; b) a step of injecting a backdoor into gradient coordinates where the coordinates do not change relatively by applying top-k% gradient masking to the vulnerable layer; and c) A backdoor attack method comprising the step of injecting a backdoor that bypasses a defense mechanism by using PGD (Projected Gradient Descent) to limit the value within the defense mechanism.
In paragraph 1, A backdoor attack method characterized by the above step a) masking the gradient coordinates of all layers.
In paragraph 2, The vulnerable layers in step a) above are the ih layer and the hh layer in the LSTM (Long Short Term Memory) model, and A backdoor attack method characterized by injecting a backdoor into both the ih layer and the hh layer.
In paragraph 2, The vulnerable layer in step a) above is the mlp.c-fc layer in the GPT (Generative pre-trained transformer)-2 model, and A backdoor attack method characterized by injecting a backdoor into the mlp.c-fc layer.
In paragraph 1, Step b) above sets the top-k% gradient masking ratio relatively large for the ih layer among the vulnerable layers in the LSTM (Long Short Term Memory) model, A backdoor attack method characterized by using a relatively small top-k% gradient masking ratio in the hh layer.
In paragraph 1, The above step b) is a backdoor attack method characterized by using a relatively small top-k% gradient masking ratio in the mlp.c-fc layer, which is a vulnerable layer in the GPT (Generative pre-trained transformer)-2 model.
In paragraph 1, The above step c) is, A backdoor attack method characterized by using Projected Gradient Descent (PGD) with a 2- degree to evade Norm Clipping defense techniques.

Description

Backdoor attack method for improve the security of federated learning The present invention relates to a backdoor attack method for improving federated learning security, and more specifically, to a covert and sustainable backdoor attack method. With the recent application of artificial intelligence in various fields, diverse learning algorithms have been developed, and technologies for network-based learning methods, such as federated learning, have been proposed. Federated learning is a type of distributed data learning strategy; while security is enhanced by decentralization, security against malicious network attacks on each client and global server is required. To enhance the security of the system, various forms of attacks can be assumed, and defense mechanisms can be established to defend against them. To prevent attacks from malicious clients in federated learning, the global server applies security technology during the aggregation process. For example, Korean registered patent No. 10-2522053 (registered April 11, 2023, custom distributed learning method and system robust against adversarial attacks) introduced a reverse aggregation technique to provide defense against adversarial attacks. In addition, Korean published patent No. 10-2023-0056499 (published April 27, 2023, federated learning mediation method and apparatus for preventing data infringement) introduces a mediation device that blocks a direct network connection between a data owner and a data analyst in a federated learning system. Although various such security technologies are being applied, it is necessary to strengthen security to prepare for a wider variety of malicious attacks. In particular, backdoor attacks are known as an example of malicious attacks, but existing backdoor attacks have limitations such as rapidly decreasing effectiveness over several rounds or being easily detected. Because these limitations can make it difficult to effectively utilize backdoor attacks, there is a need to develop more covert and sustainable backdoor attack techniques and to develop defense mechanisms against them. The concept of associative learning can be explained in more detail as follows. In a federated learning task, the global model is trained over T (positive integer) rounds, and in each round, a sampled subset of k clients participates out of a total of K (positive integer) participants (clients). In each round t, the selected clients receive the current global model and create a local model by performing training multiple times on their respective local datasets. Clients send only update values representing the difference between their local models and the global model to the global server, and the server receives the updated local models from the clients, aggregates them, and creates a new global model as shown in Equation 1 below. Here, for each client k It is defined as, represents the gradient of each client k. In this scenario, each client solves the variance optimization problem as shown in Equation 2 below. Here, the local objective function Fk(w) is defined by the following mathematical equation 3. Here, ℓ is the task loss function (e.g., cross-entropy), n is the learning rate, and Pk is the training dataset of client k, am. In a typical federated learning scenario, each normal client trains a local model using its own data and then sends the updated values to a parameter server. b The learning process of the th normal client can be expressed as Equation 4. Here, the gradient g b can be expressed by the following mathematical equation 5. Here, n b is the dataset size of the b-th normal client, and Pb is the dataset of the b-th normal client, is the normal model in the t-th iteration. On the other hand, the attacker's primary goal is to inject a backdoor that induces malicious behavior or manipulates specific results when a specific trigger phrase is given, without affecting the model's overall performance during the training process. That is, the learning process of the m-th malicious client can be expressed as shown in mathematical equation 6 below. Here, the gradient can be expressed by mathematical formula 7. Here, n m is the data size of the m-th malicious client, is a model injected with a backdoor in the t-th iteration, and Pm is the dataset of the m-th malicious client. However, as the threat of backdoor attacks in federated learning environments is recognized, various defense mechanisms have been proposed to protect the aggregation process, and attackers must implement covert backdoors that can evade the filtering mechanisms present in federated learning systems. In federated learning environments, representative defense mechanisms such as Norm Clipping or FLAME are applied, and Norm Clipping techniques using l2 -norms have the potential to filter backdoor attacks by clipping update values that exceed the norm limits. This mechanism is highly effective in restricting backdoor functions and also has the advantage of low