CN-121984776-A - Back door attack defense method based on interpretable feature graph and symbol fine granularity information
Abstract
A back door attack defense method based on interpretable feature graphs and symbol fine granularity information belongs to the field of artificial intelligence and machine learning, and comprises two detection modules, namely Interpretable Feature Detection (IFD) and dynamic main parameter difference detection (DMPD). The IFD captures the global difference of gradient information by means of an interpretable technology, the IFD does not lose the defending effect along with model iteration, and a subset can be obtained after IFD screening . Since the global variability of the IFD to capture gradient information alone ignores the importance of local gradient information, the IFD may be overridden. The DMPD uses the gradient symbol information, the symbol information can carry out finer evaluation on the model submitted by the client and cannot be influenced by iteration, and a subset is obtained after DMPD screening . Finally, intersection of the two subsets obtained by the two modules is taken to obtain a final set , for And (5) performing norm clipping on the gradient update, and then aggregating a new global model. The invention has excellent defending capability against back door attacks under federal learning.
Inventors
- WANG CHANGXING
- WANG XIUJUAN
- XU LIYA
- LI JIAYING
Assignees
- 北京工业大学
Dates
- Publication Date
- 20260505
- Application Date
- 20260312
Claims (1)
- 1. The back door attack defending method FLID based on the interpretable feature map and the symbol fine-grained information is characterized in that: Step 1, interpretable Feature Detection (IFD) For an inclusion Classification task of individual labels, server Will generate A uniform noise sample, wherein each tag contains The method comprises the steps of obtaining a noise sample, uniformly distributing the same number of samples for each label to have negligible influence on a screening result because a server is invisible to a back door label of a malicious client, but an interpretable saliency map of a malicious model on a normal label is smaller than a benign model difference, and assuming that each round of training server randomly selects Individual clients participate in training, including A malicious client-side will be able to determine, A benign client, wherein A saliency map representation for each client Wherein Represent the first The number of clients to be connected to each other, Represent the first Each round of training server will make global model And issuing the result to each client, wherein the global model is not polluted in the initial condition, so that cosine similarity of a saliency map generated by each client and a saliency map generated by the global model can be calculated to judge whether the current model is polluted or not, and the formula is defined as follows: ; ; Wherein the method comprises the steps of Represent the first The saliency map calculated by each client on all noise samples contains The data of the plurality of data, Representation server A saliency map calculated over all noise samples, containing The data of the plurality of data, In order to achieve this, the first and second, Represent the first Saliency map of individual clients and server Cosine similarity of saliency maps of (2), followed by threshold values Screening out subsets Wherein ; Step 2, dynamic main parameter difference detection (DMPD) IFD only utilizes global gradient information to screen malicious clients and is easy to be crossed by carefully designed attack strategies, for example, in strong back door attack, as model parameters are continuously expanded, the malicious parameters of an attacker have extremely small proportion, only capturing global gradient information often ignores the important effect of local gradient information on detecting malicious clients, the symbol direction can carry out more careful evaluation on models submitted by the clients, and especially when the models are attacked, the symbol direction of malicious update is different from the symbol direction of benign update, even if the traditional index difference of the malicious clients is smaller, so the symbol direction of gradient update is considered as the second stage of screening malicious models; Therefore, it is proposed to dynamically detect the principal parameter difference (DMPD) to check the difference of the gradient update symbols of each client in the case of unified core parameter position of the global model, and the global model of each round is dynamically changed during the Federal learning training process, so that a temporary global model is first aggregated by FedAvg during the t-th round of training Since FedAvg is an unguarded method, if there is a malicious client Will be contaminated, in order to obtain Is a temporary global model of noise data pairs generated by an IFD module Counter-propagating to obtain gradient updates Back propagation is not intended for training but only to find the gradient change maximum Each position due to Contaminated with The positions include parameters which play a role in the back door task and normal parameters and do not violate the data privacy, and at the moment, the data privacy is acquired Is the main parameter position of (a) The calculation of the differences between subsequent clients will be wrapped around Unfolding; Definition 1 ( ) For a vector , Representing the real number domain, representing Is composed of A vector of real numbers, wherein Is gradient update Flattened vector, definition Representation vector quantity Each at the collection The positions in (a) are taken as a union, wherein , Is to Sorting and picking before A set of post-position, wherein Representing this front The positions correspond to the vectors In (2), wherein Representation of Cannot exceed the vector length Not less than 1 of the total number of the components, Representation of Is in an ascending order of magnitude, ; Representing the current global model Parameter area of interest, per round Is dynamic and accords with the characteristic of federal learning of non-independent and uniform distribution Unifying parameter positions among clients, and easily screening malicious clients for quantifying positions among different clients The difference above introduces a main parameter symbol difference value (MPSV) before which to better extract the difference of each client in By means of symbols on Obtain a mask ; ; Wherein Is at Based on the generated mask, if the mth position is in And 1, otherwise 0, From the following components 1 And 0 Each; Gradient updating after straightening of the ith client of the t-th round; in order to obtain the function of the parameter symbol, the values are-1, 0 and 1; is an exclusive or function, and is 1 if the signs of the two parameters are different, otherwise, is 0; Is the Hadamard product; For calculating the gap between two clients The number of parameters with different symbols is added; representing the primary parameter symbol difference values of the ith client and the jth client; to get the total difference between each client and the other clients, the following is defined: Wherein Representing the total difference of the sign of the client i and other clients, and finally obtaining Representing From the following components A real number is composed because each round of training has P clients involved, at this time DMPD calculates MPSV through main parameter symbol to distinguish malicious client from benign client effectively; after the MPSV is obtained, the MPSV is required to be subjected to anomaly filtering, and a Z_score (MZ_score) based on a median is used for filtering, wherein the MZ_score can process non-normal distribution data and accurately reject abnormal values by means of the median compared with the traditional Z_score, so that the influence of the extreme values is reduced; Definition 2 ( ) For a collection In which is assumed that Is a collection Is used for the middle-order number of the (c), Is a collection For any standard deviation of (2) Obtain its MZ_score as ; DMPD by Processing MPSV and setting threshold Deleting clients exceeding the threshold to obtain a set Wherein The malicious client side causes MPSV to be calculated smaller due to consistency of back door tasks An anomaly will occur.
Description
Back door attack defense method based on interpretable feature graph and symbol fine granularity information Technical Field The invention belongs to the field of artificial intelligence and machine learning, and particularly relates to defending a back door attack developed by an attacker under a federal learning framework. Background In the present age, technologies such as big data, cloud computing and the like are vigorously developed, artificial intelligence technologies such as machine learning, deep learning and the like are popularized to various industries, the development of a big language model LLM is performed in the present day, the development of hot trend is started from the development ChatGPT of OpenAI in 2022, the LLM has wide application capability such as knowledge question and answer and text generation, and the LLM relies on massive data from pre-training to fine tuning, so that abundant values are contained in the massive data, and the data are vital to promoting the development of artificial intelligence. However, the proliferation of data volume also causes some problems, firstly, the communication cost is increased when the user uploads the data due to massive data, and meanwhile, the central server is difficult to store the massive data, secondly, the third party platform cannot guarantee the privacy of the data, so that the risk of data leakage is increased, the data holder is reluctant to share the data, and finally, the data held by the data holder possibly contains sensitive information, and the information cannot be disclosed. The training of the model is not carried out, but the data holder is reluctant or cannot disclose the data, more and more data are kept in a secret state, so that the problem of data islanding is generated, and the problems of data islanding and security become main bottlenecks affecting the development of artificial intelligence. In order to solve the problems of data island and security, mcMahan et al propose federal learning for the first time, which is a novel distributed machine learning model that allows multiple clients to jointly train out a model through multiple iterations while protecting the data privacy of each client. In each iteration, each client trains a local model by using local data, then uploads gradient updates to a server, the server receives all gradient updates and aggregates a new global model, and then the new global model is issued to the client for a new iteration. Due to the ability of federal learning to protect data privacy, it has become a basic framework for building machine learning models in the fields of traffic, medical treatment, internet of things, word prediction, and the like. However, federal learning can protect the data privacy of the client, but is extremely vulnerable to attacks, especially back door attacks, due to the unknowing nature of the gradient updates submitted by the server to the client, and by manipulating malicious clients to embed back door triggers on their local data, and then inject malicious updates into the global model, the contaminated global model will identify the sample embedded back door triggers as the back door label set by the attacker, while it is normal on other samples, and back door attacks are more difficult to defend, because the malicious model is very small in difference from the benign model, and the attacker can manipulate local model parameters to further raise the defending difficulty, so that it is particularly important to identify and defend back door attacks, compared with non-directional attacks only to reduce the performance of the global model. Nowadays, for federal learning backdoor attacks, various defense methods are proposed in academia and industry, which use gradient updated different information to detect and defend against backdoor attacks, and can be roughly classified into the following categories: score-based backdoor attack defense methods that utilize specific metrics (e.g., cosine similarity, gradient norms, variances, sign information) to assign different weights to each client to reduce the impact of malicious gradients on the model, the methods can obtain better results on simple back door attacks, because the malicious gradients and the benign gradients can be well distinguished at the moment, however, the defending methods have poor defending effects on attacks carefully designed by attackers, and the attackers can generate malicious gradients very similar to the benign gradients. According to the back door attack defense method based on the differential privacy, the relation between model parameters and the back door triggers is weakened by adding Gaussian noise in a training or aggregation stage, the method can play a role even facing more hidden back door attacks, however, the performance of the model is reduced by adding noise to the model, and meanwhile, the model training is easy to collapse due to the fact that the amplitude of the noi