CN-122020658-A - Neural network backdoor defense method, device and hardware based on consistency learning
Abstract
The invention provides a neural network backdoor defense method, device and hardware based on consistency learning, which are used for acquiring a clean data set and a suspicious backdoor model, constructing a toxicity removal data set for guiding consistency learning based on the clean data set, combining and constructing a consistency learning objective function, after the toxicity removal data set is input into the suspicious backdoor model, adjusting parameters of the suspicious backdoor model based on the consistency learning objective function to strengthen learning of self content of sample data, reducing attention to backdoor trigger information to inhibit backdoor triggering, realizing backdoor defense, and maintaining classification performance of normal samples based on an optimization target. The method directly removes the backdoor of the suspicious backdoor model, is simpler and more efficient, greatly reduces the time cost required by fine adjustment, and maintains the high-level image classification performance.
Inventors
- MEI JIANPING
- Yu Miaoqi
Assignees
- 浙江工业大学
Dates
- Publication Date
- 20260512
- Application Date
- 20251230
Claims (10)
- 1. A neural network backdoor defense method based on consistency learning is characterized in that the method acquires a clean data set and a suspicious backdoor model Constructing a detoxifying data set for guiding consistency learning based on the clean data set; combining and constructing a consistency learning objective function, and inputting the suspicious backdoor model by using the detoxification data set And then, adjusting parameters of the suspicious backdoor model based on the consistency learning objective function to strengthen the learning of the self content of the sample data, reducing the attention to the information of the backdoor trigger to inhibit the backdoor trigger, thereby realizing backdoor defense and maintaining the classification performance of the normal sample based on the optimization objective.
- 2. The neural network backdoor defense method based on consistency learning according to claim 1 is characterized by comprising the steps of selecting all samples of N categories from a clean data set based on a proxy model, superposing the same disturbance on all samples of each category, enabling all samples of the same category to pass through the proxy model with frozen weights, calculating cross entropy loss, enabling the loss to carry out gradient back propagation, optimizing the disturbance to enable the loss to contain characteristic information of the category, and sequentially extracting the disturbance corresponding to each of the remaining categories.
- 3. The neural network backdoor defense method based on consistency learning of claim 2, wherein each disturbance is independently added to each sample data in the clean data set to obtain new sample data with N times of the original sample data in the clean data set, and a detoxication data set is constructed.
- 4. The neural network backdoor defense method based on consistency learning of claim 2, wherein the mean and variance of each class in the clean dataset are calculated by using the proxy model and recorded as And stored.
- 5. The neural network backdoor defense method based on consistency learning as set forth in claim 4, wherein the consistency learning objective function includes disturbance output invariance loss Clean sample output consistency loss And distribution consistency loss 。
- 6. The neural network back door defense method based on consistency learning of claim 5, wherein the detoxication data set is input into a suspicious back door model Suspicious back door model Learning samples with different disturbance and category labels consistent with the original samples, calculating cross entropy loss, and obtaining a suspicious back door model Disturbance output invariance loss for detoxified data set ; Copying a suspicious back door model, and marking as Respectively feeding samples of the clean data set into suspicious back door models And Calculating MSE loss through probability distribution finally output by the two to obtain clean sample output consistency loss ; With suspicious back door models Sampling the final statistical distribution of the output sample and the mean and variance of the class to which the current sample belongs, and calculating the distribution consistency loss of the final statistical distribution and the mean and variance 。
- 7. The neural network backdoor defense method based on consistency learning of claim 1, wherein the clean data set and the samples in the manufactured detoxified data set are image samples.
- 8. The neural network backdoor defending device based on consistency learning is characterized by comprising: The data and model acquisition module is used for acquiring a clean data set and a suspicious back door model; a consistency constraint construction module for constructing a toxicity removal data set guiding consistency learning based on the clean data set; The model adjustment module is used for inputting the suspicious backdoor model by the detoxification data set and the clean data set, and carrying out parameter adjustment on the suspicious backdoor model based on the constructed consistency learning objective function so as to defend the triggering of the backdoor and maintain the classification performance.
- 9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the neural network backdoor defense method based on consistency learning as set forth in any one of claims 1 to 7 when executing the program.
- 10. A computer-readable storage medium, on which a computer program is stored, characterized in that the program, when executed by a processor, implements a neural network backdoor defense method based on consistency learning according to one of claims 1 to 7.
Description
Neural network backdoor defense method, device and hardware based on consistency learning Technical Field The invention relates to the technical field of electric digital data processing, in particular to a neural network backdoor defense method, device and hardware based on consistency learning in the technical field of model safety. Background Image classification is a classical problem in the field of computer vision classification, with the aim of classifying different images into different categories. In recent years, the deep neural network obtains a good application effect in the field of vision classification, has become a first choice modeling tool for solving a plurality of machine learning tasks in the field of computer vision, and particularly, the large-scale neural network trained in a supervision mode obtains generalization capability obviously superior to other traditional models in the image classification task. However, in recent years, research layers of model back door attacks are endless, which prove that a basic conclusion is that any model has at least one back door vulnerability, and the model can be triggered manually by a certain means, such as a specific trigger, so as to achieve the result that the model generates misclassification. Model security is therefore becoming more and more important, as once the wrong information classification is generated in critical areas, serious consequences will occur. Model attacks fall into two main categories. The method is used for resisting sample attack, the method does not need to poison the model and the data, only uses the error cognition, namely the back door, of the model generated naturally in the process of learning the data to trigger the error recognition of the model, and the second type is back door attack, wherein an attacker can implant the back door in a plurality of links of data collection, training and the like of the model, and the method is specifically characterized by data poison, module addition and the like. However, the above-mentioned partial attack method has a considerable limitation in the real scene, such as countering sample attack, and a large amount of original training data is required to accurately find the model from the back door, and in the back door attack, the model architecture is also very easy to find a problem by the user. Therefore, the most widely used in real scenes is data detoxification, namely, some disturbance is performed on part of data of a user. There is a certain difficulty in the back door attack of data poisoning. Firstly, the diversity of triggers, including triggers based on nonsensical blocks (black and white blocks of 3x 3) and characteristic triggers, is characterized by the fact that the triggers are simple to operate and strong in aggressiveness, but are easily identified by people, and the triggers are quite opposite. Secondly, the difficulty of sample detection is that a part of high-efficiency back door attack method can poison only a very small part of data to achieve a stronger attack effect, which is a great challenge for sample detection. Finally, the complexity of back door removal is that the currently proposed defense method is mainly divided into three modes of sample detection, robustness training and non-learning, but for various different attack methods, an effective defense method is always lacking to remove the back door. Disclosure of Invention In order to solve the technical problems, the invention provides a neural network backdoor defense method, device and hardware based on consistency learning. The technical scheme adopted by the invention is that a neural network backdoor defense method based on consistency learning is adopted, and the method obtains a clean data set and a suspicious backdoor modelConstructing a detoxifying data set for guiding consistency learning based on the clean data set, wherein the clean data set generally refers to a data set which is not polluted, and the suspicious back door model refers to a target model which is suspected to be implanted with a back door trigger; combining and constructing a consistency learning objective function, and inputting the suspicious backdoor model by using the detoxification data set And then, adjusting parameters of the suspicious backdoor model based on the consistency learning objective function to strengthen the learning of the self content of the sample data, reducing the attention to the information of the backdoor trigger to inhibit the backdoor trigger, thereby realizing backdoor defense and maintaining the classification performance of the normal sample based on the optimization objective. Preferably, all samples of N categories are selected from a clean data set based on a proxy model, the same disturbance is superimposed on all samples of each category, all samples of the same category are subjected to a weight frozen proxy model, cross entropy loss is calculated, gradient back propagation