CN-121999264-A - Method and apparatus for evaluating robustness of diffusion-based purification models

CN121999264ACN 121999264 ACN121999264 ACN 121999264ACN-121999264-A

Abstract

Embodiments of the present disclosure disclose methods and apparatus to evaluate the robustness of diffusion-based purification models. The method comprises the steps of conducting random purification on noise images for a first number of times through a purification model to obtain a first number of purification results, wherein the noise images are obtained by superposing sample images against noise, determining the first number of classification results through a classifier, determining loss values and gradients in each random purification process respectively based on differences between the classification results and actual categories of the sample images, determining a representative random purification process based on the loss values and gradients in each random purification process, conducting reverse gradient propagation based on the representative random purification process, determining a target gradient, attacking the purification model based on the target gradient, and determining robustness of the purification model. This embodiment improves the accuracy of evaluating the robustness of the diffusion-based purification model.

Inventors

XIANG YANG
WANG JIEBO
FU XIAOWEN
HAN YUXUAN

Assignees

香港科技大学

Dates

Publication Date: 20260508
Application Date: 20251009
Priority Date: 20241105

Claims (10)

1. A method of evaluating robustness of a diffusion-based purification model, comprising: randomly purifying noise images for a first number of times through the purifying model to obtain a first number of purifying results, wherein the noise images are obtained by superposing sample images against noise; determining a first number of classification results by a classifier from the first number of purification results; Determining a loss value and a gradient in each random cleaning process based on the difference between each classification result and the actual class of the sample image; Determining a representative random decontamination process based on the loss value and the gradient in each random decontamination process; Performing inverse gradient propagation based on the representative stochastic purification process, determining a target gradient; and attacking the purification model based on the target gradient, and determining the robustness of the purification model.
2. The method of claim 1, wherein the determining a representative random decontamination process based on the loss value and the gradient in each random decontamination process comprises: Determining shared countermeasure noise which can affect the most random purification process based on the loss value and gradient in each random purification process; a representative random purging process is determined based on the shared challenge noise.
3. The method of claim 2, wherein the determining a representative random cleaning process based on the shared challenge noise comprises: Estimating the loss of the shared countermeasure noise for each random purging process; mapping the loss caused by each random purification process to the probability of sharing loopholes in the random purification process; a representative random decontamination process is determined by maximizing the probability.
4. The method of claim 1, wherein the determining a target gradient based on the inverse gradient propagation of the representative stochastic purge process comprises: Determining a probability distribution of misclassification by maximizing the loss value in each random cleaning process; carrying out weighted summation on gradients in each random purification process based on the probability distribution, and determining an aggregation gradient; A target gradient is determined based on the representative stochastic purge process and the polymerization gradient performing inverse gradient propagation.
5. The method of claim 4, wherein the determining a target gradient based on the inverse gradient propagation of the representative stochastic purge process and the polymerization gradient comprises: purifying the noise image through a representative random purification process to obtain a purification result; and carrying out inverse gradient propagation on the purification result and the inner product of the polymerization gradient, and determining a target gradient.
6. The method of claim 1, wherein the attacking the purification model based on the target gradient, determining the robustness of the purification model, comprises: Based on the target gradient, attacking the purification model, and determining robustness of the attack and updated countermeasure noise; Repeatedly executing the following attack steps until the repeated times reach a second number of times, namely superposing the sample image on the updated anti-noise to obtain an updated noise image, randomly purifying the updated noise image for a first number of times through the purifying model, determining a first number of classification results through a classifier, determining a loss value and a gradient in each random purifying process respectively based on the difference between each classification result and the actual category of the sample image, determining a representative random purifying process based on the loss value and the gradient in each random purifying process, performing inverse gradient propagation based on the representative random purifying process, determining a target gradient, attacking the purifying model based on the target gradient, and determining the robustness of the attack and the updated anti-noise; And determining the worst value of the obtained robustness as the robustness of the purifying model.
7. The method of claim 6, wherein the determining a target gradient based on the inverse gradient propagation of the representative stochastic purge process comprises: Performing a counter-gradient propagation based on the representative stochastic purification process, determining an intermediate gradient; and weighting the intermediate gradient based on the repetition times to determine a target gradient.
8. An apparatus for evaluating robustness of a diffusion-based purification model, comprising: A purifying unit configured to perform a first number of random purifications on a noise image through the purifying model to obtain a first number of purifying results, wherein the noise image is obtained by superposing an anti-noise image on a sample image; A classification unit configured to determine a first number of classification results from the first number of purification results by a classifier; A calculation unit configured to determine a loss value and a gradient in each random cleaning process based on a difference between each classification result and an actual class of the sample image, respectively; A selection unit configured to determine a representative random purging process based on the loss value and the gradient in each random purging process; a propagation unit configured to perform inverse gradient propagation based on the representative stochastic purification process, determining a target gradient; An attack unit configured to attack the purification model based on the target gradient, determining a robustness of the purification model.
9. An electronic device, comprising: One or more processors; A storage device having one or more computer programs stored thereon, When executed by the one or more processors, causes the one or more processors to implement the method of any of claims 1-7.
10. A computer readable medium having stored thereon a computer program, wherein the computer program, when executed by a processor, implements the method of any of claims 1-7.

Description

Method and apparatus for evaluating robustness of diffusion-based purification models Technical Field Embodiments of the present disclosure relate to the field of computer technology, and in particular, to a method and apparatus for evaluating robustness of diffusion-based purification models. Background Diffusion-based decontamination has been shown to be impressive as an antagonistic defense approach. However, there is a concern whether this robustness stems from insufficient assessment. Existing studies have shown that attacks based on the desired transformation (EOT, expectation of Transformation) face gradient dilemma due to global gradient averaging, resulting in poor evaluation. Furthermore, a single evaluation (1-evaluation) underestimates the risk of re-submission in random defenses. To solve these problems, the present application proposes an efficient attack method-DIFFHAMMER. The method bypasses gradient dilemma by selectively attacking a vulnerable purification process, integrates N times of evaluation (N-evaluation) into a cycle, and realizes comprehensive and efficient evaluation by utilizing gradient grafting. Experiments prove that DIFFHAMMER can obtain ideal effects within 10-30 iterations, and the performance is superior to other methods. This has made the reliability of diffusion-based decontamination questionable after alleviating the gradient dilemma and carefully reviewing its risk of re-delivery. Disclosure of Invention Embodiments of the present disclosure propose methods and apparatus to evaluate the robustness of diffusion-based purification models. In a first aspect, embodiments of the present disclosure provide a method of evaluating robustness of a diffusion-based decontamination model, including performing a first number of stochastic decontaminations on a noise image through the decontamination model to obtain a first number of decontamination results, wherein the noise image is obtained by superimposing an anti-noise on a sample image, determining the first number of classification results by a classifier, determining a loss value and a gradient in each stochastic decontaminating process based on differences between each classification result and an actual class of the sample image, respectively, determining a representative stochastic decontaminating process based on the loss value and the gradient in each stochastic decontaminating process, performing a reverse gradient propagation based on the representative stochastic decontaminating process, determining a target gradient, attacking the decontamination model based on the target gradient, and determining robustness of the decontamination model. In a second aspect, embodiments of the present disclosure provide an apparatus for evaluating robustness of a diffusion-based decontamination model, including a decontamination unit configured to subject a noise image to a first number of stochastic decontaminations by the decontamination model to obtain a first number of decontamination results, wherein the noise image is obtained by superimposing an anti-noise on a sample image, a classification unit configured to determine the first number of classification results by a classifier, a calculation unit configured to determine a loss value and a gradient in each stochastic decontamination process based on a difference between each classification result and an actual class of the sample image, respectively, a selection unit configured to determine a representative stochastic decontamination process based on the loss value and the gradient in each stochastic decontamination process, a propagation unit configured to perform inverse gradient propagation based on the representative stochastic decontamination process, an attack unit configured to attack the decontamination model based on the target gradient, and determine robustness of the decontamination model. In a third aspect, embodiments of the present disclosure provide an electronic device comprising one or more processors, a storage device having one or more computer programs stored thereon, which when executed by the one or more processors, cause the one or more processors to implement the method of any of the first or second aspects. In a fourth aspect, embodiments of the present disclosure provide a computer readable medium having a computer program stored thereon, wherein the computer program, when executed by a processor, implements the method according to any one of the first or second aspects. In a fifth aspect, embodiments of the present disclosure provide a computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of the first or second aspects. The embodiments of the present disclosure provide a method and apparatus for evaluating robustness of diffusion-based purification models, based on an Expectation Maximization (EM) algorithm, which proposes a selective attack method DIFFHAMMER. The algor