Search

CN-116797871-B - AdvDrop-based challenge sample generation method

CN116797871BCN 116797871 BCN116797871 BCN 116797871BCN-116797871-B

Abstract

The invention discloses a method for generating a countermeasure sample based on AdvDrop, which relates to the technical field of machine learning safety, wherein an image is input into two different branches of a space domain and a frequency domain for processing, for frequency domain attack AdvDrop, the input image is firstly divided into N x N blocks, each block is converted into the frequency domain by using discrete cosine transform DCT, a quantization matrix M is introduced to reduce some specific frequencies of the transformed image, a tangent function is introduced in the quantization process to gradually approach the quantization function, the quantization matrix M is further accurately regulated by a new quantization function, then the image is converted from the frequency domain to the space domain by inverse discrete cosine transform IDCT operation, and then the countermeasure disturbance is iteratively updated by using gradients from different domains by a space domain attack and frequency domain attack fusion module, so that the quality of the generated countermeasure sample is finally improved, the difference between the distribution characteristics of the countermeasure sample and the distribution characteristics of a real sample can be reduced, and the attack success rate is improved.

Inventors

  • LING JIE
  • CHEN JINHUI
  • LUO YU

Assignees

  • 广东工业大学

Dates

Publication Date
20260508
Application Date
20230530

Claims (9)

  1. 1. A method of generating challenge samples based on AdvDrop, comprising the steps of: S1, acquiring an original image; s2, respectively inputting an original image into two different branches, wherein one branch uses PGD to attack the space domain of the original image to obtain a preliminary countermeasure sample, and the other branch uses AdvDrop to attack the frequency domain of the original image to obtain a first image; S3, combining the first image obtained in the step S2 into the preliminary countermeasure sample obtained in the step S2, and generating a final countermeasure sample by utilizing gradient update disturbance from different fields; In step S2, the other branch uses AdvDrop to attack the frequency domain of the original image to obtain a first image, and the specific steps include: S2.1, dividing an input image into N-N blocks, and converting an original image from a space domain to a frequency domain by applying DCT on each block; S2.2, obtaining the maximum value of the loss function in the frequency domain by adjusting and inputting the original image; S2.3, calculating an contrast loss P n+1 for each block separated in the frequency domain; s2.4, introducing a quantization matrix M to perform quantization operation; S2.5, introducing a differential quantization function M diff , gradually approaching the quantization function by introducing a tangent function in the quantization process, and further accurately adjusting a quantization matrix M; s2.6, fusing the quantization matrix M obtained in the step S2.5 with the resistance loss P n+1 obtained in the step S2.3; and S2.7, converting the N blocks with the modified image frequency in the frequency domain back to the space domain by using IDCT.
  2. 2. The method for generating a challenge sample based on AdvDrop as set forth in claim 1, wherein in step S2, the PGD is used to attack the spatial domain of the original image to obtain a preliminary challenge sample input classification model for classification, which specifically includes: where x is the image information, its label is y, θ is a parameter of the classification model, Is a loss function value, x t is a challenge sample processed by t times of FGSM algorithm, x t+1 is a challenge sample processed by t+1 times of FGSM algorithm, a sign function sign () extracts a gradient direction, a parameter beta represents an amplitude value of pixel update of each iteration image, pi a+S represents a loop a+S times, a group of allowed disturbance S is introduced for each image pixel point a, and a calculation method for maximizing the loss function in a PGD (pulse-width modulation) space domain attack algorithm is specifically as follows: arg maxL(X adv ,y),s.t.||X adv -x init || p < Where X init is the original image, y is the label, X adv is the challenge sample, Is a perturbation of the Lp norm.
  3. 3. The method for generating a challenge sample based on AdvDrop as claimed in claim 1, wherein in step S2.2, the calculation method for obtaining the maximum value of the loss function in the frequency domain by adjusting the input of the original image is: arg max L(D′(F(D(X adv ))),θ,y),s.t.||D(X adv )-D(X)|| p < where D () is the DCT operation, F () represents the modified image frequency, D' () is the IDCT operation, θ is the parameter of the classification model, y is the label, X is the original image, Is a perturbation of the LP norm, X adv is the challenge sample.
  4. 4. The challenge sample generation method based on AdvDrop of claim 1, wherein in step S2.3, the process of calculating the challenge loss P n+1 for each block separated in the frequency domain is: Where ω is the step size for each iteration, D () is the DCT operation, D' () is the IDCT operation, F () represents the modified image frequency, θ is the parameter of the classification model, y is the label, P n is the contrast loss at the time of updating to the nth step, Is a challenge sample over n iterations.
  5. 5. The challenge sample generation method based on AdvDrop of claim 1, wherein in step S2.4, the introducing quantization matrix M performs quantization operations, the quantization operations are: where Δ represents the quantization step size, the quantized value is limited to a valid range M is a quantization matrix, and x is image information.
  6. 6. The challenge sample generation method based on AdvDrop of claim 1, wherein in step S2.5, the differential quantization function M diff is introduced, and the quantization matrix M is accurately adjusted by gradually approaching the quantization function by introducing a tangent function in the quantization process, and the differential quantization function M diff is specifically: Phi (·) is defined as follows: Where α is an adjustable parameter, the quantization matrix M is updated with gradient symbols returned by back propagation, expressed as: Representing a perturbation of the Lp norm, M init is represented as an initial quantization matrix, M represents a quantization matrix, L (x ', y) represents a loss, M' is an updated quantization matrix, and x is image information.
  7. 7. The method for generating a challenge sample based on AdvDrop as claimed in claim 1, wherein in step S2.6, the quantization matrix M obtained in step S2.5 is fused with the challenge loss P n+1 obtained in step S2.3, and the fusion process is as follows: wherein +.is Hadamard product, For the challenge samples after n iterations, D () is the DCT operation, F () modifies the image frequency, M is the quantization matrix, and P n+1 is the challenge loss.
  8. 8. The method for generating challenge samples based on AdvDrop of claim 1, wherein in step S2.7, the application IDCT converts n×n blocks after modifying the image frequency in the frequency domain back to the spatial domain, and the IDCT expression is as follows: Wherein D (x) [u,v] is a representation of the input image x on the frequency domain after discrete cosine transform, representing coefficients at positions (u, v) in the frequency domain, x [ k, m ] represents coordinates of the image in the frequency domain transformed space domain, C (u) and C (v) are scaling coefficients, i and j are cyclic variables, the value range is 0 to N-1, and N is the size of each block.
  9. 9. The method for generating a challenge sample based on AdvDrop of claim 1, wherein in step S3, the first image obtained in step S2 is combined into the preliminary challenge sample obtained in step S2, and the disturbance is iteratively updated by using gradients from different fields, so as to finally generate the challenge sample, comprising the steps of: Let Ω S and Ω F denote that the attack on the spatial domain of the original image using PGD and the attack on the frequency domain of the image using AdvDrop, update gradients from different domains according to the resistance loss in the frequency domain, and attack Ω F on the frequency domain of the image using AdvDrop is calculated by: Where η' is the frequency value after attack, η is the original image frequency, γ f is the step size in the frequency domain, θ is the parameter of the classification model, y is the label, Is the loss function value; S10.2, updating the gradient according to the resistance loss in the spatial domain, and attacking omega s to the spatial domain of the image by using the PGD, wherein the calculation method comprises the following steps: Where η″ is a pixel value, γ s is a step size in the spatial domain, θ is a parameter of the classification model, y is a label of the input image, and then the pixel value η″ is calculated by the contrast loss in the space; and S10.3, after each iteration, switching the sequence of the attack on the frequency domain of the image by using AdvDrop, and alternately switching the sequence between the frequency domain and the space domain according to the antagonism loss so as to generate an input sample with antagonism, so that a classification model generates an incorrect classification result, and a final antagonism sample is generated.

Description

AdvDrop-based challenge sample generation method Technical Field The invention relates to the field of artificial intelligence security, in particular to a AdvDrop-based challenge sample generation method. Background With the development of deep learning and generation type countermeasure networks (GAN), the safety of the deep neural network gradually becomes a research focus in the artificial intelligence safety problem, although the deep neural network has good performance in most classification tasks, the deep neural network is very fragile when facing countermeasure samples, the countermeasure samples are a type of samples formed by manually adding fine disturbance in a data set, the generated countermeasure samples can induce a machine learning model to perform error classification, and the safety of the model is threatened; As 2013 Szegedy et al first reveals the vulnerability of a depth neural network and proposes the concept of antagonism samples, antagonism attacks formally become a research field in depth learning security, subsequent researchers propose a Fast Gradient Symbol Method (FGSM) based on CNN's high-dimensional linear hypothesis, model misjudgment is made by creating perturbations in opposite directions of gradient of a loss function, some researches make various improvements on the basis of FGSM, one category is to introduce iteration ideas, basic Iteration Method (BIM) adds an iteration process on the basis of FGSM, so that the method is also called I-FGSM. The method performs multi-step perturbation along the direction of gradient rise, and after each step, recalculates the gradient direction, compared with FGSM, can generate antagonism samples closer to optimal solution, but at the cost of increasing the calculated amount, the other category is to introduce momentum, the momentum method (MI-m) proposed by the et al integrates the I-FGSM into the FGSM in the iteration process along the gradient direction of the loss function, the iterative direction increases the gradient decreasing speed vector, the momentum vector can be integrated into the FGSM, the algorithm is optimized for the first time, and the algorithm is optimized by the algorithm of the maximum CW is Carlini, and the maximum attack is optimized; However, the above method is considered as disturbance attack in space, the generation of the challenge sample is performed by changing the pixels of the original image, and although the misleading rate of the generated challenge sample to the model is high, the difference is easily observed by human eyes, so many work attempts to understand the effective mechanism of the challenge attack from the challenge noise itself, researchers find that the prediction precision of the challenge sample after JPG compression is improved compared with that of the uncompressed challenge sample when the JPG compression is processed by the same network, and the JPEG algorithm used for compression undergoes five processes, namely YUV conversion-chroma sampling-Discrete Cosine Transform (DCT) -quantization-coding; in the chroma sampling, the picture can be regarded as downsampling, in the quantization process, the frequency domain data after DCT processing is abandoned, and the rest is lossless, therefore, the JPEG algorithm can be regarded as the compression of abandoned high-frequency information, the JPG transformation can also be interpreted as abandoning the high-frequency information of the countermeasure image, wang et al makes more systematic research on the influence of the high-frequency information on the DCNN, firstly, it proves that the trained DCNN has strong dependence on the high-frequency information, even the high-frequency part of the image which can not be recognized by human eyes is adopted, classification with high confidence can be achieved, the classifying effect is greatly reduced, even can not be recognized, duan et al can find out that the low-frequency component (low-frequency signal) of the countermeasure sample represents a region with low brightness or value change in the image, namely a large flat gray scale region in the image by dividing the image and comparing the statistical characteristics of each frequency band of the original sample, the main part of the image is described, mainly the comprehensive measure of the intensity of the whole image, the high frequency component (high frequency signal) of the countersample corresponds to the part of the image with intense variation, namely the edge (outline) or noise and detail part of the image, mainly the measure of the edge and outline of the image; Because the traditional attack method has the defects of low sample generation speed and large calculation amount, the process of generating samples needs to acquire the structural information, the parameter content and the like of the target model, so that the applicability is single, and meanwhile, the problems of low quality of the generated countersamples, low succes