Search

US-12620209-B2 - Method and system for generating image adversarial examples based on an acoustic wave

US12620209B2US 12620209 B2US12620209 B2US 12620209B2US-12620209-B2

Abstract

The disclosure discloses a method and a system for generating image adversarial examples based on an acoustic wave. The method includes: acquiring an image containing a target object or a target scene; generating simulated image examples for the acquired image, wherein the simulated image examples have adversarial effects on a deep learning algorithm in a target machine vision system; optimizing the generated simulated image examples to obtain an optimal adversarial example and corresponding adversarial parameters; and injecting the adversarial parameters into an inertial sensor of the target machine vision system in a manner of an acoustic wave, such that the adversarial parameters are used as sensor readings that will cause an image stabilization module in the target machine vision system to operate to generate particular blurry patterns in a generated real-world image so as to generate an image adversarial example in a physical world.

Inventors

  • Xiaoyu JI
  • Wenyuan Xu
  • Yushi Cheng
  • Yuepeng Zhang
  • Kai Wang
  • Chen Yan

Assignees

  • ZHEJIANG UNIVERSITY

Dates

Publication Date
20260505
Application Date
20220323
Priority Date
20201020

Claims (8)

  1. 1 . A method for generating image adversarial examples based on an acoustic wave, comprising: acquiring an image containing a target object or a target scene; generating simulated image examples for the acquired image by using an acoustic wave-based adversarial example simulation model, wherein the simulated image examples have adversarial effects on a deep learning algorithm in a target machine vision system; optimizing the generated simulated image examples by using an adversarial example optimization method to obtain optimal adversarial examples and corresponding adversarial parameters; and injecting the adversarial parameters into an inertial sensor of the target machine vision system in a manner of an acoustic wave by using an inertial sensor reading injection method, such that the adversarial parameters are used as sensor readings that will cause an image stabilization module in the target machine vision system to operate to generate particular blurry patterns in a generated real-world image so as to generate image adversarial examples in a physical world.
  2. 2 . The method for generating image adversarial examples based on an acoustic wave according to claim 1 , wherein the generating simulated image examples for the acquired image by using an acoustic wave-based adversarial example simulation model comprises constructing the acoustic wave-based adversarial example simulation model through: (1) false camera motion modeling: it is assumed that false readings of the inertial sensor caused by an acoustic attack is M f ={a x , a y , a z , ω r , ω p , ω y }, where a x , a y , a z are false acceleration readings at x, y, z axes of an accelerometer, respectively, ω r , ω p , ω y are false angular velocity readings at roll, pitch, yaw axes of a gyroscope, respectively and it is assumed that the image stabilization module is operable to perform a complete compensation, and false camera motion caused by the acoustic attack is M c ={−a x , −a y , −a z , −ω r , −ω p , −ω y }, wherein the acoustic wave-based adversarial example simulation model is constructed by four dimensions including three dimensions of x, y, z axes of the accelerometer and one dimension of roll axis of the gyroscope among the six dimensions; (2) pixel motion modeling: the false camera motion will cause a different imaging position of the target object or the target scene, resulting in occurrence of pixel motion in an output image; wherein: with respect to the dimension of x axis of the accelerometer, for any pixel in the image, the false camera motion −a x causes a pixel displacement of f 2 ⁢ u ⁢ a x ⁢ T 2 in an opposite direction during an imaging process, where f is a focal length of the camera and u is an object distance of the target object or the target scene, T is a exposure time of the camera; with respect to the dimension of y axis of the accelerometer, for any pixel in the image, the false camera motion −a y causes a pixel displacement of f 2 ⁢ u ⁢ a y ⁢ T 2 in an opposite direction during the imaging process; with respect to the dimension of z axis of the accelerometer, for any pixel in the image, the false camera motion −a z causes a pixel displacement of r o 2 ⁢ u ⁢ a Z ⁢ T 2 in a direction far away from a center of the image during the imaging process, where r o is a distance between the pixel and the center of the image; with respect to the dimension of roll axis of the gyroscope, for any pixel in the image, the false camera motion −ω r causes pixel displacement of ω r Tr c in an opposite direction during the imaging process, where r c is a distance between the pixel and a center of angular velocity rotation; (3) image blur modeling: pixel motion during the imaging process will cause image blur, wherein false camera motion in dimensions of x axis and y axis of the accelerometer causes linear pixel motion, resulting in linear image blur; false camera in the dimension of z axis of the accelerometer causes radial pixel motion, resulting in radial image blur; and false camera motion in the dimension of roll axis of the gyroscope causes rotary pixel motion, resulting in rotary image blur; wherein a unified image blur model is constructed for the above blurs as follows: B ⁡ ( i , j ) = 1 n + 1 ⁢ ∑ k = - n 0 ⁢ X ⁡ ( i ′ ⁡ ( k ) , j ′ ⁡ ( k ) ) ⁢ [ i ′ ⁡ ( k ) , j ′ ⁡ ( k ) ] T = [ u ⁡ ( k ) , v ⁡ ( k ) ] T + [ i , j ] T ⁢ [ u ⁡ ( k ) , v ⁡ ( k ) ] T = [ cos ⁢ ⁢ α cos ⁡ ( k n ⁢ β + γ ) cos ⁢ ⁢ δ sin ⁢ ⁢ α sin ⁡ ( k n ⁢ β + γ ) sin ⁢ ⁢ δ ] ⁡ [ kf ⁢  a x + a y  ⁢ T 2 ⁢ nu r c kf ⁢ a z ⁢ T 2 ⁢ r o ∂ nu ] α = arccos ⁡ ( a x · a y  a x  ⁢  a y  ) ⁢ ⁢ β = ω r ⁢ T ⁢ ⁢ γ = arctan ⁡ ( j - c 1 i - c 0 ) ⁢ ⁢ δ = arctan ⁡ ( j - o 1 i - o 0 ) ⁢ ⁢ r c =  ( i , j ) , ( c 0 , c 1 )  2 ⁢ ⁢ r o =  ( i , j ) , ( o 0 , o 1 )  2 where X is an original image, B is a blurred image, (i,j) is an coordinate of a pixel, B (i,j) is a pixel with an coordinates (i,j) in the blurred image, n is the number of discrete points, (c 0 , c 1 ) is a coordinate of an image center, (o 0 , o 1 ) is a coordinate of a rotation center; and the simulated image examples are obtained under respective adversarial parameters based on the false camera motion modeling, the pixel motion modeling, and the image blur modeling.
  3. 3 . The method for generating image adversarial examples based on an acoustic wave according to claim 2 , wherein the optimizing the generated simulated image examples by using an adversarial example optimization method comprises: (1) designing optimized functions: different optimized functions are designed for different types of adversarial image examples; wherein three types of adversarial image examples with different effects are taken into consideration: the first type is an adversarial image example with a hiding effect, which makes the depth learning algorithm unable to identify the target object; the second type is an adversarial image example with a creating effect, which creates a target object in the current image that is detectable by the deep learning algorithm; and the third type is an adversarial image example with a changing effect, which enables the deep learning algorithm to detect the target object as another object; for the adversarial image example with a hiding effect, optimization functions are: min a x , a y , a z , ω r ⁢ w 1 ⁢ S p B ⁢ S p C + w 2 ⁢  B - X  p s . t . ⁢  a x + a y + a z  < ε 1  ω r  < ε 2 where p is a number of the target object, S p B is a detection confidence of an area of the target object outputted by the deep learning algorithm, S p C is a detection confidence of a category of the target object outputted by the deep learning algorithm, w 1 and w 2 are weight values for balancing effectiveness of the adversarial image examples and example generation cost, ε 1 and ε 2 are upper limits of influences of acoustic waves on readings of the accelerometer and the gyroscope respectively; for the adversarial image example with a creating effect, optimization functions are: min a x , a y , a z , ω r ⁢ - w 3 ⁢ S o B ⁢ S o C | C o = T ∑ p = 1 m ⁢ Uop + w 4 ⁢  B - X  p s . t . ⁢  a x + a y + a z  < ε 1  ω r  < ε 2 where o is the number of a target object to be created, C o =T is a category of the target object to be created, S o B is a detection confidence of an area of the target object to be created outputted by the deep learning algorithm, S o C is a detection confidence of a category of the target object to be created outputted by the deep learning algorithm, p is a number of an existed object in the image, m is the number of the existed objects in the image, Uop is an intersection ratio between the area of the object o to be created and the area of the existed object p, w 3 and w 4 are weight values for balancing effectiveness of the adversarial image examples and example generation cost, ε 1 and ε 2 are upper limits of influences of acoustic waves on readings of the accelerometer and the gyroscope respectively; for the adversarial image example with a changing effect, optimization functions are: min a x , a y , a z , ω r ⁢ - w 5 ⁢ Upp ′ ⁢ S p B ′ ⁢ S p C ′ ⁢  C p ′ = T + w 6 ⁢  B - X  p ⁢ ⁢ s . t . ⁢  a x + a y + a z  < ε 1 ⁢ ⁢  ω r  < ε 2 where p is a number of the target object, S p B ⁢ ′ is a detection confidence of a modified area of the target object outputted by the deep learning algorithm, S p C ⁢ ′′ is a detection confidence of a modified category of the target object outputted by the deep learning algorithm, C p ′=T is the modified category of the target object, Upp′ is an intersection ratio of the area of the target object p before modification and the area of the target object p′ after modification, w 5 and w 6 are weight values for balancing effectiveness of the adversarial image examples and example generation cost, ε 1 and ε 2 are upper limits of influences of acoustic waves on readings of the accelerometer and the gyroscope respectively; (2) solving optimized functions: the optimized functions are solved by using a Bayesian optimization method to obtain the optimal adversarial parameters.
  4. 4 . The method for generating image adversarial examples based on an acoustic wave according to claim 1 , wherein the inertial sensor reading injection method comprises: determining a resonance frequency of the inertial sensor in the target machine vision system by a frequency scanning; adjusting a resonance frequency of the acoustic wave to introduce a direct current (DC) component into an analog-to-electrical converter so as to stabilize an output of the sensor; and performing an amplitude modulation to shape a waveform outputted from the sensor such that the sensor readings approximate the adversarial parameters.
  5. 5 . A system for generating image adversarial examples based on an acoustic wave, comprising: an acoustic wave-based adversary simulation module, an adversarial example optimization module, and a sensor reading injection module; wherein the acoustic wave-based adversary simulation module is configured for false camera motion modeling, pixel motion modeling and image blur modeling; the adversarial example optimization module is configured for design of optimized functions and solution of optimizes functions; and the sensor reading injection module is configured for resonance frequency determination, false reading stabilization, and false reading shaping; the system utilizes the acoustic wave-based adversary simulation module, the adversarial example optimization module and the sensor reading injection module to implement the method for generating image adversarial examples based on an acoustic wave according to claim 1 .
  6. 6 . A system for generating image adversarial examples based on an acoustic wave, comprising: a memory for storing instructions; and a processor that executes the instructions stored in the memory to perform the method for generating image adversarial examples based on an acoustic wave according to claim 1 .
  7. 7 . A non-transitory computer-readable storage medium storing instructions that, when executed by a processor, cause the processor to perform the method for generating image adversarial examples based on an acoustic wave according to claim 1 .
  8. 8 . A system for generating image adversarial examples based on an acoustic wave, comprising: means for acquiring an image containing a target object or a target scene; means for generating simulated image examples for the acquired image by using an acoustic wave-based adversarial example simulation model, wherein the simulated image examples have adversarial effects on a deep learning algorithm in a target machine vision system; means for optimizing the generated simulated image examples by using an adversarial example optimization method to obtain optimal adversarial examples and corresponding adversarial parameters; and means for injecting the adversarial parameters into an inertial sensor of the target machine vision system in a manner of an acoustic wave by using an inertial sensor reading injection method, such that the adversarial parameters are used as sensor readings that will cause an image stabilization module in the target machine vision system to operate to generate particular blurry patterns in a generated real-world image so as to generate an image adversarial example in a physical world.

Description

The present application is a continuation of International Application No. PCT/CN2021/124791, filed on Oct. 19, 2021, which claims a priority to Chinese patent application No. 202011124293.6, filed on Oct. 20, 2020, both of which are incorporated herein by reference in their entireties. TECHNICAL FIELD The present application relates to the field of artificial intelligence, and particularly to a method and system for generating image adversarial examples based on an acoustic wave. BACKGROUND With continuous development of artificial intelligence technologies, machine vision is widely used in existing intelligent systems, such as an intelligent robot, a self-driving car, etc. Machine vision uses a camera to capture information about surrounding environment of an intelligent system, and uses a deep learning algorithm to detect and recognize an object contained in a captured image, so as to achieve a purpose of perception of the environment. Since results of perception by the machine vision are usually used as an information source for subsequent decision-making of the intelligent system, security of the results of perception by the machine vision is very important. In recent years, researches on an image adversarial example has been increasing. An image adversarial example refers to an example that can interfere with results of perception by the machine vision. Researches on an image adversarial example has important guiding significance for ensuring security of a machine system or an intelligent system. At present, researches on an image adversarial example mainly focus on a digital domain, that is, a pixel value in a digital image is directly modified to construct an image adversarial example. Although the image adversarial example constructed by this method generally has good adversarial effects, it is difficult to be applied in a practical system. In addition, there is currently a method for constructing an image adversarial example based on a physical domain, but since it requires an appearance of a target object is modified or some light is injected into a camera, it has poor concealment. SUMMARY According to a first aspect, there is provided method for generating image adversarial examples based on an acoustic wave. The method includes: acquiring an image containing a target object or a target scene; generating simulated image examples for the acquired image by using an acoustic wave-based adversarial example simulation model, wherein the simulated image examples have adversarial effects on a deep learning algorithm in a target machine vision system; optimizing the generated simulated image examples by using an adversarial example optimization method to obtain optimal adversarial examples and corresponding adversarial parameters; and injecting the adversarial parameters into an inertial sensor of the target machine vision system in a manner of an acoustic wave by using an inertial sensor reading injection method, such that the adversarial parameters are used as sensor readings that will cause an image stabilization module in the target machine vision system to operate to generate particular blurry patterns in a generated real-world image so as to generate image adversarial examples in a physical world. In some embodiments, the acoustic wave-based adversarial example simulation model is constructed by the following three steps: (1) false camera motion modeling: it is assumed that false readings of the inertial sensor caused by an acoustic attack is Mf={ax, ay, az, ωr, ωp, ωy}, where ax, ay, az are false acceleration readings at x, y, z axes of an accelerometer, respectively, ωr, ωp, ωy are false angular velocity readings at roll, pitch, yaw axes of a gyroscope, respectively and it is assumed that the image stabilization module is operable to perform a complete compensation, and false camera motion caused by the acoustic attack is Mc={−ax, −ay, −az, −ωr, −ωp, −ωy}, wherein the acoustic wave-based adversarial example simulation model is constructed by four dimensions including three dimensions of x, y, z axes of the accelerometer and one dimension of roll axis of the gyroscope among the six dimensions; (2) pixel motion modeling: the false camera motion will cause a different imaging position of the target object or the target scene, resulting in occurrence of pixel motion in an output image; wherein: with respect to the dimension of x axis of the accelerometer, for any pixel in the image, the false camera motion −ax causes a pixel displacement of f2⁢u⁢ax⁢T2 in an opposite direction during an imaging process, where f is a focal length of the camera and u is an object distance of the target object or the target scene, T is a exposure time of the camera; with respect to the dimension of y axis of the accelerometer, for any pixel in the image, the false camera motion −ay causes a pixel displacement of f2⁢u⁢ay⁢T2 in an opposite direction during the imaging process; with respect to the dimension of z axis of th