CN-115019050-B - Image processing method, device, equipment and storage medium

CN115019050BCN 115019050 BCN115019050 BCN 115019050BCN-115019050-B

Abstract

The application discloses an image processing method, an image processing device, image processing equipment and a storage medium, and belongs to the technical field of artificial intelligence. The method comprises the steps of obtaining an original image, carrying out feature encoding processing on the original image to obtain a first feature image, obtaining a second feature image and a third feature image of the original image according to the first feature image, wherein the second feature image refers to image disturbance to be superimposed on the original image, each position on the third feature image is provided with different feature values, each feature value is used for representing the importance degree of the image feature on the corresponding position, generating a noise image according to the second feature image and the third feature image, and superimposing the original image and the noise image to obtain a first countermeasure sample. The application can generate high-quality countermeasure samples, and further can obtain good attack effects.

Inventors

Lu Shaohao
HU YI
YAN KE
DU JUNLONG
ZHU CHENG
GUO XIAOWEI

Assignees

腾讯科技（深圳）有限公司

Dates

Publication Date: 20260505
Application Date: 20210305

Claims (14)

1. An image processing method, the method comprising: Acquiring an original image, and performing feature encoding processing on the original image to obtain a first feature map; Inputting the first feature map into a first feature decoder of an anti-attack network to perform first feature decoding processing to obtain an original noise feature map, and acquiring a second feature map of the original image according to the original noise feature map, wherein the second feature map refers to image disturbance to be superimposed on the original image; The method comprises the steps of inputting a first feature image into a second feature decoder of an anti-attack network to perform second feature decoding processing to obtain a third feature image of an original image, normalizing image feature values of all positions on the third feature image, wherein the size of the third feature image is consistent with that of the original image, all positions on the third feature image are provided with different feature values, and all feature values are used for representing the importance degree of image features on corresponding positions; generating a noise image according to the second characteristic diagram and the third characteristic diagram; and superposing the original image and the noise image to obtain a first countermeasure sample.
2. The method according to claim 1, wherein the performing feature encoding processing on the original image to obtain a first feature map includes: Inputting the original image into a feature encoder of the anti-attack network for feature encoding processing to obtain a first feature image, wherein the size of the first feature image is smaller than that of the original image; The feature encoder comprises a convolution layer and residual blocks, wherein the residual blocks are positioned behind the convolution layer in a connection sequence, any one residual block comprises an identity mapping and at least two convolution layers, and the identity mapping of any one residual block points to the output end of any one residual block from the input end of any one residual block.
3. The method of claim 1, wherein the obtaining a second feature map of the original image from the original noise feature map comprises: And performing inhibition processing on noise characteristic values of all positions on the original noise characteristic map to obtain the second characteristic map, wherein the size of the second characteristic map is consistent with the size of the original image.
4. A method according to claim 3, wherein said suppressing noise feature values at respective locations on said original noise feature map comprises: comparing the noise characteristic values of all the positions on the original noise characteristic diagram with a target threshold value; and for any position on the original noise characteristic diagram, replacing the noise characteristic value of the any position with the target threshold value in response to the noise characteristic value of the any position being larger than the target threshold value.
5. The method of claim 1, wherein generating a noise image from the second feature map and the third feature map comprises: And carrying out position multiplication processing on the second characteristic diagram and the third characteristic diagram to obtain the noise image.
6. The method according to any one of claims 1 to 5, wherein the challenge network further comprises an image recognition model, the method further comprising: and inputting the first countermeasure sample into the image recognition model to obtain an image recognition result output by the image recognition model.
7. The method of claim 6, wherein the training process against the aggressor network comprises: Obtaining a second challenge sample of the sample image included in the training dataset; Inputting the sample image and the second countermeasure sample into the image recognition model together for feature encoding processing to obtain feature data of the sample image and feature data of the second countermeasure sample; respectively constructing a first loss function and a second loss function based on the characteristic data of the sample image and the characteristic data of the second countermeasure sample; Acquiring a third feature image of the sample image, wherein each position on the third feature image of the sample image has different feature values, and each feature value is used for representing the importance degree of the image feature at the corresponding position; Constructing a third loss function based on a third feature map of the sample image; and performing end-to-end training based on the first loss function, the second loss function and the third loss function to obtain the anti-attack network.
8. The method of claim 7, wherein constructing a first loss function and a second loss function based on the characteristic data of the sample image and the characteristic data of the second challenge sample, respectively, comprises: Separating out characteristic angles of the sample image from the characteristic data of the sample image; Separating out the characteristic angle of the second challenge sample from the characteristic data of the second challenge sample; the first loss function is constructed based on the characteristic angle of the sample image and the characteristic angle of the second challenge sample, and the optimization objective of the first loss function is to enlarge the characteristic angle between the sample image and the second challenge sample.
9. The method of claim 7, wherein constructing a first loss function and a second loss function based on the characteristic data of the sample image and the characteristic data of the second challenge sample, respectively, comprises: Separating out a characteristic module value of the sample image from the characteristic data of the sample image; Separating out a characteristic module value of the second challenge sample from the characteristic data of the second challenge sample; the second loss function is constructed based on the characteristic model value of the sample image and the characteristic model value of the second challenge sample, and the optimization objective of the second loss function is to reduce the difference between the characteristic model values of the sample image and the second challenge sample.
10. The method of claim 7, wherein the performing end-to-end training based on the first, second, and third loss functions to obtain the challenge network comprises: Obtaining a first sum of the second and third loss functions, and obtaining a product of a target constant and the first sum; And performing end-to-end training on the second sum value of the first sum value and the product value as a final loss function to obtain the attack resisting network.
11. The method of claim 6, wherein the first feature decoder and the second feature decoder of the challenge network are identical in structure.
12. An image processing apparatus, characterized in that the apparatus comprises: the encoding module is configured to acquire an original image, and perform feature encoding processing on the original image to obtain a first feature map; The decoding module is configured to acquire a second feature map and a third feature map of the original image according to the first feature map, wherein the second feature map refers to image disturbance to be superimposed on the original image, each position on the third feature map has different feature values, and each feature value is used for representing the importance degree of the image feature on the corresponding position; A first processing module configured to generate a noise image from the second feature map and the third feature map; a second processing module configured to superimpose the original image and the noise image to obtain a first challenge sample; The decoding module comprises a first decoding unit and a second decoding unit; the first decoding unit is configured to input the first feature map into a first feature decoder of an anti-attack network to perform first feature decoding processing to obtain an original noise feature map, obtain a second feature map of the original image according to the original noise feature map, wherein the first feature decoder comprises a deconvolution layer and a convolution layer; The second decoding unit is configured to input the first feature map into a second feature decoder of an attack resisting network to perform second feature decoding processing to obtain a third feature map of the original image, normalize image feature values of all positions on the third feature map, wherein the size of the third feature map is consistent with that of the original image, the second feature decoder comprises a deconvolution layer and a convolution layer, and the convolution layer is positioned behind the deconvolution layer in connection sequence for the second feature decoder.
13. A computer device, characterized in that it comprises a processor and a memory in which at least one program code is stored, which is loaded and executed by the processor to implement the image processing method according to any of claims 1 to 11.
14. A computer readable storage medium, characterized in that at least one program code is stored in the storage medium, which is loaded and executed by a processor to implement the image processing method according to any one of claims 1 to 11.

Description

Image processing method, device, equipment and storage medium Technical Field The present application relates to the field of artificial intelligence, and in particular, to an image processing method, apparatus, device, and storage medium. Background Methods for destroying the image recognition capability of an image recognition model by utilizing the drawbacks of deep learning are collectively called attack resistance, namely, after adding noise which is difficult to be recognized by human eyes, the image recognition task of the image recognition model based on the deep learning can be disabled. In other words, the objective of the attack resistance is to add disturbance on the original image which is difficult to be perceived by human eyes, so that the recognition result output by the model is completely inconsistent with the actual classification of the original image. Among them, an image to which noise is added and which human eyes look coincident with an original image is called an antagonistic sample. The related art adopts a search-based or optimization method to combat attacks. The search or optimization-based method involves multiple forward operations and gradient calculation when generating the challenge sample, so that disturbance that disables the recognition task of the image recognition model is searched in a certain search space, which can cause that a lot of time is required for generating a challenge sample, and the time required by the challenge attack mode is unacceptable in the scene of a lot of pictures, so that the timeliness is poor. To solve this problem, a method of generating a network based on an countermeasure generation network is proposed. However, training the game process with a single generator and arbiter against the generating network may make the generated disturbance unstable, which in turn may lead to an unstable attack. Based on the above description, it is known that an effective attack effect cannot be obtained at present, and how to perform image processing to generate a high-quality challenge sample becomes a problem to be solved by those skilled in the art. Disclosure of Invention The embodiment of the application provides an image processing method, an image processing device, image processing equipment and a storage medium, which can generate a high-quality countermeasure sample and further can obtain a good attack effect. The technical scheme is as follows: in one aspect, there is provided an image processing method, the method including: Acquiring an original image, and performing feature encoding processing on the original image to obtain a first feature map; Acquiring a second feature map and a third feature map of the original image according to the first feature map, wherein the second feature map refers to image disturbance to be superimposed on the original image, each position on the third feature map has different feature values, and each feature value is used for representing the importance degree of the image feature on the corresponding position; generating a noise image according to the second characteristic diagram and the third characteristic diagram; and superposing the original image and the noise image to obtain a first countermeasure sample. In another aspect, there is provided an image processing apparatus including: the encoding module is configured to acquire an original image, and perform feature encoding processing on the original image to obtain a first feature map; The decoding module is configured to acquire a second feature map and a third feature map of the original image according to the first feature map, wherein the second feature map refers to image disturbance to be superimposed on the original image, each position on the third feature map has different feature values, and each feature value is used for representing the importance degree of the image feature on the corresponding position; A first processing module configured to generate a noise image from the second feature map and the third feature map; And the second processing module is configured to superimpose the original image and the noise image to obtain a first countermeasure sample. In some embodiments, the encoding module is configured to: Inputting the original image into a feature encoder of an anti-attack network to perform feature encoding processing to obtain a first feature map, wherein the size of the first feature map is smaller than that of the original image; The feature encoder comprises a convolution layer and residual blocks, wherein the residual blocks are positioned behind the convolution layer in a connection sequence, any one residual block comprises an identity mapping and at least two convolution layers, and the identity mapping of any one residual block points to the output end of any one residual block from the input end of any one residual block. In some embodiments, the decoding module includes a first decoding unit configured to: inputting