CN-121998027-A - Black box attack resistance method based on agent function optimization and feature probability diffusion
Abstract
The invention belongs to the technical field of black box attack resistance, and discloses a black box attack resistance method based on agent function optimization and feature probability diffusion, aiming at the problem that the traditional attack resistance can not disturb a network intrusion detection system based on machine learning and the black box attack resistance sample attack has a plurality of inquiry times, the method for generating the countercheck sample by combining the countercheck sample, reinforcement learning and diffusion model is designed to solve the problems of high calculation cost, universality of the countercheck sample and the like, an indirect optimization framework is realized, and a trainable neural network is introduced for generating the countercheck disturbance. By back-propagating the proxy loss function J, the parameters of the neural network can be updated indirectly. The mechanism skillfully converts a non-conductive black box optimization problem into a trainable end-to-end neural network optimization problem, thereby realizing indirect optimization against disturbance.
Inventors
- YE XIAOMING
- Kong Tenglong
- Ou Lujin
- LI XIN
Assignees
- 成都信息工程大学
Dates
- Publication Date
- 20260508
- Application Date
- 20260121
Claims (10)
- 1. The black box anti-attack method based on agent function optimization and characteristic probability diffusion is characterized in that the method realizes low query times optimization generation of anti-disturbance on the premise of not obtaining internal parameters of an attacked model by the synergistic effect of a reinforcement learning disturbance generation mechanism driven by an agent loss function and a characteristic probability diffusion constraint mechanism, and comprises the following steps: constructing an attack environment based on black box output feedback only, and mapping an original sample into an initial countermeasure state; In the black box attack environment, constructing an attack objective function simultaneously comprising confidence constraint and sample similarity constraint, and introducing a time variable to self-update the objective function so as to reduce the number of inquiry times required by unit disturbance; introducing a proxy loss function, and converting non-gradient feedback of the black box attack objective function into an optimizable gradient signal; under the drive of the proxy loss function, the disturbance generation process is iteratively updated through a reinforcement learning algorithm, so that disturbance vectors gradually approach the misclassification boundary under the condition of meeting the similarity constraint, and the black box countermeasure attack is completed.
- 2. The black-box challenge attack method of claim 1, wherein the challenge environment is constructed by defining a state space and an action space, wherein the state space is characterized by a current challenge sample, the action space is characterized by a disturbance parameter output by a disturbance generating function, and the initial state is a challenge initial state mapped by an original sample.
- 3. The black box challenge attack method of claim 1, wherein the challenge objective function is constructed by jointly constructing a classification confidence constraint of a challenge sample and a similarity constraint of an input sample, and dynamically adjusting the weight of the objective function by introducing a time variable, so that the early optimization stage focuses on wrong classification exploration, and the later optimization stage focuses on disturbance convergence.
- 4. The black box challenge method of claim 1, wherein the proxy loss function is configured to map the classification result change of the black box model to a continuous and conductive optimization signal, and to serve as a unified optimization basis for the policy network and the value network in the reinforcement learning algorithm.
- 5. A disturbance generation method for black box attack resistance, characterized in that the iterative optimization of disturbance vector is realized through a disturbance generation function based on probability modeling and a strategy updating mechanism guided by a proxy loss function, comprising: initializing a policy network and a value network based on a state space and an action space; Encoding the current countermeasure state by a strategy network, and outputting probability parameters of disturbance distribution; sampling the disturbance vector according to the probability parameter, and applying the disturbance obtained by sampling to the current countermeasure state to obtain the next state; After the state sequence reaches the final state, parameter updating is carried out on the strategy network and the value network based on the proxy loss function, so that the self-adaptive optimization of the disturbance generating function is realized under the condition of not depending on the gradient of the attacked model.
- 6. The method of generating perturbations of claim 5 wherein the perturbation generation function models the perturbation vector using a probability distribution, and the policy network outputs a mean vector and a standard deviation vector that are consistent with the perturbation dimensions and constructs a probability distribution that is independent of each other across the feature dimensions.
- 7. The method of generating perturbation according to claim 5, wherein the strategy network and the value network are jointly optimized by the same proxy loss function, so that the perturbation sampling distribution gradually spreads to the high misclassification probability area along with the feedback of the black box.
- 8. The utility model provides a general countermeasure disturbance generation method based on characteristic probability diffusion, which is characterized in that characteristic distribution of a plurality of disturbance samples is jointly learned through a conditional diffusion modeling mechanism, so as to generate a transferable general countermeasure disturbance, and the method comprises the following steps: constructing a disturbance sample data set, and introducing a time variable and a noise variable into a diffusion model; carrying out multi-step noise adding on the disturbance sample through a forward diffusion process, so that the distribution of the disturbance sample gradually approaches to the standard normal distribution; Training a denoising neural network in the back diffusion process for predicting and eliminating noise step by step so as to recover the general disturbance fused with various disturbance characteristics; And embedding attack condition constraint in the denoising process, so that the generated general disturbance meets the preset misclassification requirement on the black box model.
- 9. The method for generating universal challenge disturbance according to claim 8, wherein the attack condition is used for restricting the misclassification rate of the universal disturbance on the black box model, and when the misclassification rate does not reach a preset requirement, the denoising neural network is guided to converge towards a stronger disturbance direction by enhancing the restriction strength of the condition.
- 10. The general purpose countermeasure disturbance generating method according to claim 8, wherein the generated general purpose disturbance forms a countermeasure sample by being superimposed on the original sample, and a classification result of the black box model is used as a condition feedback signal to realize closed-loop optimization of the general purpose disturbance generating process.
Description
Black box attack resistance method based on agent function optimization and feature probability diffusion Technical Field The invention belongs to the technical field of black box attack resistance, and particularly relates to a black box attack resistance method based on agent function optimization and feature probability diffusion. Background In recent years, artificial intelligence (ARTIFICIALINTELLIGENCE, AI) has rapidly evolved and has become an important mainstay of digital industry economy. With the increasing degree of integration and automation, more and more enterprises adopt AI tools to improve the working efficiency. Despite the many benefits, deep learning suffers from the drawbacks of algorithms or models themselves that are difficult to interpret, unpredictable, etc. Thus, the potential AI security problems of against sample attacks, model back-door attacks, and data-poisoning attacks, which are caused by the defects of the AI system itself, may pose a serious threat to the AI system. In the age of network communication, network traffic is an important means of communication. An attacker can access the host computer by using malicious traffic to perform malicious activities, resulting in significant economic loss. Currently, conventional malicious traffic attacks are careful in constructing the contents of the data packets, bypassing the intrusion detector based on code vulnerabilities present in the intrusion detector. This attack approach is called an evasion attack, however, the conventional evasion attack approach faces the basic failure of today's deep learning based intrusion detection systems. The attack mode of the current mainstream is divided into 3 different attack modes of attack based on characteristic space, attack based on data packet space and attack based on stream space. The attack based on the data packet and the stream space can directly modify the content of the network traffic data packet, but the attack cannot guarantee the communication function and the malicious function of malicious traffic after modifying the data packet, and because the intrusion detection system based on the stream or the data packet needs to process millions of data packets and data streams with the size of TB, the cost is high and the detection efficiency is low, so that the intrusion detection system based on the feature space is commercially available in the market at present. The challenge sample attacks are classified into white-box attacks and black-box attacks according to attack environments. White-box attacks require full knowledge of the target classifier, and lack practicality, can only be used as a theoretical study. Black box attack algorithms are classified into attack based on migration learning and attack based on queries. The attack core based on migration learning is to construct a substitution model, the substitution model has a decision boundary similar to a target model, the substitution model is used for resisting attack, and then the target model is attacked by the migration of a resisting sample, the attack relies on the similarity among models, the problem of the migration across the framework cannot be solved, the attack mode needs training data which are distributed uniformly, and the training data of the target model are difficult to obtain. The black box attack algorithm based on the query has the advantages of more query times, high calculation complexity and dependence on gradient estimation, and has low convergence speed and even lower migration attack under the condition of low query times. Through the above analysis, the problems and defects existing in the prior art are as follows: (1) In practical projects, most of the intrusion detection systems are intrusion detection systems based on feature space, and traditional evasion attacks are low in efficiency and easy to detect. (2) The data volume to be processed against sample attacks based on data packets and data streams is large, and most of the current commercial detection systems are intrusion detection systems based on feature spaces, and the attacks against the data packets and the data streams are similar to the traditional evading attacks in nature and are easily identified by the intrusion detection systems based on the feature spaces. (3) Construction of a challenge-dependent surrogate model based on transfer learning requires acquisition of training data of the same distribution. The black box attack query times based on query are too many, most of which are high in calculation complexity and slow in convergence speed due to gradient estimation algorithm. Disclosure of Invention Aiming at the problems existing in the prior art, the invention provides a black box attack resistance method based on agent function optimization and feature probability diffusion. The invention is realized in such a way that the black box attack resistance method based on agent function optimization and feature probability di