CN-122019258-A - Neural network fault tolerance method based on model vulnerability awareness
Abstract
The invention discloses a neural network fault tolerance method based on model vulnerability perception, which comprises the steps of firstly deploying a pre-training quantized neural network model, setting experimental super parameters, inputting an original data set to execute a position sensitive search algorithm, outputting the pre-training quantized neural network model after iterative reinforcement, finally deploying and applying the output pre-training quantized neural network model, and adopting a key position weight protection mechanism in an reasoning process to realize neural network fault tolerance protection. The method has lower fault-tolerant expenditure and finer protection granularity, avoids huge resource consumption caused by overall protection of the whole model, can adapt to the structural change of the model by adopting an iterative reinforcement strategy, ensures that each reinforcement aims at the current weakest position, realizes the optimal fault-tolerant effect with the minimum cost, and is particularly suitable for the deep neural network model containing residual blocks.
Inventors
- ZHAN JINYU
- YAN WENJIE
- Pan Liaolei
- JIANG WEI
- LI TIANYUAN
- Quan Boran
- ZHANG ZIHAN
- CAI PEIZHI
- Dou Jiaxiang
- WANG SEN
Assignees
- 电子科技大学
Dates
- Publication Date
- 20260512
- Application Date
- 20260224
Claims (3)
- 1. A neural network fault tolerance method based on model vulnerability awareness comprises the following specific steps: S1, deploying a pre-training quantized neural network model, and setting experimental super parameters; Setting the pre-trained quantized neural network model includes The model layer divides the original data set into training data set and verification data set according to actual condition, wherein the experimental super-parameters comprise target loss threshold value Error injection rate Maximum number of reinforcement iterations ; S2, based on the model in the step S1, inputting an original data set to execute a position sensitive search algorithm, and outputting an iterative reinforced pre-training quantized neural network model; The method comprises the steps of normalizing an original data set, inputting the normalized data set into a pre-training quantized neural network model, firstly, performing vulnerability analysis to obtain a vulnerability weight set, performing simulated bit flipping attack on the vulnerability weight in an iterative mode, identifying a key feature layer, reinforcing the model until the fault tolerance performance requirement is met, and outputting the reinforced pre-training quantized neural network model; Wherein vulnerability analysis is to inject errors into key weight positions of each layer of the model, and identify weight positions most sensitive to the errors in the model, namely the most fragile layer Performing model reinforcement, namely performing iterative reinforcement on a key feature layer corresponding to the weakest layer; S3, based on the step S2, deploying and applying the output reinforced pre-training quantized neural network model, and adopting a key position weight protection mechanism in the reasoning process to realize the fault-tolerant protection of the neural network; The key position weight protection mechanism detects and relieves potential bit-flipping attacks by utilizing a redundant feature layer established in a reinforcement stage in an reasoning stage, and when bit-flipping errors occur in the key weight position, the entropy values of the redundant feature layer and an original key feature layer are compared, correct model weights are judged and selected, and weights with smaller entropy values are output, so that neural network fault-tolerant protection is realized.
- 2. The neural network fault tolerance method based on model vulnerability awareness of claim 1, wherein the step S2 specifically comprises the following steps: S21, initializing a pre-training quantized neural network model; initializing an iteration counter Evaluating an original pre-trained quantized neural network model In validating a data set Baseline loss on And set a target loss threshold The expression is as follows: ; Wherein, the Expressed as a percentage of baseline loss; S22, based on the step S21, performing vulnerability analysis and simulating bit flip attack to identify the most fragile layer; Firstly, inputting a training data set into a pre-training quantized neural network model, calculating the sensitivity score of each weight in each model layer, namely, calculating the gradient through back propagation, and calculating the sensitivity score of each weight based on the product of the gradient value and the weight The expression is as follows: ; Wherein, the Represent the first The sensitivity score of the individual weights is determined, Represent the first The magnitude of the value of the individual weights, , Represent the first A set of weights for the layer model layer; Indicating the loss value for the first The gradient value of the individual weights is used, , ; Sensitivity scoring is then performed Performing descending order sorting to obtain Calculation from the error injection rate , Representation model The number of layer weights, and selecting the front with the highest sensitivity score The individual weights form a fragile weight set ; Performing simulated bit flipping attack on each model layer to identify the most vulnerable layer, and performing simulated bit flipping attack on the most vulnerable layer Layer model layer, only to fragile weight set The weight of the layer is injected into the error, and then the target weight is bit-flipped while other layers are kept unchanged, and the expression is as follows: ; ; Wherein, the The weight after the inversion is represented as, Representing a bit flip operation; Representing an erroneous disturbance value; The loss of the disturbance model on the validation set is then evaluated Calculating the loss variation And restoring the model Finally, through the formula Identifying a layer exhibiting the most significant performance degradation as the most fragile layer ; S23, based on the most fragile layer identified in the step S22, carrying out iterative reinforcement on a key feature layer corresponding to the most fragile layer, and outputting a pre-trained quantized neural network model after reinforcement; First, the mapping relation between the key weight of the most fragile layer and the feature map is tracked, and the most fragile layer is identified Acquiring fragile weight set thereof The key feature layer set where (a) is located The expression is as follows: ; wherein the critical weight of the most fragile layer is the weight most sensitive to errors in the most fragile layer, the full-connection layer and the convolution layer are provided with a plurality of characteristic layers, and the critical characteristic layers are concentrated in a model layer One or more feature layers of the weights of (a); Reinforcing the identified key feature layers to create a copy for each key feature layer, aggregating it with the original feature layer weights via additional short connections, and updating the pre-trained quantized neural network model state In (a) and (b) Creating a copy thereof Adding the reinforced key feature layer into the model to obtain an updated model, setting a first step The initial state of the model at the time of iteration is expressed as In the first place Model after multiple iterations The expression is as follows: ; Wherein, the Representation of Is provided with a set of key feature layers, Representation pair A reinforced feature layer set; Representation of Is provided with a layer of original key features, Representation pair A reinforced feature layer; After obtaining Later, the fault tolerance of the reinforced model is evaluated, i.e. in the first After the iteration, the model is oriented Injection error assessment disturbance loss ; Finally judging whether the fault-tolerant performance requirement is met, if so Terminating the iteration and outputting the current model Otherwise, updating the iteration counter If (if) Returning to the step S22 to continue the next iteration, if so Ending the iteration and outputting the current model; Wherein, the Representing a predefined upper bound on the number of iterations, the objective function then finds the minimum number of iterations The expression is as follows: ; Wherein, the Representation of Loss values obtained after injection of the wrong perturbations.
- 3. The neural network fault tolerance method based on model vulnerability awareness according to claim 1, wherein in the step S3, the key location weight protection mechanism is specifically as follows: based on the reinforced pre-training quantized neural network model obtained in the step S2, the obtained reinforced key feature layer set With the original key feature layer set Discretizing the weight value, calculating information entropy, and setting the corresponding weight set to be expressed as And For the purpose of And The total number is defined as For this purpose The weight value is discretized into In each of the intervals of time, Represent the first Discretizing the intervals and calculating probability distribution Wherein: ; Wherein, the Indicating that it falls on the first The number of weights in each interval, Represent the first The proportion of the number of the individual interval weights to the total number of the weights; representing a set of key feature layers Is a weight sum of (2); Then calculate the information entropy The expression is as follows: ; Respectively calculating and reinforcing key feature layer sets And the original key feature layer set The expression is as follows: ; defining entropy difference and comparing entropy values, calculating entropy difference The expression is as follows: ; and finally, executing corresponding operation according to the entropy difference judgment result, namely checking Whether or not equal to 0 to determine whether or not an attack has occurred, when When no error is considered to occur, the output of the original key feature layer is directly adopted And when the feature map is damaged by bit flipping attack, the redundancy mechanism is adopted to carry out error alleviation and output the weight of a smaller entropy value.
Description
Neural network fault tolerance method based on model vulnerability awareness Technical Field The invention belongs to the technical field of neural network security, and particularly relates to a neural network fault tolerance method based on model vulnerability sensing. Background Bit flipping generally refers to unexpected flipping of binary bits of weight parameters, bias terms, and intermediate activation values in a neural network, which may be driven by cosmic ray radiation, electromagnetic environment interference, aging of hardware devices, and the like, thereby generating abnormal value deviations. Especially in deep neural network architecture, the bias has a multi-level amplification effect, namely, after abnormal output of a single neuron is subjected to linear transformation and nonlinear activation of a subsequent network layer, numerical distribution balance of the whole network can be quickly destroyed, and network function complete failure is caused again. Such as misjudging road conditions by an automatic driving system, outputting wrong instructions by an industrial controller, and the like. Residual blocks are currently widely used in mainstream neural networks. Residual connection effectively relieves the problem of gradient disappearance or gradient explosion which plagues deep network training by establishing shortcut connection which bypasses one or more convolution layers or full connection layers, promotes the propagation of gradient signals in the whole network, and enables the model to learn identity mapping or residual functions more easily. However, due to the inherent nature of the residual connection, errors occurring in the single layer can have a more severe impact on subsequent results. Thus, the residual connection is extremely vulnerable to bit flipping attacks. At the same time, as the number of residual components increases, this gap becomes more pronounced, for example, the accuracy of ResNet containing more residual blocks drops by a greater magnitude than ResNet/34, indicating that the vulnerability of the residual connection to bit-flipping attacks is positively correlated with the size of the residual structure. Currently there are mainly the following ways to deal with such error attacks: (1) The EDC (error detection and correction) mainly uses coding theory and logic circuit design technology, and adds redundancy check bits, such as parity check codes, hamming codes, cyclic redundancy check codes (CRC) and the like, in the data storage or transmission process to realize real-time monitoring and repairing of bit flipping errors. (2) The main principle of the leak ReLU is to improve the negative input interval of the existing ReLU function, discard the complete truncation of the negative input, and endow a very small fixed slope, namely when the input value is less than 0, the output is the product of the input value and the slope (such as the output= -5 multiplied by 0.01= -0.05 when the input value is more than or equal to 5), and when the input value is more than or equal to 0, the output is equal to the input value. The design aims to relieve the problem of dead neurons caused by the fact that the negative input of the existing ReLU is 0 for a long time, so that the neurons in the negative input interval can still keep weak gradient update, and the stability of network training is improved. (3) The hardware redundancy design mainly uses a multi-copy backup and voting logic technology, and enables all modules to simultaneously execute the same neural network calculation task by deploying a plurality of identical hardware calculation modules such as triple-modular redundancy TMR and double-modular redundancy DMR, wherein in the calculation process, if a certain module generates abnormal output due to bit overturn, the correct result can be selected according to the principle of 'minority obeying majority', and meanwhile, the abnormal module is marked and a fault recovery mechanism (such as restarting and switching standby modules) is triggered. In summary, the above partial techniques sacrifice computational efficiency for improving fault tolerance, such as hardware redundancy design requires multi-module parallel computation, increasing computation delay, complex EDC coding rules can prolong data processing time, and parameter regularization increases computation in training process, affecting model training speed. The triple-modular redundancy TMR is suitable for scenes with high reliability requirements, but cannot meet the low power consumption requirements of edge devices, and lacks a general fault-tolerant scheme for covering multiple scenes. Therefore, a method for ranking the frailty degree of each layer of the model, strengthening the most fragile weight position and greatly reducing the expenditure on time and space is needed to be studied. Disclosure of Invention In order to solve the technical problems, the invention provides a neural network fault t