CN-121509116-B - Unmanned aerial vehicle network intrusion detection multi-classification method based on knowledge distillation
Abstract
The invention provides a multi-classification method for unmanned aerial vehicle network intrusion detection based on knowledge distillation, which comprises the steps of firstly obtaining a data set consisting of a plurality of groups of unmanned aerial vehicle network flow data containing normal samples and a plurality of attack samples, extracting communication characteristics from the data set, constructing a directed multi-attribute graph G enhanced based on an application layer protocol, obtaining a training graph and a test graph, training a teacher model by using the training graph, performing forward reasoning on the training graph and the test graph by using the training graph, obtaining a training embedded vector and a test embedded vector, taking the training embedded vector as input of a student model, realizing knowledge distillation by combining and optimizing a hard tag loss function and a soft tag loss function, and obtaining the trained student model, reasoning by using the trained student model, taking the test embedded vector as input, and outputting a prediction tag. The method can still obtain effective expression under the condition of not marking the sample type, and obviously reduces the reasoning cost under the condition of ensuring the detection performance.
Inventors
- LI XUAN
- ZENG QIAN
- ZHOU TIANQING
- XIA HAIBIN
- LI GUANGHUI
Assignees
- 华东交通大学
Dates
- Publication Date
- 20260508
- Application Date
- 20260113
Claims (8)
- 1. The unmanned aerial vehicle network intrusion detection multi-classification method based on knowledge distillation is characterized by comprising the following steps of: Step S1, acquiring a data set consisting of a plurality of groups of unmanned aerial vehicle network flow data including normal samples and a plurality of attack samples; s2, extracting communication characteristics from a data set, generating a corresponding edge characteristic vector for each piece of traffic data, distributing auxiliary identification and sample type labels, dividing the data set into a training set and a testing set, modeling a graph of the traffic data in the training set and the testing set by constructing a directed multi-attribute graph G enhanced based on an application layer protocol, and converting an interactive relation between communication entities into a graph structure to obtain a training graph and a testing graph; Step S3, building a neural network model, wherein the neural network model comprises a teacher model and a student model, the teacher model is based on a side-level unsupervised graph learning method DGI, and uses a graph neural network E-GRAPHSAGE as an encoder to carry out message transmission and feature aggregation, and the student model is a multi-layer perceptron MLP, and is formed by alternately cascading N layers of full-connection layers and N-1 groups of optional normalization nonlinear units, wherein N is more than or equal to 1; step S4, training a teacher model by using the training diagram, and respectively performing forward reasoning on the training diagram and the test diagram by using the trained teacher model to respectively obtain a training embedded vector and a test embedded vector; the step S4 includes the steps of: Step S41, taking a training diagram as an input, taking the training diagram as a training diagram positive sample, creating a damaged diagram as a training diagram negative sample through a damage function, inputting the training diagram positive sample and the training diagram negative sample into an encoder, outputting positive sample embedding of the training diagram positive sample and negative sample embedding of the training diagram negative sample by the encoder, taking an average value of all positive sample embedding by using a Readout function, and compressing the range of the average value of the positive sample embedding to be between 0 and 1 through a Sigmoid activation function so as to generate a global diagram abstract S containing all semantics of the diagram; Step S42, calculating a positive sample score and a negative sample score by using a bilinear scoring method through a discriminator D, namely, obtaining a positive sample score by calculating the dot product similarity of the positive sample embedding and the global map abstract S, obtaining a negative sample score by calculating the dot product similarity of the negative sample embedding and the global map abstract S, and then sending the obtained positive sample score, a label with a value of 1 and a label with a value of 0 into a binary cross entropy loss function, and measuring the degree of positive sample score tending to 1 and the degree of negative sample score tending to 0 through the binary cross entropy loss function to form discrimination loss; Step S43, the discrimination loss is counter-propagated, the weight of the discriminator D and the parameters of the encoder are synchronously updated, and training and parameter optimization of a teacher model are completed; Step S44, respectively performing a forward reasoning operation on the training diagram and the test diagram to respectively obtain a training embedded vector and a test embedded vector, transforming the training embedded vector through a linear mapping layer to obtain the logarithmic probability of training of the teacher model, and using the training embedded vector for training the student model and the test embedded vector for reasoning of the student model; Step S5, in a training stage of the student model, the student model takes training embedded vectors output by a teacher model as input, and the training embedded vectors are optimized in a combined mode through a hard tag loss function and a soft tag loss function to obtain a trained student model; And S6, reasoning by using the trained student model, inputting a test embedded vector generated by the teacher model into the trained student model, outputting a test prediction label by the student model, comparing the test prediction label with a real label in a test chart, and calculating each index value for evaluating the network intrusion detection performance of the unmanned aerial vehicle.
- 2. The method according to claim 1, wherein said step S2 comprises the steps of: step S21, firstly, carrying out character serialization processing on fields of communication features in traffic data of an unmanned aerial vehicle network to ensure consistency when field features are used as node identifiers of an attribute graph, and simultaneously, replacing and filling infinite values and missing values existing in a data set, replacing the infinite values in the data set with 0, filling the missing values with 0, and ensuring stability of subsequent feature processing; Wherein the communication features include a source address, a destination address, a source port, and a destination port; step S22, grouping flow data in a data set according to a sample type label, and randomly downsampling to reduce calculation power requirements to form a training set and a testing set, wherein the training set and the testing set are divided according to a ratio of 7:3; Step S23, converting field features in the data set into a numerical value sequence in a target coding mode, scaling the numerical value sequence by using a normalization method, merging multidimensional features of each flow data into an edge feature vector h, and storing the edge feature vector h in an attribute tag to obtain an original edge set; the field characteristics comprise an address protocol field, an application layer protocol, a TCP (transmission control protocol) flag bit accumulated value, a client TCP flag bit accumulated value, a server TCP flag bit accumulated value, an ICMP type and code combined value, an ICMP type value, a DNS query transaction identifier, a DNS query type and an FTP client command return code; Step S24, before constructing an attribute graph, pairing source address nodes of original edges with the same protocol value in pairs through an application layer protocol field, adding auxiliary edges for the missing potential communication relationship between the two source address nodes without the original edges according to a preset rule, and obtaining an auxiliary edge set as structural supplement to the original edge set so as to enhance the structural integrity and semantic relevance of the unmanned plane network flow attribute graph; Step S25, auxiliary identification is distributed to the flow data in the data set, wherein the value of the auxiliary identification is 0 or 1, the value of the auxiliary identification is 0 and represents that the flow data belongs to a normal sample, and the value of the auxiliary identification is 1 and represents that the flow data belongs to an attack sample; Combining an original edge set and an auxiliary edge set, taking a source address and a destination address as a starting point and an end point of an attribute map, carrying an edge feature vector h, a sample type label and an auxiliary identifier as edge attributes on the edges from the starting point to the end point of the attribute map, modeling the attribute map by adopting a multiple map data structure in a map computing library NetworkX, enabling multiple parallel edges to exist between the same pair of nodes by the multiple map data structure to obtain an undirected attribute map, converting the undirected attribute map into a directed attribute map by a to_directed function to reflect the directionality of communication, and finally, converting network flow data of the unmanned aerial vehicle into a map object which can be processed by a depth map learning library by calling a depth map learning library DGL in a map neural network to obtain a directed multiple attribute map G which is enhanced based on an application layer protocol, thereby obtaining a training map and a test map, and then endowing nodes in the directed multiple attribute map G which are enhanced based on the application layer protocol with all 1-occupied feature vectors consistent with the edge feature dimension so as to meet the input requirements of a coder in a subsequent neural network model.
- 3. The method according to claim 2, wherein in the step S3, the structure of the optional normalized nonlinear units in the student model is determined by a parameter norm_type, when the parameter norm_type=batch, all the optional normalized nonlinear units in the student model are composed of a batch normalization operation, a ReLU activation function, and a regularization operation, when the parameter norm_type=layer, all the optional normalized nonlinear units in the student model are composed of a layer normalization operation, a ReLU activation function, and a regularization operation, and when the parameter norm_type=none, all the optional normalized nonlinear units in the student model are composed of a ReLU activation function, and a regularization operation; In the student model, a first full-connection layer executes linear transformation from an input dimension to a hidden dimension, a middle full-connection layer and a normalization nonlinear unit execute constant-dimension linear transformation from the hidden dimension to the hidden dimension, a last full-connection layer executes linear transformation from the hidden dimension to an output dimension, after each full-connection layer executes linear transformation except the last layer, an optional normalization layer, a ReLU activation function and regularization operation in an optional normalization nonlinear unit are sequentially executed to form a normalization to nonlinear to regularization sequence, the last layer only executes linear transformation of the full-connection layer, the optional normalization layer is determined by a parameter norm_type, and batch normalization operation or layer normalization operation or no normalization operation is executed.
- 4. A method according to claim 3, wherein said step S5 comprises the steps of: The training embedded vector output by the teacher model is used as input of the student model, the training embedded vector is transmitted to the student model for training, then the logarithmic probability of the student model training is output, hard tag loss is calculated through the logarithmic probability of the student model training and the real tag on the training chart, meanwhile, probability distribution is obtained after the logarithmic probability of the student model training is calculated through a Softmax activation function, probability distribution obtained after the logarithmic probability of the teacher model training output by the teacher model is calculated through the Softmax activation function is used for calculating soft tag loss, the soft tag loss is used for measuring the difference between the output distribution of the student model and the soft tag distribution of the teacher model after being projected, constraint is carried out on the student model, back propagation and updating are carried out on parameters of the student model through an Adam optimizer, and finally the student model is optimized through the combination of the hard tag loss function and the soft tag loss function, and finally the trained student model is obtained.
- 5. The method according to claim 4, wherein the step S6 comprises the steps of: The method comprises the steps of taking a test embedded vector generated by a trained teacher model as input, using a trained student model to carry out reasoning, outputting logarithmic probability of student model test, converting the logarithmic probability into test probability distribution through a Softmax activation function, outputting a sample type label corresponding to the maximum probability as a test prediction label, comparing a real label on a student model access test chart with the converted test probability distribution, and evaluating multi-classification performance of the model through accuracy rate, recall rate and F1 score.
- 6. The method of claim 4, wherein the overall loss function total_loss is calculated by the following formula when the student model is jointly optimized by the hard tag loss function and the soft tag loss function: ; Wherein loss_cls is hard tag loss of knowledge distillation and adopts a cross entropy loss function, loss_kd is soft tag loss of knowledge distillation and adopts KL divergence loss, alpha is a weight coefficient of the loss function, and T represents distillation temperature and is used for balancing the importance of different loss items; the computational expression of the cross entropy loss function is as follows: ; Where i represents the class of the sample, n is the total number of classes, Z i is the unnormalized predictive value of the i-th class of sample, the unnormalized predictive value of the i-th class of sample is represented by the score of the i-th class output by the network, i.e. the original score which does not pass through the Softmax activation function, j represents the class of the sample, Z j represents the unnormalized predictive value of the j-th class of sample, y i represents the hard tag indicating variable of the sample in the class i, A logarithm of probability representing model predictive class i; Activating a function for Softmax for converting the non-normalized predictive score into a probability distribution; The calculation expression of the KL divergence loss is as follows: ; Wherein i represents the class of the sample, n represents the total number of classes, Q is the target distribution of the teacher model, and P is the prediction distribution of the student model; The expression of the prediction probability distribution of the teacher model on the class i is as follows: ; Wherein i and j represent the class of the sample, n is the total number of classes, T i represents the original score of the teacher model on class i, c j represents the original score of the teacher model on class j, and T represents the distillation temperature; represents an exponential function based on a natural constant e, wherein, Represents the un-normalized weight obtained by scaling the original score T i by the temperature T and mapping by the exponential function, Representing the un-normalized weight obtained by scaling the original score c j by the temperature T and mapping by an exponential function; the expression of the predictive probability distribution of the student model on the class i is: ; Where i and j represent the class of the sample, n is the total number of classes, s i represents the raw score of the student model on class i, s j represents the raw score of the student model on class j, and T represents the distillation temperature.
- 7. An unmanned network intrusion detection device comprising one or more processors and memory coupled to the one or more processors, the memory for storing computer program code comprising computer instructions that the one or more processors invoke to implement the knowledge distillation based unmanned network intrusion detection multi-classification method of any one of claims 1-6.
- 8. A computer readable storage medium storing instructions which when executed by a processor implement the knowledge distillation based unmanned network intrusion detection multi-classification method of any one of claims 1 to 6.
Description
Unmanned aerial vehicle network intrusion detection multi-classification method based on knowledge distillation Technical Field The application relates to the technical field of unmanned aerial vehicle network intrusion detection methods, in particular to an unmanned aerial vehicle network intrusion detection multi-classification method based on knowledge distillation. Background With the rise of low-altitude economy, the application scale of the unmanned aerial vehicle in the fields of urban logistics, environmental monitoring, emergency rescue, public safety and the like is continuously expanded, and the communication network of the unmanned aerial vehicle is a core support for low-altitude airspace operation. The low-altitude economic unmanned aerial vehicle network is generally composed of an aerial unmanned aerial vehicle, a ground base station, an edge node and a background control center, has a complex communication link structure, and has high mobility and frequent interaction characteristics due to an open running environment. In such network environments, communication links and nodes are highly vulnerable to potential attack targets, including denial of service attacks, spoofing attacks, traffic hijacking, and other malicious intrusion actions. Once an attack occurs, unmanned aerial vehicle network service interruption, task interruption, data leakage, aircraft control failure and even serious threat to social public safety can be caused. Therefore, the intrusion detection of the unmanned aerial vehicle network is not only a necessary means for guaranteeing the communication safety, but also a basic technology for supporting low-altitude economic and healthy development. The existing unmanned aerial vehicle network intrusion detection method mainly comprises a traditional machine learning method based on feature engineering and a detection method based on deep learning. The traditional machine learning method based on the feature engineering realizes the abnormal recognition by manually extracting flow statistical features or system behavior indexes and combining classification models such as a support vector machine, a random forest, a K nearest neighbor and the like. The detection method based on the deep learning adopts a deep learning model such as a self-encoder, LSTM, CNN, GNN and the like, can automatically extract high-dimensional characteristics and capture nonlinear attack modes, and is particularly superior in modeling space-time dependence and graph structure information. However, the existing methods have the following disadvantages: (1) The existing deep learning model can obtain higher detection precision in a conventional network environment, but has huge model parameter scale and high floating point operation quantity, and obviously exceeds the calculation force condition of an unmanned aerial vehicle-mounted embedded platform. The model often depends on external GPU or cloud computing resources to run, so that unmanned aerial vehicle terminals with limited power consumption, storage and load are difficult to directly deploy, and the technical requirements of unmanned aerial vehicle networks on light weight and low delay reasoning cannot be met. (2) Most of the existing methods abstract network traffic or host behavior into independent feature vectors, lack modeling capability of inter-node structure dependence and association modes, and are difficult to effectively reflect global and local spatial features including point-to-point interaction, multi-node cooperation and link dependence in an unmanned aerial vehicle network, so that model accuracy and generalization capability are limited. (3) The existing model is often trained by relying on large-scale high-quality labeling data, but in an actual unmanned aerial vehicle network, attack samples are scarce and difficult to label in time, real attack behaviors have concealment and diversity, dynamic evolution characteristics are further provided, and the existing model is difficult to fully cover through a small number of labels, so that the existing model is insufficient in performance when dealing with emerging threats, and the problems of detection report omission and poor adaptability to unknown attacks are easy to occur. Disclosure of Invention The invention aims to provide an unmanned aerial vehicle network intrusion detection multi-classification method based on knowledge distillation, which is characterized in that the knowledge distillation technology and a side-level unsupervised graph learning method are used, so that the intrusion detection accuracy is ensured, the calculation complexity of an inference stage is reduced, and the running efficiency of a model in real-time detection under an unmanned aerial vehicle network environment is improved. The invention provides a knowledge distillation-based unmanned aerial vehicle network intrusion detection multi-classification method, which comprises the following steps: Step