CN-116975714-B - Improved method of ALBERT model applied to encryption traffic classification

CN116975714BCN 116975714 BCN116975714 BCN 116975714BCN-116975714-B

Abstract

The invention discloses an improved method of ALBERT model applied to encryption traffic classification, belonging to the field of network encryption traffic classification. Firstly, capturing a network flow data packet, and converting the network flow data packet into a standard input format required by ALBERT models after preprocessing. And improving the Self-attribute layer and the FFN layer in the ALBERT model to obtain an improved ALBERT model. And performing unsupervised pre-training on the improved ALBERT model by using a MLM method by utilizing standard input of ALBERT models, fine-tuning the model by using class label data after the unsupervised pre-training is completed, removing a mask shielding layer, and storing the final ALBERT model. The invention can effectively reduce the resource loss, maintain stable accuracy and have wider application prospect.

Inventors

Niu Songpeng
DONG JIANWU
SUO SHUAI
CAI RONGHUA
LIU GANG

Assignees

北京赛思信安技术股份有限公司

Dates

Publication Date: 20260508
Application Date: 20230727

Claims (3)

1. An improved method of ALBERT model applied to encryption traffic classification is characterized by comprising the following specific steps: Step one, capturing a network flow data packet, and converting the network flow data packet into a standard input format required by ALBERT models after preprocessing, wherein the standard input format is used as a training set; The preprocessing process of the network flow data packet comprises the following steps: firstly, analyzing a network flow data packet, and extracting SSL handshake data and transmission data from the network flow data packet; then, removing Ethernet header, IP and TCP port information in SSL handshake data and transmission data, converting each bit value into corresponding double characters according to 16 system, generating two character strings, and respectively marking the character strings as A and B; Finally, dividing A and B into sections from left to right by 4 characters respectively, generating two character sequences, combining, judging whether the length of the combined character sequences meets the standard input length s of ALBERT, and when the length is less than s, filling zero after the length is less than s; The two character sequences A and B are respectively: , the ALBERT standard inputs for the transformation are as follows: Wherein the method comprises the steps of ; Step two, improving a Self-attribute layer and an FFN layer in the ALBERT model to obtain an improved ALBERT model; The Self-Attention layer is a mutual Self-Attention layer among the token, and the improvement is that a mask shielding layer is added in the Attention matrix, wherein the mask shielding layer is used for shielding an SSL handshake part or a file data transmission part so that other parts except the shielded part pay Attention to the global token; The improvement of the FFN layer is to reduce the number of output nodes of a neural network of a certain layer of the original FFN, and then add a new neural network, wherein the new neural network takes the reduced number of output nodes as input, the number of output nodes is the reduced number of nodes, and finally the input nodes and the output nodes of the new neural network are spliced together to form the same output size as the original FFN; Step three, performing unsupervised pre-training on the improved ALBERT model by using a training set by adopting an MLM method; MLM refers to masked language modeling, masking language modeling; the specific process of the unsupervised pre-training is as follows: Step 301, training set data in a standard input format is input into an improved ALBERT model; Step 302, randomly replacing some token in the model input by using any token with 10% probability, and taking the original token as a prediction label; Step 303, performing feedforward operation by adopting an MLM method, and calculating the difference between the model output of the token at the replacement position and the prediction label by using cross entropy; step 304, optimizing model parameters through feedback adjustment by combining the difference between model output and a predictive label, judging whether the training stop condition is met, if not, returning to step 302 to continue training until the training stop condition is met, if so, completing the unsupervised pre-training of the model; the training stop conditions are: After all training data are sequentially input ALBERT, a round of model training is completed, the count is increased by 1, and when the value of the count is larger than a preset batch, a stopping condition is reached, and the training is terminated; And step four, after the unsupervised pre-training is completed, removing the mask shielding layer after the model is finely tuned by using the class label data, and storing a final ALBERT model.
2. The improvement of ALBERT model for encrypted traffic classification according to claim 1, wherein in said improvement of FFN layer in step two, the reduction ratio of network output nodes varies with the task.
3. The improvement of ALBERT model applied to encrypted traffic classification according to claim 1, wherein in the fourth step, the specific process of fine tuning the pre-trained ALBERT model is: step 401, inputting the model standard input obtained in the step one and the category label data into a pre-trained ALBERT model; Step 402, extracting output corresponding to [ CLS ] through feedforward operation, obtaining output with the same size as the class label through a layer of trainable fully-connected neural network, and calculating the difference between the output and the real class label data by using cross entropy; Step 403, optimizing model parameters by feedback adjustment in combination with the difference between the model output and the real type label data, judging whether a stopping condition is met, if not, returning to step 402 until the stopping condition is met, if yes, executing step 404; the stopping condition is that whether the training is stopped or not is judged by comparing whether the count of the model training is larger than a preset batch or not; And step 404, removing a mask shielding layer added in the Self-Attention layer in the model, cutting off the input tensor of the mask shielding part, and storing the model.

Description

Improved method of ALBERT model applied to encryption traffic classification Technical Field The invention belongs to the field of network encryption traffic classification, and particularly relates to an improvement method of ALBERT model applied to encryption traffic classification. Background Network traffic classification is a technique for identifying network traffic classes. Traffic encryption technology is increasingly being required to protect the interests of internet users in the public network. However, this also gives the lawbreaker the opportunity to implant malware, and the need for network traffic classification is increasing for building a good network environment. The network encryption traffic classification includes a traditional machine learning method and a deep learning-based method. Traditional traffic classification has failed to meet demand due to the complexity of dynamic network environments. In contrast, deep learning achieves better results. It is difficult to apply the method to a smart phone or a general computer only because of the large resource consumption. Disclosure of Invention The invention aims to reduce the consumption of resources under the condition of not reducing the accuracy of a model, so that the landing is easier to realize, and provides an improved method applied to ALBERT models for classifying encrypted traffic, which selects ALBERT of fewer layers to process the problem of classifying the encrypted traffic and is used for identifying malicious traffic data. An improved method applied to ALBERT models for encryption traffic classification comprises the following specific steps: step one, capturing a network flow data packet, and converting the network flow data packet into a standard input format required by ALBERT models after preprocessing. The specific process of pretreatment is as follows: firstly, analyzing a network flow data packet, and extracting SSL handshake data and transmission data from the network flow data packet; then, removing Ethernet header, IP and TCP port information in SSL handshake data and transmission data, converting each bit value into corresponding double characters according to 16 system, generating two character strings, and respectively marking the character strings as A and B; And finally, dividing A and B into sections from left to right by 4 characters respectively, generating two character sequences, combining, judging whether the length of the combined character sequences meets the standard input length s of ALBERT, when the length is less than s, filling zero after the length is less than s, and when the length is greater than s, gradually removing the tail character strings of the longer sequences in A and B until the condition is met. The two character sequences A and B are respectively: , the ALBERT standard inputs for the transformation are as follows: Wherein the method comprises the steps of 。 Step two, improving a Self-attribute layer and an FFN layer in the ALBERT model to obtain an improved ALBERT model; the Self-Attention layer is a mutual Self-Attention layer between the token, and the improvement is to add a mask masking layer in the Attention matrix. The mask masking layer serves to mask the SSL handshake part or the file data transfer part so that other parts than the masked part are focused on the global token. The improvement of the FFN layer is to reduce the number of output nodes of the neural network of a certain layer of the original FFN and then add a new neural network. The new neural network takes the reduced output node number as input, the output node number is the reduced node number, and finally the input and output nodes of the new neural network are spliced together to form the same output size as the original FFN. The reduction ratio of the network output nodes varies with the task. Thirdly, performing unsupervised pre-training on the improved ALBERT model by adopting a Masked Language Modeling (MLM) method; the specific process of the unsupervised pre-training is as follows: Step 301, inputting the data in the standard input format obtained in the step one into the improved ALBERT model; Step 302, randomly replacing some token in the model input by using any token with 10% probability, and taking the original token as a prediction label; Step 303, performing feedforward operation by adopting an MLM method, and calculating the difference between the model output of the token at the replacement position and the prediction label by using cross entropy; step 304, optimizing model parameters through feedback adjustment by combining the difference between the model output and the predictive label, judging whether the training stop condition is met, if not, returning to step 302 to continue training until the training stop condition is met, and if so, completing the unsupervised pre-training of the model. The training stop conditions are: After all training data are sequentially input ALBERT, a ro