CN-122027250-A - APT attack detection method, device and system based on lightweight deep learning framework

CN122027250ACN 122027250 ACN122027250 ACN 122027250ACN-122027250-A

Abstract

The application provides an APT attack detection method, device and system based on a lightweight deep learning framework. The method comprises the steps that each node builds a local sample data set based on local network flow data of the node to train a local classification model of the node, the trained model parameters are uploaded to a federation server, the federation server gathers model parameters of all nodes and optimizes global model parameters to send the optimized global model parameters to each node, each node updates parameters of the local classification model based on the received global model parameters, and repeatedly executes model training steps on the updated local classification model until the model converges, and each node classifies flow data to be detected based on the converged global model parameters to obtain an APT attack detection result. The technical scheme of the application effectively reduces the complexity and communication load of the model while ensuring the detection precision, and improves the practicability and stability of the model in a distributed and resource-limited environment.

Inventors

Fu Xingbing
Lou Supeng
HUANG BUTIAN
WANG DONG
Ning Jianting

Assignees

杭州电子科技大学

Dates

Publication Date: 20260512
Application Date: 20260128

Claims (10)

1. An APT attack detection method based on a lightweight deep learning framework, the method comprising: Step S1, each participating node participating in federation learning constructs a local sample data set based on local network flow data; Step S2, each participating node trains a local classification model based on a local sample data set and uploads trained model parameters to a federal server, wherein the local classification model comprises a MSCNN network, a transducer network and a classifier which are sequentially cascaded, and the steps are that: the MSCNN network comprises multiple paths of parallel one-dimensional depth separable convolution branches, each path of branch is configured to have convolution kernels with different time scales so as to respectively capture instantaneous behavior characteristics, persistent attack characteristics and long-term local behavior characteristics of APT attack in network flow data, and output characteristic vectors of each path of branch are spliced and fused to output multi-scale fusion characteristics; The converter network comprises a plurality of converter layers with shared parameters, each layer multiplexes the same converter block instance to extract long-range dependency relationship among different stages of APT attack in the multi-scale fusion characteristic and output a global characteristic vector; the classifier classifies the global feature vector and outputs a prediction result of an APT attack class; Step S3, the federation server aggregates the model parameters from all the participating nodes to obtain global model parameters, and optimizes the global model parameters by utilizing a global verification data set so as to send the optimized global model parameters to all the participating nodes; step S4, each participating node updates parameters of the local classification model based on the received global model parameters, and repeatedly executes step S2 on the updated local classification model until the global model parameters are converged; and S5, classifying the flow data to be detected by each participating node based on the converged global model parameters to obtain an APT attack detection result.
2. The method of claim 1, wherein the one-dimensional depth separable convolution branches are configured to first perform a channel-by-channel convolution and to perform a point-by-point convolution on a channel-by-channel convolution result.
3. The method of claim 1, wherein each path of the MSCNN network of one-dimensional depth-separable convolutions is further configured with a batch normalization layer after the branches, and wherein the output feature vectors of the one-dimensional depth-separable convolutions are normalized.
4. The method of claim 1, wherein after the output feature vectors of the branches are merged by stitching, the MSCNN network is further configured to adaptively pool the merged feature vectors to output the multi-scale merged feature.
5. The method of claim 1, wherein the fransformer block instance employs a pre-normalization structure to globally average pool sequence features output by the fransformer layer.
6. The method of claim 1, wherein the Transformer block instance performs a self-attention operation on the multi-scale fusion feature by a self-attention mechanism to extract long-range dependencies between different phases in the multi-scale fusion feature with respect to APT attacks.
7. An APT attack detection device based on a lightweight deep learning framework, said device comprising: The data set construction unit is used for constructing a local sample data set based on network traffic data local to each participating node; The model training unit is used for training a local classification model based on the local sample data set and uploading the trained model parameters to the federal server, wherein the local classification model comprises a MSCNN network, a transducer network and a classifier which are sequentially cascaded, and the local classification model comprises the following components: the MSCNN network comprises multiple paths of parallel one-dimensional depth separable convolution branches, each path of branch is configured to have convolution kernels with different time scales so as to respectively capture instantaneous behavior characteristics, persistent attack characteristics and long-term local behavior characteristics of APT attack in network flow data, and output characteristic vectors of each path of branch are spliced and fused to output multi-scale fusion characteristics; The converter network comprises a plurality of converter layers with shared parameters, each layer multiplexes the same converter block instance to extract long-range dependency relationship among different stages of APT attack in the multi-scale fusion characteristic and output a global characteristic vector; the classifier classifies the global feature vector and outputs a prediction result of an APT attack class; The model aggregation unit is used for aggregating model parameters from all the participating nodes to obtain global model parameters, and optimizing the global model parameters by utilizing a global verification data set so as to send the optimized global model parameters to all the participating nodes; The model training unit is also used for updating parameters of the local classification model based on the received global model parameters, and repeatedly executing training on the updated local classification model until the global model parameters are converged; and the attack detection unit is used for classifying the flow data to be detected based on the converged global model parameters to obtain an APT attack detection result.
8. An APT attack detection system based on a lightweight deep learning framework is characterized by comprising a plurality of participating nodes and a federal server; The participating node is configured to construct a local sample data set based on local network traffic data, train a local classification model based on the local sample data set, and upload trained model parameters to a federal server, where the local classification model includes a MSCNN network, a Transformer network, and a classifier that are sequentially cascaded, where: the MSCNN network comprises multiple paths of parallel one-dimensional depth separable convolution branches, each path of branch is configured to have convolution kernels with different time scales so as to respectively capture instantaneous behavior characteristics, persistent attack characteristics and long-term local behavior characteristics of APT attack in network flow data, and output characteristic vectors of each path of branch are spliced and fused to output multi-scale fusion characteristics; The converter network comprises a plurality of converter layers with shared parameters, each layer multiplexes the same converter block instance to extract long-range dependency relationship among different stages of APT attack in the multi-scale fusion characteristic and output a global characteristic vector; the classifier classifies the global feature vector and outputs a prediction result of an APT attack class; The federation server is used for aggregating model parameters from all the participating nodes to obtain global model parameters, and optimizing the global model parameters by utilizing a global verification data set so as to send the optimized global model parameters to all the participating nodes; The participating node is further configured to update parameters of the local classification model based on the received global model parameters, and repeatedly train the updated local classification model until the global model parameters converge, and classify the traffic data to be detected based on the converged global model parameters to obtain an APT attack detection result.
9. An electronic device, comprising: Processor, and A computer readable storage medium having stored therein computer program instructions which, when executed by the processor, cause the processor to perform the method of any of claims 1 to 6.
10. A computer readable storage medium, characterized in that the computer readable storage medium has stored thereon a computer program which is executed by a processor by the method according to any of claims 1 to 6.

Description

APT attack detection method, device and system based on lightweight deep learning framework Technical Field The invention relates to the technical field of network security, in particular to an APT attack detection method, device and system based on a lightweight deep learning framework. Background Advanced persistent threats (ADVANCED PERSISTENT THREAT, APT) are a very dangerous and highly complex form of network attack, which are generally characterized by strong concealment, long duration, multi-stage evolution of the attack process, and the like. Compared with the traditional network attack, the APT attack is often implemented in a progressive manner, and is hidden in a target system for a long time by deliberately avoiding the existing security detection mechanism, so that serious security threat is formed to key infrastructure and industrial environment. APT attacks typically exhibit significant features of persistence, progression, masking, pertinence, multiple stages, and high organization. In the actual attack process, an attacker can often be hidden in an invaded system for a long time, and the attack target is gradually completed through a complex evasion technology and a staged attack flow. The attack behavior has obvious correlation in the time dimension, and meanwhile, the data distribution shows a high imbalance characteristic, so that the difficulty of APT attack detection is remarkably increased, and higher requirements are provided for the local behavior feature extraction capability and the cross-stage long-term dependence modeling capability of a detection model. In recent years, deep learning (DEEP LEARNING) technology has made remarkable progress in the intrusion detection field, which exhibits good performance in attack recognition through automatic learning of complex timing features and spatial features in network traffic data. However, most of the existing APT detection methods based on deep learning rely on a centralized training mode, which is often difficult to deploy in actual industrial scenes and inter-organizational collaborative environments, mainly due to limitations of factors such as data privacy protection regulations, data sensitivity, and organizational management strategies. Meanwhile, the APT related attack data has scarcity and high confidentiality, and further restricts the improvement of centralized data convergence and model generalization capability. Federal learning (FEDERATED LEARNING) is used as a privacy protection type distributed learning paradigm, and collaborative training among multiple participants can be realized on the premise of not sharing original data, so that contradiction between data island and privacy disclosure is relieved to a certain extent. The characteristic enables the method to have higher application value in APT collaborative detection scenes under the distributed organization environment. However, the direct application of the deep learning model to the federal learning framework still faces various technical challenges, including that firstly, frequent model parameter interaction in the federal learning process brings about larger communication overhead and reduces the overall operation efficiency of the system, and secondly, a complex model generally has higher calculation and storage resource requirements at a local client side, so that the practical deployment feasibility of the complex model in a resource-limited edge environment is limited. Disclosure of Invention In view of the above, the application provides an APT attack detection method, device and system based on a lightweight deep learning framework, so as to effectively reduce the complexity and communication load of a model while ensuring the detection precision and improve the practicability and stability of the model in a distributed and resource-limited environment. Specifically, the application is realized by the following technical scheme: according to a first aspect of embodiments of the present specification, there is provided an APT attack detection method based on a lightweight deep learning framework, the method including: Step S1, each participating node participating in federation learning constructs a local sample data set based on local network flow data; Step S2, each participating node trains a local classification model based on a local sample data set and uploads trained model parameters to a federal server, wherein the local classification model comprises a MSCNN network, a transducer network and a classifier which are sequentially cascaded, and the steps are that: the MSCNN network comprises multiple paths of parallel one-dimensional depth separable convolution branches, each path of branch is configured to have convolution kernels with different time scales so as to respectively capture instantaneous behavior characteristics, persistent attack characteristics and long-term local behavior characteristics of APT attack in network flow data, and outp