Search

CN-122027262-A - DoH flow data enhancement method based on diffusion model and TTP analysis

CN122027262ACN 122027262 ACN122027262 ACN 122027262ACN-122027262-A

Abstract

The invention discloses a DoH flow data enhancement method based on a diffusion model and TTP analysis, and belongs to the technical field of network security and artificial intelligence intersection. The method comprises the steps of analyzing DoH network attack based on an ATT & CK matrix, extracting TTP features including plaintext features, statistical features and flow features, carrying out joint probability modeling on the DoH flow features and the TTP features by using a diffusion model, learning joint distribution of the flow features and the TTP features by using known real DoH flow data in a training stage, and driving the diffusion model to generate high-quality enhanced flow data conforming to the feature distribution by only inputting the TTP features extracted from a small amount of new malicious flow in a generating stage. The method solves the problems of data scarcity, sample imbalance and insufficient model generalization capability in the detection of the DoH malicious traffic, and improves the identification capability of a detection model on small samples and novel malicious traffic.

Inventors

  • XIA CHUNHE
  • GUAN XIAOWEI
  • WANG TIANBO
  • ZHOU WEIDONG
  • WANG HAIQUAN

Assignees

  • 北京航空航天大学

Dates

Publication Date
20260512
Application Date
20260203

Claims (9)

  1. 1. The DoH flow data enhancement method based on the diffusion model and the TTP analysis is characterized by comprising the following steps of: Firstly, extracting a group of TTP features based on TTP analysis of ATT & CK matrix on network attack behavior of the DoH protocol; Extracting flow characteristics from the known DoH flow data, and correlating the flow characteristics with the TTP characteristics extracted in the step one; Thirdly, constructing and training a diffusion model taking TTP characteristics as conditions, and learning the conditional probability distribution of the flow characteristics by utilizing the known DoH flow data and the corresponding TTP characteristics; and step four, in the data enhancement stage, aiming at the target small sample DoH malicious traffic, extracting TTP characteristics of the target small sample DoH malicious traffic, inputting the TTP characteristics into a trained diffusion model, and generating enhanced traffic data conforming to the TTP characteristic distribution.
  2. 2. The method for enhancing DoH traffic data based on diffusion model and TTP analysis of claim 1, wherein extracting TTP features in step one comprises; Screening technical and tactical items related to DoH network attack from an ATT & CK matrix according to the attack principle of the DoH protocol; utilizing the security meeting and the disclosed attack analysis report to supplement and verify the details of the screened technical and tactic; determining a corresponding observable data source from the ATT & CK framework according to the supplemented technical and tactical techniques; screening and defining TTP characteristics from the data sources based on a DoH protocol communication mechanism and a network attack flow; And combining the related research paper with the actual DoH attack flow data, and carrying out validity verification and final determination on the defined TTP characteristics.
  3. 3. The method for enhancing DoH traffic data based on diffusion model and TTP analysis according to claim 1 or 2, wherein said TTP features comprise at least one of the following; plaintext features, including response time statistics and cipher suite information extracted from TLS handshake or TCP header; the statistical characteristics comprise packet length, traffic, duration, frequency and entropy value characteristics obtained from the overall statistics of the traffic session; Flow characteristics include a relative encoding that characterizes the timing position of a packet in a session.
  4. 4. The method for enhancing DoH flow data based on diffusion model and TTP analysis according to claim 1, wherein the extracting and processing of flow characteristics in the second step comprises; dividing a complete DoH session into a plurality of continuous traffic clusters according to a data packet time interval threshold; and slicing the feature vector formed by the flow cluster sequence by using a sliding window with a fixed size to obtain a flow feature segment with a fixed dimension.
  5. 5. The method for enhancing DoH traffic data based on diffusion model and TTP analysis according to claim 1, wherein the training process of diffusion model in step three comprises; sampling from a known real DoH flow data set to obtain a flow characteristic fragment and TTP characteristics of a session to which the flow characteristic fragment belongs; TTP characteristics and random noise are used as conditions to be input into a diffusion model, and training samples are constructed through a forward diffusion process; training a model to learn a reverse diffusion process so as to predict and remove noise, wherein an optimization target is to enable generated data distribution to approach to real condition distribution; Training is performed using a weighted loss function that combines the large sample dataset with the small sample validation set, and an early-stop strategy is used to prevent overfitting.
  6. 6. The method for enhancing DoH traffic data based on diffusion model and TTP analysis according to claim 1, wherein the data generating process in step four comprises; Extracting TTP characteristics of the target small sample DoH malicious traffic to be enhanced; inputting the extracted TTP characteristics and random noise sampled from standard normal distribution into a trained diffusion model; the model executes a reverse diffusion process, gradually removes noise, and outputs high-quality DoH flow characteristic data matched with the input TTP characteristic.
  7. 7. A DoH traffic data enhancement system based on diffusion model and TTP analysis, comprising: The TTP feature analysis and extraction module is configured to execute the TTP feature extraction procedure set forth in claim 1 or 2, and output a TTP feature set; A flow characteristic processing module, configured to extract and normalize flow characteristics from original flow data, and specifically execute the method of claim 4; A conditional diffusion model training module for constructing and training a generation model conditioned on TTP characteristics, specifically performing the method of claim 5; A small sample traffic enhancement generation module for receiving TTP characteristics of a target traffic and generating enhancement data, in particular performing the method of claim 6.
  8. 8. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of any one of claims 1 to 6 when the program is executed by the processor.
  9. 9. A computer readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the method according to any one of claims 1 to 6.

Description

DoH flow data enhancement method based on diffusion model and TTP analysis Technical Field The invention relates to the technical field of network security and artificial intelligence, in particular to a DoH flow data enhancement method based on a diffusion model and TTP analysis. Background The Domain name system (Domain NAME SYSTEM, DNS) is a hierarchical distributed database system responsible for converting Domain names that are convenient for human reading into IP addresses that are friendly to machines, one of the most important infrastructures on the internet, and almost all network activities are independent of DNS. However, the DNS protocol encodes data transmitted between the user and the server in a plaintext form, and any third party can directly obtain the data information of the user through a DNS message, which creates an opportunity for launching a network attack based on the DNS protocol. Such as man-in-the-middle attacks, privacy data disclosure, DNS hijacking attacks, etc. The data enhancement technique (Data Augmentation) generates new data samples from existing data in some way or changes the distribution characteristics of the original data. The data enhancement technology can solve the problem of data size faced by a machine learning model, can change the data distribution of a training set to be in accordance with real distribution when the problem of data drift occurs, and can generate a large amount of data in accordance with the characteristics of new malicious traffic with lower time cost and labor cost when the new type of malicious traffic occurs, thereby providing data support for traffic classification detection tasks. TTP (Tactics, techniques, and Procedures) analysis is an important methodology in the field of network security. Tactics are high-level descriptions of the behavior and strategy of an attacker, which includes a series of actions and actions by which the attacker achieves a specific goal. The technology refers to non-specific guidelines and intermediate methods for implementing tactical actions, which describe how to implement attack targets using various means. A program refers to a series of operations that perform an attack tactic using a specific technique, and it involves detailing the activities performed by an attacker in order to successfully reach a target. With the continuous evolution and complexity of network attacks, the difficulty of the traditional security defense method for coping with network threats becomes higher and higher, and the continuous evolution of network attacks is difficult to comprehensively resist by only relying on security solutions such as firewalls, antivirus software and the like. In this case, the network security domain first systematically proposes a TTP analysis method around 2014, emphasizing protection from the behavioral patterns (tactics, techniques) of an attacker instead of a single attack trace (program). The ATT & CK matrix is a tactical and technical knowledge base of the united states MITRE company based on actual network attacks and observations that aims to describe and classify tactics, techniques and sub-techniques used by attackers in network attacks. In recent years, as the importance of network security has grown, the ATT & CK framework has been widely known in the network security industry. The databases of the ATT & CK matrix continually update contributions from research communities making them the basis for APT threat intelligence. Today, MITRE ATT & CK becomes a complex network security framework introducing 14 tactics, 190 multiple technologies, 380 multiple sub-technologies (MITRE. ATT & CK [ EB/OL ] (2025): https:// attack. MITRE. Org) used by an attacker in the course of an attack. For each technology that network attack may take, the ATT & CK matrix records the malware using the attack technology, the hacker organization developing the malware, and the corresponding real network attack case, and provides the key indexes of the mitigation measures for reducing the harm of the attack technology and the implementation of the detection attack technology. The DoH protocol (DNS over HTTPS) is used as an advanced DNS encryption technology, has been widely used in mainstream browsers, operating systems and public DNS service providers, and gradually becomes an important means for improving network communication security and privacy protection, but the nature of the DoH protocol for plaintext encryption may cause failure of a detection method for DNS tunnel attack. The DoH protocol encrypts DNS query through HTTPS, so that user privacy security is improved, but encryption characteristics of the DoH protocol are also used by malicious software for constructing a hidden tunnel, performing Command-and-Control (C & C) communication based on the DoH protocol, and other attacks. The existing DOH malicious flow detection method based on machine learning is faced with the problems of difficult data acquisition,