CN-121984693-A - Network attack tracing evidence obtaining method

CN121984693ACN 121984693 ACN121984693 ACN 121984693ACN-121984693-A

Abstract

The invention relates to the technical field of computer network security, in particular to a network attack tracing evidence obtaining method. The method comprises the steps of fusing data from various heterogeneous data sources such as malicious software, network traffic, host logs and threat information, extracting deep features of the multi-modal data through a parallel special artificial intelligent model such as a convolutional neural network CNN and a cyclic neural network RNN, constructing an isomerism attribution knowledge graph by the extracted entities and relations, weighting edges in the graph according to credibility and uniqueness of evidence to resist deceptive tactics, analyzing the knowledge graph through the graph neural network, and automatically deducing the most likely source of attack through high-order complex association among the entities in a learning graph. The invention obviously improves the automation degree, accuracy and anti-deception capability of network attack tracing.

Inventors

ZHU SANLI
XU KANG
XU JUNJUN
YU YANG
TAO KAI

Assignees

南京邮电大学

Dates

Publication Date: 20260505
Application Date: 20251210

Claims (8)

1. The network attack tracing evidence obtaining method is characterized by comprising the following steps: collecting original data related to network attack events from various data sources, wherein the data sources comprise a malicious software binary file, network flow data, a host audit log and unstructured threat information text; feature extraction, namely automatically extracting multi-dimensional feature vectors by using a plurality of parallel feature extraction models aiming at the original data, wherein the feature extraction models comprise a code feature extraction model for processing a malicious software binary file and a semantic feature extraction model for processing threat information text; The graph construction, namely representing various entities extracted in the characteristic extraction step as nodes in a graph, and constructing edges connecting the nodes according to the inherent relevance among the entities to form a heterogeneous weighted attribution knowledge graph, wherein the weight of the edges represents the confidence degree of the association relation; And (3) attribution analysis, namely inputting the attribution knowledge graph into a pre-trained graphic neural network model, analyzing the high-order association and community structure in the graph by the model, and finally outputting the probability or classification result of attributing the attack event to a specific threat actor.
2. The network attack traceability evidence obtaining method according to claim 1, wherein in the feature extraction step, the code feature extraction model uses a convolutional neural network to process a gray level map representation of a malicious software sample to extract static visual features, and uses a cyclic neural network to process an API call sequence thereof to extract dynamic behavior features.
3. The network attack traceability evidence obtaining method according to claim 2, wherein the code feature extraction model adopts a convolutional neural network to process a gray level map representation of a malicious software sample to extract static visual features, and specifically comprises: Disassembling the malicious software binary file, obtaining relevant code information of the malicious software binary file, and converting the code information into gray level graph representation; Inputting the gray level image into a convolutional neural network, sliding and scanning on an image through a convolutional kernel in a convolutional layer, detecting local features in the image, and combining low-level features into high-level features through multi-layer convolutional operation; the feature map output by the convolution layer is downsampled through the pooling layer, and the dimension of the feature map is reduced; And converting the feature map after convolution and pooling into a one-dimensional vector through a full connection layer to obtain a static visual feature vector of the malicious software sample.
4. The network attack traceability evidence obtaining method according to claim 2, wherein the processing of the API call sequence by the cyclic neural network to extract the dynamic behavior feature specifically comprises: Dynamically running a malicious software sample through a sandbox environment, and recording a complete API call sequence generated in the execution process; Carrying out standardization processing on the acquired API call sequence, and mapping each API function into a unique integer identifier; inputting the preprocessed API call sequence into an embedding layer, converting integer identifiers of each API function into vector representations with low dimension and density, and capturing semantic association among the API functions; inputting the embedded vector sequence into a cyclic neural network, modeling the time sequence dependency relationship of API call through a memory mechanism thereof, and capturing short-range call dependency and long-range behavior modes in sequence; and (3) screening key features with large contribution to malicious behavior classification from hidden states of the cyclic neural network at all moments by adopting a time step attention mechanism, and aggregating to form feature vectors capable of representing the overall dynamic behaviors of the malicious software.
5. The network attack traceability evidence obtaining method according to claim 1, wherein in the feature extraction step, the semantic feature extraction model adopts a natural language processing model to automatically extract attack tactics, technologies and procedures, compromise indexes and the mentioned threat organization entities from threat intelligence texts.
6. The network attack traceability evidence obtaining method according to claim 1, wherein in the map construction step, when the weight of the edge is calculated, reliability of evidence sources, uniqueness of features and association type are comprehensively considered, and features easy to forge are given lower weight.
7. The network attack traceability evidence obtaining method according to claim 1, wherein in the feature extraction step, static analysis and dynamic analysis are performed on a malicious software sample at the same time, and two types of features are integrated into the multi-modal feature set together.
8. The network attack traceability evidence obtaining method according to claim 1, wherein in the attribution analysis step, the graph neural network model identifies derivative relations or tactical evolutions existing between novel attack activities and known threat organizations by learning patterns of historically attributed attack events.

Description

Network attack tracing evidence obtaining method Technical Field The invention relates to the technical field of computer network security, in particular to a network attack tracing evidence obtaining method. Background Feature code-based detection techniques have long been the cornerstone of network security defense. The method compares network traffic or files to identify known threats by maintaining a database of "signatures" of known malware or aggression. The method has the advantages of high accuracy and high speed for detecting the known threat. However, it cannot detect any unknown, emerging or variant threats, so-called "zero-day attacks". An attacker can easily change the morphology of malicious code by using polymorphic or morphing techniques, thereby bypassing feature code matching. In addition, no file malware runs entirely in memory, also disabling traditional methods based on file scanning. This approach, which relies on known patterns, is difficult to truly identify against increasingly complex and customized attacks. In order to make up for the deficiency of feature code detection, a series of more advanced analysis techniques have been developed in the industry. For example, anomaly-based detection attempts to identify deviations from routine activity by establishing a "normal" behavioral baseline, but this approach is often accompanied by a high false positive rate. At the same time, a "point solution" for a particular data type has also developed. For example, a malware binary file is visualized as a grayscale image, and classified using a Convolutional Neural Network (CNN) to extract its static structure and texture features. However, these advanced analytical techniques tend to be isolated, fragmented. A CNN model can identify a family of malware, but it is unknown to the software's communication behavior in the network, or to the attacker described in threat intelligence reports. An RNN model may discover a suspicious sequence of API calls, but cannot directly relate it to static code features of malware. The analysis of the "information island" phenomenon prevents the attack event from forming a complete and unified knowledge. The nature of the network attack tracing is precisely that these scattered threads from different dimensions and different modalities are effectively concatenated. The prior art is free of a systematic approach that allows for efficient fusion of these independent, specialized analysis results and placement within a unified framework for associative analysis. Disclosure of Invention The invention aims to overcome the defects in the prior art, provides a network attack traceability evidence obtaining method, and aims to quickly, accurately and effectively resist attribution traceability of deceptive tactics on network attack event sources. A network attack tracing evidence obtaining method comprises the following steps: and collecting data, namely collecting original data related to network attack events from various data sources, wherein the data sources comprise malicious software binary files, network traffic data, host audit logs and unstructured threat information texts. The method overcomes the 'information island' problem of the traditional single-source analysis by constructing a comprehensive evidence base. The fusion multi-source data can ensure the robustness and fault tolerance of analysis through the information complementation of other data sources under the condition that part of the data sources are missing or damaged by an attacker. More importantly, it provides the necessary data basis for subsequent identification and combating fraudulent tactics such as "false flags" because it is difficult for an attacker to perfectly forge the trace simultaneously in all data dimensions. And (3) feature extraction, namely automatically extracting multi-dimensional feature vectors by using a plurality of parallel feature extraction models aiming at the original data, wherein the feature extraction models comprise a code feature extraction model for processing a malicious software binary file and a semantic feature extraction model for processing threat information text. Preferably, static analysis and dynamic analysis are performed on the malicious software sample simultaneously, and the two types of features are integrated into the multi-modal feature set together. Preferably, the code feature extraction model processes gray map representations of malware samples using convolutional neural networks to extract static visual features and processes its API call sequences using convolutional neural networks to extract dynamic behavioral features. Preferably, the code feature extraction model processes a gray map representation of a malware sample using a convolutional neural network to extract static visual features, and specifically includes: Disassembling the malicious software binary file, obtaining relevant code information of the malicious software binary