CN-116846636-B - Tracing graph-oriented host intrusion detection method, system and storage medium

CN116846636BCN 116846636 BCN116846636 BCN 116846636BCN-116846636-B

Abstract

The invention discloses a method, a system and a storage medium for detecting host intrusion oriented to a traceability graph, and belongs to the field of network security. The method comprises the steps of S1, collecting tracing data of a host to be tested to construct a tracing graph representing user behaviors, S2, mapping nodes in the tracing graph into roles, constructing a characteristic vector for representing attribute characteristics, structural characteristics and node characteristic matrixes of interaction relations among the nodes in the tracing graph, mapping nodes with similar characteristic vectors into the same role, S3, comprehensively considering time sequence relations among the attributes and edges of the nodes of the tracing graph and attention parameters among different roles to perform attribute time sequence random walk with attention, and S4, converting the obtained attribute time sequence random walk sequence into an embedded vector to extract the characteristics of the tracing graph and perform intrusion anomaly detection. The invention can perform deep characterization learning on the traceable data, reduce the training workload of the detection model and improve the accuracy and efficiency of intrusion detection.

Inventors

XIE YULAI
Dai Shuangbiao
FENG DAN

Assignees

华中科技大学

Dates

Publication Date: 20260505
Application Date: 20230704

Claims (10)

1. The method for detecting the host intrusion facing the traceability graph is characterized by comprising the following steps of: s1, collecting traceability data of a host to be tested to construct a traceability graph representing user behaviors; s2, mapping the nodes in the traceability graph into roles, constructing a node characteristic matrix of which characteristic vectors are used for representing attribute characteristics, structural characteristics and interaction relations among the nodes in the traceability graph, and mapping the nodes with similar characteristic vectors in the node characteristic matrix into the same role; s3, performing attribute time sequence random walk with attention, and generating an attribute time sequence random walk sequence with length L by taking a current node v i as a starting point: X i is the feature vector of node v i ; Representing a function mapping the node to a role, wherein, And is also provided with , Is a node With the next node Edges connected with each other; All edges in the tracing graph; Sequence numbers in the wander sequence; Representing nodes The creation time of the edge is earlier than the node Creation time of the edge between; the current node moves to neighbor nodes corresponding to different roles with different probabilities, wherein the probabilities are attention parameters among the roles and are used for reflecting importance among the roles; S4, converting the attribute time sequence random walk sequence into an embedded vector to extract the features of the traceability graph, and detecting intrusion anomalies.
2. The method according to claim 1, wherein in S3, attention parameters between the roles are characterized by a role attention matrix M, and the manner of obtaining the role attention matrix M includes: S301, setting the transition probability as a current role attention matrix M; S302, performing attribute time sequence random walk with attention by adopting a current role attention matrix M to obtain an embedded vector of a current node v i And embedded vectors corresponding to all neighbor nodes of the current node ; S303, adopting the embedded vector And Updating the current character attention matrix M: = softmax( ) = = mean({e j , e j And v j }) Wherein softmax represents the normalization operator; Representing the role corresponding to the current node v i , and w j represents elements in the role set corresponding to all neighbor nodes of the current node v i ; Representing a w j role pair Importance of the character; Is the embedded vector corresponding to the current node v i ; Representing a node set belonging to a role w j in the neighbor nodes, wherein a mean function is used for aggregating embedded vectors corresponding to the same type of roles in the sequence; representing role sets corresponding to all neighbor nodes of the current node v i ; s304, repeating S302 and S303 until the character attention matrix M is stable, and obtaining the required character attention matrix M.
3. The method according to claim 2, wherein in S304, by calculating the distance between two character attention matrices obtained by two adjacent iterations, a number is obtained by which the transition probability change of a character exceeds a set first threshold value, and if the number is smaller than a set second threshold value, the character attention matrix M tends to be stable.
4. A method according to any one of claims 1-3, wherein in S4, the attribute time-series random walk sequence is input to SkipGram model to calculate the embedded vector for each node.
5. The method of claim 4, wherein in S4, the embedded vector is input into a pre-trained intrusion detection model for intrusion anomaly detection.
6. A method according to any of claims 1-3, characterized in that the mapping of nodes as functions of roles Is a binary operator or a k-means cluster function.
7. The method of claim 1, wherein in S1, the step of collecting the trace data of the host to be tested further comprises filtering trace data irrelevant to the intrusion behavior and removing nodes with the same attribute characteristics.
8. The method of claim 1, wherein nodes in the traceability graph are used to characterize data objects of hosts under test, the data objects including processes, files, sockets, and pipes.
9. A trace-graph oriented host intrusion detection system for performing the method of any one of claims 1-8, the system comprising: The traceability map construction module is used for collecting traceability data of the host to be tested so as to construct a traceability map representing user behaviors; The role mapping module is used for mapping the nodes in the traceability graph into roles, constructing a node characteristic matrix of which characteristic vectors are used for representing attribute characteristics, structural characteristics and interaction relations among the nodes in the traceability graph, and mapping the nodes with similar characteristic vectors in the node characteristic matrix into the same role; the random walk module is used for carrying out attribute time sequence random walk with attention and generating attribute time sequence random walk sequences with the current node v i as a starting point and the length L: X i is the feature vector of node v i ; Representing a function mapping the node to a role, wherein, And is also provided with , Is a node With the next node Edges connected with each other; All edges in the tracing graph; Sequence numbers in the wander sequence; Representing nodes The creation time of the edge is earlier than the node Creation time of the edge between; the current node moves to neighbor nodes corresponding to different roles with different probabilities, wherein the probabilities are attention parameters among the roles and are used for reflecting importance among the roles; And the anomaly detection module is used for converting the attribute time sequence random walk sequence into an embedded vector so as to extract the features of the traceability graph and carrying out intrusion anomaly detection.
10. A computer readable storage medium, comprising a stored computer program, which when executed by a processor, controls a device in which the computer readable storage medium is located to perform the trace-oriented host intrusion detection method according to any one of claims 1 to 8.

Description

Tracing graph-oriented host intrusion detection method, system and storage medium Technical Field The invention belongs to the technical field of network security, and particularly relates to a method, a system and a storage medium for detecting host intrusion oriented to a traceability map. Background Intrusion detection technology is one of the core technologies in the field of network security, and it determines whether there is abnormal behavior violating security policies or is under attack in a system or network by analyzing and using information (e.g., network traffic, host logs, etc.) collected from a computer system, computer network. Intrusion detection is an active protection technology, and has important significance for the security protection of networks and systems. Conventional host intrusion detection methods typically analyze and identify host intrusions using system calls or logs as data sources, however, these methods are easily bypassed by an attacker due to the defects of the data sources (system calls/logs) themselves, resulting in lower detection accuracy. The traceable data is used as a data source based on the traceable host intrusion detection, and the traceable data provides a complete structured view for events occurring on a system or a network by describing the system data objects (processes, files, sockets and pipelines) and complex dependency relations among the data objects, and is presented as a directed acyclic graph (traceable graph), so that the accuracy and the robustness of the detection are fundamentally enhanced. The traditional host intrusion detection method based on tracing selects a universal graph embedded model (such as DeepWalk, node2Vec, GRAPHSAGE) or a graph kernel algorithm (such as WEISFEILER-Lehma) to perform characterization learning on tracing graph data to obtain embedded vectors, performs intrusion detection on the data features based on the embedded vector characterization, performs shallow characterization learning only on tracing, has limited obtained data features, and has single characterization on tracing data features due to the fact that the embedded vectors obtained based on the universal model or algorithm gradually become diversified and become complicated in attack modes of an attacker. Disclosure of Invention Aiming at the defects and improvement demands of the prior art, the invention provides a method, a system and a storage medium for detecting host intrusion oriented to a traceability map, which aim to improve the accuracy and efficiency of intrusion detection based on traceability map data. In order to achieve the above object, according to a first aspect of the present invention, there is provided a method for detecting host intrusion for a traceability graph, including: s1, collecting traceability data of a host to be tested to construct a traceability graph representing user behaviors; s2, mapping the nodes in the traceability graph into roles, constructing a node characteristic matrix of which characteristic vectors are used for representing attribute characteristics, structural characteristics and interaction relations among the nodes in the traceability graph, and mapping the nodes with similar characteristic vectors in the node characteristic matrix into the same role; s3, performing attribute time sequence random walk with attention, and generating an attribute time sequence random walk sequence with length L by taking a current node v i as a starting point: x i is the feature vector of node v i, phi (x) represents the function mapping the node to a role, where, And is also provided withIs a nodeWith the next nodeE T is the edge of the tracing graph, t is the sequence number in the wandering sequence; Representing nodes The creation time of the edge is earlier than the nodeCreation time of the edge between; the current node moves to neighbor nodes corresponding to different roles with different probabilities, wherein the probabilities are attention parameters among the roles and are used for reflecting importance among the roles; S4, converting the attribute time sequence random walk sequence into an embedded vector to extract the features of the traceability graph, and detecting intrusion anomalies. Further, in S3, the attention parameters between the roles are represented by a role attention matrix M, and the acquiring manner of the role attention matrix M includes: S301, setting the transition probability as a current role attention matrix M; S302, performing attribute time sequence random walk with attention by using a current role attention matrix M to obtain an embedded vector e i of a current node v i and embedded vectors e N corresponding to all neighbor nodes of the current node; S303, updating the current role attention matrix M by adopting the embedded vectors e i and e N: The method comprises the steps of obtaining a normalization operator, wherein softmax represents the normalization operator, W i represents a