CN-121984777-A - Multi-source attack detection method for computing power cluster safety protection

CN121984777ACN 121984777 ACN121984777 ACN 121984777ACN-121984777-A

Abstract

The invention discloses a multi-source attack detection method for computing power cluster safety protection, which belongs to the technical field of host safety attack detection and comprises the steps of inputting an operating system audit log of a computing node in a computing power cluster to be identified into a pre-trained time sequence anomaly detection model to calculate an anomaly probability score of the operating system audit log, judging as an attack when the anomaly probability score exceeds a threshold value, otherwise judging as safety, acquiring the pre-trained time sequence anomaly detection model by the steps of collecting the operating system audit log of each computing node, analyzing an event in the log into a standardized five-tuple event and obtaining a global causal graph based on the event, screening active entity nodes in the global causal graph and performing multi-dimensional feature coding to generate a feature sequence, and inputting the feature sequence into the pre-constructed time sequence anomaly detection model to obtain a trained time sequence anomaly detection model. The invention solves the problem that the prior detection technology is difficult to restore the whole attack link crossing the host.

Inventors

XIE GUOHUA
XU XIAOLONG
ZHAO JUAN
WU YUANYUAN

Assignees

南京邮电大学

Dates

Publication Date: 20260505
Application Date: 20260313

Claims (10)

1. A multi-source attack detection method for computing power cluster security protection is characterized by comprising the following steps: Analyzing a plurality of events from an operating system audit log of a computing node in a computing power cluster to be identified, and constructing a feature sequence aiming at each event; Inputting the feature sequence into a pre-trained time sequence anomaly detection model, calculating an anomaly probability score of the feature sequence according to the time sequence anomaly detection model, judging that an event corresponding to the feature sequence is an attack event when the anomaly probability score exceeds a confidence threshold value, and generating an alarm entity set, otherwise, judging that the event is a safety event; the pre-trained time sequence anomaly detection model is obtained through the following steps: Collecting an operating system audit log of each computing node in the computing power cluster, wherein the operating system audit log comprises a plurality of events, and analyzing each event into a standardized quintuple event; Obtaining a plurality of scattered local causal graphs based on standardized five-tuple events, and correlating the plurality of scattered local causal graphs to obtain a global causal graph; the method comprises the steps of aggregating neighborhood characteristics of entity nodes in a global causal graph through a inductive graph neural network to generate topological entity characteristics; Active entity nodes in the global causal graph are screened, and events in space-time neighbors of the active entity nodes are extracted by utilizing an overlapping sliding window mechanism to obtain an event sequence; performing multidimensional feature coding on the event sequence to generate a feature sequence, wherein the multidimensional feature coding comprises coding by using topological entity features; Training a pre-constructed time sequence abnormality detection model through the characteristic sequence of each event to obtain a time sequence abnormality detection model after training.
2. The multi-source attack detection method for computing power cluster safety protection according to claim 1, wherein the pre-constructed time sequence anomaly detection model comprises an input module, a sparse self-attention mechanism module, a self-attention distillation mechanism module and an output module which are connected in sequence; the input module inputs the feature sequence to the sparse self-attention mechanism module, and the sparse self-attention mechanism module carries out weighted aggregation on the feature sequence to obtain The latter feature sequence will The latter feature sequences are input to a self-attention distillation mechanism module, which is based on And extracting high-dimensional features from the obtained feature sequence and outputting the high-dimensional features.
3. The multi-source attack detection method for computing power cluster security protection according to claim 2, wherein the sparse self-attention mechanism is obtained by computing by the following formula: , In the formula, The sequence of features is represented by a sequence of features, Representing characteristic sequences Is a sparse self-attention mechanism of (c), And A learnable weight matrix of queries, keys and values respectively, The classification function is represented as a function of the class, Representing a transpose of the key matrix, The dimension of the feature sequence is represented, For a sparse mask, The self-attention distillation mechanism is calculated by the following formula: , In the formula, The number of layers is indicated and, Representing the first produced after distillation operation The sequence of layer characteristics, Representing the maximum pooling operation and, The activation function is represented as a function of the activation, A one-dimensional convolution is represented, Represent the first Of layers of The latter feature sequence.
4. The multi-source attack detection method for computing force cluster security protection according to claim 2, wherein the anomaly probability score is obtained by calculating the following formula: , In the formula, The anomaly probability score is represented by a score, The activation function is represented as a function of the activation, Representing context vectors generated by global average pooling of high-dimensional features, The weight matrix is represented by a matrix of weights, Representing the bias parameters.
5. The multi-source attack detection method for computing force cluster security protection according to claim 1, wherein the expression of the five-tuple event is as follows: , In the formula, Representing a five-tuple event, The source entity of the event is represented, Representing the target entity of the object, The type of interaction behavior is indicated and, Representing event occurrence time stamps Representing the host identity to which the event belongs.
6. The multi-source attack detection method for computing force cluster security protection of claim 5, wherein the expression of the local causal graph is as follows: , In the formula, Representing the sequence number of the compute node in the compute cluster, Representing computing nodes Is used for the partial causal graph of (a), Representing computing nodes The set of all entity nodes that appear above, Representing computing nodes A collection of actions between the entity nodes that appear above, each action originating from an event in the operating system audit log, Representing an entity node type mapping function; representing the action type mapping function, A time stamp representing each action.
7. The multi-source attack detection method for computing force cluster security protection of claim 6, wherein the expression of the global causal graph is as follows: , wherein U represents a union, A global cause and effect graph is represented, Representing a set of computing nodes, Representing a result of crossing computing node edges A set of components, wherein computing node edges are spanned Obtained by the following steps: defining arbitrary compute nodes Five-tuple event on And arbitrary computing nodes Five-tuple event on If (if) , Then consider five-tuple event And five-tuple event There is a network interaction relationship across computing nodes, which is a five-tuple event And five-tuple event Adding a cross-computing node edge to two local causal graphs The expression is as follows: , Wherein, the And Representing computing nodes, respectively And computing node And a physical node thereon.
8. The multi-source attack detection method for computing force cluster security protection according to claim 6, wherein the neighborhood characteristics are obtained by calculation according to the following formula: , In the formula, Representing the calculated number of layers of the generalized graph neural network, Representing the node of the entity and, Is a solid node In the first place The neighborhood characteristics of the layer are represented, Is a solid node In the first place -A neighborhood feature representation of layer 1, Representing a non-linear activation function, Represent the first A matrix of weights for the layer that can be learned, The connection operation is represented by a number of steps, Represent the first A learnable aggregate function of the layers, Representing entity nodes Is used to determine the neighbor set of a neighbor, Representing entity nodes Is a neighbor in the set of neighbors, Represent the first -Layer 1 neighbor feature.
9. The multi-source attack detection method for computing force cluster security protection according to claim 8, wherein the active entity node is obtained by calculation according to the following formula: , In the formula, Representing a set of active entity nodes, Representing a set of entity nodes to be screened, As a function of the degree of egress of a physical node, As a function of the degree of penetration of the physical node, Is an entity node degree threshold.
10. The multi-source attack detection method for computing force cluster security protection according to claim 9, wherein the expression of the feature vector sequence is as follows: , In the formula, Sequence numbers representing events in the event sequence, the value range is 1 to 1 , The total number of events in the sequence of events, Representing the first in the sequence of events The number of events to be taken in a given event, Representing a sequence of feature vectors resulting from multi-dimensional feature encoding of the sequence of events, And Respectively represent the first Source entity in individual events And (d) Target entity in individual events The corresponding topological entity characteristics are used for the method, Represent the first The binary vector of actions in the event, Represent the first Time feature vectors corresponding to the events; Wherein, in the event sequence, the first Event by event The expression of (2) is as follows: , In the formula, Represent the first The source entity of the individual event(s), Represent the first The target entity of the individual event(s), Represent the first The interaction of the individual events is performed, Represent the first The interaction event time stamp in each event, Represent the first Metadata for each event.

Description

Multi-source attack detection method for computing power cluster safety protection Technical Field The invention relates to the technical field of host security attack detection, in particular to a multi-source attack detection method for computing power cluster security protection. Background With the rapid development of digital economy and artificial intelligence, computing power clusters are increasingly receiving attention as core infrastructure supporting high performance computing, big data analysis and big model training. Although existing anomaly detection technologies such as firewalls and Intrusion Detection Systems (IDS) are continuously upgraded, security still faces tremendous anomaly detection pressures when the computing power clustered environment faces a complex and advanced threat. In particular to advanced persistent threat attack, the attack behavior often uses the characteristics of numerous computing nodes (computing nodes are also called host nodes) in a computing power cluster and complex task scheduling, adopts a low-frequency slow-speed attack strategy, and is hidden in the computing nodes for a long time, so that the existing anomaly detection technology is difficult to effectively identify and block by transversely moving and stealing computing power resources or sensitive model data. In a power clustering scenario, anomaly detection faces more severe data challenges than a stand-alone environment. Because the cluster comprises a large number of computing nodes, and high-frequency cooperative communication and data exchange exist among the nodes, the audit log scale generated by a single-node operating system is exponentially increased. Moreover, the computing tasks in the clusters have high dynamic and burst performance, and normal service fluctuation and malicious attack behaviors are easily mixed in characteristics, so that the traditional boundary defense measures are gradually invalid, and therefore, high false alarm rate and false alarm rate are generated on the detection result of the security of the computing clusters. The existing anomaly detection technology mainly can be divided into the following classes according to the core mechanism, namely a signature-based detection method which relies on a predefined malicious code feature library to match attack behaviors, and the problem of maintenance hysteresis of the feature library is increasingly prominent along with iteration of an attack means, wherein zero-day vulnerability attack or variant attack on a computing power cluster cannot be identified although the response to known threats is rapid. Based on the detection method of the statistical anomaly, the deviation is found by establishing a statistical model of normal behavior, but in a computing power cluster, legal computing tasks often accompany sudden resource occupation peaks, the dynamic change causes the statistical model to be extremely easy to generate false alarm, and meanwhile, the malicious behavior is often submerged in massive benign computing operations and is difficult to distinguish by statistical characteristics in the face of 'parasitic' attack with extremely high concealment. The detection system based on the fixed rules can be disabled once an attacker changes the strategy of lateral movement, because attack links in a cluster environment usually span multiple nodes and steps are scattered, and all possible attack path combinations are difficult to be covered by manually written rules. In recent years, academia and industry have begun to move to research data source-based detection methods, namely, constructing a causal graph by using an operating system audit log, and detecting threats by analyzing causal dependencies among entities in the causal graph. Compared with simple log analysis, the causal graph-based method can provide more abundant context information and is beneficial to restoring the attack overview. However, existing causal graph-based detection techniques are mostly limited to single host environments, and lack efficient association mechanisms for computing force cluster cross-node communication, resulting in broken attack links when crossing nodes. In addition, when facing to a calculation task running for a long time, the traditional causal graph algorithm is difficult to effectively capture a long-distance time sequence dependency relationship, and is easy to cause a 'graph explosion', so that the detection efficiency is low, and the dual requirements of the computational power cluster on instantaneity and interpretability are difficult to meet. Disclosure of Invention The invention aims to provide a multi-source attack detection method for computing power cluster safety protection, which is used for carrying out association on a local causal graph according to a designed cross-computing node edge to obtain a global causal graph, and a constructed time sequence anomaly detection model is used for realizing effective associatio