CN-122020643-A - Log data security behavior extraction method and device and electronic equipment
Abstract
The application relates to the technical field of graph machine learning, in particular to a method, a device and electronic equipment for extracting safety behavior of log data, wherein the method comprises the steps of obtaining the log data of an operating system and generating a safety behavior graph dataset containing behavior graphs and tag information; training a unwrapping module, an attention module, a reconstructor and a discriminator in the environment unwrapping heterogeneous graph neural network by utilizing a graph data set, unwrapping the behavior graph into a label related sub-graph and an environment sub-graph by the unwrapping module, constructing the label related sub-graph into a graph level representation by the attention module, generating a label guide graph by the reconstructor by using label information and the environment sub-graph, determining joint distribution probability of the two sub-graphs by the discriminator according to the graph level representation and the label guide graph, determining safety behavior characteristics of log data according to the probability, and finally extracting the safety behavior characteristics of the log data. Therefore, the problem that the log data is easy to generate semantic gaps when the root cause of the attack is mined, and the interpretation and the robustness are difficult to be achieved by the solution is solved.
Inventors
- ZHAO XIBIN
- NI ZHIBIN
Assignees
- 清华大学
Dates
- Publication Date
- 20260512
- Application Date
- 20260114
Claims (10)
- 1. The method for extracting the security behavior of the log data is characterized by comprising the following steps of: Acquiring log data of an operating system, and generating a graph dataset of the security behavior of the operating system according to the log data, wherein the graph dataset comprises a behavior graph and tag information; Training an environment unwrapping heterogeneous graph neural network by using the graph dataset, wherein the environment unwrapping heterogeneous graph neural network comprises an unwrapping module, an attention module, a reconstructor and a discriminator, the unwrapping module unwraps the behavior graph into a label-related sub-graph and an environment sub-graph, the attention module constructs the label-related sub-graph into a graph level representation, the reconstructor generates a label guide graph from the label information and the environment sub-graph, and the discriminator determines joint distribution probability of the label-related sub-graph and the environment sub-graph according to the graph level representation and the label guide graph and determines safety behavior characteristics of log data according to the joint distribution probability; and extracting the safety behavior characteristics of the log data by using the trained environment unwrapped heterograph neural network.
- 2. The method of claim 1, wherein the unwrapping module comprises a first sub-graph extractor and a second sub-graph extractor, wherein the first sub-graph extractor extracts the tag-related sub-graph from the behavior graph and the second sub-graph extractor extracts the environment sub-graph from the behavior graph.
- 3. The method for extracting security actions of log data according to claim 2, wherein the extraction flows of the first sub-graph extractor and the second sub-graph extractor are the same, and the extraction flows include: Encoding the behavior diagram as node features; Generating a splicing vector according to the node characteristics and the edge characteristics of the behavior diagram; mapping the spliced vector into an intermediate vector, and Random attention is sampled in the distribution, and a heterogeneous graph is generated according to the random attention and the intermediate vector, wherein the heterogeneous graph comprises the label-related sub-graph and the environment sub-graph.
- 4. The method for extracting security actions of log data according to claim 1, wherein the attention module rewrites a message passing process of the environmental unwrapped heterogeneous graph neural network on a label-related sub-graph, weights nodes of the label-related sub-graph according to the rewritten message passing process to obtain node characteristics, and generates the graph level representation according to the node characteristics and importance values of the nodes.
- 5. The method of claim 4, wherein the expression of the rewritten message passing procedure is: the expression of the importance value of the node is: the expression of the graph level representation is: Wherein, the Representing a diagram level representation.
- 6. The method according to claim 1, wherein the reconstructor obtains a node embedding vector according to a message passing process of the environment unwrapping heterogeneous graph neural network on an environment subgraph, reconstructs an adjacency matrix according to the node embedding vector, and reconstructs the tag information and the environment subgraph into the tag guide graph according to the adjacency matrix and a reconstruction loss.
- 7. The method for extracting the security behavior of the log data according to claim 6, wherein the message passing process of the environment unwrapping heterogeneous graph neural network on the environment subgraph is as follows: The expression of the adjacency matrix is: the expression of the reconstruction loss is: 。
- 8. The method of claim 1, wherein the arbiter and the main network of the environmentally unspooling heterograph neural network are trained together, wherein during training, the main network updates the main network parameters with a self-loss function, the arbiter updates the arbiter parameters based on the self-loss function, wherein, The loss function of the main network loss function is: the loss function of the discriminator is as follows: 。
- 9. A security action extraction device for log data, comprising: the generating module is used for acquiring log data of the operating system and generating a graph dataset of the safety behavior of the operating system according to the log data, wherein the graph dataset comprises a behavior graph and tag information; the training module is used for training an environment unwrapping heterogeneous graph neural network by utilizing the graph data set, the environment unwrapping heterogeneous graph neural network comprises an unwrapping module, an attention module, a reconstructor and a discriminator, the unwrapping module unwraps the behavior graph into a label-related sub-graph and an environment sub-graph, the attention module constructs the label-related sub-graph into a graph level representation, the reconstructor generates a label guide graph from the label information and the environment sub-graph, and the discriminator determines joint distribution probability of the label-related sub-graph and the environment sub-graph according to the graph level representation and the label guide graph and determines safety behavior characteristics of log data according to the joint distribution probability; And the extraction module is used for extracting the safety behavior characteristics of the log data by utilizing the trained environment unwrapped heterogram neural network.
- 10. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor executing the program to implement the method of security action extraction of log data according to any one of claims 1 to 8.
Description
Log data security behavior extraction method and device and electronic equipment Technical Field The present application relates to the field of graph machine learning technologies, and in particular, to a method and an apparatus for extracting security behavior of log data, and an electronic device. Background Large enterprise systems are facing increasingly complex globalization attacks. When coping with an attack, security analysts need to mine root causes and damage ranges of the attack from complex and huge audit logs, which brings semantic gap problems and greatly increases the manual work burden. Behavior abstraction (Behavior Abstraction, BA) has proven to be an effective solution to the above problem by abstracting the audit log into multiple behavior graphs and identifying similar behavior graphs therein. With the support of behavior abstraction, similar behavior patterns can be categorized into human-understandable behavior categories, so that an analyst need only examine a small number of representative behavior patterns related to threats, without having to review all logs, thereby significantly reducing human effort. The behavior abstraction manual-based method provides interpretability through explicit pattern matching, but has weaker generalization, the learning-based method almost completely ignores the interpretability, is fragile when facing noise and resisting attack, is easy to be disturbed to cause representation distortion, has insufficient generalization capability among different attack strategies, and is easy to be bypassed by evading attack. Therefore, the log data is easy to generate semantic gaps when the root cause of the attack is mined, and the solution is difficult to combine the interpretability and the robustness. Disclosure of Invention The application provides a method and a device for extracting security behaviors of log data and electronic equipment, which are used for solving the problems that semantic gaps are easy to generate when the root cause of attack is mined in the log data, and the solution is difficult to consider the problems of interpretability, robustness and the like. The embodiment of the first aspect of the application provides a safe behavior extraction method of log data, which comprises the following steps of obtaining log data of an operating system, generating a graph dataset of safe behaviors of the operating system according to the log data, training an environment unwrapping heterogeneous graph neural network by utilizing the graph dataset, wherein the environment unwrapping heterogeneous graph neural network comprises an unwrapping module, an attention module, a reconstructor and a discriminator, the unwrapping module unwraps the behavior graph into a label related sub-graph and an environment sub-graph, the attention module constructs the label related sub-graph into a graph level representation, the reconstructor generates a label guide graph by the label information and the environment sub-graph, the discriminator determines joint distribution probability of the label related sub-graph and the environment sub-graph according to the graph level representation and the label guide graph, determines safe behavior characteristics of the log data according to the joint distribution probability, and extracting the safe behavior characteristics of the log data by utilizing the trained environment unwrapping heterogeneous graph neural network. According to one embodiment of the application, the unwrapping module comprises a first sub-graph decimator that decimates the tag-related sub-graph from the behavioral graph and a second sub-graph decimator that decimates the environmental sub-graph from the behavioral graph. According to one embodiment of the application, the extraction flows of the first sub-graph extractor and the second sub-graph extractor are the same, the extraction flows comprise the steps of encoding the behavior graph into node features, generating a spliced vector according to the node features and edge features of the behavior graph, mapping the spliced vector into an intermediate vector, and generating a sub-graph from the node features and the edge features of the behavior graphThe random attention is sampled in the distribution, and a heterogeneous graph is generated according to the attention and the intermediate vector of the random sampling, wherein the heterogeneous graph comprises a label related sub-graph and an environment sub-graph. According to one embodiment of the application, the attention module rewrites the message passing process of the environment unwrapping heterogeneous graph neural network on the label-related subgraph, weights the nodes of the label-related subgraph according to the rewritten message passing process to obtain node characteristics, and generates a graph level representation according to the node characteristics and the importance value of the nodes. According to one embodiment of the applicati