Search

CN-116702893-B - Embedding method, device and medium of network security knowledge graph

CN116702893BCN 116702893 BCN116702893 BCN 116702893BCN-116702893-B

Abstract

The application discloses an embedding method, device and medium of a network security knowledge graph, the method comprises the steps of constructing a training set of the knowledge graph embedding model, initializing entities in the training set into an embedding layer entity vector of the knowledge graph embedding model, embedding space information and time information into the embedding layer entity vector in a layering manner, obtaining a positive sample, carrying out negative sampling operation on the positive sample to obtain a negative sample, constructing a cross entropy loss function, training the knowledge graph embedding model through the positive sample and the negative sample, calculating the average reciprocal rank of the knowledge graph embedding model, selecting the knowledge graph embedding model with the highest score in the average reciprocal rank as a target model, and carrying out test evaluation on the target model according to link prediction to determine the efficiency of the target model. In the embodiment of the application, the time information and the space information of the network security knowledge can be mapped to the embedded layer, so that the representation of the network security knowledge with space-time characteristics in the embedded layer is realized.

Inventors

  • GU ZHAOQUAN
  • JIA YAN
  • Zhao Angxiao
  • FANG BINXING
  • Li runheng
  • XIE YUSHUN
  • LONG YU
  • Wei Songxuan
  • ZHOU KE

Assignees

  • 鹏城实验室
  • 四川亿览态势科技有限公司
  • 电子科技大学(深圳)高等研究院

Dates

Publication Date
20260505
Application Date
20230509

Claims (10)

  1. 1. The embedding method of the network security knowledge graph is characterized by comprising the following steps of: Constructing a training set of a knowledge graph embedding model, wherein each safety knowledge sample in the training set comprises a head entity, a relation, a tail entity, space information representing a network address and time information corresponding to the space information; Initializing the entities in the training set into an embedded layer entity vector of the knowledge graph embedded model, and embedding the space information and the time information into the embedded layer entity vector in a layering manner; acquiring small batches of positive samples from the training set, and performing negative sampling operation on the positive samples to obtain negative samples; constructing a cross entropy loss function according to a scoring function of the knowledge graph embedding model, and training the knowledge graph embedding model by a preset total round number H through the positive sample and the negative sample, wherein the knowledge graph embedding model obtained by current training is stored once every g rounds of training, H and g are positive integers and H is integer multiples of g; calculating and storing the obtained average reciprocal rank of the knowledge graph embedded model, and selecting the knowledge graph embedded model with the highest score in the average reciprocal rank as a target model; Performing test evaluation on the target model according to link prediction to determine the efficacy of the target model; the embedded layer entity vector comprises d dimensions, d is a positive integer, the space information comprises source address information and destination address information, and the step of embedding the space information and the time information into the embedded layer entity vector in a layering manner comprises the following steps: Embedding the source address information in the first md dimensions of the embedded layer entity vector, embedding the destination address information in the last md dimensions of the embedded layer entity vector, and embedding the time information in the pd-th to qd-th dimensions of the embedded layer entity vector, wherein m, p and q are all larger than 0 and smaller than 1, md, pd and qd are positive integers, md is smaller than or equal to pd, and pd is smaller than or equal to qd; the embedded layer entity vector is expressed as: Wherein, the In order to embed the layer entity vector, For the current dimension of the object to be processed, For the spatial information to be used in the present invention, For the information of the source address in question, For the purpose of the destination address information, In order for the time information to be in-formation, As the entity vector of dimension n, And The spatial activation function and the temporal activation function respectively, And The spatial weight parameter and the temporal weight parameter, respectively.
  2. 2. The embedding method according to claim 1, wherein hierarchically embedding the spatial information and the temporal information into the embedding layer entity vector comprises: Selecting a first number of dimensions from d dimensions of the embedded layer entity vector to embed the spatial information; Selecting a second number of dimensions from d dimensions of the embedded layer entity vector to embed the time information; wherein the dimension in which the spatial information is embedded is different from the dimension in which the temporal information is embedded.
  3. 3. The embedding method according to claim 2, wherein the selecting a first number of dimensions from d dimensions of the embedding layer entity vector to embed the spatial information comprises: selecting half of the first number of dimensions to embed the source address information; the destination address information is embedded for the remaining half of the first number of dimensions.
  4. 4. The embedding method according to claim 1, wherein hierarchically embedding the spatial information and the temporal information into the embedding layer entity vector comprises: Dividing d dimensions of the embedded layer entity vector into four parts according to parameters m, p and q, wherein the first part is embedding the source address information in the [1, md ] dimension, the second part is keeping static characteristics of the dimension in the (md, pd ], the third part is embedding the time information in the (pd, qd) dimension, the fourth part is embedding the destination address information in the [ qd, d ] dimension, and qd= (1-m) d+1.
  5. 5. The embedding method of claim 2, wherein the spatial information includes source address information and destination address information, and wherein prior to hierarchically embedding the spatial information and the temporal information into the embedded layer entity vector, the embedding method further comprises: Dividing the source address and the destination address into four address division fields according to the separator of the IP address under the condition that the source address information and the destination address information are IPv4 addresses; The time information is divided into three time division fields according to a time format of year, month and day.
  6. 6. The embedding method of claim 5, wherein the selecting a first number of dimensions from d dimensions of the embedding layer entity vector to embed the spatial information comprises: Respectively embedding four address segmentation fields in the selected first number of dimensions; adding the embedded entity vectors which are in the same dimension and belong to the same IP address; the selecting a second number of dimensions from d dimensions of the embedded layer entity vector to embed the time information includes: respectively embedding three time division fields in the selected second number of dimensions; and adding the embedded entity vectors which are in the same dimension and belong to the same time information.
  7. 7. The embedding method according to claim 1, wherein the constructing a training set of knowledge-graph embedding models includes: Numbering each safety knowledge sample to form an entity-numbering dictionary and a relation-numbering dictionary; The performing negative sampling operation on the positive sample to obtain a negative sample includes: And randomly extracting entities from an entity-number dictionary to replace head entities and tail entities of the obtained small batches of positive samples to obtain two types of negative samples, wherein the number of each type of negative samples is acquired according to a preset proportion, and the preset proportion is the proportion of the number of the negative samples to the number of the positive samples.
  8. 8. The embedding method of claim 1, wherein the performing a test evaluation on the target model based on link prediction to determine the efficacy of the target model comprises: Constructing a test set and an entity set of the knowledge graph embedding model, wherein each safety knowledge fact in the test set comprises a head entity, a relation, a tail entity, space information and time information, and the entity set comprises all entities in the knowledge graph; Respectively replacing the head entity of each safety knowledge fact in the test set with all the entities in the entity set to obtain a first candidate query dictionary, and respectively replacing the tail entity of each safety knowledge fact in the test set with all the entities in the entity set to obtain a second candidate query dictionary; Searching and obtaining an embedded vector corresponding to the safety knowledge facts in the target model according to the first candidate query dictionary and the second candidate query dictionary for the safety knowledge facts in the test set in the test process; and calculating the average reciprocal rank and the comprehensive rank Hits@n of the embedded representation of the target model under the test set to determine the effectiveness of the target model.
  9. 9. An embedding device of a network security knowledge graph, comprising at least one processor and a memory for communication connection with the at least one processor, wherein the memory stores instructions executable by the at least one processor, the instructions being executable by the at least one processor to enable the at least one processor to perform the embedding method according to any one of claims 1 to 8.
  10. 10. A computer-readable storage medium storing computer-executable instructions for causing a computer to perform the embedding method according to any one of claims 1 to 8.

Description

Embedding method, device and medium of network security knowledge graph Technical Field The present application relates to the field of knowledge graph application technologies, and in particular, to a method, an apparatus, and a medium for embedding a network security knowledge graph. Background Knowledge graph is a semantic network with graph structure, and is widely welcomed and applied in recent years due to its good structure and semantic expression capability. The general knowledge graph is generally composed of knowledge in the form of triples (head entity, relation between head entity and tail entity, expressed as (h, r, t)), and by constructing the knowledge graph, not only dynamic development rules are presented for different fields, but also valuable references are provided for discipline research and engineering application. In an MDATA (Multi-dimensional Data Association AND INTELLIGENT ANALYSIS, multidimensional data association and intelligent analysis model) model, the improved triples are adopted to associate time and space information of knowledge, so that the method is suitable for a network security knowledge graph, but the knowledge is required to be used for generating value, and the knowledge is also required to be embedded into a vector, so that the calculation of the knowledge is realized. The current mainstream time sequence knowledge embedding method mostly utilizes a graph neural network, learns the structure information of graphs and the neighbor information of nodes under different time through a convolutional neural network and a attention mechanism, and realizes updating of the nodes by depending on the cyclic neural network, so that embedding work is completed, but due to the large quantity of parameters and lack of utilization of space information, the model is difficult to sense and respond to the change of network situation in time. Disclosure of Invention The embodiment of the application provides an embedding method, an embedding device and a medium of a network security knowledge graph, which map time information and space information of network security knowledge to an embedding layer to realize the representation of the network security knowledge with space-time characteristics in the embedding layer. In a first aspect, an embodiment of the present application provides a method for embedding a network security knowledge graph, including: Constructing a training set of a knowledge graph embedding model, wherein each safety knowledge sample in the training set comprises a head entity, a relation, a tail entity, space information representing a network address and time information corresponding to the space information; Initializing the entities in the training set into an embedded layer entity vector of the knowledge graph embedded model, and embedding the space information and the time information into the embedded layer entity vector in a layering manner; acquiring small batches of positive samples from the training set, and performing negative sampling operation on the positive samples to obtain negative samples; constructing a cross entropy loss function according to a scoring function of the knowledge graph embedding model, and training the knowledge graph embedding model by a preset total round number H through the positive sample and the negative sample, wherein the knowledge graph embedding model obtained by current training is stored once every g rounds of training, H and g are positive integers and H is integer multiples of g; calculating and storing the obtained average reciprocal rank of the knowledge graph embedded model, and selecting the knowledge graph embedded model with the highest score in the average reciprocal rank as a target model; and performing test evaluation on the target model according to the link prediction to determine the efficacy of the target model. In some embodiments, the embedded layer entity vector comprises d dimensions, d being a positive integer, and the hierarchically embedding the spatial information and the temporal information into the embedded layer entity vector comprises: Selecting a first number of dimensions from d dimensions of the embedded layer entity vector to embed the spatial information; Selecting a second number of dimensions from d dimensions of the embedded layer entity vector to embed the time information; wherein the dimension in which the spatial information is embedded is different from the dimension in which the temporal information is embedded. In some embodiments, the spatial information includes source address information and destination address information, and the selecting a first number of dimensions from d dimensions of the embedded layer entity vector to embed the spatial information includes: selecting half of the first number of dimensions to embed the source address information; the destination address information is embedded for the remaining half of the first number of dimensions. In some em