CN-115796147-B - Information association degree calculation method applied to network security threat information

CN115796147BCN 115796147 BCN115796147 BCN 115796147BCN-115796147-B

Abstract

The invention discloses an information association degree calculation method applied to network security threat information, which comprises threat information map construction, extraction of structural semi-structural data based on an STIX format, article map extraction, entity normalization processing, word level comparison, entity association degree calculation and comprehensive scoring module, wherein the association degree calculation is comprehensively scored, and the association degree of two information articles is recorded in a database. The invention extracts the hidden association between articles, extracts the information core content through knowledge extraction, threatens the information knowledge graph to infer the potential association of the core content to calculate the association degree of two pieces of information, comprises normalization processing of different names of entities, word level matching, entity association degree calculation, comprehensive scoring module, and directly matches and scores different names of entities through triples, and the graph identifies the three dimensions of scoring of different names of entities, malware and vulnerability association degree and calculates the information association degree.

Inventors

WU QIONG
FANG CHENG
ZHAI LIDONG
LV ZHI
ZHAO YAO
SUN PU

Assignees

中科大数据研究院

Dates

Publication Date: 20260508
Application Date: 20221207

Claims (8)

1. The information association degree calculation method applied to the network security threat information is characterized by comprising the following steps of: based on the STIX format, extracting structured semi-structured data, including CVE, CPE, ATT & CK datasets; Secondly, article map extraction: Performing named entity identification based on BERT-BiLSTM-CRF, acquiring an entity, and performing relation extraction through a Pipeline mode and an R-BERT model; thirdly, entity normalization processing: in the threat information knowledge graph, the entity types with different names of the same entity comprise malicious organizations and malicious software, firstly, malicious organizations and malicious software entities in the article information triples are extracted, secondly, corresponding class entities and other names of the entities in the threat information graph are traversed, finally, the entities are positioned, and the corresponding entities in the article information triples are converted into standard entities; Converting all aliases into { aliases: standard entities }, converting all malicious organization aliases and malicious organization entities extracted from articles into vectors based on the aliases data in the threat intelligence map, and calculating cosine similarity between each aliase and entity of the two groups of data; after the cosine similarity calculation is completed, a malicious organization alias with the highest cosine similarity with a malicious organization entity is found, and if the cosine similarity is larger than 0.9, the article malicious organization entity is converted into a standard entity according to the K_V relation of { alias: standard entity }; The malicious software performs normalization processing through the same processing flow, and converts the extracted malicious software entity into a standard entity of the malicious software; Normalizing the content extracted by the article through cosine similarity and threat information knowledge graph, thereby obtaining potential association between entities; Fourth, word level comparison: word level comparison is carried out after keyword normalization processing is completed; Firstly, matching keywords of two articles one by one, wherein each time a group of keywords is matched, scoring is increased by one score, and the group of keywords are removed from the keyword group; fifthly, calculating entity association degree: After the entity which can be completely matched is removed in the previous step, the rest entities perform association degree calculation to identify potential relations between informations; In the entity association degree calculation process, aiming at malicious software, potential association degrees among malicious software entities are discovered by calculating commonalities of attack modes realized by the malicious software, after normalization processing is completed, the malicious software entities in articles can be directly associated with the malicious software entities in threat intelligence knowledge graphs, and then the association degrees of the malicious software are calculated by the attack modes which can be realized by the malicious software in the threat intelligence knowledge graphs.
2. The method of claim 1, further comprising comprehensively scoring a total score = normalized entity matching log 1+malware association 1+vulnerability correlation infrastructure matching log 0.1+infrastructure provider matching log 0.1+malicious organization association 1, comprehensively scoring the correlation calculation, and recording the correlation of two information articles in a database.
3. The intelligence association degree calculating method according to claim 1, wherein in the article map extraction process, named entity recognition is performed based on the BERT-BiLSTM-CRF by using a BERT pre-training model to obtain corresponding word vectors, then inputting the word vectors into BiLSTM layers to further extract context relations of texts, and finally obtaining classification results through the CRF layers.
4. The intelligence association degree calculating method according to claim 3, wherein the input is a word sequence, the output is a predictive label corresponding to each word, and the entity is obtained by classification extraction.
5. The intelligence association degree calculation method according to claim 1, wherein the relation extraction is performed through a Pipeline mode and an R-BERT model after the entities are acquired, the BERT model is applied to relation classification, special marks are inserted before and after the positions of target entities, text is input into the BERT to perform fine-tuning so as to identify the positions of the two target entities and transmit information to the BERT model, then the positions of the two target entities are found in output word vectors of the BERT model, and word vectors and sentence codes are used as input of multi-layer neural network classification, so that semantic information of sentences and the two target entities can be captured to better adapt to relation classification tasks.
6. The method for calculating the relevance of information according to claim 1, wherein in the process of calculating the relevance of entities, aiming at the vulnerability, the potential relevance between the vulnerability entities is discovered by judging whether the vulnerability exists in the same infrastructure, the relation between the vulnerability and hardware, software and an operating system can be extracted from threat information knowledge graphs, the hardware, software and the operating system related to the vulnerability in two information articles are extracted from threat information knowledge graphs and matched, and the score is increased by 0.1 score when the vulnerability is matched to a pair of the same hardware, software and operating system.
7. The intelligence association computing method according to claim 1, wherein in the entity association computing process, potential association between infrastructures is discovered by judging whether the infrastructure entities are the same vendor products for the infrastructure portion, provider relationships of infrastructure associations can be extracted from threat intelligence knowledge maps, potential relationships of infrastructures are judged by the same provider, and scores of each pair of matched vendors are increased by 0.1 score.
8. The information association degree calculating method according to claim 1, wherein in the entity association degree calculating process, potential association degrees among malicious organizations are discovered by using common attack means and malicious software of the malicious organizations aiming at the malicious organizations, relationships among the malicious organizations, the attack means and the malicious software can be extracted from threat information knowledge maps, and the association degrees of the malicious organizations are calculated by matching the same attack means and the malicious software of the two malicious organizations.

Description

Information association degree calculation method applied to network security threat information Technical Field The invention belongs to the technical field of network security, and particularly relates to an information association degree calculation method applied to network security threat information. Background At present, the network security problem has become a problem that units such as governments, administrative authorities, public institutions, enterprises, non-profit organizations and the like must face. Against increasing network threats, the system is a necessary measure for each unit to protect unit software and hardware. Before protecting digital assets of a unit, the unit needs to know what to protect, which represents the importance of network security threat intelligence. However, in the face of the vast amount of cyber-security threat information, it is very important to obtain an overall view of cyber-security threat events. The information of a single information source only describes the event from one angle, and the overall appearance of the event information is difficult to display, so that the calculation of the association degree of the information to integrate the associated information becomes an essential ring for analyzing the network security event. The main stream of the related information integration modes are two, the first is manual integration, the network security threat information content point is multi-faceted and wide, the information quantity is huge, the timeliness requirement is high, and the requirement cannot be met only by manual inspection. The second is a machine learning-based method, which includes content recommendation-based, keyword-based, and the like. In this way, the related content judgment is mostly performed based on the relevance of the text content direct comparison, however, the methods of special relevance in the network security threat information are difficult to relate. For example, a threat organization is called Sofacy, and meanwhile, the organization is also called multiple names such as APT 28, fancy bear, fantasy bear and the like, different names can be used by different information articles, and a simple method for using word or text association degree is difficult to process the association of the same entity with completely different fields. Meanwhile, the association generated for the bridge by the third entity between the article entities is difficult to be processed by the conventional method. For example, two information articles refer to two vulnerabilities CVE-2018-0001 and CVE-2020-1234 respectively, and the two vulnerabilities are literally not associated, but can be known to exist in the same software according to threat information knowledge maps, and the conventional method of implicit association cannot be identified. Disclosure of Invention Aiming at the problem that the conventional method about implicit association cannot be identified in the currently mainstream association information integration mode, the invention provides an information association degree calculation method applied to network security threat information, which is used for extracting the implicit association between articles, extracting information core content through knowledge extraction, and reasoning the potential association of the core content by threat information knowledge graph to calculate the association degree of two pieces of information. The invention solves the technical problems by adopting a scheme that the information association degree calculating method applied to the network security threat information comprises the following steps. First, threat information map construction: Based on the STIX format, structured semi-structured data is extracted, including without limitation CVE, CPE, ATT & CK, etc. datasets. Secondly, article map extraction: Named entity identification is carried out based on BERT-BiLSTM-CRF, and relationship extraction is carried out through a Pipeline mode and an R-BERT model after the entity is obtained. Thirdly, entity normalization processing: In the threat information knowledge graph, the entity types with different names of the same entity comprise malicious organizations and malicious software, firstly, malicious organizations and malicious software entities in the article information triples are extracted, secondly, corresponding class entities and other names of the entities in the threat information graph are traversed, finally, the entities are positioned, and the corresponding entities in the article information triples are converted into standard entities. Based on the alias data in the threat information map, converting all aliases into { aliases: standard entities }, converting all malicious organization aliases and malicious organization entities extracted from articles into vectors, calculating cosine similarity between each alias and the entity of the two groups of data, finding out the