US-12627697-B2 - Cyber threat information processing apparatus, cyber threat information processing method, and storage medium storing cyber threat information processing program

US12627697B2US 12627697 B2US12627697 B2US 12627697B2US-12627697-B2

Abstract

Provided is a cyber threat information processing method including acquiring webpage data based on link information, and analyzing tag structure information of the webpage data, converting data included in a tag area of the webpage data into tag feature data according to the tag structure information, and training an AI model using the converted tag feature data to acquire cyber threat information of the data included in the tag area.

Inventors

Ki Hong Kim
Sung Eun Park
Min Jun CHOI
Hyun Jong LEE

Assignees

SANDS LAB Inc.

Dates

Publication Date: 20260512
Application Date: 20230410
Priority Date: 20221227

Claims (9)

1 . A cyber threat information processing method to provide a cyber intelligence service through Application Program Interface (API) interface, the cybersecurity threat information processing method comprising: acquiring webpage data based on link information, and analyzing tag structure information of the webpage data; converting hypertext markup language (HTML) data included in a tag area of the webpage data according to the tag structure information to extract tag feature data; detecting whether the HTML data in the webpage data is malicious on multiple layers using two or more detection techniques which include antivirus-based malicious pattern detections or signature-based malicious pattern detections; training an artificial intelligence (AI) model on the tag feature data to detect attack techniques of a malicious activity caused by the data included in the tag area, wherein multiple attack techniques are classified with multi-labeling by using a binary vector based on multi-labeling classification; acquiring cyber threat information on the malicious activity; and providing the cyber intelligence service through the API interface related to the webpage data with the attack techniques classified with the multi-labeling and an attack group of the malicious activity.
2 . The cyber threat information processing method according to claim 1 , wherein the tag structure information comprises a document object model (DOM) tree structure.
3 . The cyber threat information processing method according to claim 1 , wherein, when the HTML data included in the tag area of the webpage data is converted into the tag feature data, the cyber threat information processing method is applied to user-related data in a tag except for grammar included in the webpage data.
4 . A cyber threat information processing apparatus to provide a cyber intelligence service through Application Program Interface (API) interface, the cyber threat information processing apparatus comprising: a database configured to store webpage data; and a processor, wherein the processor: acquires the webpage data based on link information, and analyzes tag structure information of the webpage data; converts hypertext markup language (HTML) data included in a tag area of the webpage data according to the tag structure information to extract tag feature data; detects whether the HTML data in the webpage data is malicious on multiple layers using two or more detection techniques which include antivirus-based malicious pattern detections or signature-based malicious pattern detections; trains an AI model on the tag feature data to detect attack techniques of a malicious activity caused by the data included in the tag area, wherein multiple attack techniques are classified with multi-labeling by using a binary vector based on multi-labeling classification; acquires cyber threat information on the malicious activity; and provides the cyber intelligence service through the API interface related to the webpage data with the attack techniques classified with the multi-labeling and an attack group of the malicious activity.
5 . The cyber threat information processing apparatus according to claim 4 , wherein the tag structure information comprises a DOM tree structure.
6 . The cyber threat information processing apparatus according to claim 4 , wherein, when the HTML data included in the tag area of the webpage data is converted into the tag feature data, the cyber threat information processing apparatus is applied to user-related data in a tag except for grammar included in the webpage data.
7 . A non-transitory computer-readable storage medium storing a cyber threat information processing program to provide a cyber intelligence service through Application Program Interface (API) interface, the program executing computer instructions comprising: a module for acquiring webpage data based on link information, and analyzing tag structure information of the webpage data; a module for converting hypertext markup language (HTML) data included in a tag area of the webpage data into tag feature data according to the tag structure information to extract tag feature data; a module for detecting whether the HTML data in the webpage data is malicious on multiple layers using two or more detection techniques which include antivirus-based malicious pattern detections or signature-based malicious pattern detections; and a module for training an AI model on the tag feature data to detect attack techniques of a malicious activity caused by the data included in the tag area, wherein multiple attack techniques are classified with multi-labeling by using a binary vector based on multi-labeling classification, acquiring cyber threat information on the malicious, and providing the cyber intelligence service through the API interface related to the webpage data with the attack techniques classified with the multi-labeling and an attack group of the malicious activity.
8 . The non-transitory computer-readable storage medium according to claim 7 , wherein the tag structure information comprises a DOM tree structure.
9 . The non-transitory computer-readable storage medium according to claim 7 , wherein, when the HTML data included in the tag area of the webpage data is converted into the tag feature data, the cyber threat information processing program is applied to user-related data in a tag except for grammar included in the webpage data.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS This application claims the benefit of Korean Patent Application No. 10-2022-0185541, filed on Dec. 27, 2022, which is hereby incorporated by reference as if fully set forth herein. BACKGROUND OF THE INVENTION Field of the Invention The disclosed embodiments relate to a cyber threat information processing apparatus, a cyber threat information processing method, and a storage medium storing a cyber threat information processing program. Discussion of the Related Art The damage from cybersecurity threats, which are gradually becoming more sophisticated, centering on new or variant malware, has been increasing. In order to reduce such damage even a little and to respond at an early stage, countermeasure technology has been advancing through multi-dimensional pattern composition, various types of complex analysis, etc. However, recent cyber-attacks tend to increase day by day rather than being adequately responded to within a control range. These cyberattacks threaten finance, transportation, environment, health, etc. that directly affect lives of people beyond the existing information and communication technology (ICT) infrastructure. One of basic technologies to detect and respond to most existing cybersecurity threats is to create a database of patterns for cyberattacks or malware in advance, and utilize appropriate monitoring technologies where data flow is required. Existing technology has evolved based on a method of identifying and responding to threats when a data flow or code matching a monitored pattern is detected. Such conventional technology has an advantage of being able to rapidly and accurately perform detection when a data flow or code matches a previously secured pattern. However, the technology has a problem in that, in the case of a new or mutant threat for which a pattern is not secured or is bypassed, detection is impossible or it takes a significantly long time for analysis. The related art is focused on a method of advancing technology to detect and analyze malware itself even when artificial intelligence (AI) analysis is used. However, there is no fundamental technology to counter cybersecurity threats, and thus there is a problem in that it is difficult to address new malware or new variants of malware with this method alone, and there is a limitation. For example, there is a problem in that only the technology for detecting and analyzing previously discovered malware itself cannot address decoy information or fake information for deceiving a detection or analysis system thereof, and confusion occurs. In the case of mass-produced malware having enough data to be learned, characteristic information thereof can be sufficiently secured, and thus it is possible to distinguish whether code is malicious or a type of malware. However, in the case of advanced persistent threat (APT) attacks, which are made in relatively small numbers and attack precisely, since training data does not match in many cases, and targeted attacks make up the majority, even when the existing technology is advanced, there are limitations. In addition, conventionally, methods and expression techniques for describing malware, attack code, or cyber threats have differed depending on the position or analysis perspective of an analyst. For example, a method of describing malware and attack activity has not been standardized worldwide, and thus there has been a problem in that, even when the same incident or the same malware is detected, explanations of experts in the field are different, and thus confusion had occurred. Even a malware detection name has not been unified, and thus, for the same malicious file, it has been impossible to identify an attack performed correctly, or attacks have been differently organized. Therefore, there has been a problem in that identified attack techniques cannot be described in a normalized and standardized manner. A conventional malware detection and analysis method focuses on detection of malware itself, and thus has a problem in that, in the case of malware performing significantly similar malicious activity, when generating attackers are different, the attackers cannot be identified. In connection with the above problems, the conventional method has a problem in that it is difficult to predict a type of cyber threat attack occurring in the near future by such an individual case-focused detection method. SUMMARY OF THE INVENTION The present disclosure is to address at least the above-mentioned problems and/or disadvantages and to provide at least the advantages described below. Accordingly, an aspect of the present disclosure is to provide a cyber threat information processing apparatus, a cyber threat information processing method, and a storage medium storing a cyber threat information processing program capable of detecting and addressing malware not exactly matching data learned by AI and addressing a variant of malware. Another aspect of the