US-12619722-B2 - Cyber threat information processing apparatus, cyber threat information processing method, and storage medium storing cyber threat information processing program

US12619722B2US 12619722 B2US12619722 B2US 12619722B2US-12619722-B2

Abstract

A cyber threat information processing method, a cyber threat information processing apparatus, and a storage medium storing a cyber threat information processing program may analyze and process an executable file, perform clustering to generate one or more clusters, and determine similarity with a cluster of another user based on characteristic information of the executable file.

Inventors

Ki Hong Kim

Assignees

SANDS LAB Inc.

Dates

Publication Date: 20260505
Application Date: 20230206
Priority Date: 20220209

Claims (11)

1 . A cyber threat information processing method comprising: receiving a request for analysis of an executable file from a first user; extracting a set of assembly code for a function of the executable file according to the request, wherein the set of assembly code includes opcode which corresponds to the function and a piece of disassembled code which corresponds to an operand; converting the extracted set of assembly code and a unit of code extracted from a second file previously requested by the first user into a first hash value; converting the first hash value into first N-gram data, wherein the N is a natural number; performing ensemble machine learning based on the first N-gram data and evaluating a first similarity between the extracted set of assembly code and the unit of code extracted from the second file previously requested by the first user; generating a first cluster of code blocks for the first user based on the first similarity, wherein the first cluster of code blocks includes a portion of the extracted set of assembly code; evaluating a second similarity between the extracted set of assembly code for the first cluster of code blocks and a second cluster of code blocks for a second user, wherein the second cluster of code blocks includes a portion of assembly code extracted from a third file from the second user; and providing information related to the executable file to the second user when the second similarity is greater than a preset threshold value.
2 . The cyber threat information processing method according to claim 1 , wherein the extracted set of assembly code is extracted by dissembling the executable file to obtain dissembled code and reconstructing the dissembled code.
3 . The cyber threat information processing method according to claim 1 , wherein the first cluster of code blocks is generated when the first similarity is greater than or equal to a threshold value.
4 . The cyber threat information processing method according to claim 1 , wherein the evaluating of the second similarity comprises: converting the extracted set of assembly code and the second cluster of code blocks for the second user into a second hash value; converting the second hash value into second N-gram data; and performing ensemble machine learning on block-unit code of the second N-gram data.
5 . The cyber threat information processing method according to claim 1 , wherein the information related to the executable file includes the second similarity.
6 . A cyber threat information processing apparatus comprising: a database configured to store one or more clusters for each user; and a processor configured to analyze and process an input executable file, wherein the processor is configured to: receive a request for analysis of an executable file from a first user; extract a set of assembly code for a function of the executable file according to the request, wherein the set of assembly code includes opcode which corresponds to the function and a piece of disassembled code which corresponds to an operand; convert the extracted set of assembly code and a unit of code extracted from a second file previously requested by the first user into a first hash value; convert the first hash value into first N-gram data, wherein the N is a natural number; perform ensemble machine learning based on the first N-gram data and evaluating a first similarity between the extracted set of assembly code and the unit of code extracted from the second file previously requested by the first user; generate a first cluster of code blocks for the first user based on the first similarity, wherein the first cluster of code blocks includes a portion of the extracted set of assembly code; evaluate a second similarity between the extracted set of assembly code for the first cluster of code blocks and a second cluster of code blocks for a second user, wherein the second cluster of code blocks includes a portion of assembly code extracted from a third file from the second user; and provide information related to the executable file to the second user when the second similarity is greater than a preset threshold value.
7 . The cyber threat information processing apparatus according to claim 6 , wherein the processor is configured to disassemble the executable file to obtain dissembled code, and reconstruct the dissembled code to extract the assembly code.
8 . The cyber threat information processing apparatus according to claim 6 , wherein the processor is configured to generate the first cluster of code blocks when the first similarity is greater than or equal to a threshold value.
9 . The cyber threat information processing apparatus according to claim 6 , wherein the processor is configured to: convert the extracted set of assembly code and the second cluster of code blocks for the second user into a second hash value; convert the second hash value into second N-gram data; and perform ensemble machine learning on block-unit code of the second N-gram data.
10 . The cyber threat information processing apparatus according to claim 6 , wherein the information related to the executable file includes the second similarity.
11 . A non-transitory storage medium that stores a computer-readable program, the non-transitory storage medium storing one or more programs for processing cyber threat information, the one or more programs including instructions executed by one or more programs of a cyber threat information processing apparatus, and the one or more programs causing the cyber threat information processing apparatus to: receive a request for analysis of an executable file from a first user; extract a set of assembly code for a function of the executable file by analyzing the executable file according to the request, wherein the set of assembly code includes opcode which corresponds to the function and a piece of disassembled code which corresponds to an operand; convert the extracted set of assembly code and a unit of code extracted from a second file previously requested by the first user into a first hash value; convert the first hash value into first N-gram data, wherein the N is a natural number; perform ensemble machine learning based on the first N-gram data and evaluating a first similarity between the extracted set of assembly code and the unit of code extracted from the second file previously requested by the first user; generate a first cluster of code blocks for the first user based on the first similarity, wherein the first cluster of code blocks includes a portion of the extracted set of assembly code; evaluate a second similarity between the extracted set of assembly code for the first cluster of code blocks and a second cluster of code blocks of a second user, wherein the second cluster of code blocks includes a portion of assembly code extracted from a third file from the second user; and provide information related to the executable file to the second user when the second similarity is greater than a preset threshold value.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS This application claims the benefit of Korean Patent Application No. 10-2022-0017168, filed on Feb. 9, 2022, which is hereby incorporated by reference as if fully set forth herein. BACKGROUND OF THE INVENTION Field of the Invention The disclosed embodiments relate to a cyber threat information processing apparatus, a cyber threat information processing method, and a storage medium storing a cyber threat information processing program. Discussion of the Related Art The damage from cybersecurity threats, which are gradually becoming more sophisticated, centering on new or variant malware, has been increasing. In order to reduce such damage even a little and to respond at an early stage, countermeasure technology has been advancing through multi-dimensional pattern composition, various types of complex analysis, etc. However, recent cyber-attacks tend to increase day by day rather than being adequately responded to within a control range. These cyberattacks threaten finance, transportation, environment, health, etc. that directly affect lives of people beyond the existing information and communication technology (ICT) infrastructure. One of basic technologies to detect and respond to most existing cybersecurity threats is to create a database of patterns for cyberattacks or malware in advance, and utilize appropriate monitoring technologies where data flow is required. Existing technology has evolved based on a method of identifying and responding to threats when a data flow or code matching a monitored pattern is detected. Such conventional technology has an advantage of being able to rapidly and accurately perform detection when a data flow or code matches a previously secured pattern. However, the technology has a problem in that, in the case of a new or mutant threat for which a pattern is not secured or is bypassed, detection is impossible or it takes a significantly long time for analysis. The related art is focused on a method of advancing technology to detect and analyze malware itself even when artificial intelligence (AI) analysis is used. However, there is no fundamental technology to counter cybersecurity threats, and thus there is a problem in that it is difficult to address new malware or new variants of malware with this method alone, and there is a limitation. For example, there is a problem in that only the technology for detecting and analyzing previously discovered malware itself cannot address decoy information or fake information for deceiving a detection or analysis system thereof, and confusion occurs. In the case of mass-produced malware having enough data to be learned, characteristic information thereof can be sufficiently secured, and thus it is possible to distinguish whether code is malicious or a type of malware. However, in the case of advanced persistent threat (APT) attacks, which are made in relatively small numbers and attack precisely, since training data does not match in many cases, and targeted attacks make up the majority, even when the existing technology is advanced, there are limitations. In addition, conventionally, methods and expression techniques for describing malware, attack code, or cyber threats have differed depending on the position or analysis perspective of an analyst. For example, a method of describing malware and attack activity has not been standardized worldwide, and thus there has been a problem in that, even when the same incident or the same malware is detected, explanations of experts in the field are different, and thus confusion had occurred. Even a malware detection name has not been unified, and thus, for the same malicious file, it has been impossible to identify an attack performed correctly, or attacks have been differently organized. Therefore, there has been a problem in that identified attack techniques cannot be described in a normalized and standardized manner. A conventional malware detection and analysis method focuses on detection of malware itself, and thus has a problem in that, in the case of malware performing significantly similar malicious activity, when generating attackers are different, the attackers cannot be identified. In connection with the above problems, the conventional method has a problem in that it is difficult to predict a type of cyber threat attack occurring in the near future by such an individual case-focused detection method. SUMMARY OF THE INVENTION The present disclosure is to address at least the above-mentioned problems and/or disadvantages and to provide at least the advantages described below. Accordingly, an aspect of the present disclosure is to provide a cyber threat information processing apparatus, a cyber threat information processing method, and a storage medium storing a cyber threat information processing program capable of detecting and addressing malware not exactly matching data learned by AI and addressing a variant of malware. Another aspect of the