KR-20260067000-A - Method for Detecting Malicious Elements in PDFs through AI-Based Multidimensional Analysis

KR20260067000AKR 20260067000 AKR20260067000 AKR 20260067000AKR-20260067000-A

Abstract

The present invention relates to a method for detecting malicious elements within a PDF through artificial intelligence-based multidimensional analysis. The method according to the present invention comprises: a) a step of preparing the PDF file for analysis in a state suitable for analysis by verifying its integrity, damage, and encryption status through a handler module when the PDF file is input; b) a step of extracting text, images, JavaScript, and URLs from the PDF file for analysis; c) a step of detecting malicious elements by performing signature pattern detection and personal information detection on the extracted text; d) a step of detecting malicious elements by analyzing pixel patterns and color distributions on the extracted images and extracting text through OCR; e) a step of detecting malicious elements including deobfuscation and malicious behavior prediction on the extracted JavaScript; f) a step of detecting malicious elements including static analysis, dynamic analysis, phishing detection, signature pattern detection, drop file analysis, and malicious behavior prediction on the extracted URL; g) a step of detecting malicious elements by executing the PDF file for analysis in an isolated environment through a dynamic analysis module to induce potential malicious behavior. h) c) to g) includes a step of determining whether the PDF file to be analyzed is ultimately malicious based on the results of detecting malicious elements.

Inventors

김현목
김영중
박선홍

Assignees

(주)모니터랩

Dates

Publication Date: 20260512
Application Date: 20241105

Claims (10)

In a method for detecting malicious elements within a PDF through artificial intelligence-based multidimensional analysis executed on a computing device, a) When a PDF file to be analyzed is input, a step of checking the integrity, damage, and encryption status of the PDF file to be analyzed through a handler module to prepare it for analysis; b) A step of extracting text, images, JavaScript, and URLs from the PDF file to be analyzed above; c) A step of detecting malicious elements by performing signature pattern detection and personal information detection on the extracted text above; d) a step of analyzing pixel patterns and color distributions for the extracted image and extracting text through OCR to detect malicious elements; e) A step of detecting malicious elements, including deobfuscation and malicious behavior prediction, for the extracted JavaScript above; f) A step of performing malicious element detection, including static analysis, dynamic analysis, phishing detection, signature pattern detection, drop file analysis, and malicious behavior prediction, on the extracted URLs above; g) a step of detecting malicious elements by executing the PDF file to be analyzed in an isolated environment through a dynamic analysis module to induce potential malicious behavior; h) A method for detecting malicious elements in a PDF through artificial intelligence-based multidimensional analysis, comprising the step of determining whether the final malicious element detection result of the PDF file to be analyzed is based on the malicious element detection result in steps c) through g).
In Article 1, A method for detecting malicious elements within a PDF through AI-based multidimensional analysis performed using an AI prediction model, wherein the prediction of malicious behavior regarding the above JavaScript and the prediction of malicious behavior regarding the above URL are performed.
In Paragraph 2, The AI prediction model pre-trained for predicting malicious behavior regarding the above JavaScript outputs the result of predicting the risk level for the extracted JavaScript, and A method for detecting malicious elements within a PDF through AI-based multidimensional analysis, wherein an AI prediction model pre-trained for predicting malicious behavior regarding the above URL outputs a result predicting the risk level for the extracted URL.
In Paragraph 3, A method for detecting malicious elements in PDF through AI-based multidimensional analysis, characterized in that the above-mentioned AI prediction model is at least one of CNN (Convolutional Neural Network), RNN (Recurrent Neural Network), LSTM (Long Short-Term Memory), GRU (Gated Recurrent Unit), BERT (Bidirectional Encoder Representations from Transformers), Transformer model, Autoencoder, Random Forest, Gradient Boosting, One-Class SVM (Support Vector Machine), GNN (Graph Neural Network), and DBN (Deep Belief Network).
In Paragraph 4, A method for detecting malicious elements in a PDF through artificial intelligence-based multidimensional analysis, wherein the results of each malicious element detection performed in steps c) through g) above are collected, and the final determination of whether the target PDF file is malicious is made based on the results of analyzing the correlation between each collected malicious element detection result.
In Paragraph 5, A method for detecting malicious elements in a PDF through artificial intelligence-based multidimensional analysis, characterized by determining the PDF file to be analyzed as malicious when each of the collected malicious element detection results shows a consistent malicious judgment, and marking the PDF file to be analyzed as a suspicious file when each of the detection results does not match or is uncertain.
In Paragraph 6, A step of storing the above-mentioned final determination result of whether it is malicious; and A method for detecting malicious elements in a PDF through artificial intelligence-based multidimensional analysis, further comprising the step of providing a final conclusion regarding whether the PDF file to be analyzed is malicious based on the stored results.
In Paragraph 7, A method for detecting malicious elements in a PDF through AI-based multidimensional analysis that stops further analysis and returns an error if the PDF file is damaged or in an incorrect format in step a) above.
A computer-readable recording medium having a program for executing on a computer a method for detecting malicious elements in a PDF through artificial intelligence-based multidimensional analysis according to any one of paragraphs 1 through 8.
As a computing device, processor; and Memory for storing instructions or programs executable by the processor; comprising, A computing device that executes the method for detecting malicious elements within a PDF through artificial intelligence-based multidimensional analysis according to any one of claims 1 to 8 when the above instruction or program is executed by the processor.

Description

Method for Detecting Malicious Elements in PDFs through AI-Based Multidimensional Analysis The present invention relates to a method for detecting malicious elements within a PDF, and more specifically, to a method for detecting malicious elements within a PDF through artificial intelligence-based multidimensional analysis. In modern society, PDF files have established themselves as one of the widely used standard file formats for exchanging various document types. However, recently, there has been a surge in cases where PDF files are exploited as a means to distribute malware or carry out phishing attacks. While these malicious PDF files appear like ordinary files, they pose a risk of performing malicious actions on the system or stealing sensitive information from users through elements embedded within them, such as JavaScript code, URL links, images, or text. For this reason, malicious PDF file detection technology is considered a critical element in cybersecurity. Existing malicious PDF detection methods have primarily relied on signature-based detection. Signature-based detection stores predefined malicious code patterns in a database and determines maliciousness by checking whether those patterns are found in an input PDF file. While this approach can be useful for detecting known malicious elements registered in the signature database, it has limitations in detecting new types of malicious elements or modified malicious code. This is because PDF files have a flexible structure, allowing malicious users to easily alter their internal structure. Consequently, detecting malicious files solely through signatures is not an effective security measure. Furthermore, various threat elements can be embedded within PDF files. For instance, a PDF file consists of multiple pages and objects (text, images, JavaScript code, URLs, etc.), and each object may independently contain malicious activity. Existing single-factor analysis methods make it difficult to comprehensively analyze these diverse elements, and there is a high probability of missing threats embedded in specific parts of the PDF file. Due to these limitations, there is an increasing number of cases where cyber attacks bypass existing security systems by exploiting the potential for PDF file modification. Therefore, a new technical approach is required to multidimensionally analyze various elements of malicious PDF files and overcome the limitations of existing signature-based detection methods. FIG. 1 is a configuration diagram of a computing device executing a method for detecting malicious elements in a PDF through artificial intelligence-based multidimensional analysis according to an embodiment of the present invention. FIG. 2 is a flowchart provided to explain the operation of a method for detecting malicious elements in a PDF through artificial intelligence-based multidimensional analysis according to an embodiment of the present invention. Then, with reference to the attached drawings, embodiments of the present invention will be described in detail so that those skilled in the art can easily implement the present invention. The terms used in this specification are for describing embodiments and are not intended to limit the invention. In this specification, the singular form includes the plural form unless specifically stated otherwise in the text. The terms "comprises" and/or "comprising" used in this specification do not exclude the presence or addition of one or more other components in addition to the components mentioned. Throughout the specification, the same reference numerals refer to the same components, and "and/or" includes each of the mentioned components and all combinations of one or more. Although terms such as "first," "second," etc., are used to describe various components, these components are not limited by these terms. These terms are used merely to distinguish one component from another. Therefore, the first component mentioned below may be the second component within the technical scope of the invention. In this specification, the term "computing device" includes all various devices capable of performing computational processing and providing results to a user. For example, a computing device may include desktop PCs, notebook computers, and server computers, as well as smartphones, tablet PCs, cellular phones, PCS phones (Personal Communication Service phones), synchronous/asynchronous IMT-2000 (International Mobile Telecommunication-2000) mobile terminals, Palm PCs (Personal Computers), and Personal Digital Assistants (PDAs). FIG. 1 is a configuration diagram of a computing device executing a method for detecting malicious elements in a PDF through artificial intelligence-based multidimensional analysis according to an embodiment of the present invention. Referring to FIG. 1, the computing device (100) may include memory (110) and a processor (120). The memory (110) can store at least one instruction and/or program. Addit