JP-7855636-B2 - Detection of obfuscated malicious code within document files

JP7855636B2JP 7855636 B2JP7855636 B2JP 7855636B2JP-7855636-B2

Inventors

チャン，ベンジャミン
サッパシー，ガナシャム

Assignees

ネットスコープ，インク．

Dates

Publication Date: 20260508
Application Date: 20240520
Priority Date: 20210224

Claims (20)

A system for providing detection of the presence of malicious code embedded in a document file, wherein the system Multiple system components, An action that receives a document file containing one or more embedded items, An action to analyze the document file and extract at least one of the embedded items, Includes a plurality of system components configured to process the document file and perform an action to determine whether at least one of the embedded items contains malicious code, The above processing is performed from at least one of the embedded items, This includes extracting obfuscation scoring features that indicate a known instance of malicious code obfuscated within at least one of the embedded items, The processing further includes inputting the obfuscation scoring features into a trained machine learning model to determine the likelihood that at least one of the embedded items in the document file contains malicious code.
The system further includes a network interface connected to a network, The system according to claim 1, wherein the received action occurs via the network interface.
The system according to claim 1, wherein the document file is a Microsoft Office document.
The system according to claim 1, wherein the document file is one of a Microsoft Word document, a Microsoft Excel document, or a Microsoft PowerPoint document.
The system according to claim 1, wherein the document file is one of a Word processing document, a spreadsheet document, or a presentation document.
The system according to claim 1, wherein the embedded item includes at least one of one or more macros and/or one or more OLE objects.
The system according to claim 6, wherein the macro includes a VBA macro.
The system according to claim 1, wherein the obfuscation scoring feature includes a macro-related feature, and the macro-related feature includes the use of at least one of CreateObject, Shell, FileSystem, URLDownloadToFile, CallByName, or DetectSandbox.
The system according to claim 1, wherein the obfuscation scoring feature includes an object-related feature, the object-related feature includes the use of at least one of the following Visual Basic For Applications (VBA) Object Linking and Embedding (OLE) features, and the Visual Basic For Applications (VBA) Object Linking and Embedding (OLE) feature includes the use of at least one of Shell, FileSystem, URLDownloadToFile, CallByName, or DetectSandbox.
The system according to claim 1, further comprising testing for malicious code within a document file while isolating the document file within a sandbox.
A method implemented by computer hardware resources to provide detection of the presence of malicious code embedded in a document file, An action that receives a document file containing one or more embedded items, An action to analyze the document file and extract at least one of the embedded items, The action includes processing the document file to determine whether at least one of the embedded items contains malicious code, The above processing is performed from at least one of the embedded items, This includes extracting obfuscation scoring features that indicate a known instance of malicious code obfuscated within at least one of the embedded items, The process further comprises inputting the obfuscation scoring features into a trained machine learning model to determine the likelihood that at least one of the embedded items in the document file contains malicious code.
The method according to claim 11, wherein the network interface is connected to a network, and the reception occurs via the network interface.
The method according to claim 11, wherein the document file is a Microsoft Office document.
The method according to claim 11, wherein the document file is one of a Microsoft Word document, a Microsoft Excel document, or a Microsoft PowerPoint document.
The method according to claim 11, wherein the document file is one of a Word processing document, a spreadsheet document, or a presentation document.
The method according to claim 11, wherein the embedded item includes at least one of one or more macros and/or one or more OLE objects.
The method according to claim 16, wherein the macro includes a VBA macro.
The method according to claim 11, wherein the obfuscation scoring feature includes a macro-related feature, and the macro-related feature includes the use of at least one of CreateObject, Shell, FileSystem, URLDownloadToFile, CallByName, or DetectSandbox.
The method according to claim 11, wherein the obfuscation scoring feature includes an object-related feature, the object-related feature includes the use of at least one of the following Visual Basic For Applications (VBA) Object Linking and Embedding (OLE) features, the Visual Basic For Applications (VBA) Object Linking and Embedding (OLE) feature includes the use of at least one of Shell, FileSystem, URLDownloadToFile, CallByName, or DetectSandbox.
The method according to claim 11, further comprising testing for malicious code within a document file while isolating the document file in a sandbox.

Description

Cross-references and embedding This patent application is a continuation of U.S. Patent Application No. 17/184,478 and claims priority to U.S. Nonprovisional Patent Application No. 17/572,548 (Agent Reference Number NSKO-1038-2), filed on January 10, 2022, entitled "Detection Of Malicious Code that is Obfuscated Within A Document File." This patent application also claims priority to U.S. Nonprovisional Patent Application No. 17/184,478 (Agent Reference Number NSKO 1038-1), filed on 24 February 2021 and issued on 11 January 2022 as U.S. Patent No. 11,222,112, titled "Signatureless Detection of Malicious MS Office Documents Containing Advanced Threats in Macros," which is referenced hereto. This patent application is also related to the "Signatureless" patent application filed on February 24, 2021. We claim priority from U.S. Non-Provisional Patent Application No. 17/184,502, titled "Detection of Malicious MS Office Documents Containing Embedded Ole Objects" (Agent Reference Number NSKO-1040-1). This patent application claims priority over all of the aforementioned documents, including all patent applications, publications, and patents contained herein. All of the aforementioned documents are also incorporated herein by reference in their entirety for any and all purposes. The disclosed technology relates to cybersecurity attacks and cloud-based security, and more specifically, to systems and methods for preventing malware attacks in which Microsoft Office documents act as a primary vector (method) for delivering malicious code in the form of macros and OLE objects. In addition, the disclosed technology relates to the detection of documents containing malicious macros and/or malicious OLE objects that do not have known signatures. In the context of the disclosed technology, "unsigned" refers to the detection of malicious macros and malicious OLE objects that do not have a previously established signature. Furthermore, the disclosed technology uses machine learning and feature engineering to predict the presence of malicious macros and (unnecessarily) malicious OLE objects in MS Office documents and other document types without requiring (that the malicious code was previously known). The subject matter considered in this section should not be assumed to be prior art simply as a result of its reference in this section. Similarly, the issues mentioned in this section, or issues related to the subject matter provided as background, should not be assumed to have been previously recognized in the prior art. The subject matter in this section merely represents different approaches and may, in itself, correspond to implementations of the claimed technology. In the six months leading up to the COVID-19 pandemic, Microsoft Office files accounted for only 5% of the Trojans and downloaders detected on the Netskope security cloud platform. Following COVID-19, particularly when EMOTET became active again, the percentage increased to nearly 45%, while portable executable files and other types decreased. Typically, Microsoft Excel files containing malicious links, VBA scripts, or PowerPoint scripts accounted for nearly three-quarters of the malicious Office documents detected, often containing malicious macros. Over 90% of malicious Office documents were distributed through cloud applications, compared to 50% for all other malicious file types combined. Malware code creators are becoming increasingly sophisticated in finding ways to deliver malware payloads to secure networks using MS Office. Documents containing malware employ advanced obfuscation techniques to conceal the malicious code, making them difficult to detect and often going unnoticed until they have caused significant damage to the network. To predict malicious content in MS Office files, it is necessary to detect obfuscated macros and OLE objects delivered through Microsoft Office document files that follow a VBA document object model, using feature engineering combined with machine learning. This problem is solved by feature engineering, as determined by network security systems such as Netskope, which trains supervised machine learning algorithms to predict and distinguish between legitimate and secure documents, suspicious documents potentially containing malware, and malicious documents undoubtedly containing malicious code. In this way, network devices can predict the presence of malware in document files without using known signatures against unknown malware. Additionally, the detection of malicious Office files can occur in near real-time, significantly improving network security while reducing the negative impact on system throughput by minimizing latency in network security processing time. In the drawings, similar reference letters generally refer to the same parts throughout different drawings. Furthermore, the drawings are not necessarily to scale; instead, the focus is generally on illustrating the principles of the disclosed technology. In the f