US-20260129062-A1 - MACHINE LEARNING FOR PRIORITIZING TRAFFIC IN MULTI-PURPOSE INLINE CLOUD ANALYSIS (MICA) TO ENHANCE MALWARE DETECTION
Abstract
Techniques for machine learning for prioritizing traffic in multi-purpose inline cloud analysis (MICA) to enhance malware detection are disclosed. In some embodiments, a system, a process, and/or a computer program product for machine learning for prioritizing traffic in multi-purpose inline cloud analysis (MICA) to enhance malware detection includes processing a set of data for network security analysis to extract a file; determining that the file is to be offloaded to a cloud security entity for security processing based at least in part on a prefilter model that is implemented as a machine learning model; forwarding the file to the cloud security entity using a multi-purpose inline cloud analysis (MICA) channel; and performing an action in response to receiving a verdict from the cloud security entity.
Inventors
- Sheng Yang
- Curtis Leland Carmony
- Ali Islam
- Kashyap Tavarekere Ananthapadmanabha
- William Redington Hewlett, II
Assignees
- PALO ALTO NETWORKS, INC.
Dates
- Publication Date
- 20260507
- Application Date
- 20251219
Claims (20)
- 1 . A system for generating a multi-purpose inline cloud analysis (MICA) prefilter model for execution on an inline security entity, comprising: one or more processors configured to: collect samples, wherein the samples include a plurality of files of a predetermined file type; extract features from one sample of the samples; determine a ground truth verdict and file size for the one sample of the samples; relabel the samples; split train, test, and validation (TTV) data based on traffic bytes; train the MICA prefilter model with the one sample of the samples, wherein the one sample is weighted; and determine a threshold value for the MICA prefilter model; and a memory coupled to the one or more processors and configured to provide the one or more processors with instructions.
- 2 . The system of claim 1 , wherein the predetermined file type includes a portable executable (PE) file.
- 3 . The system of claim 1 , wherein the determining of the ground truth verdict and the file size comprises to: check the one sample against an allow list for known good files.
- 4 . The system of claim 1 , wherein the MICA prefilter model implements a machine learning technique.
- 5 . The system of claim 1 , wherein the MICA prefilter model implements a machine learning technique, and wherein the machine learning technique includes one or more of the following: random forest, linear regression, support vector machine, naive Bayes, logistic regression, K-nearest neighbors, decision trees, gradient boosted decision trees, K-means clustering, hierarchical clustering, density-based spatial clustering of applications with noise (DBSCAN) clustering, and/or principal component analysis.
- 6 . The system of claim 1 , wherein the one or more processors are further configured to: determine a session count for the one sample.
- 7 . The system of claim 6 , wherein the determining of the threshold value comprises to: determine the threshold value based on the session count and the file size of the one sample.
- 8 . A method for generating a multi-purpose inline cloud analysis (MICA) prefilter model for execution on an inline security entity, comprising: collecting samples, wherein the samples include a plurality of files of a predetermined file type; extracting features from one sample of the samples; determining a ground truth verdict and file size for the one sample of the samples; relabeling the samples; splitting train, test, and validation (TTV) data based on traffic bytes, wherein a session count for the one sample is received as input; training the MICA prefilter model with the one sample of the samples, wherein the one sample is weighted; and determining a threshold value for the MICA prefilter model.
- 9 . The method of claim 8 , wherein the predetermined file type includes a portable executable (PE) file.
- 10 . The method of claim 8 , wherein the determining of the ground truth verdict and the file size comprises: checking the one sample against an allow list for known good files.
- 11 . The method of claim 8 , wherein the MICA prefilter model implements a machine learning technique.
- 12 . The method of claim 8 , wherein the MICA prefilter model implements a machine learning technique, and wherein the machine learning technique includes one or more of the following: a random forest technique, a linear regression technique, a support vector machine technique, a naive Bayes technique, a logistic regression technique, a K-nearest neighbors technique, decision trees technique, gradient boosted decision trees technique, a K-means clustering technique, a hierarchical clustering technique, a density-based spatial clustering of applications with noise (DBSCAN) clustering, and/or a principal component analysis technique.
- 13 . The method of claim 8 , further comprising: determine a session count for the one sample.
- 14 . The method of claim 13 , wherein the determining of the threshold value comprises: determining the threshold value based on the session count and the file size of the one sample.
- 15 . A system for generating a multi-purpose inline cloud analysis (MICA) prefilter model for execution on an inline security entity, comprising: one or more processors configured to: collect samples, wherein the samples include a plurality of files of a predetermined file type; means for extracting features from one sample of the samples; means for determining a ground truth verdict and file size for the one sample of the samples; means for relabeling the samples; means for splitting train, test, and validation (TTV) data based on traffic bytes, wherein a session count for the one sample is received as input; means for training the MICA prefilter model with the one sample of the samples, wherein the one sample is weighted; and means for determining a threshold value for the MICA prefilter model; and a memory coupled to the one or more processors and configured to provide the one or more processors with instructions.
- 16 . The system of claim 15 , wherein the predetermined file type includes a portable executable (PE) file.
- 17 . The system of claim 15 , wherein the MICA prefilter model implements a machine learning technique.
- 18 . The system of claim 15 , wherein the MICA prefilter model implements a machine learning technique, and wherein the machine learning technique includes one or more of the following: a random forest technique, a linear regression technique, a support vector machine technique, a naive Bayes technique, a logistic regression technique, a K-nearest neighbors technique, decision trees technique, gradient boosted decision trees technique, a K-means clustering technique, a hierarchical clustering technique, a density-based spatial clustering of applications with noise (DBSCAN) clustering, and/or a principal component analysis technique.
- 19 . The system of claim 15 , wherein the one or more processors are further configured to: means for determining a session count for the one sample.
- 20 . The system of claim 15 , wherein the means for determining of the threshold value comprises to: determine the threshold value based on the session count and the file size of the one sample.
Description
CROSS REFERENCE TO OTHER APPLICATIONS This application is a continuation of U.S. patent application Ser. No. 18/385,844, entitled MACHINE LEARNING FOR PRIORITIZING TRAFFIC IN MULTI-PURPOSE INLINE CLOUD ANALYSIS (MICA) TO ENHANCE MALWARE DETECTION filed Oct. 31, 2023 which is incorporated herein by reference for all purposes. BACKGROUND OF THE INVENTION Nefarious individuals attempt to compromise computer systems in a variety of ways. As one example, such individuals may embed or otherwise include malicious software (“malware”) in email attachments and transmit or cause the malware to be transmitted to unsuspecting users. When executed, the malware compromises the victim's computer. Some types of malware will instruct a compromised computer to communicate with a remote host. For example, malware can turn a compromised computer into a “bot” in a “botnet,” receiving instructions from and/or reporting data to a command and control (C&C) server under the control of the nefarious individual. One approach to mitigating the damage caused by malware is for a security company (or other appropriate entity) to attempt to identify malware and prevent it from reaching/executing on end user computers. Another approach is to try to prevent compromised computers from communicating with the C&C server. Unfortunately, malware authors are using increasingly sophisticated techniques to obfuscate the workings of their software. As one example, some types of malware use Domain Name System (DNS) queries to exfiltrate data. Accordingly, there exists an ongoing need for improved techniques to detect malware and prevent its harm. Techniques for detecting malware may be performed locally by a firewall or via a cloud service. BRIEF DESCRIPTION OF THE DRAWINGS Various embodiments of the invention are disclosed in the following detailed description and the accompanying drawings. FIG. 1 is a block diagram of an environment in which a malicious traffic is detected or suspected in accordance with some embodiments. FIG. 2A illustrates an embodiment of a data appliance. FIG. 2B is a functional diagram of logical components of an embodiment of a data appliance. FIG. 3 is a block diagram of an environment in which a security platform offloads services to a cloud system in accordance with some embodiments. FIG. 4 illustrates a processing of data on a data plane of a security platform in accordance with some embodiments. FIG. 5 is a block diagram of a high-level architecture for machine learning for prioritizing traffic in multi-purpose inline cloud analysis (MICA) to enhance malware detection in accordance with some embodiments. FIG. 6 is a flow diagram training the MICA prefilter model in accordance with some embodiments. FIG. 7 is a flow diagram of applying the MICA prefilter model for forwarding decisions executed locally on a security platform in accordance with some embodiments. FIGS. 8A-8B are tables of evaluation metrics of experiment results. FIG. 9 is a flow diagram of a process for applying the MICA channel for offloading from an inline security entity to a cloud security entity in accordance with some embodiments. FIG. 10 is a flow diagram of a process for machine learning for prioritizing traffic in multi-purpose inline cloud analysis (MICA) to enhance malware detection in accordance with some embodiments. DETAILED DESCRIPTION The invention can be implemented in numerous ways, including as a process; an apparatus; a system; a composition of matter; a computer program product embodied on a computer readable storage medium; and/or a processor, such as a processor configured to execute instructions stored on and/or provided by a memory coupled to the processor. In this specification, these implementations, or any other form that the invention may take, may be referred to as techniques. In general, the order of the steps of disclosed processes may be altered within the scope of the invention. Unless stated otherwise, a component such as a processor or a memory described as being configured to perform a task may be implemented as a general component that is temporarily configured to perform the task at a given time or a specific component that is manufactured to perform the task. As used herein, the term ‘processor’ refers to one or more devices, circuits, and/or processing cores configured to process data, such as computer program instructions. A detailed description of one or more embodiments of the invention is provided below along with accompanying figures that illustrate the principles of the invention. The invention is described in connection with such embodiments, but the invention is not limited to any embodiment. The scope of the invention is limited only by the claims and the invention encompasses numerous alternatives, modifications and equivalents. Numerous specific details are set forth in the following description in order to provide a thorough understanding of the invention. These details are provided for the purpose of exa