Search

US-20260129070-A1 - AUTOMATED GENERATION OF BEHAVIORAL SIGNATURES FOR MALICIOUS WEB CAMPAIGNS

US20260129070A1US 20260129070 A1US20260129070 A1US 20260129070A1US-20260129070-A1

Abstract

Techniques for automated generation of behavioral signatures for malicious web campaigns are disclosed. In some embodiments, a system/process/computer program product for automated generation of behavioral signatures for malicious web campaigns includes crawling a plurality of web sites associated with a malware campaign; determining discriminating repeating attributes (e.g., behavior related attributes, which can be determined using dynamic analysis, and static related attributes, which can be determined using static analysis) as malware campaign related footprint patterns, wherein the discriminating repeating attributes are not associated with benign web sites; and automatically generating a human-interpretable malware campaign signature based on the malware campaign related footprint patterns.

Inventors

  • William Russell Melicher
  • Oleksii Starov
  • Shresta Bellary Seetharam
  • Shaown Sarker

Assignees

  • PALO ALTO NETWORKS, INC.

Dates

Publication Date
20260507
Application Date
20250731

Claims (20)

  1. 1 . A system, comprising: a memory; and a processor coupled to the memory and configured to: crawl a plurality of web sites associated with a malware campaign for behavior related and static related attributes; determine behavior related and static related discriminating repeating attributes as malware campaign related footprint patterns, wherein the discriminating repeating attributes are associated with more than one of the crawled web sites and are not associated with benign web sites, wherein the behavior related discriminating repeating attributes are based on browser application programming interface (API) calls from dynamic execution, wherein the discriminating repeating attributes include behavior related attributes and static related attributes; and automatically generate a transparently human-interpretable malware campaign signature represented in plain text based on the malware campaign related footprint patterns.
  2. 2 . The system of claim 1 , wherein the discriminating repeating attributes include behavior related attributes associated with dynamic content that affects the user's experience visiting the web site and static related attributes associated with static content that is visible to the user during the visit.
  3. 3 . The system of claim 1 , wherein the discriminating repeating attributes include behavior related attributes associated with dynamic content that affects the user's experience when visiting the web site and determined using dynamic analysis and static related attributes associated with static content that is visible to the user during the visit and determined using static analysis.
  4. 4 . The system of claim 1 , wherein a browser environment is instrumented for tracking dynamic behaviors, and wherein the browser environment is configured to crawl a plurality of uncategorized, labeled Uniform Resource Links (URLs) to generate malware campaign signatures.
  5. 5 . The system of claim 1 , wherein the malware campaign is associated with a set of domains used by an attacker for a malicious activity including phishing, Uniform Resource Link (URL) delivered malware, and/or other malicious related activity.
  6. 6 . The system of claim 1 , wherein the malware campaign is associated with a set of web sites used by an attacker for a malicious activity, and wherein the malicious activity includes phishing, web site delivered malware, and/or other malicious related activity.
  7. 7 . The system of claim 1 , wherein the human-interpretable malware campaign signature detects that another web site is associated with the malware campaign even if the another web site includes content that is encrypted and/or obfuscated.
  8. 8 . The system of claim 1 , wherein the automatically generated human-interpretable malware campaign signature identifies another malicious web site belonging to the malware campaign by applying the malware campaign signature on both labeled and unlabeled Uniform Resource Links (URLs) associated with the plurality of web sites.
  9. 9 . The system of claim 1 , wherein the processor is further configured to periodically update the human-interpretable malware campaign signature for the malware campaign.
  10. 10 . The system of claim 1 , wherein the processor is further configured to generate a new human-interpretable malware campaign signature for a new malware campaign.
  11. 11 . The system of claim 1 , wherein the processor is further configured to distribute the malware campaign signature to a firewall, wherein the firewall is configured to apply the malware campaign signature based on monitored network traffic activity.
  12. 12 . A method, comprising: crawling a plurality of web sites associated with a malware campaign for behavior related and static related attributes; determining behavior related and static related discriminating repeating attributes as malware campaign related footprint patterns, wherein the discriminating repeating attributes are associated with more than one of the crawled web sites and are not associated with benign web sites, wherein the behavior related discriminating repeating attributes are based on browser application programming interface (API) calls from dynamic execution, wherein the discriminating repeating attributes include behavior related attributes and static related attributes; and automatically generating a transparently human-interpretable malware campaign signature represented in plain text based on the malware campaign related footprint patterns.
  13. 13 . The method of claim 12 , wherein the discriminating repeating attributes include behavior related attributes determined using dynamic analysis and static related attributes determined using static analysis.
  14. 14 . The method of claim 12 , wherein a browser environment is instrumented for tracking dynamic behaviors, and wherein the browser environment is configured to crawl a plurality of uncategorized, labeled Uniform Resource Links (URLs) to generate malware campaign signatures.
  15. 15 . The method of claim 12 , wherein the malware campaign is associated with a set of domains used by an attacker for a malicious activity including phishing, Uniform Resource Link (URL) delivered malware, and/or other malicious related activity.
  16. 16 . The method of claim 12 , wherein the malware campaign is associated with a set of web sites used by an attacker for a malicious activity, and wherein the malicious activity includes phishing, web site delivered malware, and/or other malicious related activity.
  17. 17 . The method of claim 12 , wherein the automatically generated human-interpretable malware campaign signature identifies another malicious web site belonging to the malware campaign by applying the malware campaign signature on both labeled and unlabeled Uniform Resource Links (URLs) associated with the plurality of web sites.
  18. 18 . The method of claim 12 , wherein the human-interpretable malware campaign signature detects that another web site is associated with the malware campaign even if the another web site includes content that is encrypted and/or obfuscated.
  19. 19 . A computer program product embodied in a non-transitory computer readable medium and comprising computer instructions for: crawling a plurality of web sites associated with a malware campaign for behavior related and static related attributes; determining behavior related and static related discriminating repeating attributes as malware campaign related footprint patterns, wherein the discriminating repeating attributes are associated with more than one of the crawled web sites and are not associated with benign web sites, wherein the behavior related discriminating repeating attributes are based on browser application programming interface (API) calls from dynamic execution, wherein the discriminating repeating attributes include behavior related attributes and static related attributes; and automatically generating a transparently human-interpretable malware campaign signature represented in plain text based on the malware campaign related footprint patterns.
  20. 20 . The computer program product of claim 19 , wherein the malware campaign is associated with a set of domains used by an attacker for a malicious activity including phishing, Uniform Resource Link (URL) delivered malware, and/or other malicious related activity.

Description

CROSS REFERENCE TO OTHER APPLICATIONS This application is a continuation of U.S. patent application Ser. No. 18/104,058 entitled AUTOMATED GENERATION OF BEHAVIORAL SIGNATURES FOR MALICIOUS WEB CAMPAIGNS filed Jan. 31, 2023, which claims priority to U.S. Provisional Patent Application No. 63/305,967 entitled AUTOMATED GENERATION OF BEHAVIORAL SIGNATURES FOR MALICIOUS WEB CAMPAIGNS filed Feb. 2, 2022, each of which is incorporated herein by reference for all purposes. BACKGROUND OF THE INVENTION Malware is a general term commonly used to refer to malicious software (e.g., including a variety of hostile, intrusive, and/or otherwise unwanted software). Malware can be in the form of code, scripts, active content, and/or other software. Example uses of malware include disrupting computer and/or network operations, stealing proprietary information (e.g., confidential information, such as identity, financial, and/or intellectual property related information), and/or gaining access to private/proprietary computer systems and/or computer networks. Unfortunately, as techniques are developed to help detect and mitigate malware, nefarious authors find ways to circumvent such efforts. Accordingly, there is an ongoing need for improvements to techniques for identifying and mitigating malware. BRIEF DESCRIPTION OF THE DRA WINGS Various embodiments of the invention are disclosed in the following detailed description and the accompanying drawings. FIG. 1 illustrates an example of an environment in which malicious applications (“malware”) are detected and prevented from causing harm. FIG. 2A illustrates an embodiment of a data appliance. FIG. 2B is a functional diagram of logical components of an embodiment of a data appliance. FIG. 3 illustrates an example of logical components that can be included in a system for analyzing samples. FIG. 4A illustrates a logical signature and matching process in accordance with some embodiments. FIG. 4B illustrates an example dynamic logical predicate from extended VisibleV8 in accordance with some embodiments. FIG. 4C illustrates a predicate extraction pipeline from crawled web pages in accordance with some embodiments. FIG. 4D illustrates Table 1 that includes extracted static and dynamic predicates during data collection in accordance with some embodiments. FIG. 4E illustrates Table 2 that includes crawled URLs and collected predicates by verdict and Table 3 that includes collected predicates by type in accordance with some embodiments. FIG. 4F illustrates an algorithm for ordering predicates to construct a set of repeating discriminative predicates in accordance with some embodiments. FIG. 4G illustrates an algorithm for generating signatures from the set of discriminative repeating predicates in accordance with some embodiments. FIG. 4H is a graph of the minimum URL count Threshold, Umin, in accordance with some embodiments. FIG. 41 is a graph of the minimum predicate count Threshold, Pmin, in accordance with some embodiments. FIG. 4J illustrates Table 4 that indicates the time taken to perform each stage of the signature generation and application in accordance with some embodiments. FIG. 4K illustrates a Table 5 that includes a breakdown of predicate types in generated signatures in accordance with some embodiments. FIG. 4L illustrates Table 6 that includes a breakdown of labeled predicates used for each process by percentile slices in accordance with some embodiments. FIG. 4M illustrates a Table 7 that includes the top ten campaign signatures with the highest toxicity in accordance with some embodiments. FIG. 4N illustrates a Table 8 that includes a breakdown of manual analysis of URLs not flagged by VirusTotal in accordance with some embodiments. FIG. 40 illustrates a Table 9 that provides an impact of detected URLs from unlabeled data over enterprise customer request logs (e.g., since September 2021) in accordance with some embodiments. FIG. 4P provides a Listing 1 that displays a shortened version of our generated signature that successfully identified a clickjacking campaign in accordance with some embodiments. FIG. 4Q provides a Listing 2 that displays a shortened version of our generated signature that successfully identified a JavaScript (JS) malware campaign (e.g., manipulating browsing history) in accordance with some embodiments. FIG. 4R illustrates a Table 10 that includes HTML tags and corresponding attributes for extraction of HTML URL and Domain predicates in accordance with some embodiments. FIG. 5 is a screen diagram that shows an example of one of these URLs associated with a clickjacking campaign. FIG. 6 is a flow diagram of a process for automated generation of behavioral signatures for malicious web campaigns in accordance with some embodiments. FIG. 7 is another flow diagram of a process for automated generation of behavioral signatures for malicious web campaigns in accordance with some embodiments. DETAILED DESCRIPTION The invention can be implemented in numero