Search

US-12621265-B2 - System and method for detecting dictionary-based DGA traffic

US12621265B2US 12621265 B2US12621265 B2US 12621265B2US-12621265-B2

Abstract

A system and method for detecting dictionary-based DGA traffic is provided. A domain name system (DNS) stream is received. The DNS stream is classified using a per domain dictionary domain generation algorithm (DGA) classifier to generate candidate dictionary DGA domains with cluster information. The candidate dictionary DGA domains are filtered to generate a set of dictionary DGA domains. An action is performed based on a match with a monitored domain name of a monitored DNS request and a dictionary DGA domain of the set of dictionary DGA domains.

Inventors

  • Janos Szurdi
  • Weihan Jiang
  • David Qianshan He

Assignees

  • PALO ALTO NETWORKS, INC.

Dates

Publication Date
20260505
Application Date
20220418

Claims (9)

  1. 1 . A system, comprising: a processor configured to: receive a domain name system (DNS) stream, wherein the DNS stream includes a plurality of DNS requests; classify the DNS stream using a per domain dictionary domain generation algorithm (DGA) classifier to generate candidate dictionary DGA domains with cluster information of a cluster of the candidate dictionary DGA domains, comprising: classifying, using the per domain dictionary DGA classifier, domains of the DNS stream into dictionary DGA generated domains or benign domains; and clustering the dictionary DGA generated domains to generate the candidate dictionary DGA domains with the cluster information of the cluster, wherein the cluster is determined based on an IP address of a firewall that originated a DNS request of the plurality of DNS requests; filter the candidate dictionary DGA domains to generate a set of dictionary DGA domains, comprising: determining whether an amount of time that a candidate dictionary DGA domain has been registered is greater than a predetermined amount of time threshold or is less than or equal to the predetermined amount of time threshold; in response to a determination that the amount of time that the candidate dictionary DGA domain has been registered is less than or equal to the predetermined amount of time threshold, including the candidate dictionary DGA domain in the set of dictionary DGA domains; and in response to a determination that the amount of time that the candidate dictionary DGA domain has been registered is greater than the predetermined amount of time threshold, omitting including the candidate dictionary DGA domain in the set of dictionary DGA domains; and perform an action based on a match with a domain name of a monitored DNS request and a dictionary DGA domain of the set of dictionary DGA domains; and a memory coupled to the processor and configured to provide the processor with instructions.
  2. 2 . The system of claim 1 , wherein classify the DNS Stream comprises classifying the DNS stream using a per domain deep learning dictionary DGA classifier to generate the candidate dictionary DGA domains with cluster information.
  3. 3 . The system of claim 1 , wherein classify the DNS Stream comprises classifying the DNS stream using a per domain machine learning dictionary DGA classifier to generate the candidate dictionary DGA domains with cluster information.
  4. 4 . The system of claim 1 , wherein perform the action comprises generating an alert.
  5. 5 . The system of claim 1 , wherein perform the action comprises blocking a response for the domain name.
  6. 6 . The system of claim 1 , wherein perform the action comprises responding with a sinkhole IP address.
  7. 7 . The system of claim 1 , wherein the cluster information includes a source IP address of the monitored DNS request.
  8. 8 . A method, comprising: receiving, using a processor, a domain name system (DNS) stream, wherein the DNS stream includes a plurality of DNS requests; classifying, using the processor, the DNS stream using a per domain dictionary domain generation algorithm (DGA) classifier to generate candidate dictionary DGA domains with cluster information of a cluster of the candidate dictionary DGA domains, comprising: classifying, using the per domain dictionary DGA classifier, domains of the DNS stream into dictionary DGA generated domains or benign domains; and clustering the dictionary DGA generated domains to generate the candidate dictionary DGA domains with the cluster information of the cluster, wherein the cluster is determined based on an IP address of a firewall that originated a DNS request of the plurality of DNS requests; filtering, using the processor, the candidate dictionary DGA domains to generate a set of dictionary DGA domains, comprising: determining whether an amount of time that a candidate dictionary DGA domain has been registered is greater than a predetermined amount of time threshold or is less than or equal to the predetermined amount of time threshold; in response to a determination that the amount of time that the candidate dictionary DGA domain has been registered is less than or equal to the predetermined amount of time threshold, including the candidate dictionary DGA domain in the set of dictionary DGA domains; and in response to a determination that the amount of time that the candidate dictionary DGA domain has been registered is greater than the predetermined amount of time threshold, omitting to include the candidate dictionary DGA domain in the set of dictionary DGA domains; and performing, using the processor, an action based on a match with a domain name of a monitored DNS request and a dictionary DGA domain of the set of dictionary DGA domains.
  9. 9 . A non-transitory computer program product embodied in a non-transitory computer readable medium and comprising computer instructions for: receiving a domain name system (DNS) stream, wherein the DNS stream includes a plurality of DNS requests; classifying the DNS stream using a per domain dictionary domain generation algorithm (DGA) classifier to generate candidate dictionary DGA domains with cluster information of a cluster of the candidate dictionary DGA domains, comprising: classifying, using the per domain dictionary DGA classifier, domains of the DNS stream into dictionary DGA generated domains or benign domains; and clustering the dictionary DGA generated domains to generate the candidate dictionary DGA domains with the cluster information of the cluster, wherein the cluster is determined based on an IP address of a firewall that originated a DNS request of the plurality of DNS requests; filtering the candidate dictionary DGA domains to generate a set of dictionary DGA domains, comprising: determining whether an amount of time that a candidate dictionary DGA domain has been registered is greater than a predetermined amount of time threshold or is less than or equal to the predetermined amount of time threshold; in response to a determination that the amount of time that the candidate dictionary DGA domain has been registered is less than or equal to the predetermined amount of time threshold, including the candidate dictionary DGA domain in the set of dictionary DGA domains; and in response to a determination that the amount of time that the candidate dictionary DGA domain has been registered is greater than the predetermined amount of time threshold, omitting to include the candidate dictionary DGA domain in the set of dictionary DGA domains; and performing an action based on a match with a domain name of a monitored DNS request and a dictionary DGA domain of the set of dictionary DGA domains.

Description

BACKGROUND OF THE INVENTION Malicious software (malware) generally refers to unwanted, hostile, or intrusive software that can be used to disrupt computer or network operations, collect private or sensitive information, or access private computer systems or networks. Malware can be in the form of executable code, scripts, active content, and other software. Example malware includes computer viruses, worms, Trojan horses, rootkits, keyloggers, spyware, adware, botnet command and control (C&C) related malware, and other unwanted, hostile, or intrusive software. Security solutions (e.g., security devices or appliances, which can provide firewall solutions) can be used to safeguard against malware. For example, a firewall can identify and prevent the further spread of malware in a network. A firewall generally protects networks from unauthorized access while permitting authorized communications to pass through the firewall. A firewall is typically implemented as a device or a set of devices, or software executed on a device, such as a computer or appliance, that provides a firewall function for network access. For example, firewalls can be integrated into operating systems of devices (e.g., computers, smart phones, tablets, or other types of network communication capable devices). Firewalls can also be integrated into or executed as software on servers, gateways, network/routing devices (e.g., network routers), or appliances (e.g., security appliances or other types of special purpose devices). Firewalls typically deny or permit network transmission based on a set of rules. These sets of rules are often referred to as policies. For example, a firewall can filter inbound traffic by applying a set of rules or policies. A firewall can also filter outbound traffic by applying a set of rules or policies. Firewalls can also be capable of performing basic routing functions. BRIEF DESCRIPTION OF THE DRAWINGS Various embodiments of the invention are disclosed in the following detailed description and the accompanying drawings. FIG. 1A is a functional block diagram illustrating an architecture for detecting dictionary-based DGA traffic in accordance with some embodiments. FIG. 1B is an example of a list of domains generated by a compromised IP address. FIG. 1C is another example of a list of domains generated by a compromised IP address. FIG. 2 is an example of a workflow for detecting dictionary-based DGA traffic. FIG. 3A is an example of a workflow for performing per domain deep learning dictionary DGA classification. FIG. 3B is an example of a neural network. FIG. 4 is an example of a workflow for performing per domain machine learning dictionary DGA classification. FIG. 5 is another functional block diagram illustrating an architecture for detecting dictionary-based DGA traffic in accordance with some embodiments. FIG. 6 is another example of a workflow for detecting dictionary-based DGA traffic. FIG. 7 is a flow diagram illustrating an embodiment of a process for detecting dictionary-based DGA traffic. FIG. 8 is a flow diagram illustrating an embodiment of a process for filtering a candidate dictionary DGA domain. FIG. 9 is a flow diagram illustrating another embodiment of a process for detecting dictionary-based DGA traffic. FIG. 10 is a flow diagram illustrating an embodiment of a process for segmenting a domain name. DETAILED DESCRIPTION The invention can be implemented in numerous ways, including as a process; an apparatus; a system; a composition of matter; a computer program product embodied on a computer readable storage medium; and/or a processor, such as a processor configured to execute instructions stored on and/or provided by a memory coupled to the processor. In this specification, these implementations, or any other form that the invention may take, may be referred to as techniques. In general, the order of the steps of disclosed processes may be altered within the scope of the invention. Unless stated otherwise, a component such as a processor or a memory described as being configured to perform a task may be implemented as a general component that is temporarily configured to perform the task at a given time or a specific component that is manufactured to perform the task. As used herein, the term ‘processor’ refers to one or more devices, circuits, and/or processing cores configured to process data, such as computer program instructions. A detailed description of one or more embodiments of the invention is provided below along with accompanying figures that illustrate the principles of the invention. The invention is described in connection with such embodiments, but the invention is not limited to any embodiment. The scope of the invention is limited only by the claims and the invention encompasses numerous alternatives, modifications and equivalents. Numerous specific details are set forth in the following description in order to provide a thorough understanding of the invention. These details are