Search

US-12621313-B2 - Detection of malicious beaconing in virtual private networks

US12621313B2US 12621313 B2US12621313 B2US 12621313B2US-12621313-B2

Abstract

A computer-implemented method includes accessing virtual private cloud flow logs of network traffic data originating from a virtual private cloud, generating filtered flow logs by filtering the virtual private cloud flow logs, extracting features based on a plurality of attributes from the filtered flow logs, training one or more machine learning models based on the features, applying the one or more machine learning models to the network traffic data to identify potential beacons, generating an alert notification that identifies the potential beacons, and communicating the alert notification to an alerting system.

Inventors

  • Sandeep Chandana
  • Aditya Kumar
  • Ameya Mahesh Sanzgiri

Assignees

  • SNOWFLAKE INC.

Dates

Publication Date
20260505
Application Date
20230427

Claims (20)

  1. 1 . A computer-implemented method comprising: accessing virtual private cloud flow logs of network traffic data originating from a virtual private cloud; generating filtered flow logs by filtering the virtual private cloud flow logs; extracting features based on a plurality of attributes from the filtered flow logs; training one or more machine learning models based on the features, the training of the one or more machine learning models comprising at least one of: training a virtual private cloud machine learning model for each virtual private cloud of a plurality of virtual private clouds; training an account machine learning model for each account of the virtual private cloud; or training a resource machine learning model for each resource of an account of the virtual private cloud; applying the one or more machine learning models to the network traffic data to identify potential beacons; generating an alert notification that identifies the potential beacons; and communicating the alert notification to an alerting system.
  2. 2 . The computer-implemented method of claim 1 , further comprising: scoring and ranking the potential beacons based on past occurrences and time decay weight.
  3. 3 . The computer-implemented method of claim 2 , wherein the scoring and ranking of the potential beacons are based on one or more of: frequency, regularity, duration, size, content, encryption status, destination domain reputation, or source device behavior.
  4. 4 . The computer-implemented method of claim 1 , further comprising: receiving user feedback in response to communicating the alert notification to the alerting system; and re-training the one or more machine learning models based on the user feedback.
  5. 5 . The computer-implemented method of claim 1 , further comprising: receiving user feedback in response to communicating the alert notification to the alerting system; identifying a first machine learning model of the one or more machine learning models based on the alert notification and the user feedback; and re-training the first machine learning model based on the user feedback.
  6. 6 . The computer-implemented method of claim 1 , further comprising: causing a display of the alert notification with a graphical user interface that allows a user to filter, sort, search, or export alerts.
  7. 7 . The computer-implemented method of claim 1 , wherein the network traffic data identifies a combination of source IP addresses, destination IP addresses, ports, protocols, payloads, timestamps, and intervals.
  8. 8 . The computer-implemented method of claim 1 , wherein the plurality of attributes includes a combination of a communication duration, time between communications, transferred data size, and a number of packets.
  9. 9 . The computer-implemented method of claim 1 , wherein the one or more machine learning models comprises at least one of isolating anomalies from n-dimensional space, Macbeth vector search, enclosing inliers, or auto-encoder.
  10. 10 . The computer-implemented method of claim 1 , wherein the one or more machine learning models are trained on labeled network traffic data that includes known examples of malicious and benign beacons.
  11. 11 . The computer-implemented method of claim 1 , the one or more machine learning models include one or more of: classification models, clustering models, anomaly detection models, or regression models.
  12. 12 . The computer-implemented method of claim 1 , wherein the virtual private cloud flow logs include a plurality of flow logs, each flow log corresponding to a virtual private cloud account.
  13. 13 . The computer-implemented method of claim 12 , wherein the virtual private cloud account operates a plurality of services.
  14. 14 . The computer-implemented method of claim 13 , wherein each machine learning models of the one or more machine learning models correspond to a service from the plurality of services.
  15. 15 . The computer-implemented method of claim 1 , wherein the alert notification comprises information about a source device, a destination host, beacon characteristics, and recommended actions.
  16. 16 . A computing apparatus comprising: a processor; and a memory storing instructions that, when executed by the processor, configure the apparatus to perform operations comprising: accessing virtual private cloud flow logs of network traffic data originating from a virtual private cloud; generating filtered flow logs by filtering the virtual private cloud flow logs; extracting features based on a plurality of attributes from the filtered flow logs; training one or more machine learning models based on the features, the training of the one or more machine learning models comprising at least one of: training a virtual private cloud machine learning model for each virtual private cloud of a plurality of virtual private clouds; training an account machine learning model for each account of the virtual private cloud; or training a resource machine learning model for each resource of an account of the virtual private cloud; applying the one or more machine learning models to the network traffic data to identify potential beacons; generating an alert notification that identifies the potential beacons; and communicating the alert notification to an alerting system.
  17. 17 . The computing apparatus of claim 16 , wherein the operations further comprise: scoring and ranking the potential beacons based on past occurrences and time decay weight.
  18. 18 . A non-transitory computer-readable storage medium, the computer-readable storage medium including instructions that when executed by a computer, cause the computer to perform operations comprising: accessing virtual private cloud flow logs of network traffic data originating from a virtual private cloud; generating filtered flow logs by filtering the virtual private cloud flow logs; extracting features based on a plurality of attributes from the filtered flow logs; training one or more machine learning models based on the features, the training of the one or more machine learning models comprising at least one of: training a virtual private cloud machine learning model for each virtual private cloud of a plurality of virtual private clouds; training an account machine learning model for each account of the virtual private cloud; or training a resource machine learning model for each resource of an account of the virtual private cloud; applying the one or more machine learning models to the network traffic data to identify potential beacons; generating an alert notification that identifies the potential beacons; and communicating the alert notification to an alerting system.
  19. 19 . The non-transitory computer-readable storage medium of claim 18 , wherein the operations further comprise scoring and ranking the potential beacons based on past occurrences and time decay weight.
  20. 20 . The non-transitory computer-readable storage medium of claim 18 , wherein the operations further comprise: receiving user feedback in response to communicating the alert notification to the alerting system; and re-training the one or more machine learning models based on the user feedback.

Description

TECHNICAL FIELD The present disclosure generally relates to special-purpose machines that detects malicious beaconing activities in virtual private networks, and to the technologies by which such special-purpose machines become improved compared to other special-purpose machines for detecting beaconing activities. BACKGROUND New security and networking paradigms such as firewalls and Virtual Private Cloud (VPC) prevents malware from directly communicating with attackers/adversaries. To overcome this, the attackers create a command-and-control server (C2 server) that can send commands to infected endpoints. When an endpoint is infected, it tries to establish an outbound connection to the attacker's C2 server over the internet. Usually, this connection will try to look like normal traffic by using HTTP, HTTPS or DNS. The purpose of the connection is to notify the C2 server that a new infected endpoint is ready and waiting for instructions. This process will then pause for some time before repeating the check in process again. This activity is referred to as malicious beaconing activity and is often a sign of more planned and widespread network attacks on organizational infrastructure. BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWINGS To easily identify the discussion of any particular element or act, the most significant digit or digits in a reference number refer to the figure number in which that element is first introduced. FIG. 1 illustrates an example computing environment that includes a network-based database system in communication with a cloud storage provider system, in accordance with some embodiments of the present disclosure. FIG. 2 is a block diagram illustrating components of a compute service manager, in accordance with some embodiments of the present disclosure. FIG. 3 is a block diagram illustrating components of an execution platform, in accordance with some embodiments of the present disclosure. FIG. 4 is a block diagram illustrating storage of database tables in micro-partitions, according to some example embodiments. FIG. 5 is a block diagram illustrating a malicious beacon detection system in accordance with one example embodiment. FIG. 6 is a block diagram illustrating a data pipelines module in accordance with one example embodiment. FIG. 7 is a block diagram illustrating a machine learning model structure in accordance with one embodiment. FIG. 8 is a flow diagram illustrating a method for training a machine learning model in accordance with one example embodiment. FIG. 9 is a flow diagram illustrating a method for providing alert data to a monitoring system in accordance with one example embodiment. FIG. 10 is a table illustrating alert data in accordance with one example embodiment. FIG. 11 is block diagram showing a software architecture within which the present disclosure may be implemented, according to an example embodiment, FIG. 12 is a diagrammatic representation of a machine in the form of a computer system within which a set of instructions may be executed for causing the machine to perform any one or more of the methodologies discussed herein, according to an example embodiment. DETAILED DESCRIPTION The description that follows describes systems, methods, techniques, instruction sequences, and computing machine program products that illustrate example embodiments of the present subject matter. In the following description, for purposes of explanation, numerous specific details are set forth in order to provide an understanding of various embodiments of the present subject matter. It will be evident, however, to those skilled in the art, that embodiments of the present subject matter may be practiced without some or other of these specific details. Examples merely typify possible variations. Unless explicitly stated otherwise, structures (e.g., structural components, such as modules) are optional and may be combined or subdivided, and operations (e.g., in a procedure, algorithm, or other function) may vary in sequence or be combined or subdivided. Databases are widely used for data storage and access in computing applications. Databases may include one or more tables that include or reference data that can be read, modified, or deleted using queries. Querying very large databases and/or tables might require scanning large amounts of data. Reducing the amount of data scanned is one of the main challenges of data organization and processing. The term “micro-partition” is used herein to refer to a contiguous unit of storage that stores some or all of the data of a single table. In some example embodiments, each micro-partition stores between 50 and 500 MB of uncompressed data. Micro-partitions may be stored in a compressed or uncompressed form. Groups of rows in tables may be mapped into individual micro-partitions organized in a columnar fashion. In relational databases comprising rows and columns, all columns for the rows of a micro-partition are stored in the micro-par