US-12627633-B2 - Application traffic and runtime behavior learning and enforcement

US12627633B2US 12627633 B2US12627633 B2US 12627633B2US-12627633-B2

Abstract

Systems and methods for learning behavioral activity correlations. A method includes intercepting a plurality of requests, wherein each of the plurality of requests is directed to a respective destination entity of a plurality of destination entities; creating a request queue by queueing the plurality of requests; inspecting contents of the plurality of requests; separately forwarding each intercepted request to its respective destination entity based on the request queue; monitoring runtime output of each of the plurality of destination entities, wherein the runtime output includes behavioral activities of the plurality of destination entities; and training a machine learning model based on the contents of the plurality of requests the runtime output of each of the plurality of destination entities, wherein the machine learning model is trained to output request-output correlations between groups of requests and subsequent behavioral activities.

Inventors

Liron Levin
Isaac SCHNITZER
Ory Segal
Dima Stopel

Assignees

PALO ALTO NETWORKS, INC.

Dates

Publication Date: 20260512
Application Date: 20200728

Claims (20)

1 . A method comprising: intercepting a first plurality of requests, wherein each of the first plurality of requests is directed to a corresponding one of a plurality of entities; buffering the first plurality of requests in a queue; based on forwarding requests in the first plurality of requests in the queue to respective entities in the plurality of entities, monitoring a first plurality of events at the respective entities in the plurality of entities; and correlating requests in the first plurality of requests with corresponding events in the first plurality of events; training a machine learning model on the correlated requests and corresponding events to learn first correlations that indicate events in the first plurality of events and corresponding requests in the first plurality of requests that caused those events in the first plurality of events inputting a second plurality of requests and behavioral data for a second plurality of events at entities in the plurality of entities into the trained machine learning model to obtain second correlations as output, wherein the second correlations indicate events in the second plurality of events and corresponding requests in the second plurality of requests that caused those events in the second plurality of events; and determining at least one of allowed and forbidden ones of the second correlations.
2 . The method of claim 1 , further comprising: detecting one or more impermissible events at a first entity in the plurality of entities based on monitoring events at the plurality of entities; obtaining a second plurality of requests in a time period preceding the detected one or more impermissible events; inputting the second plurality of requests and the one or more impermissible events into the trained machine learning model to determine a subset of the second plurality of requests correlated with the one or more impermissible events; and indicating mitigation action at the first entity based, at least in part, on the subset of the second plurality of requests and the one or more impermissible events.
3 . The method of claim 2 , further comprising training a firewall model to detect permissible and impermissible events at each of the plurality of entities, wherein detecting the one or more impermissible events at the first entity comprises inputting metadata for events at the first entity into the firewall model.
4 . The method of claim 3 , wherein training the firewall model to detect permissible and impermissible events comprises training the firewall model according to one or more policies comprising one or more lists of permissible and impermissible request-to-event correlations.
5 . The method of claim 1 , wherein the events at the plurality of entities comprise at least one of filesystem events, process events, network events, and queries to databases.
6 . A non-transitory computer readable medium having stored thereon program code comprising instructions to: intercept a first plurality of requests, wherein each of the first plurality of requests is directed to a corresponding one of a plurality of entities; buffer the first plurality of requests in a queue; based on forwarding requests in the first plurality of requests in the queue to respective entities in the plurality of entities, monitor a first plurality of events at the respective entities in the plurality of entities; and correlate requests in the first plurality of requests with corresponding events in the first plurality of events; train a machine learning model on the correlated requests and corresponding events to learn first correlations that indicate events in the first plurality of events and corresponding requests in the first plurality of requests that caused those events in the first plurality of events; input a second plurality of requests and behavioral data for a second plurality of events at entities in the plurality of entities into the trained machine learning model to obtain second correlations as output, wherein the second correlations indicate events in the second plurality of events and corresponding requests in the second plurality of requests that caused those events in the second plurality of events; and determine at least one of allowed and forbidden ones of the second correlations.
7 . A system comprising: a processor; and a machine-readable medium having instructions stored thereon that are executable by the processor to cause the system to, intercept a first plurality of requests, wherein each of the first plurality of requests is directed to a corresponding one of a plurality of entities; buffer the first plurality of requests in a queue; based on forwarding requests in the first plurality of requests in the queue to respective entities in the plurality of entities, monitor a first plurality of events at the respective entities in the plurality of entities; and correlate requests in the first plurality of requests with corresponding events in the first plurality of events; train a machine learning model on the correlated requests and corresponding events to learn first correlations that indicate events in the first plurality of events and corresponding requests in the first plurality of requests that caused those events in the first plurality of events; input a second plurality of requests and behavioral data for a second plurality of events at entities in the plurality of entities into the trained machine learning model to obtain second correlations as output, wherein the second correlations indicate events in the second plurality of events and corresponding requests in the second plurality of requests that caused those events in the second plurality of events; and determining at least one of allowed and forbidden ones of the second correlations.
8 . The system of claim 7 , wherein the machine-readable medium further has instructions stored thereon executable by the processor to cause the system to: detect one or more impermissible events at a first entity in the plurality of entities based on monitoring events at the plurality of entities; obtain a second plurality of requests in a time period preceding the detected one or more impermissible events; input the second plurality of requests and the one or more impermissible events into the trained machine learning model to determine a subset of the second plurality of requests correlated with the one or more impermissible events; and indicate mitigation action at the first entity based, at least in part, on the subset of the second plurality of requests and the one or more impermissible events.
9 . The system of claim 8 , wherein the machine-readable medium further has instructions stored thereon executable by the processor to cause the system to train a firewall model to detect permissible and impermissible events at each of the plurality of entities, wherein the instructions executable by the processor to cause the system to detect the one or more impermissible events at the first entity comprise instructions to input metadata for events at the first entity into the firewall model.
10 . The system of claim 9 , wherein the instructions executable by the processor to cause the system to train the firewall model to detect permissible and impermissible events comprise instructions executable by the processor to cause the system to train the firewall model according to one or more policies comprising one or more lists of permissible and impermissible request-to-event correlations.
11 . The computer readable medium of claim 6 , wherein the program code further comprises instructions to: detect one or more impermissible events at a first entity in the plurality of entities based on monitoring events at the plurality of entities; obtain a second plurality of requests in a time period preceding the detected one or more impermissible events; input the second plurality of requests and the one or more impermissible events into the trained machine learning model to determine a subset of the second plurality of requests correlated with the one or more impermissible events; and indicate mitigation action at the first entity based, at least in part, on the subset of the second plurality of requests and the one or more impermissible events.
12 . The computer readable medium of claim 11 , wherein the program code further comprises instructions to train a firewall model to detect permissible and impermissible events at each of the plurality of entities, wherein the instructions to detect the one or more impermissible events at the first entity comprise instructions to input metadata for events at the first entity into the firewall model.
13 . The computer readable medium of claim 12 , wherein the instructions to train the firewall model to detect permissible and impermissible events comprise instructions to train the firewall model according to one or more policies comprising one or more lists of permissible and impermissible request-to-event correlations.
14 . A method comprising: grouping metadata of subsets of a first plurality of requests directed to a protected entity, wherein the subsets of requests are differentiated by called process and wherein metadata of each request within a subset of requests indicates a same process; buffering subsets of requests grouped by called process in a queue; based on processing subsets of requests in the queue at the protected entity, correlating events in a first plurality of events at the protected entity with respect to each called process with corresponding subsets of the first plurality of requests; training a machine learning model on the correlated requests and corresponding events at the protected entity, wherein the machine learning model is trained to learn first correlations that indicate events in the first plurality of events and corresponding requests in a first plurality of requests that caused those events in the first plurality of events; inputting a second plurality of requests and behavioral data for a second plurality of events at the protected entity into the trained machine learning model to obtain second correlations as output, wherein the second correlations indicate events in the second plurality of events and corresponding requests in the second plurality of requests that caused those events in the second plurality of events; and determining at least one of allowed and forbidden ones of the second correlations.
15 . The method of claim 14 , further comprising: detecting one or more impermissible events at a protected entity based on monitoring events at the protected entity; obtaining a second plurality of requests directed to the protected entity in a time period preceding the detected one or more impermissible events; grouping metadata of subsets of the second plurality of requests, wherein the subsets of requests are differentiated by called process and metadata of each request within a subset of requests indicates a same process; inputting the second plurality of requests and the one or more impermissible events into the trained machine learning model to determine a subset of the second plurality of requests correlated with the one or more impermissible events; and indicating mitigation action at the protected entity based, at least in part, on the subset of the second plurality of requests and the one or more impermissible events.
16 . The method of claim 15 , further comprising training a firewall model to detect permissible and impermissible events at the protected entity, wherein detecting the one or more impermissible events at the protected entity comprises inputting metadata for events at the protected entity into the firewall model.
17 . The method of claim 16 , wherein training the firewall model to detect permissible and impermissible events comprises training the firewall model according to one or more policies comprising one or more lists of permissible and impermissible request-to- event correlations.
18 . The method of claim 14 , wherein the events at the protected entity comprise at least one of filesystem events, process events, network events, and queries to databases.
19 . A non-transitory computer readable medium having stored thereon program code comprising instructions to: group metadata of subsets of a first plurality of requests directed to a protected entity, wherein the subsets of requests are differentiated by called process and wherein metadata of each request within a subset of requests indicates a same process; buffer subsets of requests grouped by called process in a queue; based on processing subsets of requests in the queue at the protected entity, correlate events in a first plurality of events at the protected entity with respect to each called process with corresponding subsets of the first plurality of requests; train a machine learning model on the correlated requests and corresponding events at the protected entity, wherein the machine learning model is trained to learn first correlations that indicate events in the first plurality of events and corresponding requests in the first plurality of requests that caused those events in the first plurality of events; input a second plurality of requests and behavioral data for a second plurality of events at the protected entity into the trained machine learning model to obtain second correlations as output, wherein the second correlations indicate events in the second plurality of events and corresponding requests in the second plurality of requests that caused those events in the second plurality of events; and determine at least one of allowed and forbidden ones of the second correlations.
20 . The computer readable medium of claim 19 , wherein the program code further comprises instructions to: detect one or more impermissible events at a protected entity based on monitoring events at the protected entity; obtain a second plurality of requests directed to the protected entity in a time period preceding the detected one or more impermissible events; group metadata of subsets of the second plurality of requests, wherein the subsets of requests are differentiated by called process and metadata of each request within a subset of requests indicates a same process; input the second plurality of requests and the one or more impermissible events into the trained machine learning model to determine a subset of the second plurality of requests correlated with the one or more impermissible events; and indicate mitigation action at the protected entity based, at least in part, on the subset of the second plurality of requests and the one or more impermissible events.

Description

TECHNICAL FIELD The present disclosure relates generally to application firewalls, and more specifically to learning and enforcement of application firewall policies. BACKGROUND One of the main challenges in cloud native environments is to evaluate firewall policies based on entity identifiers rather than information which is not entity-specific such as Internet Protocol (IP) addresses. Thus, techniques for accurately and efficiently evaluating application firewall policies are desirable. Some application firewall solutions attempt to utilize machine learning to learn identities of entities in order to enable application firewall policy enforcement. To this end, some existing solutions attempt to associate incoming web request with specific application runtime behaviors such as queries, process spawning, and the like. For example, existing solutions may learn that a specific route leads to a specific process being spawned, that a specific route causes a particular database query pattern (e.g., requesting that all users for the route follow a query of a specific table of a database), that a specific route causes a file modification on a disk (e.g., uploading images), and the like. Existing solutions face challenges in correlating and associating application web events with unrelated runtime events on a live-system. This is particularly challenging in web servers with high traffic load. As a result, existing solutions do not have runtime components and cannot flexibly adapt to changes in associations at runtime. It would therefore be advantageous to provide a solution that would overcome the challenges noted above. SUMMARY A summary of several example embodiments of the disclosure follows. This summary is provided for the convenience of the reader to provide a basic understanding of such embodiments and does not wholly define the breadth of the disclosure. This summary is not an extensive overview of all contemplated embodiments, and is intended to neither identify key or critical elements of all embodiments nor to delineate the scope of any or all aspects. Its sole purpose is to present some concepts of one or more embodiments in a simplified form as a prelude to the more detailed description that is presented later. For convenience, the term “some embodiments” or “certain embodiments” may be used herein to refer to a single embodiment or multiple embodiments of the disclosure. Certain embodiments disclosed herein include a method for learning behavioral activity correlations using traffic shaping. The method comprises: intercepting a plurality of requests, wherein each of the plurality of requests is directed to a respective destination entity of a plurality of destination entities; creating a request queue by queueing the plurality of requests; inspecting contents of the plurality of requests; separately forwarding each intercepted request to its respective destination entity based on the request queue; monitoring runtime output of each of the plurality of destination entities, wherein the runtime output includes behavioral activities of the plurality of destination entities; and training a machine learning model based on the contents of the plurality of requests the runtime output of each of the plurality of destination entities, wherein the machine learning model is trained to output request-output correlations between groups of requests and subsequent behavioral activities. Certain embodiments disclosed herein also include a non-transitory computer readable medium having stored thereon causing a processing circuitry to execute a process, the process comprising: intercepting a plurality of requests, wherein each of the plurality of requests is directed to a respective destination entity of a plurality of destination entities; creating a request queue by queueing the plurality of requests; inspecting contents of the plurality of requests; separately forwarding each intercepted request to its respective destination entity based on the request queue; monitoring runtime output of each of the plurality of destination entities, wherein the runtime output includes behavioral activities of the plurality of destination entities; and training a machine learning model based on the contents of the plurality of requests the runtime output of each of the plurality of destination entities, wherein the machine learning model is trained to output request-output correlations between groups of requests and subsequent behavioral activities. Certain embodiments disclosed herein also include a system for learning behavioral activity correlations using traffic shaping. The system comprises: a processing circuitry; and a memory, the memory containing instructions that, when executed by the processing circuitry, configure the system to: intercept a plurality of requests, wherein each of the plurality of requests is directed to a respective destination entity of a plurality of destination entities; create a request queue by queue