US-12619720-B2 - Systems and methods for cybersecurity alert deduplication, grouping, and prioritization

US12619720B2US 12619720 B2US12619720 B2US 12619720B2US-12619720-B2

Abstract

Systems and methods for alert deduplication. A method includes querying a software component associations database based on a plurality of software containers indicated by a plurality of alerts in order to identify a plurality of correlations between software containers among the plurality of software containers, wherein the software component associations database stores at least associations between configuration files of the plurality of software containers and build files used to build the plurality of software containers; identifying at least one set of duplicate alerts among the plurality of alerts based on the identified plurality of correlations, wherein each set of duplicate alerts includes at least two alerts of the plurality of alerts which indicate correlated software containers among the plurality of software containers; and deduplicating the plurality of alerts based on the identified at least one set of duplicate alerts in order to produce a deduplicated set of alerts.

Inventors

Oren YONA
Eyal GOLOMBEK
Tomer Schwartz
Eshel YARON
Pavel RESNIANSKI

Assignees

Dazz, Inc.

Dates

Publication Date: 20260505
Application Date: 20250220

Claims (19)

1 . A method for alert deduplication, comprising: querying a software component associations database based on a plurality of software containers indicated by a plurality of alerts in order to identify a plurality of correlations between software containers among the plurality of software containers, wherein the plurality of correlations is based on stored associations in the software component associations database, the stored associations are between a configuration file of each software container of the plurality of software containers and at least one build file used to build the software container of the plurality of software containers; identifying at least one set of duplicate alerts among the plurality of alerts based on the identified plurality of correlations, wherein each set of duplicate alerts includes at least two alerts of the plurality of alerts corresponding to correlated software containers among the plurality of software containers; and deduplicating the plurality of alerts based on the identified at least one set of duplicate alerts in order to produce a deduplicated set of alerts.
2 . The method of claim 1 , wherein deduplicating the plurality of alerts further comprises: removing at least one redundant alert such that the deduplicated set of alerts includes only one instance of each unique alert.
3 . The method of claim 1 , further comprising: mitigating at least one threat based on the deduplicated set of alerts.
4 . The method of claim 1 , wherein the at least two alerts of each set of duplicate alerts further indicate a same common vulnerability and exposure (CVE) among a plurality of predetermined CVEs.
5 . The method of claim 1 , wherein a first build file indicated by a first alert of each set of duplicate alerts is associated with a first configuration file indicated by a second alert of the set of duplicate alerts.
6 . The method of claim 5 , wherein the first build file indicated by the first alert of each set of duplicate alerts is used to build a container image corresponding to the first configuration file indicated by the second alert of the set of duplicate alerts.
7 . The method of claim 1 , wherein the at least two alerts of each set of duplicate alerts includes alerts from different detection tools.
8 . The method of claim 1 , further comprising: de-compiling the configuration file of each software container of the plurality of software containers in order to produce a plurality of de-compiled configuration files; identifying at least one candidate build file in each of the plurality of de-compiled configuration files; associating each de-compiled configuration file with one of the at least one candidate build file which meets at least one matching condition for the de-compiled configuration file; and populating the software component associations database based on the association.
9 . The method of claim 8 , wherein identifying the at least one candidate build file in each of the plurality of de-compiled configuration files further comprises: matching each de-compiled configuration file of the plurality of de-compiled configuration files to each of the at least one candidate build file for the de-compiled configuration file based on the at least one matching command between the de-compiled configuration file and each of the at least one candidate build file for the de-compiled configuration file.
10 . A non-transitory computer readable medium having stored thereon instructions for causing a processing circuitry to execute a process for software containers attribution, the process comprising: querying a software component associations database based on a plurality of software containers indicated by a plurality of alerts in order to identify a plurality of correlations between software containers among the plurality of software containers, wherein the plurality of correlations is based on stored associations in the software component associations database, the stored associations are between a configuration file of each software container of the plurality of software containers and at least one build file used to build the software container of the plurality of software containers; identifying at least one set of duplicate alerts among the plurality of alerts based on the identified plurality of correlations, wherein each set of duplicate alerts includes at least two alerts of the plurality of alerts corresponding to correlated software containers among the plurality of software containers; and deduplicating the plurality of alerts based on the identified at least one set of duplicate alerts in order to produce a deduplicated set of alerts.
11 . A system for alert deduplication, comprising: a processing circuitry; and a memory, the memory containing instructions that, when executed by the processing circuitry, configure the system to: query a software component associations database based on a plurality of software containers indicated by a plurality of alerts in order to identify a plurality of correlations between software containers among the plurality of software containers, wherein the plurality of correlations is based on stored associations in the software component associations database, the stored associations are between a configuration file of each software container of the plurality of software containers and at least one build file used to build the software container of the plurality of software containers; identify at least one set of duplicate alerts among the plurality of alerts based on the identified plurality of correlations, wherein each set of duplicate alerts includes at least two alerts of the plurality of alerts corresponding to correlated software containers among the plurality of software containers; and deduplicate the plurality of alerts based on the identified at least one set of duplicate alerts in order to produce a deduplicated set of alerts.
12 . The system of claim 11 , wherein the system is further configured to: remove at least one redundant alert such that the deduplicated set of alerts includes only one instance of each unique alert.
13 . The system of claim 11 , wherein the system is further configured to: mitigate at least one threat based on the deduplicated set of alerts.
14 . The system of claim 11 , wherein the at least two alerts of each set of duplicate alerts further indicate a same common vulnerability and exposure (CVE) among a plurality of predetermined CVEs.
15 . The system of claim 11 , wherein a first build file indicated by a first alert of each set of duplicate alerts is associated with a first configuration file indicated by a second alert of the set of duplicate alerts.
16 . The system of claim 15 , wherein the first build file indicated by the first alert of each set of duplicate alerts is used to build a container image corresponding to the first configuration file indicated by the second alert of the set of duplicate alerts.
17 . The system of claim 11 , wherein the at least two alerts of each set of duplicate alerts includes alerts from different detection tools.
18 . The system of claim 11 , wherein the system is further configured to: de-compile the configuration file of each software container of the plurality of software containers in order to produce a plurality of de-compiled configuration files; identify at least one candidate build file in each of the plurality of de-compiled configuration files; associate each de-compiled configuration file with one of the at least one candidate build file which meets at least one matching condition for the de-compiled configuration file; and populate the software component associations database based on the association.
19 . The system of claim 18 , wherein the system is further configured to: match each de-compiled configuration file of the plurality of de-compiled configuration files to each of the at least one candidate build file for the de-compiled configuration file based on the at least one matching command between the de-compiled configuration file and each of the at least one candidate build file for the de-compiled configuration file.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS This application is a continuation of U.S. patent application Ser. No. 17/816,161 filed on Jul. 29, 2022, now pending, which is continuation-in-part of U.S. patent application Ser. No. 17/656,914 filed on Mar. 29, 2022, now U.S. Pat. No. 12,204,651. The contents of the above-referenced applications are hereby incorporated by reference. TECHNICAL FIELD The present disclosure relates to cybersecurity alert deduplication and prioritization in virtualized execution environments. BACKGROUND Most virtualized execution environments deploy several cybersecurity detection tools to monitor for risks or abnormalities in different parts of the software development pipeline such as code, container repositories, production containers, and the like. These tools may generate alerts when vulnerable, abnormal or otherwise potentially malicious behavior is detected. In many implementations, the different tools scan for alerts in different parts of the pipeline. An alert is a collection of findings (e.g., risks or events) that, taken together, are significant from a cybersecurity perspective. Each alert may be realized as or may include text indicating the type of potential vulnerability, threat, the findings involved, relevant times, and the like. Although the existence of these automated detection systems enables the identification of potential cyber threats in a manner that is not feasible to detect manually, the detection tools in many virtualized execution environments (particularly large environments) collectively generate extremely large numbers of alerts. The result is that resolving the issues reflected by these alerts is an incredibly complex and labor-intensive task. Particular challenges for resolving alerts include the sheer number of alerts being generated as well as the need to prioritize those alerts in order to effectively mitigate the threats they may represent. Some existing solutions attempt to automate deduplication and prioritization decisions, but these solutions face challenges in accurately identifying duplicate alerts, particularly when alerts are generated by different tools. Some solutions attempt to utilize attribution techniques in order to identify sources of cybersecurity events or look for matches between textual representations of the alerts (or certain fields or attributes in the alerts). However, these solutions are often only reactive, requiring manual attribution after a breach has already occurred. Further, these solutions may fail when two alerts include similar text even though they are not related to the same underlying issue or root cause, or when two alerts include significantly different text even though they are related to the same underlying issue or root cause. Additionally, existing manual attribution processes require significant amounts of time and labor. As a result, these solutions are not suitable for use in deduplication and prioritization where the goal is to manage alerts in real time in order to avoid a breach or other harm. In the context of cybersecurity, attribution is the process by which security analysts collect evidence, build timelines, and try to piece together evidence in the wake of a cyber-attack, to determine what caused the breach. For example, attribution of a detected malware can identify the type of resource utilized to run the malware (e.g., a software agent running a Linux® agent), the network resources that the malware communicated with, local resources that have been exploited, and so on. The attribution may not necessarily lead to a hacker who maliciously exploited the vulnerability but also to a programmer who accidentally caused the vulnerability. Attribution of software containers is a very complex problem due, in part, to the structure of containers and how they are formed. A software container, such as build by Docker®, is a standard unit of software that packages code and all its dependencies to allow applications to run from one computing environment to another. A software container includes a container image which is a lightweight, standalone, executable package of software that includes all resources to run an application including code, runtime, system tools, system libraries and settings. The build file (e.g., Dockerfile) contains all the commands to assemble and create a container image. Using a build file, users can create an automated build that executes several command-line instructions in succession. It would therefore be advantageous to provide a solution that would overcome the challenges noted above. SUMMARY A summary of several example embodiments of the disclosure follows. This summary is provided for the convenience of the reader to provide a basic understanding of such embodiments and does not wholly define the breadth of the disclosure. This summary is not an extensive overview of all contemplated embodiments, and is intended to neither identify key or critical elements of all embodime