US-12627692-B2 - Automatic rule generation for malicious indicators based on historical data

US12627692B2US 12627692 B2US12627692 B2US 12627692B2US-12627692-B2

Abstract

Malicious indicators rule generation using historical data is provided. A method includes receiving, from threat detection engines of a plurality of vendor systems, a plurality of threat detection indications for a dataset. Each threat detection indication of the plurality of threat detection indications receives a vendor-specific tokenization based on historical data associated with the plurality of vendor systems. The method further includes identifying, from the plurality of threat detection indications, a lead detection from a first vendor system of the plurality of vendor systems and an accuracy detection from at least one second vendor system of the plurality of vendor systems. The lead detection and the accuracy detection have overlapping data from the dataset. The method further includes generating, by a processing device, a malicious behavior detection procedure based on the lead detection, the accuracy detection, and the vendor-specific tokenization being used to detect a malicious behavior in dataset.

Inventors

Mihai Maganu
ANDREI STOIAN
Ernest Szocs
Paul Urian

Assignees

CROWDSTRIKE, INC.

Dates

Publication Date: 20260512
Application Date: 20240506

Claims (16)

1 . A method comprising: receiving, from threat detection engines of a plurality of vendor systems, a plurality of threat detection indications for a dataset, each threat detection indication of the plurality of threat detection indications receiving a vendor-specific tokenization based on historical data associated with the plurality of vendor systems; identifying, from the plurality of threat detection indications; a lead detection from a first vendor system of the plurality of vendor systems, wherein the lead detection comprises a threat detection indication from the plurality of threat detection indications having a largest amount of data from the dataset; and an accuracy detection from at least one second vendor system of the plurality of vendor systems, wherein the accuracy detection comprises another threat detection indication from the plurality of threat detection indications having a next largest amount of data from the dataset, and wherein the lead detection and the accuracy detection have overlapping data from the dataset; and generating, by a processing device, a malicious behavior detection procedure based on the lead detection, the accuracy detection, the overlapping data from the dataset, and the vendor-specific tokenization being used to detect a malicious behavior in the dataset.
2 . The method of claim 1 , further comprising: tokenizing the plurality of threat detection indications into component parts including at least one of: a modifier, a threat type, a family, or a variant.
3 . The method of claim 2 , wherein at least one of: the modifier corresponds to an operating system type, the threat type corresponds to a malware type, the family corresponds to a name of a threat, or the variant corresponds to a version or subversion of the family.
4 . The method of claim 1 , wherein the vendor-specific tokenization for a first threat detection indication of the first vendor system is associated with a different tokenization scheme than implemented for a second threat detection indication of the least one second vendor system.
5 . The method of claim 1 , further comprising: selecting one or more additional accuracy detections from the plurality of threat detection indications, each of the one or more additional accuracy detections being from a different second vendor system than each other, and having overlapping data with the lead detection.
6 . The method of claim 5 , wherein each of the accuracy detections includes a larger amount of data from the dataset than any other threat detection indication from the corresponding second vendor system.
7 . The method of claim 1 , further comprising: detecting, from the plurality of threat detection indications, the malicious behavior in the dataset, the malicious behavior being within the overlapping data of the lead detection from the first vendor system and the accuracy detection from the at least one second vendor system.
8 . The method of claim 1 , further comprising: applying, to a different set of data, the malicious behavior detection procedure generated for the dataset.
9 . A system comprising: a processing device; and a memory to store instructions that, when executed by the processing device cause the processing device to: receive, from threat detection engines of a plurality of vendor systems, a plurality of threat detection indications for a dataset, each threat detection indication of the plurality of threat detection indications receiving a vendor-specific tokenization based on historical data associated with the plurality of vendor systems; identify, from the plurality of threat detection indications; a lead detection from a first vendor system of the plurality of vendor systems, wherein the lead detection comprises a threat detection indication from the plurality of threat detection indications having a largest amount of data from the dataset, and an accuracy detection from at least one second vendor system of the plurality of vendor systems, wherein the accuracy detection comprises another threat detection indication from the plurality of threat detection indications having a next largest amount of data from the dataset, and wherein the lead detection and the accuracy detection have overlapping data from the dataset; and generate, by a processing device, a malicious behavior detection procedure based on the lead detection, the accuracy detection, the overlapping data from the dataset, and the vendor-specific tokenization being used to detect a malicious behavior in the dataset.
10 . The system of claim 9 , wherein the processing device is further to: tokenize the plurality of threat detection indications into component parts including at least one of: a modifier, a threat type, a family, or a variant.
11 . The system of claim 10 , wherein at least one of: the modifier corresponds to an operating system type, the threat type corresponds to a malware type, the family corresponds to a name of a threat, or the variant corresponds to a version or subversion of the family.
12 . The system of claim 9 , wherein the vendor-specific tokenization for a first threat detection indication of the first vendor system is associated with a different tokenization scheme than implemented for a second threat detection indication of the least one second vendor system.
13 . The system of claim 9 , wherein the processing device is further to: select one or more additional accuracy detections from the plurality of threat detection indications, each of the one or more additional accuracy detections being from a different second vendor system than each other having overlapping data with the lead detection.
14 . The system of claim 9 , wherein the processing device is further to: detect, from the plurality of threat detection indications, the malicious behavior in the dataset, the malicious behavior being within the overlapping data of the lead detection from the first vendor system and the accuracy detection from the at least one second vendor system.
15 . The system of claim 9 , wherein the processing device is further to: apply, to a different set of data, the malicious behavior detection procedure generated for the dataset.
16 . A non-transitory computer-readable storage medium including instructions that, when executed by a processing device, cause the processing device to: receive, from threat detection engines of a plurality of vendor systems, a plurality of threat detection indications for a dataset, each threat detection indication of the plurality of threat detection indications receiving a vendor-specific tokenization based on historical data associated with the plurality of vendor systems; identify, from the plurality of threat detection indications: a lead detection from a first vendor system of the plurality of vendor systems, wherein the lead detection comprises a threat detection indication from the plurality of threat detection indications having a largest amount of data from the dataset; and an accuracy detection from at least one second vendor system of the plurality of vendor systems, wherein the accuracy detection comprises another threat detection indication from the plurality of threat detection indications having a next largest amount of data from the dataset, and wherein the lead detection and the accuracy detection have overlapping data from the dataset; and generate, by the processing device, a malicious behavior detection procedure based on the lead detection, the accuracy detection, the overlapping data from the dataset, and the vendor-specific tokenization being used to detect a malicious behavior in the dataset.

Description

TECHNICAL FIELD Aspects of the present disclosure relate to threat detection in cyber environments, and more particularly to automatic rule generation for malicious indicators. BACKGROUND Being able to detect the presence of malicious actors in cyber environments is an important measure for safeguarding sensitive data and defending against potential cyberattacks. Detection of malicious actors also helps maintain trust and credibility among users of an application as well as maintain the reputation and business continuity of an organization that provides the software and/or service. Malicious actions may be in the form of ransomware, trojans, spyware, viruses, keyloggers, bots, and any other type of action that leverages software in a malicious manner to exploit another digital environment. Even small security breaches or incidents can have far-reaching implications on an organization's overall financial health. Thus, it is in the interest of software providers to implement technology that identifies and protects against threats from malicious actors. BRIEF DESCRIPTION OF THE DRAWINGS The described embodiments and the advantages thereof may best be understood by reference to the following description taken in conjunction with the accompanying drawings. These drawings in no way limit any changes in form and detail that may be made to the described embodiments by one skilled in the art without departing from the spirit and scope of the described embodiments. FIGS. 1A-1D are diagrams illustrating a plurality of threat detections indicated by a plurality of engines associated with a plurality of vendor systems. FIG. 2 is a diagram illustrating a next threat detection analysis that may be implemented when a lead threat detection analysis is not suitable for malicious indicator rule generation. FIG. 3 is a flow diagram of a method of automatic rule generation for malicious indicators. FIG. 4 is a component diagram of an example of a device architecture for automatic rule generation for malicious indicators. FIG. 5 is a block diagram of an example computing device that may perform one or more of the operations described herein, in accordance with some embodiments of the disclosure. DETAILED DESCRIPTION The presence of a malicious actor in a cyber environment has traditionally been detected through various detection engines. “Malicious actor”, which may also be referred to as a threat actor or cyberthreat actor, can be an individual, group of individuals, application, code, etc., that intentionally causes harm to, or exploits vulnerabilities in, computer systems, digital devices, networks, software, and the like. Malicious actors oftentimes perpetrate cyberattacks via phishing techniques, ransomware, and other forms of malicious attacks, attempts, research, and/or unwanted actions/results. Different security vendors commonly implement their own engine or set of engines to detect whether a malicious actor is present. For example, the different security vendors may identify certain behavior(s) relative to an application/code and map the certain behaviors to certain rules. Based on the mapping, the engine(s) can detect whether malicious or potentially unwanted actors/applications are attempting to compromise the application/code and may implement various procedures to thwart the intended actions of the malicious actor. A problem that arises from different security vendors implementing different engines is that it becomes difficult to analyze the collective decision making of multiple vendors. Different security vendors may defend against cyberattacks from malicious actors using different schemes. For example, if the multiple vendors do not utilize a same naming convention for threat detection, two different detections by two different vendors could refer to the same threat, but by a different name. That is, the different engines of the different vendors may each generate their own (individualized) name for a detected threat based on the respective naming conventions used by each vendor, making it difficult for a third-party analyzer to determine that each vendor is referring to the same threat. In other cases, the multiple vendors may classify the same detected threat into different categories. For example, a same Indicator of Compromise may be assigned by a first vendor to a Trojan family, but to a Ransomware family by a second vendor. In the case of automatically generated names by one or more of the vendors, where the names are not represented in a conventionally human readable form, such as text that is not in a natural language form, an analysis of the collective decision making of the multiple vendors may lead to threat analysis fatigue and/or uncertain ways to cluster similar threats. Reconciling the problems described herein provides the benefit of significantly reducing manual search time and reasoning of security analysts. Technologies that address the above-described problems are either non-existent or use t