US-20260129077-A1 - DETECTING CYBER THREATS USING ARTIFICIAL INTELLIGENCE

US20260129077A1US 20260129077 A1US20260129077 A1US 20260129077A1US-20260129077-A1

Abstract

Approaches in accordance with various illustrative embodiments provide for the generation of synthetic communications for use in training and fine-tuning threat detection models for various categories of recipients. In at least one embodiment, guidelines can be determined for a category of recipient that can be used to generate multiple types of content using generative artificial intelligence (AI), as may include text, image, and file content. A training communication can be generated using these types of content, such as to generate an email message that corresponds to a potential spear phishing attack. The generated messages can be checked for quality, and any messages that are caught by existing filters can be deleted or regenerated so that only high quality examples of spear phishing are provided as output. These training communications can be used to train a spear phishing detector for a specific category of recipient, in order to accurately flag and prevent access to actual spear phishing communications.

Inventors

Bartley Douglas Richardson
Shawn Davis
Gorkem Batmaz
Rachel Allen

Assignees

NVIDIA CORPORATION

Dates

Publication Date: 20260507
Application Date: 20251230

Claims (20)

1 . A system, comprising: a processor; and a memory storing instructions that, when read by the processor, cause the processor to: generate, using one or more generative models, a body of text corresponding to a spear phishing attempt and at least one additional type of content comprising an image or file attachment; and create a training communication, to be used to train a spear phishing detection model, by combining the body of text and metadata from the at least one additional type of content, wherein the training communication includes less than a full version of the image or file attachment.
2 . The system of claim 1 , wherein the processor is further to: create a second training communication; process the training communication using at least one filtering criterion to determine that the second training communication does not represent a valid spear phishing attempt with at least a minimum probability; and provide information about the second training communication to at least a first generative model to produce a third training communication that represents a valid spear phishing attempt with a higher probability than the second training communication.
3 . The system of claim 1 , wherein the processor is further to receive indication of a type of recipient for which a training communication is to be generated, the training communication to represent a spear phishing attempt for the type of recipient.
4 . The system of claim 3 , wherein the body of text and the at least one additional type of content are generated based on at least the type of recipient.
5 . The system of claim 1 , wherein the processor is further to process the training communication using at least one filtering criterion to determine that the training communication represents a valid spear phishing attempt with at least a minimum probability, wherein at least one filtering criterion includes (1) detection of generation by an artificial intelligence (AI) generator or (2) detection as a phishing attempt.
6 . The system of claim 1 , wherein the processor is further to: train a spear phishing detection model using a training dataset including the training communication; provide a received communication as input to the spear phishing detection model; and receive, as output of the spear phishing detection model, a classification for the received communication.
7 . The system of claim 6 , wherein the classification includes a safe classification to be allowed, an unsafe classification to be blocked, or an indeterminable classification.
8 . The system of claim 7 , wherein the processor is further to: provide information for the received communication to a recipient indicating the indeterminable classification and one or more reasons for the indeterminable classification.
9 . The system of claim 8 , wherein the processor is further to: receive, in response to providing the information, feedback regarding whether the recipient considers the received communication to represent a spear phishing attempt; and provide the feedback to further train at least a first generative model to generate one or more additional bodies of text corresponding to one or more spear phishing attempts for a type of recipient.
10 . A method comprising: generate, using one or more generative models, a body of text corresponding to a spear phishing attempt and at least one additional type of content comprising an image or file attachment; and create a training communication, to be used to train a spear phishing detection model, by combining the body of text and metadata from the at least one additional type of content, wherein the training communication includes less than a full version of the image or file attachment.
11 . The method of claim 10 , further comprising: creating a second training communication; processing the training communication using at least one filtering criterion to determine that the second training communication does not represent a valid spear phishing attempt with at least a minimum probability; and providing information about the second training communication to at least a first generative model to produce a third training communication that represents a valid spear phishing attempt with a higher probability than the second training communication.
12 . The method of claim 10 , further comprising: training the spear phishing detection model using a training dataset including the training communication; providing a received communication as input to the spear phishing detection model; and receiving, as output of the spear phishing detection model, a classification for the received communication.
13 . The method of claim 12 , wherein the classification includes a safe classification to be allowed, an unsafe classification to be blocked, or an indeterminable classification.
14 . The method of claim 10 , further comprising processing the training communication using at least one filtering criterion to determine that the training communication represents a valid spear phishing attempt with at least a minimum probability, wherein at least one filtering criterion for includes (1) detection of generation by an artificial intelligence (AI) generator or (2) detection as a phishing attempt.
15 . The method of claim 10 , further comprising: receiving, in response to providing the information, feedback regarding whether the recipient considers the received communication to represent a spear phishing attempt; and providing the feedback to further train at least a first generative model to generate one or more additional bodies of text corresponding to one or more spear phishing attempts for a type of recipient.
16 . One or more processors to create a training communication, to be used to train a spear phishing detection model, by combining a body of text and metadata from at least one additional type of content comprising an image or file attachment, wherein the training communication includes less than a full version of the image or file attachment, wherein the body of text and the at least one additional type of content are generated using one or more generative models.
17 . The one or more processors of claim 16 , further to receive indication of a type of recipient for which a training communication is to be generated, the training communication to represent a spear phishing attempt for the type of recipient, wherein the type of recipient corresponds to a role, position, tile, responsibility, or specific individual.
18 . The one or more processors of claim 16 , further to: create a second training communication; process the training communication using at least one filtering criterion to determine that the second training communication does not represent a valid spear phishing attempt with at least a minimum probability; and provide information about the second training communication to at least a first generative model to produce a third training communication that represents a valid spear phishing attempt with a higher probability than the second training communication.
19 . The one or more processors of claim 16 , further to: train a targeted cyber threat detection model using a training dataset including the training communication; provide a received communication as input to the targeted cyber threat detection model; and receive, as output of the targeted cyber threat detection model, a classification for the received communication.
20 . The one or more processors of claim 16 , further to: perform simulation operations; perform simulation operations to test or validate autonomous machine applications; render graphical output; perform deep learning operations; implement one or more actions using an edge device; generate or present virtual reality (VR) content; generate or present augmented reality (AR) content; generate or present mixed reality (MR) content; incorporate one or more Virtual Machines (VMs); implement one or more actions at least partially in a data center; perform hardware testing using simulation; generate synthetic data; generate collaborative content for 3D assets; or implement one or more actions at least partially using cloud computing resources.

Description

CROSS-REFERENCE TO RELATED APPLICATION This application is a continuation application and claims priority to U.S. patent application Ser. No. 18/185,578, filed on Mar. 17, 2023, of which is incorporated by reference herein in its entirety. BACKGROUND Spear phishing is one of the largest and costliest forms of cyber threats, resulting in billions of dollars in costs to businesses and individuals each year. While there are many approaches that can successfully detect basic phishing attacks, these solutions are not sufficiently fine-tuned to accurately detect attacks that are more specifically tailored to specific individuals or types of users, such as spear phishing and whale phishing attacks where significantly more effort is put in to crafting communications that target specific individuals of high worth or importance. As an example, a phishing email might be directed to the CEO of a company and be carefully crafted in such a way as to appear to be a legitimate email message from someone with whom the CEO may have previously interacted, involving subject matter that is relevant to the CEO within that context. Approaches for generating these targeted messages on a large scale are becoming increasingly accurate at generating realistic-looking messages, particularly when leveraging technologies such as generative artificial intelligence (AI), which makes these messages both more difficult and more critical to detect. BRIEF DESCRIPTION OF THE DRAWINGS Various embodiments in accordance with the present disclosure will be described with reference to the drawings, in which: FIGS. 1A and 1B illustrate components of an example system to generate training data and train one or more threat detection models in accordance with various embodiments; FIG. 2 illustrates an example system for processing received communications using one or more trained threat detection models, which can be used in accordance with various embodiments; FIG. 3 illustrates an example email message that has been modified to indicate a potential threat and include information indicating why the message was determined to correspond to a potential threat, which can be generated in accordance with various embodiments; FIG. 4 illustrates an example process for generating training communications for use in training one or more threat detection models that can be performed in accordance with various embodiments; FIG. 5 illustrates an example process for processing a received communication using one or more trained threat detection models, in accordance with at least one embodiment; FIG. 6 illustrates an example networked computing environment in which aspects of various embodiments can be performed; FIG. 7 illustrates an example data center system, according to at least one embodiment; FIG. 8 is a block diagram illustrating a computer system, according to at least one embodiment; FIG. 9 is a block diagram illustrating a computer system, according to at least one embodiment; FIG. 10 illustrates a computer system, according to at least one embodiment; FIG. 11 illustrates a computer system, according to at least one embodiment; FIG. 12A illustrates a computer system, according to at least one embodiment; FIG. 12B illustrates a computer system, according to at least one embodiment; FIG. 13 illustrates exemplary integrated circuits and associated graphics processors, according to at least one embodiment; FIGS. 14A-14B illustrate exemplary integrated circuits and associated graphics processors, according to at least one embodiment; FIGS. 15A-15B illustrate additional exemplary graphics processor logic according to at least one embodiment; FIG. 16 illustrates a computer system, according to at least one embodiment; FIG. 17A illustrates a parallel processor, according to at least one embodiment; FIG. 17B illustrates a partition unit, according to at least one embodiment; FIG. 18 illustrates a multi-graphics processing unit (GPU) system, according to at least one embodiment; FIG. 19 illustrates a graphics processor, according to at least one embodiment; and FIG. 20 illustrates at least portions of a graphics processor, according to one or more embodiments. DETAILED DESCRIPTION In the following description, various embodiments will be described. For purposes of explanation, specific configurations and details are set forth in order to provide a thorough understanding of the embodiments. However, it will also be apparent to one skilled in the art that the embodiments may be practiced without the specific details. Furthermore, well-known features may be omitted or simplified in order not to obscure the embodiment being described. Approaches in accordance with various illustrative embodiments provide for the generation of training data to be used to train one or more threat detectors. In particular, example spear phishing email messages can be generated using multiple types of generated content that can be used to train and fine-tune a set of detection models for di