Search

US-20260127273-A1 - RESULTS INSIGHTS

US20260127273A1US 20260127273 A1US20260127273 A1US 20260127273A1US-20260127273-A1

Abstract

The present disclosure relates to systems and methods for identifying cybersecurity risks. The systems and methods use hybrid embeddings to embed structured and unstructured data from security logs. The systems and methods use the hybrid embeddings to detect an anomaly in the security logs to identify cybersecurity risks. The systems and methods receive from a generative artificial intelligence (GAI) model a summary for the identified cybersecurity risk.

Inventors

  • Yosef BRUSS
  • Noa NUTKEVITCH
  • Maayan Kislev
  • Sami AIT OUAHMANE
  • Vignesh NAYAK
  • Yingqi LIU
  • Shany Klein Antman
  • Abraham Starosta

Assignees

  • MICROSOFT TECHNOLOGY LICENSING, LLC

Dates

Publication Date
20260507
Application Date
20241220

Claims (20)

  1. 1 . A method comprising: generating hybrid embeddings from security logs in response to receiving an input; detecting, using the hybrid embeddings, an anomaly in the security logs; dynamically generating a prompt with instructions for providing a summary of the anomaly; providing, to a generative artificial intelligence model, the prompt with the instructions; receiving, from the generative artificial intelligence model, the summary of the anomaly; and providing an output summary of the anomaly to a security mitigation agent configured to perform a security improvement operation.
  2. 2 . The method of claim 1 , wherein the hybrid embeddings include security transformer model (STM) embeddings of data generated by a STM model from unstructured columns in the security logs and ordinal encodings of data generated by an ordinal encoder from structured columns in the security logs.
  3. 3 . The method of claim 2 , wherein the STM model is pretrained on security logs using a modified masked language modeling loss.
  4. 4 . The method of claim 1 , wherein dynamically generating the prompt further includes: identifying important columns in the security logs relevant to the input and security analysis; calculating column summaries statistics of the important columns; subsampling, using a security transformer model (STM), benign logs; and including the important columns, the column summaries statistics, and the benign logs in the prompt.
  5. 5 . The method of claim 1 , wherein the anomaly is detected by a random forest analysis of the hybrid embeddings.
  6. 6 . The method of claim 1 , further comprising: filtering the security logs based on entropy; creating a filtered subset of the security logs; grouping the filtered subset of the security logs into clusters; performing a subsampling of each cluster of the clusters; and generating a second prompt with instructions to include information about each cluster in the summary.
  7. 7 . The method of claim 6 , further comprising: automatically identifying a cluster of the clusters; and generating the hybrid embeddings.
  8. 8 . The method of claim 6 , further comprising: automatically identifying a cluster of the clusters; and removing the cluster from the security logs where the anomaly is detected.
  9. 9 . The method of claim 1 , further comprising: generating an anomaly score of the anomaly detected and the summary provided by the generative artificial intelligence model; and presenting, on a display, the anomaly score.
  10. 10 . The method of claim 1 , further comprising: receiving an action to take in response to the anomaly and the summary; and preventing a cybersecurity risk by implementing the action.
  11. 11 . A device comprising: a memory to store data and instructions; and a processor operable to communicate with the memory, wherein the processor is operable to: generate hybrid embeddings from security logs in response to receiving an input; detect, using the hybrid embeddings, an anomaly in the security logs; dynamically generate a prompt with instructions for providing a summary of the anomaly; provide, to a generative artificial intelligence model, the prompt with the instructions; receive, from the generative artificial intelligence model, the summary of the anomaly; and provide an output summary of the anomaly to a security mitigation agent configured to perform a security improvement operation.
  12. 12 . The device of claim 11 , wherein the hybrid embeddings include security transformer model (STM) embeddings of data generated by a STM model from unstructured columns in the security logs and ordinal encodings of data generated by an ordinal encoder from structured columns in the security logs.
  13. 13 . The device of claim 12 , wherein the STM model is pretrained on security logs using a modified masked language modeling loss.
  14. 14 . The device of claim 11 , wherein the processor is further operable to dynamically generate the prompt by: identifying important columns in the security logs relevant to the input and security analysis; calculating column summaries statistics of the important columns; subsampling, using a security transformer model (STM), benign logs; and including the important columns, the column summaries statistics, and the benign logs in the prompt.
  15. 15 . The device of claim 11 , wherein the anomaly is detected by a random forest analysis of the hybrid embeddings.
  16. 16 . The device of claim 11 , wherein the processor is further operable to: filter the security logs based on entropy; create a filtered subset of the security logs; group the filtered subset of the security logs into clusters; perform a subsampling of each cluster of the clusters; and generate a second prompt with instructions to include information about each cluster in the summary.
  17. 17 . The device of claim 16 , wherein the processor is further operable to: automatically identify a cluster of the clusters; and generate the hybrid embeddings for the cluster.
  18. 18 . The device of claim 16 , wherein the processor is further operable to: automatically identify a cluster of the clusters; and remove the cluster from the security logs where the anomaly is detected.
  19. 19 . The device of claim 11 , wherein the processor is further operable to: generate an anomaly score of the anomaly detected and the summary provided by the generative artificial intelligence model; and present, on a display, the anomaly score.
  20. 20 . The device of claim 11 , wherein the processor is further operable to: receive an action to take in response to the anomaly and the summary; and prevent a cybersecurity risk by implementing the action.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS This application claims priority to and the benefit of U.S. Provisional Patent Application No. 63/717,407, filed on Nov. 7, 2024, which is hereby incorporated by reference in its entirety. BACKGROUND When analysts query security logs during an incident investigation for a cybersecurity risk, or a proactive threat hunting, analysts frequently explore hundreds and thousands of results. During an incident investigation or a proactive threat hunting, analysts perform multiple iterations using excessive machine resources and networking resources. The phase of exploring the hundreds and thousands of security logs, requires a significant usage of machine resources, effort, and time. Moreover, reviewing the large number of security logs forces the analysts to scroll through hundreds of records in an effort to identify the anomalies and distracts the analysts from the big picture they are working on, whether it is an incident that occurred or a hypothesis of a cybersecurity risk. BRIEF SUMMARY This summary is provided to introduce a selection of concepts that are further described below in the detailed description. This summary is not intended to identify key or essential features of the claimed subject matter, nor is it intended to be used as an aid in limiting the scope of the claimed subject matter. Some implementations relate to a method. The method includes generating hybrid embeddings from security logs in response to receiving an input related to the security logs. The method includes detecting, using the hybrid embeddings, an anomaly in the security logs. The method includes dynamically generating a prompt with instructions for providing a summary of the anomaly. The method includes providing, to a generative artificial intelligence model, the prompt with the instructions. The method includes receiving, from the generative artificial intelligence model, the summary of the anomaly. The method includes providing an output summary of the anomaly to a security mitigation agent configured to perform a security improvement operation. Some implementations relate to a device. The device includes a memory to store data and instructions; and a processor operable to communicate with the memory, wherein the processor is operable to: generate hybrid embeddings from security logs in response to receiving an input; detect, using the hybrid embeddings, an anomaly in the security logs; dynamically generate a prompt with instructions for providing a summary of the anomaly; provide, to a generative artificial intelligence model, the prompt with the instructions; receive, from the generative artificial intelligence model, the summary of the anomaly; and provide an output summary of the anomaly to a security mitigation agent configured to perform a security improvement operation. Some implementations relate to a computer-readable storage medium including instructions that, when executed by a processor, cause the processor to: generate hybrid embeddings from security logs in response to receiving an input; detect, using the hybrid embeddings, an anomaly in the security logs; dynamically generate a prompt with instructions for providing a summary of the anomaly; provide, to a generative artificial intelligence model, the prompt with the instructions; receive, from the generative artificial intelligence model, the summary of the anomaly; and provide an output summary of the anomaly to a security mitigation agent configured to perform a security improvement operation. Additional features and advantages of embodiments of the disclosure will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of such embodiments. The features and advantages of such embodiments may be realized and obtained by means of the instruments and combinations particularly pointed out in the appended claims. These and other features will become more fully apparent from the following description and appended claims, or may be learned by the practice of such embodiments as set forth hereinafter. BRIEF DESCRIPTION OF DRAWINGS In order to describe the manner in which the above-recited and other features of the disclosure can be obtained, a more particular description will be rendered by reference to specific implementations thereof which are illustrated in the appended drawings. For better understanding, the like elements have been designated by like reference numbers throughout the various accompanying figures. While some of the drawings may be schematic or exaggerated representations of concepts, at least some of the drawings may be drawn to scale. Understanding that the drawings depict some example implementations, the implementations will be described and explained with additional specificity and detail through the use of the accompanying drawings in which: FIG. 1 illustrates an example environment for identifying cybersecurity risks in accordan