Search

US-20260129068-A1 - TECHNIQUES FOR SECURING DATA VIA DATA LINEAGE

US20260129068A1US 20260129068 A1US20260129068 A1US 20260129068A1US-20260129068-A1

Abstract

A system and method for securing data. A method includes assigning identifiers to instances of data objects represented in data indicating movement of the data objects. The identifiers uniquely correspond to respective data objects. The data indicating movement of the data objects is transformed into a data structure having fields corresponding to data lineage parameters and the data lineage parameters include location, time, and the identifiers. Events represented in the transformed data are correlated based on shared attributes among the events, where the shared attributes include common locations and common unique identifiers of the data objects involved in the events. A data lineage is constructed based on the correlated events by linking between events based on the correlation and organizing the linked events with respect to time. A cybersecurity threat is detected based on the data lineage, and mitigated by blocking traffic with respect to the cybersecurity threat.

Inventors

  • Hod Ahikam BIN NOON
  • Eran Yehuda BARAK
  • Yitai SCHWARTZ

Assignees

  • Tannin Inc.

Dates

Publication Date
20260507
Application Date
20241022

Claims (20)

  1. 1 . A method for securing data, comprising: assigning a plurality of identifiers to a plurality of instances of a plurality of data objects represented in data indicating movement of the plurality of data objects, wherein the identifier assigned to each instance uniquely corresponds to exactly one data object of the plurality of data objects; transforming the data indicating movement of the data objects from a first format to a second format, wherein the second format is a data structure having a plurality of fields corresponding to a plurality of data lineage parameters, wherein the plurality of data lineage parameters include location, time, and the plurality of identifiers; correlating a plurality of events represented in the transformed data, wherein the plurality of events is correlated based on shared attributes among events of the plurality of events, wherein the shared attributes at least include common locations and common unique identifiers of the data objects among the plurality of data objects involved in events among the plurality of events; constructing a data lineage based on the correlated plurality of events, wherein constructing the data lineage includes linking between events among the plurality of events based on the correlation and organizing the linked events with respect to time; detecting a cybersecurity threat based on the data lineage; and mitigating the cybersecurity threat by at least blocking traffic with respect to the cybersecurity threat.
  2. 2 . The method of claim 1 , wherein detecting the cybersecurity threat further comprises: identifying a plurality of data flows based on the data lineage, wherein each data flow includes a movement of data, wherein the cybersecurity threat is mitigated based on the identified plurality of data flows.
  3. 3 . The method of claim 2 , wherein the cybersecurity threat is detected within at least one data flow of the plurality of data flows, wherein the traffic is blocked for at least a portion of the at least one data flow in which the cybersecurity threat is detected.
  4. 4 . The method of claim 1 , wherein detecting the cybersecurity threat further comprises analyzing the data indicating movement of the data objects with respect to normal behavior patterns.
  5. 5 . The method of claim 1 , wherein detecting the cybersecurity threat further comprises performing data exfiltration monitoring in order to identify an amount of data being transferred outside of a computing environment that is above a threshold.
  6. 6 . The method of claim 1 , further comprising: classifying the transformed data into at least one classification with respect to data sensitivity, wherein the cybersecurity threat is detected based on the at least one classification.
  7. 7 . The method of claim 1 , wherein the second format is a format of a storage, further comprising: loading at least a portion of the transformed data into the storage based on data among the transformed data stored in fields of the plurality of fields corresponding to the plurality of data lineage parameters.
  8. 8 . The method of claim 7 , wherein only the data among the transformed data stored in fields of the plurality of fields corresponding to the plurality of data lineage parameters is loaded into the storage.
  9. 9 . The method of claim 1 , wherein the data lineage is a graph including a plurality of nodes and a plurality of edges between nodes among the plurality of nodes, wherein the plurality of nodes represent a plurality of components that interact with data stored in at least one computing environment, wherein the plurality of edges represent movement of data between components among the plurality of components represented by the nodes.
  10. 10 . The method of claim 1 , wherein assigning the plurality of identifiers to the plurality of instances of the plurality of data objects further comprises: performing similarity hashing in order to determine whether instances among the plurality of instances match, wherein the plurality of identifiers is assigned based on the similarity hashing.
  11. 11 . A non-transitory computer-readable medium having stored thereon instructions for causing a processing circuitry to execute a process, the process comprising: assigning a plurality of identifiers to a plurality of instances of a plurality of data objects represented in data indicating movement of the plurality of data objects, wherein the identifier assigned to each instance uniquely corresponds to exactly one data object of the plurality of data objects; transforming the data indicating movement of the data objects from a first format to a second format, wherein the second format is a data structure having a plurality of fields corresponding to a plurality of data lineage parameters, wherein the plurality of data lineage parameters include location, time, and the plurality of identifiers; correlating a plurality of events represented in the transformed data, wherein the plurality of events is correlated based on shared attributes among events of the plurality of events, wherein the shared attributes at least include common locations and common unique identifiers of the data objects among the plurality of data objects involved in events among the plurality of events; constructing a data lineage based on the correlated plurality of events, wherein constructing the data lineage includes linking between events among the plurality of events based on the correlation and organizing the linked events with respect to time; detecting a cybersecurity threat based on the data lineage; and mitigating the cybersecurity threat by at least blocking traffic with respect to the cybersecurity threat.
  12. 12 . A system for securing data, comprising: a processing circuitry; and a memory, the memory containing instructions that, when executed by the processing circuitry, configure the system to: assign a plurality of identifiers to a plurality of instances of a plurality of data objects represented in data indicating movement of the plurality of data objects, wherein the identifier assigned to each instance uniquely corresponds to exactly one data object of the plurality of data objects; transform the data indicating movement of the data objects from a first format to a second format, wherein the second format is a data structure having a plurality of fields corresponding to a plurality of data lineage parameters, wherein the plurality of data lineage parameters include location, time, and the plurality of identifiers; correlate a plurality of events represented in the transformed data, wherein the plurality of events is correlated based on shared attributes among events of the plurality of events, wherein the shared attributes at least include common locations and common unique identifiers of the data objects among the plurality of data objects involved in events among the plurality of events; construct a data lineage based on the correlated plurality of events, wherein constructing the data lineage includes linking between events among the plurality of events based on the correlation and organizing the linked events with respect to time; detect a cybersecurity threat based on the data lineage; and mitigate the cybersecurity threat by at least blocking traffic with respect to the cybersecurity threat.
  13. 13 . The system of claim 12 , wherein the system is further configured to: identify a plurality of data flows based on the data lineage, wherein each data flow includes a movement of data, wherein the cybersecurity threat is mitigated based on the identified plurality of data flows.
  14. 14 . The system of claim 13 , wherein the cybersecurity threat is detected within at least one data flow of the plurality of data flows, wherein the traffic is blocked for at least a portion of the at least one data flow in which the cybersecurity threat is detected.
  15. 15 . The system of claim 12 , wherein detecting the cybersecurity threat further comprises analyzing the data indicating movement of the data objects with respect to normal behavior patterns.
  16. 16 . The system of claim 12 , wherein detecting the cybersecurity threat further comprises performing data exfiltration monitoring in order to identify an amount of data being transferred outside of a computing environment that is above a threshold.
  17. 17 . The system of claim 12 , wherein the system is further configured to: classify the transformed data into at least one classification with respect to data sensitivity, wherein the cybersecurity threat is detected based on the at least one classification.
  18. 18 . The system of claim 12 , wherein the second format is a format of a storage, wherein the system is further configured to: load at least a portion of the transformed data into the storage based on data among the transformed data stored in fields of the plurality of fields corresponding to the plurality of data lineage parameters.
  19. 19 . The system of claim 18 , wherein only the data among the transformed data stored in fields of the plurality of fields corresponding to the plurality of data lineage parameters is loaded into the storage.
  20. 20 . The system of claim 12 , wherein the data lineage is a graph including a plurality of nodes and a plurality of edges between nodes among the plurality of nodes, wherein the plurality of nodes represent a plurality of components that interact with data stored in at least one computing environment, wherein the plurality of edges represent movement of data between components among the plurality of components represented by the nodes.

Description

TECHNICAL FIELD The present disclosure relates generally to data leakage protection, and more specifically to securing data via data lineage. BACKGROUND In modern computing infrastructures, large amounts of data may be stored at any given time. Leaks or other improper access to such data may cause major problems for companies and for entities affected by any data leakage. As a result, techniques for securing data within a computing environment are desirable. SUMMARY A summary of several example embodiments of the disclosure follows. This summary is provided for the convenience of the reader to provide a basic understanding of such embodiments and does not wholly define the breadth of the disclosure. This summary is not an extensive overview of all contemplated embodiments, and is intended to neither identify key or critical elements of all embodiments nor to delineate the scope of any or all aspects. Its sole purpose is to present some concepts of one or more embodiments in a simplified form as a prelude to the more detailed description that is presented later. For convenience, the term “some embodiments” or “certain embodiments” may be used herein to refer to a single embodiment or multiple embodiments of the disclosure. Certain embodiments disclosed herein include a method for securing data. The method comprises: assigning a plurality of identifiers to a plurality of instances of a plurality of data objects represented in data indicating movement of the plurality of data objects, wherein the identifier assigned to each instance uniquely corresponds to exactly one data object of the plurality of data objects; transforming the data indicating movement of the data objects from a first format to a second format, wherein the second format is a data structure having a plurality of fields corresponding to a plurality of data lineage parameters, wherein the plurality of data lineage parameters include location, time, and the plurality of identifiers; correlating a plurality of events represented in the transformed data, wherein the plurality of events is correlated based on shared attributes among events of the plurality of events, wherein the shared attributes at least include common locations and common unique identifiers of the data objects among the plurality of data objects involved in events among the plurality of events; constructing a data lineage based on the correlated plurality of events, wherein constructing the data lineage includes linking between events among the plurality of events based on the correlation and organizing the linked events with respect to time; detecting a cybersecurity threat based on the data lineage; and mitigating the cybersecurity threat by at least blocking traffic with respect to the cybersecurity threat. Certain embodiments disclosed herein also include a non-transitory computer readable medium having stored thereon causing a processing circuitry to execute a process, the process comprising: assigning a plurality of identifiers to a plurality of instances of a plurality of data objects represented in data indicating movement of the plurality of data objects, wherein the identifier assigned to each instance uniquely corresponds to exactly one data object of the plurality of data objects; transforming the data indicating movement of the data objects from a first format to a second format, wherein the second format is a data structure having a plurality of fields corresponding to a plurality of data lineage parameters, wherein the plurality of data lineage parameters include location, time, and the plurality of identifiers; correlating a plurality of events represented in the transformed data, wherein the plurality of events is correlated based on shared attributes among events of the plurality of events, wherein the shared attributes at least include common locations and common unique identifiers of the data objects among the plurality of data objects involved in events among the plurality of events; constructing a data lineage based on the correlated plurality of events, wherein constructing the data lineage includes linking between events among the plurality of events based on the correlation and organizing the linked events with respect to time; detecting a cybersecurity threat based on the data lineage; and mitigating the cybersecurity threat by at least blocking traffic with respect to the cybersecurity threat. Certain embodiments disclosed herein also include a system for securing data. The system comprises: a processing circuitry; and a memory, the memory containing instructions that, when executed by the processing circuitry, configure the system to: assign a plurality of identifiers to a plurality of instances of a plurality of data objects represented in data indicating movement of the plurality of data objects, wherein the identifier assigned to each instance uniquely corresponds to exactly one data object of the plurality of data objects; transform the data indica