Search

US-12626003-B2 - Policy based compliance enforcement in a federated graph

US12626003B2US 12626003 B2US12626003 B2US 12626003B2US-12626003-B2

Abstract

A method for implementing policy-based compliance enforcement in a federated graph data environment including receiving a query from a query caller for data stored in a graph data storage system, the query including one or more components that are mapped to data entities of the graph data storage system using a schema. The schema expresses connections between data entities of the graph data storage system and includes data classification labels for the data entities of the graph data storage system. Each of the components of the query is examined to identify components to which a policy applies, the identification being done based on the data classification labels. Upon identifying the components, selecting between multiple policy enforcement modes that include reporting violation of the policy, denying query plans in violation of the policy and generating a transformed query plan that complies with the policy. Depending upon the selection, generating the query plan and generating a report that details violation of the policy, preventing execution of the query plan and generating the report that details violation of the policy or generating a transformed query plan.

Inventors

  • Anders Tungeland Gjerdrum
  • Iqra ALI
  • Theodoros Gkountouvas

Assignees

  • MICROSOFT TECHNOLOGY LICENSING, LLC

Dates

Publication Date
20260512
Application Date
20240430

Claims (17)

  1. 1 . A data processing system comprising: a processor; and a memory in communication with the processor, the memory comprising executable instructions that, when executed by the processor alone or in combination with other processors, cause the data processing system to perform functions of: receiving, by a query processing system, a query for data stored in a federated graph data storage system coupled to the query processing system, the query being received from a query caller via an intelligent query Application Programming Interface (API), the query including one or more components and the federated graph data storage system including a plurality of data stores, at least some of the plurality of data stores being disconnected; mapping, by a mapping engine, each of the one or more components of the query to data entities of the federated graph data storage system using a schema, the schema expressing connections between the data entities of the federated graph data storage system and the schema including data classification labels for one or more of the data entities of the federated graph data storage system; examining, by a detection engine, each of the one or more components of the query, to determine an identification of at least one of the one or more components to which a policy associated with the data stored in the federated graph data storage system applies, the identification being done at least based on the data classification labels associated with the data entities mapped to the one or more components; generating, by a planning engine, a query plan, wherein the query plan provides information on how the query is to be executed in the federated graph data storage system; selecting, by an enforcement engine, a transform mode from among a plurality of policy enforcement modes responsive to the identification; transforming, by the enforcement engine, the query plan to comply with the policy including at least one of pruning the data entities subject to the policy or implementing a declassifying function in the query plan that transforms the data entities to which the policy applies in a manner that the data classification labels of the data entities is changed to a lower level classification label that complies with the policy; and executing, by a graph query execution engine, the query plan to retrieve results compliant with the policy and responsive to the query.
  2. 2 . The data processing system of claim 1 , wherein the executable instructions when executed by the processor alone or in combination with other processors, cause the data processing system to perform functions of: retrieving geographical location information for at least one of the query caller, the data processing system and one or more target data stores that store the data that is a target of the query.
  3. 3 . The data processing system of claim 2 , wherein the intelligent query API provides a context used by the query to retrieve the geographical location information for the query caller and a target tenant.
  4. 4 . The data processing system of claim 2 , wherein identifying the one or more components includes determining if the one or more components are subject to a regional restriction based on the policy.
  5. 5 . The data processing system of claim 1 , wherein the policy is expressed in a policy predicate produced in code which defines which transfer operations are permitted on data items given varying data classification labels assigned to the data items.
  6. 6 . The data processing system of claim 1 , wherein the intelligent query API is an interface that communicates with the federated graph data storage system via the data processing system to respond to queries that relate to interdependence of the data entities in the federated graph data storage system.
  7. 7 . The data processing system of claim 1 , wherein the federated graph data storage system is subject to a plurality of policies, including one or more regional restrictions on transfer of the data between different geographical regions.
  8. 8 . A method for implementing policy-based compliance enforcement in a federated graph data environment, comprising: receiving, by a query processing system, a query for data stored in a federated graph data storage system coupled to the query processing system, the query being received from a query caller via an intelligent query Application Programming Interface (API), the query including one or more components and the federated graph data storage system including a plurality of data stores, at least some of the plurality of data stores being disconnected; mapping, by a mapping engine, each of the one or more components of the query to data entities of the federated graph data storage system using a schema, the schema expressing connections between the data entities of the federated graph data storage system and the schema including data classification labels for one or more of the data entities of the federated graph data storage system; examining, by a detection engine, each of the one or more components of the query, to determine an identification of at least one of the one or more components to which a policy associated with the data stored in the federated graph data storage system applies, the identification being done at least based on the data classification labels associated with the data entities mapped to the one or more components; generating, by a planning engine, a query plan, wherein the query plan provides information on how the query is to be executed in the federated graph data storage system; selecting, by an enforcement engine, a transform mode from among a plurality of policy enforcement modes responsive to the identification; transforming, by the enforcement engine, the query plan to comply with the policy including at least one of pruning the data entities subject to the policy or implementing a declassifying function in the query plan that transforms the data entities to which the policy applies in a manner that the data classification labels of the data entities is changed to a lower level classification label that complies with the policy; and executing, by a graph query execution engine, the query plan to retrieve results compliant with the policy and responsive to the query.
  9. 9 . The method of claim 8 , wherein upon selecting an observe mode, the query plan and a report that details violation of the policy are generated.
  10. 10 . The method of claim 8 , wherein upon selecting a restrict mode: either the query plan is not generated, or execution of the query plan is prevented; and a report that details violation of the policy is generated.
  11. 11 . The method of claim 8 , wherein the intelligent query API is an interface that communicates with the federated graph data storage system via the data processing system to respond to queries that relate to interdependence of the data entities in the federated graph data storage system.
  12. 12 . The method of claim 8 , wherein pruning the query plan includes removing the one or more components that are subject to the policy from the query plan.
  13. 13 . A non-transitory computer readable medium on which are stored instructions that when executed cause a programmable device to perform functions of: receiving, via a query processing system, a query for data stored in a federated graph data storage system coupled to the query processing system, the query being received from a query caller via an intelligent query Application Programming Interface (API), the query including one or more components and the federated graph data storage system including a plurality of data stores, at least some of the plurality of data stores being disconnected; mapping, by a mapping engine, each of the one or more components of the query to data entities of the federated graph data storage system using a schema, the schema expressing connections between the data entities of the federated graph data storage system and the schema including data classification labels for one or more of the data entities of the federated graph data storage system; examining, by a detection engine, each of the one or more components of the query, to determine an identification of at least one of the one or more components to which a policy associated with the data stored in the federated graph data storage system applies, the identification being done at least based on the data classification labels associated with the data entities mapped to the one or more components; generating, by a planning engine, a query plan, wherein the query plan provides information on how the query is to be executed in the federated graph data storage system; selecting, by an enforcement engine, a transform mode from among a plurality of policy enforcement modes responsive to the identification; transforming, by the enforcement engine, the query plan to comply with the policy including at least one of pruning the data entities subject to the policy or implementing a declassifying function in the query plan that transforms the data entities to which the policy applies in a manner that the data classification labels of the data entities is changed to a lower level classification label that complies with the policy; and executing, by a graph query execution engine, the query plan to retrieve results compliant with the policy and responsive to the query.
  14. 14 . The non-transitory computer readable medium of claim 13 , wherein identifying the one or more components includes determining if the one or more components are subject to a regional restriction based on the policy.
  15. 15 . The non-transitory computer readable medium of claim 13 , wherein the policy is expressed in a policy predicate produced in code which defines which transfer operations are permitted on data items given varying data classification labels assigned to the data items.
  16. 16 . The non-transitory computer readable medium of claim 13 , wherein the intelligent query API is an interface that communicates with the federated graph data storage system via the data processing system to respond to queries that relate to interdependence of the data entities in the federated graph data storage system.
  17. 17 . The non-transitory computer readable medium of claim 13 , wherein pruning the query plan includes removing the one or more components that are subject to the policy from the query plan.

Description

BACKGROUND Large software companies often make use of a graph storage data infrastructure to store various types of data for and/or about their customers. Graph data may be stored in a plethora of different physical storage systems and is often only semantically connected. The data is stored in a variety of different storage systems and the graph environment often provides a data infrastructure that can be accessed for retrieving many different types of data. The data stored in the graph storage data environment may include confidential, private or otherwise sensitive customer data and may be stored in a variety of different geographical locations. Various customers or services receive access to the graph data by making queries to the graph storage data environment. This is often done through a graph query application programming interface (API) that enables various customers or services to submit a query to the graph environment for access to the data stored in the graph storage data infrastructure. Upon receiving a graph query through the API, a graph query execution framework functions by determining what data is requested, identifying which of the physical storage systems are candidates for retrieving this data, and then generating an optimized query plan that takes various constraints such as cost, latency, reliability and the like into consideration for executing the query. However, when the query requires access to data that is stored in different storage systems and/or is subject to different policies and regulations, determining how the data in the various storage systems is related, and/or whether or not access should be granted to the data is a complex and resource intensive process. Hence, there is a need for improved systems and methods of ensuring compliance with policies in a federated graph environment. SUMMARY In one general aspect, the instant disclosure describes a data processing system having a processor and a memory in communication with the processor, where the memory comprises executable instructions that, when executed by the processor, cause the data processing system to perform multiple functions. These functions include receiving a query for data stored in a federated graph data storage system, where the query is received from a query caller via an intelligent query Application Programming Interface (API), the query includes one or more components and the federated graph data storage system includes a plurality of data stores, at least some of the plurality of data stores being disconnected. The functions also mapping each of the components of the query to one or more data entities of the graph data storage system using a schema, the schema expressing connections between data entities of the graph data storage system and the schema including data classification labels for the data entities of the graph data storage system. Each of the components of the query are then examined, via a detection engine, to identify one of the components to which a policy associated with data stored in the graph data storage system applies, the identification being done at least based on the data classification labels associated with the data entities mapped to the components. Upon identifying the component, making a selection between multiple policy enforcement modes, the multiple policy enforcement modes including at least one of reporting violation of the policy, denying query plans in violation of the policy and generating a transformed query plan that complies with the policy. Depending upon the selection, at least one of generating the query plan and generating a report that details violation of the policy, preventing execution of the query plan and generating the report that details violation of the policy, or generating the transformed query plan is performed. The query plan, the transformed query plan or the report is then provided as an output. Generating the transformed query plan includes at least one of pruning the data entity subject to the policy from the query plan and implementing a declassifying function in the query plan which transforms the data entity to which the policy applies in a manner that a classification label of the data entity changes to a lower level classification label which complies with the policy. In another general aspect the instant disclosure describes a method for implementing policy-based compliance enforcement in a federated graph data environment. The method includes storage system including a plurality of data stores, at least some of the plurality of data stores being disconnected. Each of the components of the query are then mapped to one or more data entities of the graph data storage system using a schema, the schema expressing connections between data entities of the graph data storage system and the schema including data classification labels for one or more of the data entities of the graph data storage system. Each of the components of the query ar