US-12626006-B1 - Dynamic privacy preserving data linkage in distributed computing systems
Abstract
Devices and techniques are generally described for dynamic privacy preserving data linkages. In some examples, a first data structure may store non-personal data and a second data structure may store personally identifiable information (PII). The second data structure may store first identifier data identifying first PII. A first computing service may generate first transient identifier data associated with a first time-to-live (TTL) value. The first computing service may store the first transient identifier data and the first TTL value in a third data structure in association with the first identifier data. The first computing service may send the first transient identifier data to a first computer processing system. In some examples, the first computer processing system may not be privileged to directly access the second data structure.
Inventors
- Kevan Ahlquist
- Andrew Tyler Compton
- Michael Curtis Lindahl
- Sandeep Kumar Proddaturi
- Manish Jyoti
- Srinivas R. Mudireddy
- Sergey Slovetskiy
Assignees
- AMAZON TECHNOLOGIES, INC.
Dates
- Publication Date
- 20260512
- Application Date
- 20230620
Claims (20)
- 1 . A computer-implemented method comprising: identifying a first data structure storing non-personal data; identifying a second data structure storing personally identifiable information (PII), wherein the second data structure comprises first identifier data identifying first PII and second identifier data identifying second PII; generating, by a mapping authority, first transient identifier data associated with a first time-to-live (TTL) value indicating an expiration of the first transient identifier data; storing, by the mapping authority, the first transient identifier data and the first TTL value in a relational database in association with the first identifier data, wherein the first transient identifier data is transient as linking data with respect to the first identifier data; and sending, by the mapping authority, the first transient identifier data to a first computer processing system, wherein the first computer processing system is not privileged to directly access the second data structure.
- 2 . The computer-implemented method of claim 1 , further comprising: receiving, by the mapping authority from the first computer processing system, a request for PII, the request comprising the first transient identifier data; determining, by the mapping authority, that the request is received prior to expiration of the first TTL value; determining, by the mapping authority based at least in part on the request being received prior to expiration of the first TTL value, the first identifier data; sending, by the mapping authority to a second computing system that controls access to the second data structure, the first identifier data; receiving, by the mapping authority from the second computing system, the first PII; and sending, by the mapping authority to the first computer processing system, the first PII.
- 3 . The computer-implemented method of claim 1 , further comprising: receiving, by the mapping authority from the first computer processing system, a request for PII, the request comprising the first transient identifier data; determining, by the mapping authority, that the request is received after expiration of the first TTL value; and sending, by the mapping authority, a response to the first computer processing system, wherein the response denies access to the PII.
- 4 . The computer-implemented method of claim 1 , further comprising: receiving, by the mapping authority from a second computing system that controls access to the second data structure, a first deletion request, the first deletion request comprising the first identifier data and indicating that PII associated with the first identifier data has been deleted; and deleting, by the mapping authority in response to the first deletion request, an entry in the relational database that comprises the first transient identifier data, the first TTL value, and the first identifier data.
- 5 . A method comprising: identifying a first data structure storing non-personal data; identifying a second data structure storing personally identifiable information (PII), wherein the second data structure comprises first identifier data identifying first PII and second identifier data identifying second PII; generating, by a first computing service, first transient identifier data associated with a first time-to-live (TTL) value indicating an expiration of the first transient identifier data; storing, by the first computing service, the first transient identifier data and the first TTL value in a third data structure in association with the first identifier data wherein the first transient identifier data is transient as linking data with respect to the first identifier data; and sending, by the first computing service, the first transient identifier data to a first computer processing system, wherein the first computer processing system is not privileged to directly access the second data structure.
- 6 . The method of claim 5 , further comprising: receiving, by the first computing service from the first computer processing system, a request for PII, the request comprising the first transient identifier data; determining that the request is received prior to expiration of the first TTL value; determining, based at least in part on the request being received prior to expiration of the first TTL value, the first identifier data; sending, by the first computing service to a second computing system that controls access to the second data structure, the first identifier data; receiving, by the first computing service from the second computing system, the first PII; and sending, by the first computing service to the first computer processing system, the first PII.
- 7 . The method of claim 5 , further comprising: receiving, by the first computing service from the first computer processing system, a request for PII, the request comprising the first transient identifier data; determining, by the first computing service, that the request is received after expiration of the first TTL value; and sending, by the first computing service, a response to the first computer processing system, wherein the response denies access to the PII.
- 8 . The method of claim 5 , further comprising: determining, by the first computing service, that the first TTL has expired; and deleting, by the first computing service, the first transient identifier data and the first identifier data from the third data structure.
- 9 . The method of claim 5 , further comprising: receiving, by the first computing service from a second computing system that controls access to the second data structure, a first deletion request, the first deletion request comprising the first identifier data and indicating that PII associated with the first identifier data has been deleted from the second data structure; and deleting, by the first computing service in response to the first deletion request, the first transient identifier data and the first identifier data from the third data structure.
- 10 . The method of claim 5 , further comprising: receiving, by the first computing service from a second computing system that controls access to the second data structure, first data comprising a first notification that third PII has been stored in the second data structure and third identifier data that identifies the third PII; determining a retention policy associated with the third PII; storing, by the first computing service, second transient identifier data in the third data structure in association with the third identifier data; determining, by the first computing service, a second TTL value corresponding to the retention policy; and storing, by the first computing service, the second TTL value in the third data structure in association with the third identifier data and the second transient identifier data.
- 11 . The method of claim 5 , wherein the first computing service is effective to synchronize a data retention and deletion policy across a distributed computing architecture.
- 12 . The method of claim 5 , further comprising: receiving, by the first computing service from the first computer processing system, a request for PII, the request comprising the first transient identifier data and first policy tag data; determining that the request is received prior to expiration of the first TTL value; determining a first computer-implemented policy corresponding to the first policy tag data; determining that the request complies with the first computer-implemented policy; determining, based at least in part on the request complying with the first computer-implemented policy, the first identifier data; sending, by the first computing service to a second computing system that controls access to the second data structure, the first identifier data; receiving, by the first computing service from the second computing system, the first PII; and sending, by the first computing service to the first computer processing system, the first PII.
- 13 . A system comprising: at least one processor; and non-transitory computer-readable memory storing instructions that, when executed by the at least one processor, are effective to cause the at least one processor to: identify a first data structure storing non-personal data; identify a second data structure storing personally identifiable information (PII), wherein the second data structure comprises first identifier data identifying first PII and second identifier data identifying second PII; generate first transient identifier data associated with a first time-to-live (TTL) value indicating an expiration of the first transient identifier data; store the first transient identifier data and the first TTL value in a third data structure in association with the first identifier data, wherein the first transient identifier data is transient as linking data with respect to the first identifier data; and send the first transient identifier data to a first computer processing system, wherein the first computer processing system is not privileged to directly access the second data structure.
- 14 . The system of claim 13 , the non-transitory computer-readable memory storing further instructions that, when executed by the at least one processor, are further effective to cause the at least one processor to: receive, from the first computer processing system, a request for PII, the request comprising the first transient identifier data; determine that the request is received prior to expiration of the first TTL value; determine, based at least in part on the request being received prior to expiration of the first TTL value, the first identifier data; send, to a second computing system that controls access to the second data structure, the first identifier data; receive, from the second computing system, the first PII; and send, to the first computer processing system, the first PII.
- 15 . The system of claim 13 , the non-transitory computer-readable memory storing further instructions that, when executed by the at least one processor, are further effective to cause the at least one processor to: receive, from the first computer processing system, a request for PII, the request comprising the first transient identifier data; determine that the request is received after expiration of the first TTL value; and send a response to the first computer processing system, wherein the response denies access to the PII.
- 16 . The system of claim 13 , the non-transitory computer-readable memory storing further instructions that, when executed by the at least one processor, are further effective to cause the at least one processor to: determine that the first TTL has expired; and delete the first transient identifier data and the first identifier data from the third data structure.
- 17 . The system of claim 13 , the non-transitory computer-readable memory storing further instructions that, when executed by the at least one processor, are further effective to cause the at least one processor to: receive, from a second computing system that controls access to the second data structure, a first deletion request, the first deletion request comprising the first identifier data and indicating that PII associated with the first identifier data has been deleted from the second data structure; and delete, in response to the first deletion request, the first transient identifier data and the first identifier data from the third data structure.
- 18 . The system of claim 13 , the non-transitory computer-readable memory storing further instructions that, when executed by the at least one processor, are further effective to cause the at least one processor to: receive, from a second computing system that controls access to the second data structure, first data comprising a first notification that third PII has been stored in the second data structure and third identifier data that identifies the third PII; determine a retention policy associated with the third PII; store second transient identifier data in the third data structure in association with the third identifier data; determine a second TTL value corresponding to the retention policy; and store the second TTL value in the third data structure in association with the third identifier data and the second transient identifier data.
- 19 . The system of claim 18 , the non-transitory computer-readable memory storing further instructions that, when executed by the at least one processor, are further effective to synchronize a data retention and deletion policy across a distributed computing architecture.
- 20 . The system of claim 13 , the non-transitory computer-readable memory storing further instructions that, when executed by the at least one processor, are further effective to cause the at least one processor to: receive, from the first computer processing system, a request for PII, the request comprising the first transient identifier data and first policy tag data; determine that the request is received prior to expiration of the first TTL value; determine a first computer-implemented policy corresponding to the first policy tag data; determine that the request complies with the first computer-implemented policy; determine, based at least in part on the request complying with the first computer-implemented policy, the first identifier data; send, to a second computing system that controls access to the second data structure, the first identifier data; receive, from the second computing system, the first PII; and send, to the first computer processing system, the first PII.
Description
BACKGROUND In order to comply with various government regulations and best practices, stewards of data are required to maintain strict control over the usage, distribution, handling, and retention of personal data related to individuals. In various examples, this includes instituting capabilities to retrieve and present all personal data on demand, delete all personal data on demand, and adhere to complicated time- and rules-based retention and deletion schedules for personal data. BRIEF DESCRIPTION OF DRAWINGS FIG. 1 is a diagram illustrating an example of unlinked non-personal data and personal data in two example data structures, in accordance with various aspects of the present disclosure. FIG. 2 is a diagram illustrating data linking between personal and non-personal data through a linked dataset. FIG. 3 is diagram illustrating an example of the de-linking of personal and non-personal data. FIG. 4 is a diagram illustrating de-linking and anonymization of records of non-personal and personal data. FIG. 5 is a block diagram illustrating an example system for dynamically generating privacy-preserving data linkages in distributed computing systems, in accordance with various aspects of the present disclosure. FIG. 6 illustrates an example of time-dependent transient identifiers that may be used in accordance with various aspects of the present disclosure. FIG. 7 is a timing diagram illustrating generation of a privacy-preserving data linkage for personal data, in accordance with various aspects of the present disclosure. FIG. 8 is another timing diagram illustrating generation of a mapping between transient identifier data and non-personal data, in accordance with various aspects of the present disclosure. FIG. 9 is another timing diagram illustrating a retrieval of personal data using transient identifier data, in accordance with various aspects of the present disclosure. FIG. 10 is another timing diagram illustrating retrieval of personal data using transient identifier data and usage-based access control data, in accordance with various aspects of the present disclosure. FIG. 11 depicts another example of unlinked non-personal data and personal data in two example data structures, in accordance with various aspects of the present disclosure. FIGS. 12A-12B depict linking of non-personal data and personal data using transient identifier data, in accordance with various aspects of the present disclosure. FIGS. 13A-13B depicts de-linking of non-personal data and personal data based on expiration of transient identifier data, in accordance with various aspects of the present disclosure. FIGS. 14A-14B depicts full anonymization after user personal data deletion, in accordance with various aspects of the present disclosure. FIG. 15 depicts a state diagram illustrating an example life cycle for a personal information record, in accordance with various aspects of the present disclosure. FIG. 16 depicts an example encoding structure for a privacy-preserving deletion and retention policy, in accordance with various aspects of the present disclosure. FIG. 17 depicts example mapping and access control processing based on a privacy deletion and retention policy, in accordance with various aspects of the present disclosure. FIG. 18 depicts example privacy events processing based on a privacy deletion and retention policy, in accordance with various aspects of the present disclosure. FIG. 19 depicts a centralized dynamic privacy preserving system for a distributed computing system, in accordance with various aspects of the present disclosure. FIG. 20 is a block diagram showing an example architecture of a computing device that may be used in accordance with various embodiments described herein. DETAILED DESCRIPTION In the following description, reference is made to the accompanying drawings that illustrate several examples of the present invention. It is understood that other examples may be utilized and various operational changes may be made without departing from the scope of the present disclosure. The following detailed description is not to be taken in a limiting sense, and the scope of the embodiments of the present invention is defined only by the claims of the issued patent. Storage and/or use of data related to a particular person or entity (e.g., personally identifiable information) may be required to comply with regulations, privacy policies, and/or legal requirements of the relevant jurisdictions. In many cases, users may be provided with the option of opting out of storage and/or usage of personal data and/or may select particular types of personal data that may be stored while preventing aggregation and storage of other types of personal data. Additionally, aggregation, storage, and/or use of personal data may be compliant with privacy controls, even if not legally subject to them. For example, storage and/or use of personal data may be subject to acts and regulations, such as the Health Insurance Portab