EP-4004790-B1 - MULTI-COUNTRY DATA PIPELINE THAT PROTECTS PERSONALLY IDENTIFYING INFORMATION

EP4004790B1EP 4004790 B1EP4004790 B1EP 4004790B1EP-4004790-B1

Inventors

WOESSNER, Leo
DEYOUNG, Jeffrey
SAXENA, Ritu
Reimers, Chadwick

Dates

Publication Date: 20260506
Application Date: 20200723

Claims (15)

A multi-country data pipeline (300) configured to protect Personally Identifying Information, PII, for each user in a plurality of users, comprising: physically located in a first country (401): a first application (406) configured to: receive entered data (700) from a user (400), wherein the entered data and the user are physically located in the first country (401) and the entered data comprises non-personal data (800) and the PII (701), receive a schema (404), from a third country (403), containing a privacy policy for the first country (401), identify the non-personal data (800) and the PII (701) in the entered data (700) using the schema (404), and transmit anonymized data (411) through the multi-country data pipeline (300) from the first country (401) to an analytics function (409) in a second country (402), a deidentification system (704) configured to: generate, using a one-way hash (702), an Identification, ID, tag (703) for the PII (701) in the entered data (700), and create the anonymized data (411) by replacing the PII (701), in the entered data (700), with the ID tag (703) for the PII, an identity data store (410) configured to: store the ID tag (703) and the PII (701) in a first database, and return the PII (701) associated with the ID tag (703) when the ID tag (703) is received, a second application (407) configured to: upon authenticating the user (400), transmit the ID tag (703) to a reidentification system (706), combine the PII (701) received from the reidentification system (706) with anonymized results (412) to create identified results (705), and perform an action for the user based on the identified results (705), the reidentification system (706) configured to: receive the ID tag (703) from the second application (407), transmit the ID tag (703) to the identity data store (410), receive from the identity data store (410) the PII (701) associated with the ID tag (703), and transmit the PII (701) to the second application (407); physically located in the second country (402): the analytics function (409) configured to: generate results based on the anonymized data (411), create the anonymized results (412) by adding the ID tag (703) to the results, wherein the anonymized results contain no PII (701), and transmit the anonymized results (412) through the multi-country data pipeline (300) from the second country (402) to the second application (407) in the first country (401); and physically located in the third country (403): a PII Schema Service (413) comprising a plurality of schemas (405) stored in a second database, wherein each schema in the plurality of schemas identifies a privacy policy (408) for a different country or region.
The multi-country data pipeline (300) of claim 1, wherein the schema (404) identifies a plurality of PII fields in the entered data (700) based on the privacy policy (408) for the first country (401).
The multi-country data pipeline (300) of claim 1, wherein the first country (401), the second country (402) and the third country (403) are three different countries.
The multi-country data pipeline (300) of claim 1, wherein the PII (701) entered by the user (400) never leaves the first country (401).
The multi-country data pipeline (300) of claim 1, wherein the first application (406) is a different application from the second application (407).
The multi-country data pipeline (300) of claim 1, wherein the first application (406) is the same application as the second application (407).
The multi-country data pipeline (300) of claim 1, wherein the plurality of schemas (405) includes the schema (404) and the plurality of schemas are all stored in the third country (403).
The multi-country data pipeline (300) of claim 1, wherein the multi-country data pipeline further comprises: a plurality of publisher methods (301), wherein the plurality of publisher methods comprises a Java published software development kit (304) and a REST API (306), a data ingestion unit (302) configured to: i) receive and archive data from the plurality of publisher methods (301), ii) tag the data with a producer, a message-type, a version and a timestamp (309), and iii) validate the data is in conformance with a schema containing a privacy policy for a first country, and a web services unit (303) configured to provide the data to a plurality of different consumer services.
A method for a multi-country data pipeline (300) configured to protect Personally Identifying Information, PII, for each user in a plurality of users, the method comprising: receiving, by a first application (406) physically located in a first country (401), entered data (700) from a user (400), wherein the entered data and the user are physically located in the first country and the entered data comprises non-personal data (800) and PII (701); storing, in a second database physically located in a third country (403), a PII Schema Service (413) comprising a plurality of schemas (405), wherein each schema in the plurality of schemas identifies a privacy policy (408) for a different country or region; receiving, by the first application (406), a schema (404), from the third country (403), containing a privacy policy (408) for the first country (401); identifying, by the first application (406), the non-personal data (800) and the PII (701) in the entered data (700) using the schema (404); generating, by a deidentification system (704) physically located in the first country (401), using a one-way hash (702), an Identification, ID, tag (703) for the PII (701) in the entered data (700); storing, by an identity data store (410) physically located in the first country (401), the ID tag (703) and the PII (701) in a first database; creating, by the deidentification system (704), anonymized data (411) by replacing the PII (701), in the entered data (700), with the ID tag (703) for the PII; transmitting, by the first application (406), the anonymized data (411) through the multi-country data pipeline (300) from the first country (401) to an analytics function (409) in a second country (402); generating, by the analytics function (409) physically located in the second country (402), results based on the anonymized data (411); creating, by the analytics function (409), anonymized results (412) by adding the ID tag (703) to the results, wherein the anonymized results contain no PII (701); transmitting, by the analytics function, the anonymized results (412) through the multi-country data pipeline (300) from the second country (402) to a second application (407) physically located in the first country (401); and upon authenticating the user (400), transmitting, by the second application (407) physically located in the first country (401), the ID tag (703) to a reidentification system (706) physically located in the first country (401); receiving, by the reidentification system (706), the ID tag (703) from the second application (407); transmitting, by the reidentification system (706), the ID tag (703) to the identity data store (410); returning, by the identity store (410), the PII (701) associated with the ID tag (703) when the ID tag (703) is received; receiving, by the reidentification system (706), from the identity data store (410) the PII (701) associated with the ID tag (703); transmitting, by the reidentification system (706), the PII (701) to the second application (407); combining, by the second application (407), the PII (701) received from the reidentification system (706) with anonymized results (412) to create identified results (705); and performing, by the second application (407), an action for the user based on the identified results (705).
The method of claim 9, wherein the schema (404) identifies a plurality of PII fields in the entered data (700) based on the privacy policy (408) for the first country (401).
The method of claim 9, wherein the first country (401), the second country (402) and the third country (403) are three different countries.
The method of claim 9, wherein the PII (701) entered by the user (400) never leaves the first country (401).
The method of claim 9, wherein the first application (406) is a different application from the second application (407).
The method of claim 9, wherein the first application (406) is the same application as the second application (407).
The method of claim 9, wherein the plurality of schemas (405) includes the schema (404) and the plurality of schemas are all stored in the third country (403).

Description

CROSS-REFERENCE TO RELATED APPLICATIONS This application claims priority to U.S. Application No. 16/522,511, filed July 25, 2019. FIELD OF THE INVENTION This disclosure relates to a multi-country message streaming platform built on a validated data pipeline, whereby personally identifying information (PII) never leaves the country of origin. BACKGROUND US 2015/0150144 A1 describes a privacy server for protecting personally identifiable information by substituting a token or an identifier for the private information. The privacy server recognizes that a communication includes private information and intercepts the communication. The privacy server replaces the private information with a random or pseudo-random token or identifier. The privacy server maintains the private information in a local database and associates the private information for a particular person with the token or identifier for that person. US 2016/0147945 A1 describes a system and method for providing a secure check of patient records. SUMMARY OF THE INVENTION The invention is set out in the appended set of claims. The present invention provides systems and methods comprising one or more server hardware computing devices or client hardware computing devices, communicatively coupled to a network, and each comprising at least one processor executing specific computer-executable instructions within a memory. A message streaming platform comprises a plurality of publisher methods, wherein the plurality of publisher methods comprises a Java published software development kit and a REST API; a data ingestion unit configured to i) receive and archive data from the plurality of publisher methods, ii) tag the data with a producer, message-type, version and timestamp, iii) validate the data is in conformance with a predetermined schema, and iv) tag the data with an error message if the data is not in conformance with the predetermined schema; and a web services unit configured to provide the data to a plurality of different consumer services. The invention, hereafter referred to as a message streaming platform, is an enterprise message streaming platform built around a validated data pipeline. The message streaming platform may be a data back-bone for any corporation with the need to receive, store and/or use data. The client-side and producer-side software development kits may enable messages to be published and routed to private queues based on message type (examples: student joined a course, final course grade for a student, etc.) The invention preferably has one or more of the following capabilities: creates a common service for publishing and conveyance of user activity and business events; supports loose coupling between Producers and Consumers; hides the underlying infrastructure from Producers and Consumers; provides a low barrier to adoption; performant, highly scalable, highly available, and highly reliable; supports 'at least once' delivery; provides a managed data archive; and backs up and validates conveyed messages using published schemas. Prior systems were difficult to maintain and support, unable to scale, and they often had stability issues. In contrast, the invention may reduce individual component complexity, support independent scaling of features, and support deployment flexibility. The invention may be a backbone for various business critical applications to support the information exchange between systems through messages. The invention may be an enterprise level data streaming platform to distribute corporate domain state changes and other messages across various producers and consumers. The invention may be designed for performance, scalability, message flow transparency, and guaranteed message delivery. Messages may be archived as well as published and routed to private queues based on message type and routing tags. The invention may be used by many different corporate services using and/or producing data. The invention may have the advantages and features of a schema registry and promotion; a simple interface to publish new schemas and retrieve existing schemas; producer software development kits (SDKs); have a streamlined publishing interface; have a very low latency between internal components; allow consumer SDK - near real time data pull from the invention; comprise easy consumer implementation; include rapid message delivery; published API; REST API to publish events and activities; simple authentication supports both internal and external systems; status API and tracking UI; rest API to retrieve the status of published events and activities; an easy to use API allows customers to efficiently track messages from the time they are published through the time of their archival; data storage system - organized data storage in sequence files format; inexpensive long term storage; archives all messages; long term analytics; subscription management of APIs and UI tools. In addition, the invention may have one or more of th