Search

US-12619777-B2 - Obfuscation of personally identifiable information

US12619777B2US 12619777 B2US12619777 B2US 12619777B2US-12619777-B2

Abstract

A method, including defining rules for protecting sensitive data, each of the rules including a reference URL and a reference JSON data mapping to an item of the sensitive data in a JSON payload. A proxy receives, from an application executing on a host computer, a query including a URL for data hosted by a server, and forwards the URL to the server. The proxy receives, from the server, a response to the forwarded URL, the response including a set of values stored in respective mappings. The URL and the mappings in the response are compared to the rules; and upon detecting a match between a given rule and a combination including the URL and a given mapping in the response, the proxy anonymizes by the proxy, the value stored at the given mapping in the response, and forwards the response, including the anonymized value, to the software application.

Inventors

  • Eran Meshulam Bachar
  • Avraham Levi
  • Ofir Rabanian
  • Ben STERENSON
  • Gonen Tiberg
  • Aaron Bar Hakim
  • Gilad Avidan
  • Yehonatan Ernest Friedman

Assignees

  • Demostack, Inc.

Dates

Publication Date
20260505
Application Date
20230713

Claims (19)

  1. 1 . A method for processing data, comprising: defining a set of rules for protecting sensitive data, each of the rules comprising a reference Uniform Resource Locator (URL) and a reference JavaScript Object Notation (JSON) data mapping to an item of the sensitive data in a JSON payload; receiving by a proxy, from a software application executing on a host computer, a query comprising a URL for data hosted by a server; forwarding the received URL from the proxy to the server; receiving at the proxy, from the server, a response to the forwarded URL, the response comprising a set of values stored in respective JSON data mappings; comparing the received URL and the JSON data mappings in the response to the set of rules, wherein comparing comprises generating a response pair comprising the received URL and a given response data mapping, and comparing the response pair to the rules to detect matches, wherein a match requires both the received URL matching a rule URL and the given response data mapping matching a rule data mapping; and upon detecting a match: anonymizing, by the proxy, the value stored at the given JSON data mapping in the response, and forwarding the response, including the anonymized value, to the software application.
  2. 2 . The method according to claim 1 , wherein the sensitive data comprises Personal Identifiable Information (PII).
  3. 3 . The method according to claim 2 , wherein the PII comprises Protected Health Information (PHI).
  4. 4 . The method according to claim 1 , wherein the software application comprises a demonstration application for a target application that manages the data on the server.
  5. 5 . The method according to claim 1 , wherein the query comprises a Hypertext Transfer Protocol (HTTP) request, and wherein the response comprises an HTTP response comprising the JSON payload comprising the values stored in the respective JSON data mappings.
  6. 6 . The method according to claim 1 , wherein the sensitive data comprises a first dataset, and further comprising storing the updated response to a second dataset, wherein the second dataset comprises a sensitive data-free version of the first dataset.
  7. 7 . The method according to claim 6 , and further comprising subsequent to storing the updated response to the second dataset, receiving, by the proxy from the software application, an additional query comprising the URL for the data hosted by a server, retrieving the requested data from the second dataset, and conveying, to the software application in response to the additional query, the data retrieved from the second dataset.
  8. 8 . The method according to claim 1 , wherein the query comprises a production query, wherein the received URL comprises a production URL, wherein the response comprises a production response, wherein the JSON data mapping in the response comprises a production JSON data mapping, wherein the values comprise production values, and wherein defining a given rule comprises conveying, prior to receiving the production query, a reference query comprising a given reference URL, receiving from the server, a reference response to the forwarded given reference URL, the reference response comprising a set of reference values stored in respective reference JSON data mappings, identifying a given reference value comprising sensitive data, and storing the reference URL and the reference JSON data mapping for the identified given reference value to the given rule.
  9. 9 . The method according to claim 8 , wherein detecting the match comprises detecting a match between the production URL and the given reference URL in the given rule, and detecting a match between the production JSON data mapping in the response and the reference JSON data mapping in the given rule.
  10. 10 . The method according to claim 8 , wherein defining the given rule further comprises defining an anonymization operation, and storing the anonymization operation to the given rule.
  11. 11 . The method according to claim 10 , wherein anonymizing the value stored at the given production JSON data mapping in the production response comprises the performing the anonymization operation in the given rule on the production value stored at the given production JSON data mapping in the production response.
  12. 12 . The method according to claim 8 , wherein identifying the reference value comprising sensitive data comprises identifying a format of the reference value, comparing the identified format to a list of specified formats, and detecting a match between the identified format and a given specified format.
  13. 13 . The method according to claim 8 , wherein the reference JSON data mapping comprises a first reference JSON data mapping, and further comprising detecting an additional instance of the given reference value in the reference response, identifying a second reference JSON data mapping for the additional instance the given reference value, and storing the given reference URL and the second reference JSON data mapping to an additional rule.
  14. 14 . The method according to claim 8 , wherein identifying the reference value comprising sensitive data comprises comparing the reference values to a list of keywords, and detecting a match between the reference value and a given keyword.
  15. 15 . The method according to claim 8 , wherein the reference values in the reference response comprise respective keys, and wherein identifying the reference value comprising sensitive data comprises comparing the key corresponding to the reference value to a list of keywords, and detecting a match between the corresponding key and a given keyword.
  16. 16 . The method according to claim 8 , wherein identifying the reference value comprising sensitive data comprises comparing the given reference URL to a list of keywords, and detecting a match between the corresponding key and a given keyword.
  17. 17 . The method according to claim 1 , wherein a given reference URL comprises one or more wildcard characters.
  18. 18 . An apparatus for processing data, comprising: a memory configured to store a proxy; and one or more processors configured: to define, in the memory, a set of rules for protecting sensitive data, each of the rules comprising a reference Uniform Resource Locator (URL) and a reference JavaScript Object Notation (JSON) data mapping to an item of the sensitive data in a JSON payload, to receive by a proxy, from a software application executing on a host computer, a query comprising a URL for data hosted by a server, to forward the received URL from the proxy to the server, to receive at the proxy, from the server, a response to the forwarded URL, the response comprising a set of values stored in respective JSON data mappings, to compare the received URL and the JSON data mappings in the response to the set of rules, wherein comparing comprises generating a response pair comprising the received URL and a given response data mapping, and comparing the response pair to the rules to detect matches, wherein a match requires both the received URL matching a rule URL and the given response data mapping matching a rule data mapping, and upon detecting a match: to anonymize, by the proxy, the value stored at the given JSON data mapping in the response, and to forward the response, including the anonymized value, to the software application.
  19. 19 . A computer software product for demonstrating a target application, comprising a non-transitory computer-readable medium, in which program instructions are stored, which instructions, when read by a computer, cause the computer: to define a set of rules for protecting sensitive data, each of the rules comprising a reference Uniform Resource Locator (URL) and a reference JavaScript Object Notation (JSON) data mapping to an item of the sensitive data in a JSON payload; to receive by a proxy, from a software application executing on a host computer, a query comprising a URL for data hosted by a server; to forward the received URL from the proxy to the server; to receive at the proxy, from the server, a response to the forwarded URL, the response comprising a set of values stored in respective JSON data mappings; to compare the received URL and the JSON data mappings in the response to the set of rules, wherein comparing comprises generating a response pair comprising the received URL and a given response data mapping, and comparing the response pair to the rules to detect matches, wherein a match requires both the received URL matching a rule URL and the given response data mapping matching a rule data mapping; and upon detecting a match: to anonymize, by the proxy, the value stored at the given JSON data mapping in the response, and to forward the response, including the anonymized value, to the software application.

Description

FIELD OF THE INVENTION The present invention relates generally to data security, and specifically to dynamically identifying and obfuscating personal identifiable information when executing a web-based application. BACKGROUND OF THE INVENTION Personal Identifiable Information (PII) refers to any data that can be used to identify a specific individual. This can include a person's name, address, phone number, social security number, email address, date of birth, and more. PII is often collected by organizations for various purposes, such as for employment, healthcare, or financial transactions. One example of PII is Protected Health Information (PHI), which includes information such as medical records, lab reports, hospital bills and any information relating to an individual's past, present, or future physical or mental health. In other words, PHI is a subset of PII. The collection and use of PII can also pose significant privacy and security risks if not handled appropriately. As such, it is important for individuals and organizations to take appropriate measures to protect PII and ensure its safe handling, storage, and disposal. PII regulations are laws and guidelines that aim to protect the privacy and security of personal information. These regulations typically require organizations to implement specific measures to ensure the proper handling, storage, and disposal of PII. Some common PII regulations include the General Data Protection Regulation (GDPR) in the European Union, the Health Insurance Portability and Accountability Act (HIPAA) in the United States, and the Personal Information Protection and Electronic Documents Act (PIPEDA) in Canada. Failure to comply with PII regulations can result in significant penalties and legal consequences. As such, it is important for organizations to understand and comply with the relevant regulations in their jurisdiction. The description above is presented as a general overview of related art in this field and should not be construed as an admission that any of the information it contains constitutes prior art against the present patent application. SUMMARY OF THE INVENTION There is provided, in accordance with an embodiment of the present invention, a method for processing data, including defining a set of rules for protecting sensitive data, each of the rules including a reference Uniform Resource Locator (URL) and a reference JavaScript Object Notation (JSON) data mapping to an item of the sensitive data in a JSON payload, receiving by a proxy, from a software application executing on a host computer, a query including a URL for data hosted by a server, forwarding the received URL from the proxy to the server, receiving at the proxy, from the server, a response to the forwarded URL, the response including a set of values stored in respective JSON data mappings, comparing the received URL and the JSON data mappings in the response to the set of rules, and upon detecting a match between a given rule and a combination including the received URL and a given JSON data mapping in the response, anonymizing, by the proxy, the value stored at the given JSON data mapping in the response, and forwarding the response, including the anonymized value, to the software application. In one embodiment, the sensitive data includes Personal Identifiable Information (PII). In some embodiments, the PII includes Protected Health Information (PHI). In another embodiment, the software application includes a demonstration application for a target application that manages the data on the server. In an additional embodiment, wherein the query includes a Hypertext Transfer Protocol (HTTP) request, and wherein the response includes an HTTP response including the JSON payload including the values stored in the respective JSON data mappings. In a further embodiment, the sensitive data includes a first dataset, and the method further includes storing the updated response to a second dataset, wherein the second dataset includes a sensitive data-free version of the first dataset. In some embodiments, the method further includes subsequent to storing the updated response to the second dataset, receiving, by the proxy from the software application, an additional query including the URL for the data hosted by a server, retrieving the requested data from the second dataset, and conveying, to the software application in response to the additional query, the data retrieved from the second dataset. In a supplemental embodiment, the query includes a production query, wherein the received URL includes a production URL, wherein the response includes a production response, wherein the JSON data mapping in the response includes a production JSON data mapping, wherein the values include production values, and wherein defining a given rule includes conveying, prior to receiving the production query, a reference query including a given reference URL, receiving from the server, a reference response to the forw