Search

EP-4587949-B1 - PRIVACY PRESERVING NETWORK MEASUREMENTS USING PARTIAL IDENTIFIERS AND CONSISTENT BUCKET SIZING

EP4587949B1EP 4587949 B1EP4587949 B1EP 4587949B1EP-4587949-B1

Inventors

  • GANDHI, Darshak Kumarpal
  • Chauhan, Satvik

Dates

Publication Date
20260513
Application Date
20241024

Claims (15)

  1. A computer-implemented method (200), comprising: sending (208), by a network measurement system (120) and to a client device (110), data indicating a size for a partial identifier for an application (112) of the client device; receiving (214), by the network measurement system and from the client device, (i) a partial masked identifier generated by masking a complete identifier for the client device or a user of the client device and removing a portion of a resulting complete masked identifier based on the size and (ii) a first encrypted identifier generated by encrypting the complete masked identifier using an encryption key of the client device; identifying (216), by the network measurement system and from a set of network data elements (122) that each include network data and a corresponding complete masked identifier, a subset of the network data elements for which the partial masked identifier matches a portion of the complete masked identifier of the network data element; sending (220), to the client device, a doubly encrypted identifier generated by encrypting the first encrypted identifier using a first encryption key of the network measurement system and, for each network data element in the subset of network data elements, a partial match element comprising (i) a second encrypted identifier generated by encrypting the complete masked identifier of the network data element using a second encryption key of the network measurement system and (ii) encrypted network data generated by encrypting the network data of the network data element; and receiving, by the network measurement system and from the client device, (i) the encrypted network data of a given partial match element for which a decrypted version of the doubly encrypted identifier matches the second encrypted identifier of the given partial match element or (ii) the network data of the given partial match element generated by the client device by decrypting the encrypted network data of the given partial match element.
  2. The computer-implemented method of claim 1, wherein the first encryption key of the network measurement system is the same as the second encryption key of the network measurement system, or wherein the first encryption key of the network measurement system is different from the second encryption key of the network measurement system.
  3. The computer-implemented method of any preceding claim, further comprising determining the size for the partial identifier for the application based on a number of users of the application, optionally further comprising estimating the number of users based on a number of events reported to the network measurement system for the application.
  4. The computer-implemented method of any preceding claim, wherein the network data for each network data element comprises data indicating one or more digital components interacted with by a user or client device identified by the corresponding complete masked identifier of the network data element.
  5. The computer-implemented method of any preceding claim, wherein the client device sends the partial masked identifier and the first encrypted identifier in response to detecting a conversion event, optionally wherein the conversion event comprises the download of the application to the client device.
  6. The computer-implemented method of any preceding claim, further comprising updating a network data measurement for the application based on the given partial match element.
  7. The computer-implemented method of any preceding claim, wherein the second encrypted identifier comprises a hash value generated by applying a cryptographic hash function to the encrypted complete masked identifier of the network data element, and/or wherein the encrypted network data of each network data element comprises doubly encrypted network data encrypted using a third encryption key of the network measurement system and another encryption key.
  8. A computer-implemented method (200), comprising: receiving, by a client device, data indicating a size for a partial identifier for an application of the client device; generating (210), by the client device, (i) a partial masked identifier by masking a complete identifier for the client device or a user of the client device and removing a portion of a resulting complete masked identifier based on the size and (ii) a first encrypted identifier by encrypting the complete masked identifier using an encryption key of the client device; sending (212), by the client device, the partial masked identifier and the first encrypted identifier to a network measurement system; receiving, by the client device and from the network measurement system, (i) a doubly encrypted identifier generated by encrypting the first encrypted identifier using a first encryption key of the network measurement system and, for each of a plurality of network data elements stored by the network measurement system, the plurality of network data elements being a subset of network data elements for which the partial masked identifier matches a portion of a complete masked identifier of the corresponding network data element, a partial match element comprising (i) a second encrypted identifier generated by encrypting a complete masked identifier of the network data element using a second encryption key of the network measurement system and (ii) encrypted network data generated by encrypting the network data of the network data element; decrypting (222) the doubly encrypted identifier using a decryption key of the client device to obtain a third encrypted identifier; identifying (224) a given partial match element for which the second encrypted identifier matches the third encrypted identifier; and sending (226) the encrypted network data of the given partial match element to the network measurement system.
  9. The computer-implemented method of claim 8, wherein the first encryption key of the network measurement system is the same as the second encryption key of the network measurement system, or wherein the first encryption key of the network measurement system is different from the second encryption key of the network measurement system.
  10. The computer-implemented method of any one of claims 8 to 9, wherein the size of the partial identifier for the application is based on a number of users of the application, optionally wherein the number of users is estimated based on a number of events reported to the network measurement system for the application.
  11. The computer-implemented method of any one of claims 8 to 10, wherein the network data for each network data element comprises data indicating one or more digital components interacted with by a user or client device identified by the corresponding complete masked identifier of the network data element.
  12. The computer-implemented method of any one of claims 8 to 11, wherein the client device sends the partial masked identifier and the first encrypted identifier in response to detecting a conversion event, optionally wherein the conversion event comprises the download of the application to the client device.
  13. The computer-implemented method of any one of claims 8 to 12, wherein the second encrypted identifier comprises a hash value generated by applying a cryptographic hash function to the encrypted complete masked identifier of the network data element, and/or wherein the encrypted network data of each network data element comprises doubly encrypted network data encrypted using a third encryption key of the network measurement system and another encryption key.
  14. A system (300) comprising: one or more computers; and one or more storage devices (330) storing instructions that when executed by the one or more computers, cause the one or more computers to perform the operations of the respective method of any one of claims 1-13.
  15. One or more computer-readable storage media storing instructions that, when executed by one or more computers, cause the one or more computers to perform the operations of the respective method of any one of claims 1-13.

Description

CROSS REFERENCE TO RELATED APPLICATIONS This application claims priority to U.S. Provisional Application No. 63/545,524, filed on October 24, 2023. TECHNICAL FIELD This specification generally relates to privacy-preserving data processing and cryptography. BACKGROUND Private set membership is a cryptographic protocol that allows entities to query whether an identifier is a member of a set of identifiers in a privacy preserving way such that the computer storing the set of identifiers does not learn the results of the query and the entities do not learn the details of the set of identifiers stored by the computer. In US 2022/027497 A1, a computer-implemented method is disclosed and comprises: based on user identity information that forms a target of a computer database search strategy, generating a first partial hash of the user identity information, the first partial hash comprising a plurality of characters, generating and transmitting a first query to a server computer, the first query comprising a subset of characters of the plurality of characters of the first partial hash. In US 2022/147650 A1, a method is disclosed including receiving encrypted identifiers and encrypted values, performing a concealing operation on the encrypted identifiers to produce concealed encrypted identifiers, wherein the concealing operation conceals the encrypted identifiers from a first computing system and a second computing system but enables matching between the concealed encrypted identifiers, decrypting, by the second computing system, the concealed encrypted identifiers to produce concealed identifiers, and performing, by the second computing system, an aggregation operation. In WO 2022/266071 A1, encrypted information retrieval can include generating a database that is partitioned into shards each having a shard identifier, and database entries in each shard that are partitioned into buckets having a bucket identifier. A batch of client-encrypted queries are received. The batch of client-encrypted queries are processed using a set of server-encrypted data stored in a database. The processing includes grouping the client-encrypted queries according to shard identifiers of the client-encrypted queries, executing multiple queries in the group of client- encrypted queries for the shard together in a batch execution process, and generating multiple server-encrypted results to the multiple queries in the group of client-encrypted queries. SUMMARY This specification describes methods, computer systems, and apparatus, including computer programs encoded on computer storage media, for using cryptographic protocols to generate network measurements in privacy preserving ways. The invention is defined in the independent claims. Preferred embodiments are defined in the dependent claims. A combination of private set membership and dynamically, or at least variably, sized partial identifiers can be used to obtain network data for generating network measurements (or other types of data) in ways that reduce the amount of data that is analyzed and sent over a network while still preserving user privacy, thereby improving the speed and efficiency in collecting the data and generating the network measurements without compromising privacy and data security. To protect user privacy, an actual user identifier or device identifier may not be sent from a client device of the user to another system, e.g., to a network measurement system, in cleartext. Instead, the client device can send either a complete masked identifier or a partial masked identifier. The complete masked identifier can be a masked version of the complete identifier for the user or client device. For example, the complete masked identifier can be a hash value generated by applying a cryptographic hash function to the complete user identifier or the complete device identifier. The partial masked identifier can be a partial version of the complete masked identifier, identifier, e.g., with fewer characters or bits than the complete masked identifier. For example, the complete masked identifier can be 256 bits if a SHA-256 hash function is used or 512 bits if a SHA-512 hash function is used. In these examples, the partial masked identifier would be less than 256 bits if a SHA-256 hash function is used or less than 512 bits if a SHA-512 hash function is used. For example, the partial identifier and be the first N bits, where N is a positive integer. Other hash functions can also be used. In some implementations, a client device can send the partial masked identifier to a network measurement system in response to detecting an event. The client device can also send a first encrypted identifier to the network measurement system. The first encrypted identifier can be an encrypted version of the complete masked identifier (rather than the partial masked identifier) using an encryption key of the client device. The complete masked identifier is encrypted to prevent the network measuremen