Search

US-12621272-B2 - Systems and methods for locally private non-interactive communications

US12621272B2US 12621272 B2US12621272 B2US 12621272B2US-12621272-B2

Abstract

A computer-implemented method for encoding data for communications with improved privacy includes obtaining, by a computing system comprising one or more computing devices, input data including one or more input data points. The method can include constructing, by the computing system, a net tree including potential representatives of the one or more input data points, the potential representatives arranged in a plurality of levels, the net tree including a hierarchical data structure including a plurality of hierarchically organized nodes. The method can include determining, by the computing system, a representative of each of the one or more input data points from the potential representatives of the net tree, the representative including one of the plurality of hierarchically organized nodes. The method can include encoding, by the computing system, the representative of each of the one or more input data points for communication.

Inventors

  • Badih Ghazi
  • Shanmugasundaram Ravikumar
  • Alisa Chang
  • Pasin Manurangsi

Assignees

  • GOOGLE LLC

Dates

Publication Date
20260505
Application Date
20211220

Claims (13)

  1. 1 . A computer-implemented method for encoding data for communications with improved privacy, the method comprising: obtaining, by a computing system comprising one or more computing devices, input data comprising one or more input data points; constructing, by the computing system, a net tree comprising potential representatives of the one or more input data points, the potential representatives arranged in a plurality of levels, the net tree comprising a hierarchical data structure comprising a plurality of hierarchically organized nodes, wherein constructing, by the computing system, the net tree comprising potential representatives of the one or more input data points comprises: determining, by the computing system, the expansion threshold; identifying, by the computing system, a number of one or more highest-ranking nodes in the net tree at a first level in the net tree, the number being equal to the expansion threshold; and expanding, by the computing system, the one or more highest-ranking nodes at a second level in the net tree; wherein the expansion threshold is based at least in part on an optimal transport cost between the one or more input data points and the potential representatives or on a lower bound on a minimum cost between the one or more input data points and the potential representatives; determining, by the computing system, a representative of each of the one or more input data points from the potential representatives of the net tree, the representative comprising one of the plurality of hierarchically organized nodes; and encoding, by the computing system, the representative of each of the one or more input data points for communication.
  2. 2 . The computer-implemented method of claim 1 , further comprising: prior to determining a representative of each of the one or more input data points from the potential representatives of the net tree, projecting the one or more input data points to a random subspace; and subsequent to projecting the one or more input data points to the random subspace, scaling the projected input data points to a subspace having reduced dimensionality.
  3. 3 . The computer-implemented method of claim 1 , wherein the communication comprises a local model of differential privacy.
  4. 4 . The computer-implemented method of claim 1 , wherein the communication comprises a shuffled model of differential privacy.
  5. 5 . The computer-implemented method of claim 1 wherein the net tree comprises a plurality of efficiently decodable nets.
  6. 6 . The computer-implemented method of claim 1 , wherein encoding, by the computing system, the representative of each of the one or more input data points for communication comprises encoding, by the computing system, the representative by a generalized bucketized vector summation encoder model.
  7. 7 . The computer-implemented method of claim 6 , wherein the generalized bucketized vector summation encoder model comprises a vector encoding of a dot product of a shared uniform random component and a potential representative.
  8. 8 . The computer-implemented method of claim 1 , wherein encoding, by the computing system, the representative of each of the one or more input data points for communication comprises encoding, by the computing system, the representative by a generalized histogram encoder model.
  9. 9 . The computer-implemented method of claim 8 , wherein the generalized histogram encoder model produces an output based on a shared uniform random component, wherein the output is positive with probability e{circumflex over ( )}ε/(e{circumflex over ( )}ε+1) and negative with probability 1/(e{circumflex over ( )}ε+1), where ε is a hyperparameter of differential privacy.
  10. 10 . The computer-implemented method of claim 1 , wherein the representative of an input data point of the one or more input data points comprises a closest potential representative to the input data point.
  11. 11 . The computer-implemented method of claim 10 , wherein the closest potential representative to the input data point comprises a potential representative having a smallest Euclidean distance to the input data point relative to each of the other potential representatives in the net tree.
  12. 12 . A computer-implemented method for clustering input data points with differential privacy guarantees and reduced approximation ratio, the computer-implemented method comprising: obtaining, by a computing system comprising one or more computing devices, input data comprising one or more input data points; constructing, by the computing system, a net tree comprising potential representatives of the one or more input data points, the potential representatives arranged in a plurality of levels, the net tree comprising a hierarchical data structure comprising a plurality of hierarchically organized nodes and a plurality of mappings between the plurality of hierarchically organized nodes, wherein constructing, by the computing system, the net tree comprising potential representatives of the one or more input data points comprises: determining, by the computing system, the expansion threshold; identifying, by the computing system, a number of one or more highest-ranking nodes in the net tree at a first level in the net tree, the number being equal to the expansion threshold; and expanding, by the computing system, the one or more highest-ranking nodes at a second level in the net tree; wherein the expansion threshold is based at least in part on an optimal transport cost between the one or more input data points and the potential representatives or on a lower bound on a minimum cost between the one or more input data points and the potential representatives; and determining, by the computing system, a representative of each of the one or more input data points from the potential representatives of the net tree, the representative comprising one of the plurality of hierarchically organized nodes.
  13. 13 . The computer-implemented method of claim 12 , further comprising: prior to determining a representative of each of the one or more input data points from the potential representatives of the net tree, projecting the one or more input data points to a random subspace; and subsequent to projecting the one or more input data points to the random subspace, scaling the projected input data points to a subspace having reduced dimensionality.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS This application is based upon and claims the right of priority under 35 U.S.C. § 371 to International Application No. PCT/US2021/064371 filed on Dec. 20, 2021, which claims priority to and the benefit of U.S. Provisional Patent Application No. 63/168,533, filed Mar. 31, 2021. Applicant claims priority to and the benefit of each of such applications and incorporates all such applications herein by reference in its entirety. FIELD The present disclosure relates generally to systems and methods for locally private non-interactive communications. More particularly, the present disclosure relates to differentially private k-means clustering in the one-round, non-interactive local model. BACKGROUND Clustering, such as k-means clustering, relates to grouping or clustering a set of dimensional input points into clusters based on distance from the points to a cluster center. In k-means clustering, the points are clustered based on Euclidean distance from the points to their respective cluster center, with the goal of assigning points to candidate centers to minimize the total cost across all points, and potentially subject to other constraints. Differential privacy has emerged as a popular definition of privacy, providing strong guarantees and mathematical rigor. Differential privacy provides that slight changes in input sets are not traceable at the output. Two predominant models of differential privacy have emerged: the central model, in which a trusted central curator encodes data to be differentially private; and distributed models such as the local model, in which there is no central curator, and instead outputs from each client are expected to be differentially private. SUMMARY Aspects and advantages of embodiments of the present disclosure will be set forth in part in the following description, or can be learned from the description, or can be learned through practice of the embodiments. One example aspect of the present disclosure is directed to a computer-implemented method for encoding data for communications with improved privacy. The method can include obtaining, by a computing system comprising one or more computing devices, input data including one or more input data points. The method can include constructing, by the computing system, a net tree including potential representatives of the one or more input data points, the potential representatives arranged in a plurality of levels, the net tree including a hierarchical data structure including a plurality of hierarchically organized nodes. The method can include determining, by the computing system, a representative of each of the one or more input data points from the potential representatives of the net tree, the representative including one of the plurality of hierarchically organized nodes. The method can include encoding, by the computing system, the representative of each of the one or more input data points for communication. Another example aspect of the present disclosure is directed to a computer-implemented method for decoding data encoded by a net tree based encoding algorithm. The method can include obtaining, by a computing system can include one or more computing devices, encoded input data including encoded histogram data. The method can include determining, by the computing system, a decoded frequency oracle based at least in part on the encoded histogram data. The method can include constructing, by the computing system, a net tree based at least in part on the decoded frequency oracle, the net tree including a plurality of leaves. The method can include performing, by the computing system, a k-means approximation algorithm on the net tree to partition the plurality of leaves according to respective closest centers into a plurality of partitions. Another example aspect of the present disclosure is directed to a computer-implemented method for clustering input data points with differential privacy guarantees and reduced approximation ratio. The method includes obtaining, by a computing system including one or more computing devices, input data including one or more input data points. The method includes constructing, by the computing system, a net tree including potential representatives of the one or more input data points, the potential representatives arranged in a plurality of levels, the net tree including a hierarchical data structure including a plurality of hierarchically organized nodes and a plurality of mappings between the plurality of hierarchically organized nodes. The method can include determining, by the computing system, a representative of each of the one or more input data points from the potential representatives of the net tree, the representative including one of the plurality of hierarchically organized nodes. Other aspects of the present disclosure are directed to various systems, apparatuses, non-transitory computer-readable media, user interfaces, and electronic devic