US-20260127506-A1 - Hierarchical Gradient Averaging For Enforcing Subject Level Privacy

US20260127506A1US 20260127506 A1US20260127506 A1US 20260127506A1US-20260127506-A1

Abstract

Hierarchical gradient averaging is performed as part of training a machine learning model to enforce subject level privacy. A sample of data items from a training data set is identified and respective gradients for the data items are determined. The gradients are then clipped. Each subject's clipped gradients in the sample are averaged. A noise value is added to a sum of the averaged gradients of each of the subjects in the sample. An average gradient for the entire sample is determined from the averaged gradients of the individual subjects with the added noise value. This average gradient for the entire sample is used for determining machine learning model updates.

Inventors

Virendra J. Marathe
Pallika Haridas KANANI

Assignees

ORACLE INTERNATIONAL CORPORATION

Dates

Publication Date: 20260507
Application Date: 20251223

Claims (20)

1 . A system, comprising: at least one processor; a memory, comprising program instructions that when executed by the at least one processor cause the at least one processor to implement a machine learning system, the machine learning system configured to: train a machine learning model using gradient descent on a data set comprising a plurality of subjects, wherein individual ones of the plurality of subjects comprise one or more data items, and wherein to train the machine learning model, the machine learning system is configured to: identify a sample of data items from the data set; determine respective gradients for individual data items in the sample of data items; clip the respective gradients for the individual data items in the sample of data items according to a threshold; average the clipped gradients of individual ones of the subjects with the individual data items in the sample of data items; add a noise value to a sum of the averaged gradients for the individual ones of the subjects; and determine a sample average gradient for the sample of data items from the sum of the noisy averaged gradients with the added noise value divided by a number of data items in the sample of data items.
2 . The system of claim 1 , wherein the identification of the sample of data items, the determination of the respective gradients, the clip of the respective gradients, the average of the clipped gradients, the addition of the noise value, and the determination of the sample average gradient for the sample of data items is performed as part of one training round, and wherein a number of other training rounds in addition to the one training round are performed as determined according to a privacy budget.
3 . The system of claim 1 , wherein the noise is Gaussian noise determined for the machine learning system.
4 . The system of claim 1 , wherein the sample is one of a plurality of mini-batches taken from the data set as part of the training, and wherein the identification of the sample of data items, the determination of the respective gradients, the clip of the respective gradients, the average of the clipped gradients, the addition of the noise value, and the determination of the sample average gradient for the sample of data items are performed for other ones of the plurality of mini-batches.
5 . The system of claim 1 , wherein the machine learning model is a non-federated machine learning model.
6 . The system of claim 1 , wherein the machine learning system is a federated model user system, and machine learning system is further configured to: receive the machine learning model from a federation server; and return parameter updates to the machine learning model determined from performing the training to the federation server.
7 . The system of claim 6 , wherein the federated model user system is one of a plurality of federated model user systems that received the machine learning model from the federation server, and wherein the data set is one of a plurality of data sets respectively used at the plurality of federated model user systems, wherein at least one of the plurality of subjects has an associated data item at a different one of the plurality of data sets used at a different one of the plurality of federated mode user systems.
8 . A computer-implemented method, comprising: training a machine learning model using gradient descent on a data set comprising a plurality of subjects, wherein individual ones of the plurality of subjects comprise one or more data items, and wherein the training comprises: identifying a sample of data items from the data set; determining respective gradients for individual data items in the sample of data items; clipping the respective gradients for the individual data items in the sample of data items according to a threshold; averaging the clipped gradients of individual ones of the subjects with the individual data items in the sample of data items; adding a noise value to a sum of the averaged gradients for the individual ones of the subjects; and determining a sample average gradient for the sample of data items from the sum of the noisy averaged gradients with the added noise value divided by a number of data items in the sample of data items.
9 . The computer-implemented method of claim 8 , wherein the identifying the sample of data items, the determining the respective gradients, the clipping the respective gradients, the averaging the clipped gradients, the adding the noise value, and the determining the sample average gradient for the sample of data items is performed as part of one training round, and wherein a number of other training rounds are performed in addition to the one training round as determined according to a privacy budget.
10 . The computer-implemented method of claim 8 , wherein the noise is Gaussian noise determined for a machine learning system performing the training.
11 . The computer-implemented method of claim 8 , wherein the sample is one of a plurality of mini-batches taken from the data set as part of the training, and wherein the identifying the sample of data items, the determining the respective gradients, the clipping the respective gradients, the averaging the clipped gradients, the adding the noise value, and the determining the sample average gradient for the sample of data items are performed for other ones of the plurality of mini-batches.
12 . The computer-implemented method of claim 8 , wherein the computer-implemented method is performed by a federated model user system and wherein the method further comprises: receiving the machine learning model from a federation server; and returning parameter updates to the machine learning model determined from performing the training to the federation server.
13 . The computer-implemented method of claim 12 , wherein the federated model user system is one of a plurality of federated model user systems that received the machine learning model from the federation server, and wherein the data set is one of a plurality of data sets respectively used at the plurality of federated model user systems, wherein at least one of the plurality of subjects has an associated data item at a different one of the plurality of data sets used at a different one of the plurality of federated mode user systems.
14 . The computer-implemented method of claim 8 , wherein the machine learning model is a non-federated machine learning model.
15 . One or more non-transitory, computer-readable storage media, storing program instructions that when executed on or across one or more computing devices, cause the one or more computing devices to implement: training a machine learning model using gradient descent on a data set comprising a plurality of subjects, wherein individual ones of the plurality of subjects comprise one or more data items, and wherein, in training the machine learning model, the program instructions cause the one or more computing devices to implement: identifying a sample of data items from the data set; determining respective gradients for individual data items in the sample of data items; clipping the respective gradients for the individual data items in the sample of data items according to a threshold; averaging the clipped gradients of individual ones of the subjects with the individual data items in the sample of data items; adding a noise value to a sum of the averaged gradients for the individual ones of the subjects; and determining a sample average gradient for the sample of data items from the sum of the noisy averaged gradients with the added noise value divided by a number of data items in the sample of data items.
16 . The one or more non-transitory, computer-readable storage media of claim 15 , wherein the identifying the sample of data items, the determining the respective gradients, the clipping the respective gradients, the averaging the clipped gradients, the adding the noise value, and the determining the sample average gradient for the sample of data items is performed as part of one training round, and wherein a number of other training rounds are performed in addition to the one training round as determined according to a privacy budget.
17 . The one or more non-transitory, computer-readable storage media of claim 15 , wherein the noise is Gaussian noise determined for a machine learning system performing the training.
18 . The one or more non-transitory, computer-readable storage media of claim 15 , wherein the sample is one of a plurality of mini-batches taken from the data set as part of the training, and wherein the identifying the sample of data items, the determining the respective gradients, the clipping the respective gradients, the averaging the clipped gradients, the adding the noise value, and the determining the sample average gradient for the sample of data items are performed for other ones of the plurality of mini-batches.
19 . The one or more non-transitory, computer-readable storage media of claim 15 , wherein the one or more computing devices implement a federated model user system, and wherein the one or more non-transitory, computer readable storage media store further instructions that when executed on or across the one or more computing devices, cause the one or more computing devices to further implement: receiving the machine learning model from a federation server; and returning parameter updates to the machine learning model determined from performing the training to the federation server.
20 . The one or more non-transitory, computer-readable storage media of claim 19 , wherein the federated model user system is one of a plurality of federated model user systems that received the machine learning model from the federation server, and wherein the data set is one of a plurality of data sets respectively used at the plurality of federated model user systems, wherein at least one of the plurality of subjects has an associated data item at a different one of the plurality of data sets used at a different one of the plurality of federated mode user systems.

Description

PRIORITY CLAIM This application is a continuation of U.S. patent application Ser. No. 17/805,674, filed Jun. 6, 2022, which is hereby incorporated by reference herein in its entirety. BACKGROUND Machine learning models provide important decision making features for various applications across a wide variety of fields. Given their ubquity, greater importance has been placed on understanding the implications of machine learning model design and training data set choices on machine learning model performance. Systems and techniques that can provide greater adoption of machine learning models are, therefore, highly desirable. SUMMARY Techniques for hierarchical gradient averaging for enforcing subject level privacy are described. Training data sets for a machine learning model may include data items associated with different subjects. To enforce subject-level privacy with respect to the different subjects, training of the machine learning model may include adjustments the gradients determined as part of training the machine learning model that include added noise. A sample of data items from a training data set is identified and respective gradients for the data items are determined. The gradients are then clipped. Each subject's clipped gradients in the sample are averaged. A noise value is added to the averaged gradients of each of the subjects in the sample. An average gradient for the entire sample is determined from the averaged gradients of the individual subjects. This average gradient for the entire sample is used for determining machine learning model updates. BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a logical block diagram illustrating subject-level privacy enforcement as part of a machine learning model training system, according to some embodiments. FIG. 2 is a logical block diagram illustrating a federated machine learning system that implements hierarchical gradient averaging for enforcing subject-level privacy for training federated machine learning models, according to some embodiments. FIG. 3 is a logical block diagram illustrating a non-federated machine learning system that implements hierarchical gradient averaging for enforcing subject-level privacy for training non-federated machine learning models, according to some embodiments. FIG. 4 is a high-level flowchart illustrating techniques to hierarchical gradient averaging for enforcing subject-level privacy for training machine learning models, according to some embodiments. FIG. 5 is a high-level flowchart illustrating techniques to implement averaging model parameters generated using hierarchical gradient averaging for enforcing subject-level privacy for training machine learning models, according to some embodiments. FIG. 6 illustrates an example computing system, according to some embodiments. While the disclosure is described herein by way of example for several embodiments and illustrative drawings, those skilled in the art will recognize that the disclosure is not limited to embodiments or drawings described. It should be understood that the drawings and detailed description hereto are not intended to limit the disclosure to the particular form disclosed, but on the contrary, the disclosure is to cover all modifications, equivalents and alternatives falling within the spirit and scope as defined by the appended claims. Any headings used herein are for organizational purposes only and are not meant to limit the scope of the description or the claims. As used herein, the word “may” is used in a permissive sense (e.g., meaning having the potential to) rather than the mandatory sense (e.g. meaning must). Similarly, the words “include”, “including”, and “includes” mean including, but not limited to. Various units, circuits, or other components may be described as “configured to” perform a task or tasks. In such contexts, “configured to” is a broad recitation of structure generally meaning “having circuitry that” performs the task or tasks during operation. As such, the unit/circuit/component can be configured to perform the task even when the unit/circuit/component is not currently on. In general, the circuitry that forms the structure corresponding to “configured to” may include hardware circuits. Similarly, various units/circuits/components may be described as performing a task or tasks, for convenience in the description. Such descriptions should be interpreted as including the phrase “configured to.” Reciting a unit/circuit/component that is configured to perform one or more tasks is expressly intended not to invoke 35 U.S.C. § 112(f) interpretation for that unit/circuit/component. This specification includes references to “one embodiment” or “an embodiment.” The appearances of the phrases “in one embodiment” or “in an embodiment” do not necessarily refer to the same embodiment, although embodiments that include any combination of the features are generally contemplated, unless expressly disclaimed herein. Particular features, struc