JP-7857319-B2 - Bidirectional compression and privacy for efficient communication in federative learning

JP7857319B2JP 7857319 B2JP7857319 B2JP 7857319B2JP-7857319-B2

Inventors

マティアス・ライサー
アレクセイ・トリアストシン
クリストス・ルイゾス

Assignees

クアルコム，インコーポレイテッド

Dates

Publication Date: 20260512
Application Date: 20220531
Priority Date: 20210528

Claims (14)

A computer implementation method, Steps include receiving a global model from a federated learning server, The steps include determining an updated model based on the global model and local data, The process includes the step of sending the updated model to the federated learning server using relative entropy coding, The step of sending the updated model to the federated learning server using relative entropy coding is: A step of determining a random seed, which includes the step of receiving the random seed from the federated learning server, The steps include determining a first probability distribution based on the aforementioned global model, The steps include determining a second probability distribution centered on the updated model, A step of determining a plurality of random samples from the first probability distribution according to the random seed, based on the difference between the first probability distribution and the second probability distribution, wherein the plurality of random samples are associated with a plurality of parameters of the global model. A step of assigning a probability to each of the plurality of random samples based on the ratio of the likelihood of each random sample given a second probability distribution to the likelihood of each random sample given a first probability distribution, The steps include selecting one random sample from the plurality of random samples according to the probability of each of the plurality of random samples, The steps include determining the index associated with the selected random sample, The steps include sending the aforementioned index to the federated learning server and Methods that include...
The aforementioned index is transmitted using log 2 K bits, K is the number of the plurality of random samples from the first probability distribution. The method according to claim 1 .
The method according to claim 1 , wherein the plurality of random samples are associated with layers of the global model.
The method according to claim 1 , wherein the plurality of random samples are associated with a subset of the parameters of the global model.
The process further includes a step of clipping the updated model before determining the second probability distribution centered on the updated model, The clipping is based on the standard deviation of the global model, The second probability distribution described above is based on the clipped update model, The method according to claim 1 .
The method according to claim 5 , wherein the step of clipping the updated model includes the step of clipping the norm of the updated model.
The method according to claim 1, wherein the step of determining the updated model based on the global model and local data includes the step of performing a gradient descent on the global model using the local data.
A computer implementation method, Steps to send the global model to the client device, A step of determining a random seed, which includes a step of receiving the random seed from the client device , The steps include receiving an updated model from the client device using relative entropy coding, The steps include determining an updated global model based on the updated model from the client device , The step of receiving the updated model from the client device using relative entropy coding is: The steps include receiving an index from the client device, A step of determining a sample from a probability distribution based on the global model, the random seed, and the index, A method comprising the steps of: using a determined sample to determine the updated global model, wherein the determined sample is used to update the parameters of the updated global model .
The aforementioned index is received using log 2 K bits, K is the number of random samples determined from the probability distribution based on the global model. The method according to claim 8 .
The method according to claim 8 , wherein the determined sample is used to update the layer of the updated global model.
A processing system comprising: a memory containing computer executable instructions; and one or more processors configured to execute computer executable instructions and cause the processing system to perform the method according to any one of claims 1 to 10 .
A processing system comprising means for performing the method described in any one of claims 1 to 10 .
A non-temporary computer-readable medium comprising a computer-executable instruction that, when executed by one or more processors of a processing system, causes the processing system to perform the method according to any one of claims 1 to 10 .
A computer program embodied on a computer-readable storage medium, comprising code for performing the method described in any one of claims 1 to 10 .

Description

Cross-reference of related applications: This application claims priority to PCT application PCT/US2022/072599 filed on 26 May 2022, and also claims the benefit and priority to Greek patent application 20210100355 filed on 28 May 2021, the entire contents of each of these applications being incorporated herein by reference. This disclosure relates to machine learning. Machine learning is generally the process of creating a trained model (e.g., an artificial neural network, tree, or other structure) that demonstrates a generalized fit to a priori known set of training data. Applying the trained model to new data creates inference information, which can be used to gain insights into the new data. As the use of machine learning expands across various technological domains, sometimes referred to as artificial intelligence tasks, the need for more efficient processing of machine learning model data arises. For example, "edge processing" devices such as mobile devices, always-on devices, and Internet of Things (IoT) devices must balance the implementation of advanced machine learning capabilities with various interrelated design constraints, including packaging size, native computing power, power storage and usage, data communication capabilities and cost, memory size, and heat dissipation. Federative learning is a distributed machine learning framework that enables several clients, such as edge processing devices, to collaboratively train a shared global model without transferring local data to a remote server. Generally, a central server coordinates the federative learning process, and each participating client communicates only model parameter information with the central server, while keeping its local data private. This distributed approach helps address the limitations of client device capabilities (because training is feasible) and often mitigates data privacy concerns. While federative learning generally limits the amount of model data in any single transmission between the server and client (or vice versa), the iterative nature of federative learning still generates a significant amount of data transmission traffic during training, which can be quite costly depending on the device and connection type. Therefore, it is generally desirable to attempt to reduce the size of data exchange between the server and client during federative learning. However, conventional methods for reducing data exchange, such as using lossy compression of model data to limit the amount of data exchanged between the server and client, result in inadequate models. Furthermore, conventional federative learning has demonstrated a lack of privacy. This is a diagram illustrating an exemplary associative learning architecture.This figure shows algorithm 1, an example of a sender-side implementation of irreversible relative entropy coding.This figure shows an example algorithm for a receiver-side implementation of irreversible relative entropy coding.This is a schematic diagram showing the relative entropy coding performed on associative learning updates.This figure shows an exemplary server-side algorithm for applying relative entropy coding to associative learning.This figure shows an exemplary client-side algorithm for applying relative entropy coding to associative learning.This figure shows an exemplary client-side algorithm for applying differential private relative entropy coding to federated learning.This figure shows an exemplary server-side algorithm for applying differential private relative entropy coding to federative learning.This is a schematic diagram showing how to perform differential private relative entropy coding on associative learning updates.This figure shows an exemplary method for performing associative learning according to the embodiments described herein.This figure shows another exemplary method for performing associative learning according to the embodiments described herein.This figure shows an exemplary processing system that may be configured to perform the methods described herein.This figure shows an exemplary processing system that may be configured to perform the methods described herein. For ease of understanding, the same reference numerals are used to designate identical elements common to the drawings where possible. It is intended that elements and features of one embodiment may be usefully incorporated into other embodiments without further description. Aspects of this disclosure provide apparatus, methods, processing systems, and non-temporary computer-readable media for bidirectional compression for efficient private communication in machine learning, particularly in federative learning. The performance of modern neural network-based machine learning models is very closely proportional to the amount of data they are trained on. At the same time, industry, legislators, and consumers are becoming more aware of the need to protect the privacy of the data that may be used to train such models.