EP-4740142-A1 - CONTRASTIVE LEARNING IN A FEDERATED ENVIRONMENT

EP4740142A1EP 4740142 A1EP4740142 A1EP 4740142A1EP-4740142-A1

Abstract

Certain aspects are directed to implementation of a contrastive learning model in a federated learning environment. In certain aspects, a wireless node may be configured to input a data point into a first neural network configured to output a feature of the data point. In certain aspects, the wireless node may be configured to input the feature into a second neural network and a third neural network, wherein the second neural network is configured to output a predicted client identifier based on the feature, wherein the predicted client identifier is indicative of a predicted source of the data point, and wherein the third neural network is associated with a contrastive loss computation of the data point.

Inventors

LOUIZOS, Christos
REISSER, Matthias
KORZHENKOV, Denis

Assignees

QUALCOMM INCORPORATED

Dates

Publication Date: 20260513
Application Date: 20240508

Claims (20)

1. A method for training a federated contrastive learning model, comprising: inputting a data point into a first neural network configured to output a feature of the data point; inputting the feature into a second neural network and a third neural network, wherein the second neural network is configured to output a predicted client identifier based on the feature, wherein the predicted client identifier is indicative of a predicted source of the data point, and wherein the third neural network is associated with a contrastive loss computation of the data point; and outputting: the contrastive loss computation, and a client identifier loss computation based on comparing the predicted client identifier to an actual client identifier, wherein the actual client identifier is indicative of an actual source of the data point.
2. The method of claim 1, wherein the federated contrastive learning model is an unsupervised learning model.
3. The method of claim 1, wherein the feature is output as a tensor.
4. The method of claim 1, wherein the federated contrastive learning model is a framework for contrastive learning of at least one of visual representations, audio, or text, and wherein the data point is a visual representation, an audio instance, or a text.
5. The method of claim 1, wherein the method further comprises: inputting the feature into a fourth neural network configured to output a predicted label based on the feature, wherein the predicted label is indicative of a predicted label of the data point; and outputting a label loss computation based on comparing the predicted label to an actual label of the data point.
6. The method of claim 5, wherein the data point is one of a group of multiple data points, wherein each data point in the group of multiple data points comprises the actual label.
7. The method of claim 5, wherein the federated contrastive learning model is a supervised or a semi-supervised learning model, and wherein the contrastive loss computation is based at least in part on the actual label.
8. An apparatus configured to train a federated contrastive learning model, comprising: a memory; and at least one processor coupled to the memory and configured to: input a data point into a first neural network configured to output a feature of the data point; input the feature into a second neural network and a third neural network, wherein the second neural network is configured to output a predicted client identifier based on the feature, wherein the predicted client identifier is indicative of a predicted source of the data point, and wherein the third neural network is associated with a contrastive loss computation of the data point; and output: the contrastive loss computation, and a client identifier loss computation based on comparing the predicted client identifier to an actual client identifier, wherein the actual client identifier is indicative of an actual source of the data point.
9. The apparatus of claim 8, wherein the federated contrastive learning model is an unsupervised learning model.
10. The apparatus of claim 8, wherein the feature is output as a tensor.
11. The apparatus of claim 8, wherein the federated contrastive learning model is a framework for contrastive learning of at least one of visual representations, audio, or text, and wherein the data point is a visual representation, an audio instance, or a text.
12. The apparatus of claim 8, wherein the at least one processor is further configured to: input the feature into a fourth neural network configured to output a predicted label based on the feature, wherein the predicted label is indicative of a predicted label of the data point; and output a label loss computation based on comparing the predicted label to an actual label of the data point.
13. The apparatus of claim 12, wherein the data point is one of a group of multiple data points, wherein each data point in the group of multiple data points comprises the actual label.
14. The apparatus of claim 12, wherein the federated contrastive learning model is a supervised or a semi-supervised learning model, and wherein the contrastive loss computation is based at least in part on the actual label.
15. An apparatus configured to train a federated contrastive learning model, comprising: means for inputting a data point into a first neural network configured to output a feature of the data point; means for inputting the feature into a second neural network and a third neural network, wherein the second neural network is configured to output a predicted client identifier based on the feature, wherein the predicted client identifier is indicative of a predicted source of the data point, and wherein the third neural network is associated with a contrastive loss computation of the data point; and means for outputting: the contrastive loss computation, and a client identifier loss computation based on comparing the predicted client identifier to an actual client identifier, wherein the actual client identifier is indicative of an actual source of the data point.
16. The apparatus of claim 15, wherein the federated contrastive learning model is an unsupervised learning model.
17. The apparatus of claim 15, wherein the feature is output as a tensor.
18. The apparatus of claim 15, wherein the federated contrastive learning model is a framework for contrastive learning of at least one of visual representations, audio, or text, and wherein the data point is a visual representation, an audio instance, or a text.
19. The apparatus of claim 15, wherein the apparatus further comprises: means for inputting the feature into a fourth neural network configured to output a predicted label based on the feature, wherein the predicted label is indicative of a predicted label of the data point; and means for outputting a label loss computation based on comparing the predicted label to an actual label of the data point.
20. The apparatus of claim 19, wherein the data point is one of a group of multiple data points, wherein each data point in the group of multiple data points comprises the actual label.

Description

CONTRASTIVE LEARNING IN A FEDERATED ENVIRONMENT CROSS-REFERENCE TO RELATED APPLICATIONS [0001] This application claims the benefit of Greek Patent Application Number 20230100540 titled “CONSTRASTIVE LEARNING IN A FEDERATED ENVIRONMENT,” filed July 4, 2023, which is incorporated herein by reference in its entirety. BACKGROUND Technical Field [0002] The present disclosure generally relates to machine learning, and more particularly, to contrastive learning in a federated environment. Introduction [0003] Machine learning may produce a trained model (e.g., an artificial neural network, a tree, or other structures), which represents a generalize fit to a set of training data that is known a priori. Applying the trained model to new data produces inferences, which may be used to gain insights into the new data. In some cases, applying the model to the new data is described as “running an inference” on the new data. [0004] Machine learning models are seeing increased adoption across myriad domains, including for use in classification, detection, and recognition tasks. For example, machine learning models are being used to perform complex tasks on electronic devices based on sensor data provided by one or more sensors onboard such devices, such as automatically classifying features (e.g., faces) within images. [0005] One machine learning model is called a “centralized learning” model. In centralized machine learning, an electronic communication device or “edge device” (e.g., a mobile phone, laptop, tablet, desktop computer, smart TV, client server, etc.) is communicatively connected to a central server and configured to upload local data to the server. The central server typically performs all the computational tasks necessary to train the data. While centralized training is computationally-efficient for the participating clients who are free from the computation responsibilities, the clients’ private data is also collected which can cause a privacy risk for the client. [0006] Another machine learning model is called a “federated learning” model that uses a decentralized training process. In some examples, multiple edge devices download a model (e.g., a pre-trained foundation model) from a central server. The edge devices then train the model on their private data to generate a new configuration of the model. The edge devices then summarize and encrypt their corresponding new model configurations and send them back to the server to be decrypted, averaged, and integrated into an updated model. Iteration after iteration, the collaborative training continues until the model is fully trained. [0007] There exists a need for further improvements in machine learning technology. These improvements may be applicable to different types of machine learning models, including federated learning models. SUMMARY [0008] The following presents a simplified summary of one or more aspects in order to provide a basic understanding of such aspects. This summary is not an extensive overview of all contemplated aspects, and is intended to neither identify key or critical elements of all aspects nor delineate the scope of any or all aspects. Its sole purpose is to present some concepts of one or more aspects in a simplified form as a prelude to the more detailed description that is presented later. [0009] Aspects of the disclosure are directed to a method for training a federated contrastive learning model. In some examples, the method includes inputting a data point into a first neural network configured to output a feature of the data point. In some examples, the method includes inputting the feature into a second neural network and a third neural network, wherein the second neural network is configured to output a predicted client identifier based on the feature, wherein the predicted client identifier is indicative of a predicted source of the data point, and wherein the third neural network is associated with a contrastive loss computation of the data point. In some examples, the method includes outputting: the contrastive loss computation, and a client identifier loss computation based on comparing the predicted client identifier to an actual client identifier, wherein the actual client identifier is indicative of an actual source of the data point. [0010] Aspects of the disclosure are directed to an apparatus configured to train a federated contrastive learning model. In some examples, the apparatus includes a memory and at least one processor coupled to the memory. In some examples, the memory includes instructions configured to cause the apparatus to input a data point into a first neural network configured to output a feature of the data point. In some examples, the memory includes instructions configured to cause the apparatus to input the feature into a second neural network and a third neural network, wherein the second neural network is configured to output a predicted client identifier based on the feature, wherein the