KR-20260068068-A - Quantization-Aware Federation Training to Handle Edge Device Hardware Capabilities

KR20260068068AKR 20260068068 AKR20260068068 AKR 20260068068AKR-20260068068-A

Abstract

A processor-implementation method for quantization-aware federated training includes the step of quantizing a global model by a server. The global model is quantized at multiple different quantization levels for each of one or more subnetwork models to generate one or more quantized subnetwork models. One or more subnetwork models are assigned to one or more of multiple devices according to device processing capabilities. The server distributes the quantized subnetwork models to one or more of the multiple devices. The server receives model updates from one or more devices based on local data. Based on the model updates from each of the one or more devices, the server generates an updated global model according to an aggregation function.

Inventors

첸, 안
메이유리, 비자야 다타

Assignees

퀄컴 인코포레이티드

Dates

Publication Date: 20260513
Application Date: 20240729
Priority Date: 20230911

Claims (20)

As a processor-implemented method performed by one or more processors, A step of quantizing a global model by a server - said global model is quantized at multiple different quantization levels for each of one or more subnetwork models to generate one or more quantized subnetwork models, and said one or more subnetwork models are assigned to one or more of multiple devices according to device processing capabilities -; A step of distributing a quantized subnetwork model to at least one of the plurality of devices by the above server; The step of receiving a model update from at least one device based on local data by the server above; and A processor-implemented method comprising the step of generating an updated global model according to an aggregation function based on model updates from each of the at least one device by the server.
In paragraph 1, A processor-implemented method further comprising the step of fine-tuning the updated global model using public data from a backup device.
In paragraph 1, A processor-implemented method further comprising the step of quantizing the updated global model at the plurality of different quantization levels for each of the one or more subnetwork models to generate one or more quantized updated subnetwork models by the server.
In paragraph 1, A processor-implemented method further comprising the step of applying a normalization process to the updated global model to reduce bias toward server data.
In paragraph 1, A processor-implemented method in which the model update from the at least one device is a quantized model update based on quantization by the at least one device.
As a processor-implemented method performed by one or more processors, A step of receiving a quantized subnetwork model from a server by a device - said quantized subnetwork model corresponds to a global model, said global model is quantized at a plurality of different quantization levels for each of the one or more subnetwork models to generate one or more quantized subnetwork models, said one or more subnetwork models are assigned to one or more of a plurality of devices according to device processing capabilities -; A step of generating a model update for the quantized subnetwork model based on local data by the above device; and A processor-implemented method comprising the step of transmitting the model update to the server by the device, wherein the server generates an updated global model based on the model update.
In paragraph 6, A processor-implemented method in which the above-mentioned updated global model is generated using an aggregation function based on the model update by the above-mentioned device.
In paragraph 6, A processor-implemented method further comprising the step of quantizing the model update to generate the quantized model update by the above device, wherein the quantized model update is used to generate the updated global model.
In paragraph 8, A processor-implemented method further comprising the step of applying a first normalization process to the updated global model to reduce a first bias toward server data or applying a second normalization process to the quantized model update to reduce a second bias toward device data.
In paragraph 8, A processor-implemented method further comprising the step of repeating the receiving step, the generating step, and the transmitting step for a plurality of training rounds; wherein the quantization is performed on a subset of the training rounds based on the device processing capabilities.
As a device, At least one memory; and It includes at least one processor coupled to the above at least one memory, and the at least one processor, A global model is quantized by a server—the global model is quantized at multiple different quantization levels for each of the one or more subnetwork models to generate one or more quantized subnetwork models, and the one or more subnetwork models are assigned to one or more of the multiple devices according to device processing capabilities—; The quantized subnetwork model is distributed to at least one of the plurality of devices by the above server; The above server receives a model update from at least one device based on local data; and The server above generates an updated global model according to an aggregation function based on the model updates from each of the at least one device. A device configured.
In Paragraph 11, A device in which at least one processor is further configured to fine-tune the updated global model using public data from a backup device.
In Paragraph 11, A device wherein the at least one processor is further configured by the server to quantize the updated global model at the plurality of different quantization levels for each of the one or more quantized updated subnetwork models to generate one or more quantized updated subnetwork models.
In Paragraph 11, A device in which at least one processor is additionally configured to apply a normalization process to the updated global model to reduce bias toward server data.
In Paragraph 11, A device in which the model update from the at least one device is a quantized model update based on quantization by the at least one device.
As a device, At least one memory; and It includes at least one processor coupled to the above at least one memory, and the at least one processor, A device receives a quantized subnetwork model from a server—the quantized subnetwork model corresponds to a global model, the global model is quantized at multiple different quantization levels for each of the one or more subnetwork models to generate one or more quantized subnetwork models, and the one or more subnetwork models are assigned to one or more of multiple devices according to device processing capabilities—; The above device generates a model update for the quantized subnetwork model based on local data; and A device configured to transmit the model update to the server by the above device, and the server generates an updated global model based on the model update.
In Paragraph 16, The device, wherein the above-mentioned updated global model is generated using an aggregation function based on the model update by the above-mentioned device.
In Paragraph 16, The above at least one processor is further configured to perform quantization of the model update by the device to generate a quantized model update, the quantized model update being used to generate the updated global model, the device.
In Paragraph 18, A device further configured such that at least one processor applies a first normalization process to the updated global model to reduce a first bias toward server data or applies a second normalization process to the quantized model update to reduce a second bias toward device data.
In Paragraph 18, The above at least one processor is further configured to repeat the receiving, the generating, and the transmitting for a plurality of training rounds; and the device performs the quantization in a subset of the training rounds based on the device processing capabilities.

Description

Quantization-Aware Federation Training to Handle Edge Device Hardware Capabilities Cross-reference regarding related applications This application claims priority to U.S. Patent Application No. 18/465,034, filed on September 11, 2023, titled "QUANTIZATION-AWARE FEDERATED TRAINING TO ADDRESS EDGE DEVICES HARDWARE CAPABILITIES," the entire disclosure of which is expressly incorporated by reference. Technology field Aspects of the present invention generally relate to neural networks, and more specifically to quantization-aware federated training for handling edge device hardware capabilities. Federated learning is an approach for the cooperative training of neural networks across multiple edge devices without collecting data from a central location. Because of decentralized training where raw data is not shared among edge devices, federated learning is beneficial for applications where privacy is a critical factor. Federated learning aims to address differential privacy, continuous learning, and personalization by having edge (or end) devices perform training locally using collected data and transmit only weight updates rather than raw data. While federated learning frameworks can address these fundamental problems, performing training on devices is difficult and can be burdensome in terms of memory and compute resources. As a result, some resource-constrained devices may be hindered from participating in the federated learning process. Such restrictions on participation can lead to model bias and reduced model performance. The present disclosure is described in independent claims. Some aspects of the present disclosure are described in dependent claims. In various aspects of the present disclosure, a processor-implemented method comprises the step of quantizing a global model by a server. The global model is quantized at a plurality of different quantization levels for each of one or more subnetwork models to generate one or more quantized subnetwork models. One or more subnetwork models are assigned to one or more of a plurality of devices according to device processing capabilities. The processor-implemented method also comprises the step of distributing the quantized subnetwork models by a server to at least one of a plurality of devices. The processor-implemented method further comprises the step of receiving a model update from at least one device based on local data by a server. The processor-implemented method further comprises the step of generating an updated global model by a server according to an aggregation function based on the model update from each of the at least one device. Some aspects of the present disclosure relate to an apparatus having at least one memory and one or more processors coupled to at least one memory. The processor(s) are configured to quantize a global model by a server. The global model is quantized at a plurality of different quantization levels for each of one or more subnetwork models to generate one or more quantized subnetwork models. One or more subnetwork models are assigned to one or more of a plurality of devices according to device processing capabilities. The processor(s) are also configured to distribute the quantized subnetwork models to at least one of the plurality of devices by a server. The processor(s) are further configured to receive model updates from at least one device by a server based on local data. The processor(s) are further configured to include means for generating an updated global model by a server according to an aggregation function based on model updates from each of the at least one device. In various aspects of the present disclosure, a processor-implemented method comprises the step of receiving a quantized subnetwork model from a server by a device. The quantized subnetwork model corresponds to a global model. The global model is quantized at a plurality of different quantization levels for each of the one or more subnetwork models to generate one or more quantized subnetwork models. The one or more subnetwork models are assigned to one or more of a plurality of devices according to the device processing capabilities. The processor-implemented method also comprises the step of generating a model update for the quantized subnetwork model by a device based on local data. The processor-implemented method further comprises the step of transmitting the model update by a device to a server, and the server generates an updated global model based on the model update. Some aspects of the present disclosure relate to an apparatus having at least one memory and one or more processors coupled to at least one memory. The processor(s) are configured to receive a quantized subnetwork model from a server by the device. The quantized subnetwork model corresponds to a global model. The global model is quantized at a plurality of different quantization levels for each of the one or more subnetwork models to generate one or more quantized subnetwork mo