CN-121766393-B - Method and system for learning data-free federal increment of equipment under resource limitation

CN121766393BCN 121766393 BCN121766393 BCN 121766393BCN-121766393-B

Abstract

Training a model based on an increment data stream through a deviation correction mechanism to obtain trained local parameters, uploading the trained local parameters to a cloud, calculating the difference value of the parameters to obtain gradients of all edge computing devices, calculating a federal average update direction, updating the deviation degree of the gradients and the federal average update direction, calculating an aggregation weight, and carrying out weighted fusion on the updated gradients to obtain an updated global model; and carrying out knowledge distillation based on the synthetic sample, migrating the knowledge of the old task to the updated global model, obtaining a final global model and issuing the final global model to the edge. According to the invention, federal incremental learning of the resource-limited edge equipment is realized through deviation correction training, deviation degree and information entropy double-perception aggregation and attention-directed data synthesis and distillation.

Inventors

GONG XINRONG
Ke Zibin
YANG KAIXIANG
SHI YIFAN
LIN QI
ZENG HAIXIN
ZENG HUANQIANG
ZHU JIANQING
CHEN JING

Assignees

华侨大学

Dates

Publication Date: 20260508
Application Date: 20260303

Claims (6)

1. The data-free federal incremental learning method for the equipment under the condition of limited resources is characterized by comprising the following steps of: S1, each edge computing device obtains an incremental type data stream corresponding to a current task, calculates local type distribution corresponding to each edge computing device based on the data stream, trains a local lightweight model based on the incremental type data stream through a deviation correction mechanism under the constraint of meeting zero historical data storage, and obtains trained local model parameters; training the local lightweight model through a deviation correction mechanism based on the increment class data stream to obtain trained local model parameters, wherein the method comprises the following steps of: Obtaining local category distribution corresponding to each edge computing device based on incremental category data stream calculation , wherein, Represent the first Class in individual tasks In the first place Frequency of occurrence in the individual edge computing devices; Sample in incremental class data stream Inputting a local lightweight model to obtain an original Logit value And based on category probabilities Performing calibration to obtain an adjusted Logit value : ; Wherein, the And To control the coefficient of the calibration intensity; calculating cross entropy loss of data streams based on adjusted Logit values ; Based on the empirical marginal distribution of the data flow calculation model, the formula is as follows: ; Wherein, the Is of batch size; is the temperature coefficient; Representing a class label set corresponding to the t-th task; Then the KL divergence of the experience marginal distribution and the uniform prior distribution is calculated to obtain prior matching loss ; Finally, based on minimization of cross entropy loss by And a priori matching loss Forming a total loss function, and updating the weight parameters of the local lightweight model through a back propagation algorithm until convergence to obtain trained local model parameters; S2, the cloud server collects local model parameters uploaded by each edge computing device, calculates the difference value between each model parameter and the current global model parameter, and obtains updated gradient information of each edge computing device; S3, the cloud server calculates a corresponding federal average update direction based on the update gradient information of each edge computing device, calculates an aggregation weight based on the update gradient information of each edge computing device, the deviation degree of the federal average update direction and the information entropy of the local category distribution corresponding to each edge computing device, and performs weighted aggregation on the update gradient information based on the aggregation weight to obtain an updated global model; s4, the cloud server constructs a generator containing an attention guiding module, and generates a synthesized sample of the old task by minimizing diversity loss under the condition that the original data is not participated by the generator; S5, the cloud server migrates knowledge of the old task to the updated global model through knowledge distillation based on the synthesized sample of the old task, obtains a final updated global model, and issues the final updated global model to each edge computing device.
2. The method for learning the federal incremental learning without data for devices under resource limitation according to claim 1, wherein in S2, calculating differences between each model parameter and a current global model parameter to obtain updated gradient information of each edge computing device, specifically comprising: Will be the first The local model parameters uploaded by the edge computing devices are noted as The current global model parameters are recorded as Calculating vector differences between local model parameters and current global model parameters Will be As the first The updated gradient information for the individual edge computing devices.
3. The method for learning the data-free federal increment of a device under resource limitation according to claim 1, wherein in S3, obtaining the updated global model specifically includes: updating gradient information based on edge computing devices Calculating federal average update direction : ; Wherein the method comprises the steps of Calculating the number of devices for the edge; computing information entropy based on local category distribution corresponding to each edge computing device : ; Wherein, the Represent the first Class in individual tasks In the first place Frequency of occurrence in the individual edge computing devices; Representing a class label set corresponding to the t-th task; and calculating distributed perception weights based on information entropy : ; Wherein the method comprises the steps of To control the coefficient of the weight balance strength; Computing updated gradient information for edge computing devices And federal average update direction Cosine similarity of (2) Cosine similarity As the deviation degree, calculating the direction consistency weight through the Sigmoid function Wherein For adjusting the coefficient; calculating final aggregate weights based on distributed perceptual weights and direction consistency weights : ; Based on final aggregate weights The updated gradient information is weighted and summed, specifically: ; and finally obtaining the global model updating quantity, and adding the global model updating quantity and the current global model parameters to obtain an updated global model.
4. The resource constrained device data-free federal incremental learning method of claim 1, wherein generating a composite sample of old tasks by minimizing diversity loss in S4 comprises: Taking a model corresponding to the current global model parameter as a teacher model, and freezing the network weight of the teacher model; sampling random noise from gaussian distribution And input into a generator, through which candidate samples are generated Inputting the candidate sample into a teacher model to obtain a feature map, and modulating the space, the channel and the category of the feature map through an attention guiding module to obtain a feature representation ; Calculating cross entropy loss of prediction output and pseudo tag of candidate sample based on teacher model Statistical alignment loss is calculated based on statistics of candidate samples in all batches of normalization layers of the teacher model and mean and variance stored in the teacher model And based on characteristic representation Calculating diversity loss The calculation formula is as follows: ; Wherein, the Is a weight inversely proportional to the category frequency; Representing categories Is the first of (2) Feature vectors of the synthesized samples processed by the attention guiding module; Representing categories Is the first of (2) Feature vectors of the synthesized samples processed by the attention guiding module; Representing categories A corresponding set of synthetic samples; representing a cosine similarity function; Minimizing cross entropy loss Counting alignment loss And loss of diversity And updating weight parameters of the generator through a back propagation algorithm until convergence to obtain a trained generator, inputting random noise into the trained generator, and outputting a generated sample as a synthesized sample of an old task.
5. The method for learning the data-free federal increment of equipment under the limitation of resources according to claim 1, wherein in S5, based on the synthesized sample of the old task, the knowledge of the old task is migrated to the updated global model through knowledge distillation, and the final updated global model is obtained, which specifically comprises: taking a composite sample of the old task as Taking a model corresponding to the current global model parameter as a teacher model, and taking the updated global model as a student model; will synthesize the sample Respectively inputting teacher model and student model via temperature coefficient Respectively calculating the controlled Softmax functions to obtain soft targets of teachers Prediction with students ; Calculating teacher soft target Prediction with students The KL divergence between them gives the distillation loss ; Minimizing distillation losses And updating weight parameters of the student model through a back propagation algorithm until convergence to obtain a final updated global model.
6. A device data-free federal incremental learning system under resource limitation, comprising: The local model parameter acquisition module is used for acquiring an incremental type data stream corresponding to the current task by each edge computing device, calculating based on the data stream to obtain local type distribution corresponding to each edge computing device, and training a local lightweight model based on the incremental type data stream through a deviation correction mechanism under the constraint of meeting zero history data storage to obtain trained local model parameters; training the local lightweight model through a deviation correction mechanism based on the increment class data stream to obtain trained local model parameters, wherein the method comprises the following steps of: Obtaining local category distribution corresponding to each edge computing device based on incremental category data stream calculation , wherein, Represent the first Class in individual tasks In the first place Frequency of occurrence in the individual edge computing devices; Sample in incremental class data stream Inputting a local lightweight model to obtain an original Logit value And based on category probabilities Performing calibration to obtain an adjusted Logit value : ; Wherein, the And To control the coefficient of the calibration intensity; calculating cross entropy loss of data streams based on adjusted Logit values ; Based on the empirical marginal distribution of the data flow calculation model, the formula is as follows: ; Wherein, the Is of batch size; is the temperature coefficient; Representing a class label set corresponding to the t-th task; Then the KL divergence of the experience marginal distribution and the uniform prior distribution is calculated to obtain prior matching loss ; Finally, based on minimization of cross entropy loss by And a priori matching loss Forming a total loss function, and updating the weight parameters of the local lightweight model through a back propagation algorithm until convergence to obtain trained local model parameters; The gradient information updating module is used for collecting local model parameters uploaded by each edge computing device by the cloud server, and calculating the difference value between each model parameter and the current global model parameter to obtain updated gradient information of each edge computing device; The global model updating module is used for the cloud server to calculate the corresponding federal average updating direction based on the updating gradient information of each edge computing device, calculate the aggregation weight based on the updating gradient information of each edge computing device, the deviation degree of the federal average updating direction and the information entropy of the local category distribution corresponding to each edge computing device, and carry out weighted aggregation on the updating gradient information based on the aggregation weight to obtain an updated global model; The system comprises a synthetic sample generation module, a data processing module and a data processing module, wherein the synthetic sample generation module is used for constructing a generator containing an attention guiding module by a cloud server, and generating a synthetic sample of an old task by the generator without participation of original data by minimizing diversity loss; The model issuing module is used for the cloud server to migrate knowledge of the old task to the updated global model through knowledge distillation based on the synthesized sample of the old task, obtain the final updated global model and issue the final updated global model to each edge computing device.

Description

Method and system for learning data-free federal increment of equipment under resource limitation Technical Field The invention relates to the technical field of migration learning, in particular to a method and a system for data-free federal incremental learning of equipment under resource limitation. Background In internet of things (IoT) applications, such as in the context of smart security, industrial monitoring, etc., massive amounts of sensor data are typically processed by lightweight devices (e.g., smart cameras, embedded development boards) distributed at the edge of the network. Federal Learning (FL) is widely used in edge intelligence as a distributed training paradigm for privacy preservation. However, the actual IoT edge environment faces more serious challenges than traditional federal learning: 1. The long tail and the dynamic property of the data distribution are that the data flow collected by the edge equipment always presents serious long tail distribution, and the new object category is continuously increased along with the time; 2. strict resource constraint that the storage space of the IoT edge device is extremely small, and historical data of an old task cannot be saved, which disables the traditional incremental learning method based on data playback; 3. The heterogeneity among devices, namely the huge difference of sensor data distribution at different positions, which causes conflict of update directions of local models, and high-performance global models are difficult to aggregate; The prior art generally relies on a common data set to assist or require the device to cache portions of historical data, which both violates privacy protection principles and breaks through the storage limits of the edge devices. Therefore, there is a need for an edge federal learning scheme that does not have data caching, and that can accommodate long tail and dynamic environments. Disclosure of Invention In order to solve the problems, the invention provides a data-free federal incremental learning method and system for equipment under the limitation of resources, which realize the technical effects of continuously learning new tasks and effectively preventing old task knowledge forgetting under the constraint of zero historical data storage, no original data uploading, low communication and calculation overhead through local lightweight training of deviation correction, gradient weighted aggregation of information entropy and direction consistency and attention-directed data synthesis and knowledge distillation technology. In one aspect, a method for federally incremental learning of a device under resource constraints includes: S1, each edge computing device obtains an incremental type data stream corresponding to a current task, calculates local type distribution corresponding to each edge computing device based on the data stream, trains a local lightweight model based on the incremental type data stream through a deviation correction mechanism under the constraint of meeting zero historical data storage, and obtains trained local model parameters; S2, the cloud server collects local model parameters uploaded by each edge computing device, calculates the difference value between each model parameter and the current global model parameter, and obtains updated gradient information of each edge computing device; S3, the cloud server calculates a corresponding federal average update direction based on the update gradient information of each edge computing device, calculates an aggregation weight based on the update gradient information of each edge computing device, the deviation degree of the federal average update direction and the information entropy of the local category distribution corresponding to each edge computing device, and performs weighted aggregation on the update gradient information based on the aggregation weight to obtain an updated global model; s4, the cloud server constructs a generator containing an attention guiding module, and generates a synthesized sample of the old task by minimizing diversity loss under the condition that the original data is not participated by the generator; S5, the cloud server migrates knowledge of the old task to the updated global model through knowledge distillation based on the synthesized sample of the old task, obtains a final updated global model, and issues the final updated global model to each edge computing device. Further, in S1, training the local lightweight model based on the incremental class data stream through a bias correction mechanism to obtain trained local model parameters, including: Obtaining local category distribution corresponding to each edge computing device based on incremental category data stream calculation , wherein,Represent the firstClass in individual tasksIn the first placeFrequency of occurrence in the individual edge computing devices; Sample in incremental class data stream Inputting a local lightweight mod