CN-116187431-B - Federal learning distillation method and device for non-independent co-distribution scene

CN116187431BCN 116187431 BCN116187431 BCN 116187431BCN-116187431-B

Abstract

The invention relates to the technical field of artificial intelligence, and provides a federal learning distillation method and a federal learning distillation device for a non-independent co-distribution scene, wherein the method is based on the similarity between non-tag data of a target terminal and initial tag data of the target terminal, the first characteristic similarity between the non-tag data of the target terminal and initial tag data of other terminals and the second characteristic similarity between the initial tag data of the target terminal and the initial tag data of other terminals, training samples of the initial teacher model and the basic model can be enriched, training efficiency of the initial teacher model and the basic model can be greatly improved, generalization capability of the obtained target teacher model and the obtained student model is stronger, and accuracy of an aggregate model obtained by federal learning can be improved. In addition, the method combines knowledge distillation and federal learning, so that the student model can learn knowledge of other terminals which do not exist at all, namely, the student model has no relevant label in own data, but can learn relevant knowledge through federal learning.

Inventors

SHEN CHAOFENG
WU YIJUN
ZHU YANSHU
Liang Qianneng

Assignees

安徽科讯金服科技有限公司

Dates

Publication Date: 20260508
Application Date: 20230213

Claims (10)

1. The federal learning distillation method for the non-independent co-distribution scene is characterized by being applied to a target terminal, wherein data and/or labels of all terminals under a target service end to which the target terminal belongs meet the non-independent co-distribution, and the method comprises the following steps: Determining initial tag data and non-tag data of the target terminal, and marking the non-tag data based on the similarity between the non-tag data and the initial tag data of the target terminal to obtain first tag data; The target terminal is subjected to label alignment with other terminals under the target server, the non-label data is marked based on the first characteristic similarity of the non-label data and the initial label data of the other terminals to obtain second label data, and third label data is determined based on the second characteristic similarity of the initial label data of the target terminal and the initial label data of the other terminals; Training an initial teacher model based on initial label data, the first label data, the second label data, the third label data and a label alignment result of the target terminal to obtain a target teacher model, and performing label prediction on the initial label data of the target terminal based on the target teacher model to obtain a soft label of the initial label data of the target terminal; Based on the soft tag of the initial tag data of the target terminal, the first tag data, the second tag data and the third tag data, carrying out local distillation on a basic model to obtain a student model, and carrying out federal learning based on the student model; The initial tag data and the non-tag data are obtained by clustering local data by the target terminal, the local data comprise pictures or privacy data, and tags carried by the initial tag data comprise object categories or privacy data categories in the pictures.
2. The federal learning distillation method for non-independent co-distributed scenarios according to claim 1, wherein the first feature similarity is determined based on the steps of: Determining an initial feature extraction model, and extracting a first feature vector of the non-tag data based on the initial feature extraction model; performing differential privacy protection on a preset structure in the initial feature extraction model to obtain a target feature extraction model; The target feature extraction model is sent to the other terminals, and second feature vectors of the initial tag data extracted by the other terminals based on the target feature extraction model are received; And determining the similarity between the first feature vector and the second feature vector as the first feature similarity.
3. The federal learning distillation method for non-independent co-distributed scenarios according to claim 2, wherein the second feature similarity is determined based on the steps of: extracting a third feature vector of the initial tag data of the target terminal based on the initial feature extraction model; and determining the similarity between the third feature vector and the second feature vector as the second feature similarity.
4. A federal learning distillation method for non-independent co-distributed scenarios according to any of claims 1-3, wherein the federal learning based on the student model comprises: Uploading the student model to the target server; And receiving an aggregation model obtained by the target server based on federal average aggregation of the student models uploaded by the terminals, and circularly performing local distillation by taking the aggregation model as the basic model until federal learning is finished.
5. The federal learning distillation method for a non-independent co-distributed scenario according to any one of claims 1-3, wherein the labeling the non-tag data based on the similarity between the non-tag data and the initial tag data of the target terminal to obtain first tag data includes: Determining first similar data with the largest similarity with the non-tag data in the initial tag data of the target terminal, and marking the non-tag data based on the tag carried by the first similar data to obtain the first tag data.
6. The federal learning distillation method for a non-independent co-distributed scenario according to any one of claims 1-3, wherein the labeling the non-tag data based on a first feature similarity of the non-tag data and initial tag data of the other terminal to obtain second tag data includes: And determining second similar data with the largest similarity with the first characteristics of the non-tag data in the initial tag data of the other terminals, and marking the non-tag data based on the tag carried by the second similar data to obtain the second tag data.
7. A federal learning distillation method for non-independent co-distribution scenarios according to any of claims 1-3, wherein the determining third tag data based on the second feature similarity of the initial tag data of the target terminal and the initial tag data of the other terminals comprises: Calculating a label average value carried by third similar data and fourth similar data, wherein the third similar data is initial label data of the target terminal corresponding to the second characteristic similarity larger than a preset threshold value, and the fourth similar data is initial label data of the other terminals corresponding to the second characteristic similarity larger than the preset threshold value; And taking the label average value as a label of the third phase data to obtain the third label data.
8. The federal learning distillation device for the non-independent co-distribution scene is characterized by being applied to a target terminal, wherein data and/or labels of all terminals under a target service end to which the target terminal belongs meet the non-independent co-distribution, and the device comprises: the data aggregation module is used for determining initial tag data and non-tag data of the target terminal, and labeling the non-tag data based on the similarity between the non-tag data and the initial tag data of the target terminal to obtain first tag data; The data labeling module is used for aligning the target terminal with other terminals under the target server, labeling the non-label data based on the first characteristic similarity of the non-label data and the initial label data of the other terminals to obtain second label data, and determining third label data based on the second characteristic similarity of the initial label data of the target terminal and the initial label data of the other terminals; The label prediction module is used for training an initial teacher model based on the initial label data of the target terminal, the first label data, the second label data, the third label data and a label alignment result to obtain a target teacher model, and performing label prediction on the initial label data of the target terminal based on the target teacher model to obtain a soft label of the initial label data of the target terminal; The federation distillation module is used for carrying out local distillation on the basic model based on the soft tag of the initial tag data of the target terminal, the first tag data, the second tag data and the third tag data to obtain a student model, and carrying out federation learning based on the student model; The initial tag data and the non-tag data are obtained by clustering local data by the target terminal, the local data comprise pictures or privacy data, and tags carried by the initial tag data comprise object categories or privacy data categories in the pictures.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the federal learning distillation method for non-independent co-distributed scenarios according to any one of claims 1-7 when the program is executed by the processor.
10. A non-transitory computer readable storage medium having stored thereon a computer program, wherein the computer program when executed by a processor implements the federal learning distillation method oriented to a non-independent co-distributed scenario according to any of claims 1-7.

Description

Federal learning distillation method and device for non-independent co-distribution scene Technical Field The invention relates to the technical field of artificial intelligence, in particular to a federal learning distillation method and device for a non-independent co-distribution scene. Background Federal learning (FEDERATED LEARNING, FL) is a novel model training method, which can perform preliminary training on global models issued by a server through local data by using each scattered terminal device, then upload the preliminarily trained local models to the server by using each terminal device, perform unified aggregation on each uploaded local model at the server, and issue an aggregated model to each terminal device. The federal learning realizes not only that local data is not leaked, and effectively protects the privacy safety of the local data, but also that mass and scattered local data are fully utilized for model training, and a local model with better fitting performance is obtained. Since federal learning allows participants to cooperatively train a model without sharing data, the privacy of local data is well protected and the data island is broken, federal learning is widely focused, and is particularly widely applied to distributed training scenes. In a distributed training scenario, many conventional distributed machine learning algorithms need to assume that the data distribution is uniform, i.e., the data distribution between the various terminal devices needs to follow Independent co-distributions (IIDs). However, in real life, the generation of local data cannot be controlled, local data is generated independently on different terminal devices, when a plurality of scattered terminal devices are used as participants of federal learning, the local data on each terminal device may be Non-Independent and Distributed (Non-IID), and even tags carried by the local data are Non-Independent and Distributed, which will cause a significant reduction in model training efficiency in federal learning and a weak model generalization capability. Moreover, after the federal learning is performed, the accuracy of the obtained aggregation model is not greatly improved or even reduced. Therefore, how to improve model training efficiency of federal learning in Non-IID scene, improve model generalization ability, and improve accuracy of an aggregation model is important. Disclosure of Invention The invention provides a federal learning distillation method and a federal learning distillation device for a non-independent co-distribution scene, which are used for solving the defects in the prior art. The invention provides a federal learning distillation method for a non-independent co-distribution scene, which is applied to a target terminal, wherein data and/or labels of all terminals under a target service end to which the target terminal belongs meet the non-independent co-distribution, and the method comprises the following steps: Determining initial tag data and non-tag data of the target terminal, and marking the non-tag data based on the similarity between the non-tag data and the initial tag data of the target terminal to obtain first tag data; Labeling the non-tag data based on the first feature similarity of the non-tag data and the initial tag data of the other terminals to obtain second tag data, and determining third tag data based on the second feature similarity of the initial tag data of the target terminal and the initial tag data of the other terminals; Performing label alignment on the target terminal and other terminals under the target server, training an initial teacher model based on initial label data of the target terminal, the first label data, the second label data, the third label data and a label alignment result to obtain a target teacher model, and performing label prediction on the initial label data of the target terminal based on the target teacher model to obtain a soft label of the initial label data of the target terminal; And carrying out local distillation on a basic model based on the soft label of the initial label data of the target terminal, the first label data, the second label data and the third label data to obtain a student model, and carrying out federal learning based on the student model. According to the federal learning distillation method for the non-independent co-distribution scene, the first feature similarity is determined based on the following steps: Determining an initial feature extraction model, and extracting a first feature vector of the non-tag data based on the initial feature extraction model; performing differential privacy protection on a preset structure in the initial feature extraction model to obtain a target feature extraction model; The target feature extraction model is sent to the other terminals, and second feature vectors of the initial tag data extracted by the other terminals based on the target feature extraction model are r