CN-121998132-A - Federal learning method, device, equipment and medium for guaranteeing fairness of heterogeneous data

CN121998132ACN 121998132 ACN121998132 ACN 121998132ACN-121998132-A

Abstract

The application discloses a federal learning method, a device, equipment and a medium for guaranteeing fairness of heterogeneous data, wherein the method comprises the following steps of S1, after each client uses local data to train a weight parameter of a local model to update the local model in each round of federal learning, calculating fairness indexes of the local model and accuracy of local model prediction results, S2, a server calculates global fairness indexes according to global data statistical information aggregated from the clients and broadcasts the global fairness indexes to each client, S3, each client updates the weight parameter of the local model according to a gap between the fairness indexes of the local model and the global fairness indexes, and then broadcasts the weight parameter of the global model of each client after aggregation weighting to each client to carry out next round of training until the set model training round number is reached. The application can capture the prejudice in global data distribution, and improve the data safety and the objectivity of the weight aggregation result of the server.

Inventors

TIAN JIANWEI
ZHU SHI
SUN YIZHEN
TIAN ZHENG
ZHU HONGYU
Luo Haokai
Yang Huanchi
YIN YONG
XU QING
CHEN YIBO

Assignees

国网湖南省电力有限公司信息通信分公司
国网湖南省电力有限公司
国家电网有限公司

Dates

Publication Date: 20260508
Application Date: 20260409

Claims (10)

1. The federal learning method for guaranteeing fairness of heterogeneous data is characterized by comprising the following steps: S1, in each round of federation learning, after each client uses local data to train a weight parameter of a local model to update the local model, calculating fairness indexes of the local model and accuracy of a prediction result of the local model; S2, the server calculates global fairness indexes according to global data statistical information aggregated from the clients and broadcasts the global fairness indexes to all the clients; And S3, each client updates weight parameters of the local model according to the difference between the fairness index and the global fairness index of the local model, and then the weighted weight parameters of the global model of the local model updating server of each client are broadcast to all clients for next training according to the security aggregation protocol, until the set model training round number is reached.
2. The federal learning method for guaranteeing fairness regarding heterogeneous data according to claim 1, further comprising the steps of: S0, initializing parameters of a global model at the server, training the number of models, setting initial weight parameters of local models at the clients respectively, and aggregating global data statistical information from the clients by the server by using a secure aggregation protocol.
3. The federal learning method for securing fairness of heterogeneous data according to claim 2, wherein the step S0 specifically includes the steps of: s01, initializing parameters of the global model, namely randomly sampling from Gaussian distribution by using a random initialization method, and initializing the parameters of the global model Setting the number of training wheels of the model ; S02, initial weight distribution, namely, the server distributes initial weight for each client model , wherein, Is a client terminal K is the total number of clients, and the initial weight is distributed based on the proportion of the respective data amount of the clients; S03, data statistics information aggregation, namely, a server aggregates global data statistics information from a client by using a secure aggregation protocol The method is used for representing the probability that the label value of the majority of the sensitive attributes is positive and the probability that the label value of the minority of the sensitive attributes is positive in the data sets of the clients, wherein Y represents the actual value of the dependent variable, A represents the sensitive attributes with uneven distribution, A=0 represents the category with less proportion, A=1 represents the category with more proportion, Representing the probability.
4. The federal learning method for securing fairness of heterogeneous data according to claim 3, wherein the step S1 specifically includes the steps of: s11, updating the local model, namely in each round of federal learning, each client side Using its local data To the local model Training, updating weight parameters, and obtaining an updated local model ; S12, calculating a local fairness index, namely, calculating the fairness index of a local model of the client side by the client side Wherein the parameters are Numbering representing clients , Representing the result of model prediction, wherein the fairness index has the meaning that the predicted value in each type of data of the sensitive attribute is true data, and the difference between the probability of prediction error and the probability of prediction correctness is measured to measure the difference of true case rates of the model on different sensitive attribute groups; S13, calculating accuracy rate, namely calculating the accuracy rate of the local model by the client Where TP is the number of samples correctly predicted to be positive, TN is the number of samples correctly predicted to be negative, FP is the number of samples incorrectly predicted to be positive, and FN is the number of samples incorrectly predicted to be negative.
5. The federal learning method for securing fairness of heterogeneous data according to claim 1, wherein the step S2 specifically includes the steps of: s21, calculating global fairness index according to data statistical information aggregated from the client , wherein, It is meant that the overall model prediction is true, Indicating that the actual tag value is true; s22, broadcasting the calculated global fairness index to each client for the clients to update the weight of the local model according to the broadcasting result.
6. The federal learning method for securing fairness of heterogeneous data according to claim 1, wherein the step S3 specifically includes the steps of: S31, calculating fairness index gap, namely each client calculates local fairness index of each client And global fairness index Is the difference of (1) , wherein, Is the overall accuracy rate of the device, The local accuracy of the client k is obtained, when the fairness index calculates a result, the difference is expressed as an absolute value of difference between the local fairness index and the global fairness index, otherwise, the difference is defined as an absolute value of difference between the local model accuracy and the global model accuracy; s32, updating the local model of the client according to the fairness gap Updating weight parameters of a local model of a client , wherein, Is a fairness budget parameter, controls the influence of fairness constraint on weight parameter updating, and finally the weight parameter of a local model of a client By normalization The method comprises the following steps: , wherein, Is the number of clients; s33, updating the global model, namely aggregating the weighted client local model by using a secure aggregation protocol, and updating weight parameters of the global model: Then the updated global model Broadcasting to all clients to perform the next training until the set model training round number is reached.
7. Federal learning device for guaranteeing fairness of heterogeneous data, comprising: the client fairness measurement calculation module is used for calculating fairness indexes of the local model and accuracy of prediction results of the local model after the local model is updated by using the weight parameters of the local model updated by the local data training of each client in each round of federal learning; the server side fairness measurement calculation module is used for calculating global fairness indexes according to global data statistical information aggregated from the clients by the server side and broadcasting the global fairness indexes to all the clients; And the client model and global model iteration updating module is used for updating the weight parameters of the local model according to the difference between the fairness index and the global fairness index of the local model by each client, and then broadcasting the weight parameters of the global model of the local model updating server of each client after the weighting is aggregated according to the security aggregation protocol to all clients for next training until the set model training round number is reached.
8. The federal learning apparatus for securing fairness of heterogeneous data according to claim 7, further comprising: The system model initializing module is used for initializing parameters of a global model positioned at the server, the number of model training rounds, setting initial weight parameters of local models respectively positioned at the clients, and aggregating global data statistical information from the clients by the server by using a secure aggregation protocol.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the federal learning method of guaranteeing fairness regarding heterogeneous data according to any one of claims 1 to 6 when executing the computer program.
10. A computer readable storage medium having stored thereon a computer program, wherein the computer program when executed by a processor implements the federal learning method of guaranteeing fairness regarding heterogeneous data according to any one of claims 1 to 6.

Description

Federal learning method, device, equipment and medium for guaranteeing fairness of heterogeneous data Technical Field The application relates to the technical field of electric power, in particular to a federal learning method, a federal learning device, federal learning equipment and federal learning media for guaranteeing fairness of heterogeneous data. Background Currently, with the widespread use of federal learning techniques, model fairness and privacy protection are supported by policies and have been widely focused. How to realize fairness of the model to different groups while protecting privacy becomes the focus and difficulty of attention of all participants in the current artificial intelligence field. Federal learning can cooperatively train a machine learning model among multiple parties while maintaining the privacy of their local data, and has been widely used in power, emergency, and other scenarios. However, participants in federal learning in the power industry often have different data distributions, and a decrease in model fairness can lead to poor predictions of the model for certain populations, and a large gap can be created between the contribution of different participants to the model. Therefore, it is indispensable to realize a federal learning system for realizing community fairness among different populations (such as specific gender, age, occupation), and it is a key to realize the credibility of federal learning algorithm in the power industry. In recent years, several approaches to achieving group fairness have been implemented in a centralized environment. However, these methods are all implemented at the server, and rely on information extracted from each participant by the server, such as node performance, data distribution, etc., which requires centralized access to sensitive information of each data point, reducing the security level of the respective data in each participant. And therefore unsuitable for use in federal learning. Specifically, the existing methods for guaranteeing fairness of federal learning groups are divided into a slave client processing method and a slave server processing method. Processing from the client means that each participant can train using a local level bias removal algorithm, so that the model of each participant is adapted to the corresponding data distribution, and local fairness is preferentially achieved. And then, aggregating the model parameters or gradient information of each party through the server side to enable the whole federal learning system to reach a relatively fair state. But the data of one client may contain mainly a certain group, while the data of other clients may contain other groups. In this case, the local depolarization method may not be able to capture the bias in the global data distribution. For example, the power industry data may be quite different from emergency data. The processing from the server is mostly modified from the aspects of participant selection and model aggregation, and various fairness mechanisms are designed. In the aspect of the selection of the participants, long-term fairness constraint can be introduced, the probability of each client being selected is not lower than a certain threshold value, the clients with poor performance are guaranteed to have the same opportunity to be selected, various parameters (such as data amount, computing capacity, network performance and data quality) of the clients can be comprehensively considered in the client selection stage, the types of the clients participating in training are more abundant, and therefore data diversity and model generalization are improved. In the aspect of model aggregation, the data quality of each client can be estimated, and the contribution of the client with high data quality to the global model is ensured to be larger, so that the aggregation fairness is realized. Both methods of processing from the server require the server to additionally obtain some information from each participant client, such as task performance, network performance, data volume distribution, data tag distribution, gradient distribution, etc. of each participant. This approach can result in sharing of private data to the server, which can easily lead to data security problems, as such information may require additional provision by the various parties. In addition, the evaluation of the data quality has certain subjectivity, and the result of the weight aggregation of the server side can be influenced. Disclosure of Invention The application provides a federal learning method for guaranteeing fairness of heterogeneous data, and solves the technical problems that the prior art cannot capture prejudice in global data distribution, data security problems are easy to cause, and the evaluation of data quality has certain subjectivity so as to influence the weight aggregation result of a server. The application is realized by the following scheme: The fed