CN-116205307-B - Federal learning data processing method and device, storage medium and electronic equipment

CN116205307BCN 116205307 BCN116205307 BCN 116205307BCN-116205307-B

Abstract

The method comprises the steps of training a target model to be trained based on a first data set of a first participant to obtain the first model, training the target model by combining a second participant in the federal learning member to obtain a second model, respectively obtaining model indexes of the first model and the second model to obtain a first index of the first model and a second index of the second model, determining the training contribution degree of the second data set of the second participant to the target model according to the first index and the second index, and training the target model by combining the target participant in the second participant, wherein the training contribution degree of the target participant is greater than a threshold value. In this way, the training effect of the model is improved.

Inventors

Mou tong
XU SHICHENG
HE GUANGYU
LIU SIHAN

Assignees

东软集团股份有限公司

Dates

Publication Date: 20260512
Application Date: 20221111

Claims (6)

1. A federal learning data processing method, characterized by being applied to a first party, the first party being any one of federal learning members, the federal learning members being nodes in a blockchain network, the method comprising: Training a target model to be trained based on a first data set of the first participant to obtain a first model; Training the target model by combining a second participant in the federal learning member to obtain a second model; Respectively obtaining model indexes of the first model and the second model to obtain a first index of the first model and a second index of the second model, wherein the model indexes comprise one or more of accuracy, recall rate, RP curve and RPC curve; Determining the training contribution degree of a second data set of the second participant to the target model according to the first index and the second index, wherein a difference value between the second index and the first index is calculated, and the difference value is used as the training contribution degree of the second data set to the target model; training the target model in combination with a target participant in a second participant, wherein the training contribution degree associated with the target participant is greater than a contribution degree threshold; saving the training contribution to a blockchain of the blockchain network; acquiring training contribution of the first data set generated by the second participant; Determining a feature dimension difference of the first data set and the second data set under the condition that the training contribution of the first data set is smaller than a first threshold value and the training contribution of the second data set is larger than a second threshold value, wherein the second threshold value is larger than the first threshold value; based on the feature dimension differences, feature dimensions in the first dataset are increased or decreased.
2. The method of claim 1, wherein the obtaining model metrics of the first model and the second model, respectively, results in a first metric of the first model and a second metric of the second model, comprising: acquiring characteristic dimensions of the data set of the federal learning members to obtain a dimension set; according to the dimension set, calculating the weight value of each characteristic dimension; obtaining model indexes of the first model in each characteristic dimension to obtain a third index; Calculating a first index of the first model based on a third index of the first model in each characteristic dimension and a weight value of the characteristic dimension; obtaining model indexes of the second model in each characteristic dimension to obtain a fourth index; And calculating a second index of the second model based on a fourth index of the second model in each characteristic dimension and the weight value of the characteristic dimension.
3. The method of claim 1, wherein prior to training the target model to be trained based on the first data set of the first party, further comprising: Sending an authentication request to an authentication end of federal learning, wherein the authentication request comprises identity information of the first participant, and the authentication request is used for authenticating the identity of the first participant by the authentication end; receiving an authentication response sent by the authentication end, wherein the authentication response comprises a digital identity credential issued by the authentication end for the first party under the condition that authentication is passed; Training the target model by combining a second party in the federal learning member to obtain a second model, wherein the training comprises the following steps: transmitting a joint learning request to the second party, the joint learning request including the digital identity credential, the federal learning request being used by the second party to verify the identity of the target party; and under the condition that the identity verification is passed, training the target model by combining a second party in the federal learning member to obtain a second model.
4. A federal learning data processing apparatus for use with a first party, the first party being any one of federal learning members, the federal learning members being nodes in a blockchain network, the apparatus comprising: The first training module is used for training the target model to be trained based on the first data set of the first participant to obtain a first model; The second training module is used for training the target model by combining a second participant in the federal learning member to obtain a second model; The first acquisition module is used for respectively acquiring model indexes of the first model and the second model to obtain a first index of the first model and a second index of the second model, wherein the model indexes comprise one or more of accuracy, recall rate, RP curve and RPC curve; the first determining module is used for determining the training contribution degree of the second data set of the second participant to the target model according to the first index and the second index, wherein a difference value between the second index and the first index is calculated, and the difference value is used as the training contribution degree of the second data set to the target model; the model training module is used for training the target model by combining target participants in the second participants, and the training contribution degree associated with the target participants is larger than a contribution degree threshold; A storage module for storing the training contribution to a blockchain of the blockchain network; A second obtaining module, configured to obtain a training contribution of the first data set generated by the second participant; a second determining module, configured to determine a feature dimension difference between the first data set and the second data set when the training contribution of the first data set is less than a first threshold and the training contribution of the second data set is greater than a second threshold, where the second threshold is greater than the first threshold; And the second execution module is used for increasing or decreasing the feature dimension in the first data set based on the feature dimension difference.
5. A non-transitory computer readable storage medium having stored thereon a computer program, characterized in that the program when executed by a processor realizes the steps of the method according to any of claims 1 to 3.
6. An electronic device, comprising: A memory having a computer program stored thereon; A processor for executing the computer program in the memory to carry out the steps of the method of any one of claims 1 to 3.

Description

Federal learning data processing method and device, storage medium and electronic equipment Technical Field The disclosure relates to the technical field of federal learning, in particular to a federal learning data processing method, a federal learning data processing device, a storage medium and electronic equipment. Background Federal learning is a distributed machine learning technique, which performs distributed model training among a plurality of data sources having local data, so that a global model based on virtual fusion data can be constructed on the premise that local sample data do not need to be exchanged, and further data sharing calculation is realized. The blockchain technology can provide an implementation basis for the distributed model training of the federation learning, and the availability of the federation learning is greatly improved by combining the blockchain technology. But in some scenarios federal learning may be less effective. Disclosure of Invention The disclosure aims to provide a federal learning data processing method, a federal learning data processing device, a storage medium and electronic equipment, so as to solve the related technical problems. To achieve the above object, according to a first aspect of embodiments of the present disclosure, there is provided a federal learning data processing method applied to a first party, the first party being any one of federal learning members, the method including: Training a target model to be trained based on a first data set of the first participant to obtain a first model; Training the target model by combining a second participant in the federal learning member to obtain a second model; Respectively obtaining model indexes of the first model and the second model to obtain a first index of the first model and a second index of the second model; Determining the training contribution degree of the second data set of the second participant to the target model according to the first index and the second index; The target model is trained in conjunction with a target participant in the second participant, the training contribution associated with the target participant being greater than a contribution threshold. Optionally, the method further comprises: acquiring training contribution of the first data set generated by the second participant; Determining a feature dimension difference of the first data set and the second data set under the condition that the training contribution of the first data set is smaller than a first threshold value and the training contribution of the second data set is larger than a second threshold value, wherein the second threshold value is larger than the first threshold value; based on the feature dimension differences, feature dimensions in the first dataset are increased or decreased. Optionally, the obtaining the model indexes of the first model and the second model respectively, to obtain the first index of the first model and the second index of the second model includes: acquiring characteristic dimensions of the data set of the federal learning members to obtain a dimension set; according to the dimension set, calculating the weight value of each characteristic dimension; obtaining model indexes of the first model in each characteristic dimension to obtain a third index; Calculating a first index of the first model based on a third index of the first model in each characteristic dimension and a weight value of the characteristic dimension; obtaining model indexes of the second model in each characteristic dimension to obtain a fourth index; And calculating a second index of the second model based on a fourth index of the second model in each characteristic dimension and the weight value of the characteristic dimension. Optionally, the determining, according to the first index and the second index, a training contribution degree of the second data set of the second participant to the target model includes: Calculating a difference between the second index and the first index; and taking the difference value as the training contribution degree of the second data set to the target model. Optionally, the model metrics include one or more of accuracy, recall accuracy RP curve, RPC curve. Optionally, before the training of the target model to be trained based on the first data set of the first participant, the method further includes: Sending an authentication request to an authentication end of federal learning, wherein the authentication request comprises identity information of the first participant, and the authentication request is used for authenticating the identity of the first participant by the authentication end; receiving an authentication response sent by the authentication end, wherein the authentication response comprises a digital identity credential issued by the authentication end for the first party under the condition that authentication is passed; Training the target model b