CN-121690859-B - Private data safe sharing method based on federal learning

CN121690859BCN 121690859 BCN121690859 BCN 121690859BCN-121690859-B

Abstract

The invention discloses a private data safety sharing method based on federal learning, in particular to the field of private data safety protection and sharing, which is used for solving the problems that the model reliability is insufficient and the data exchange lacks precision control in the existing cross-mechanism data collaboration process; the method comprises the steps of carrying out homomorphic encryption processing on local data of each participant, constructing a cryptograph verifiable computing task by combining federal learning, carrying out cryptograph forward propagation and backward propagation computation by outsourcing computing nodes, carrying out trusted verification on the computing process by using zero knowledge proof, finishing gradient aggregation and model updating in the cryptograph domain, carrying out intelligent judgment on the sharing value of a local data sample based on federal learning model output after model training is finished, driving cryptograph re-encryption and data exchange control, and realizing multiparty safe and controllable data circulation, thereby improving the collaborative efficiency and decision accuracy of cross-institution data on the premise of guaranteeing privacy safety and compliance requirements.

Inventors

LIU BIN
Hong Alan
Ge Jiangling
XIE QIFENG
Wan Nianbin
CAO LIMING

Assignees

厦门理工学院

Dates

Publication Date: 20260508
Application Date: 20260206

Claims (5)

1. The method for safely sharing the private data based on federal learning is characterized by comprising the following steps of: s1, encrypting local privacy data of each participant by adopting an addition homomorphic encryption algorithm to generate homomorphic ciphertext data, and extracting characteristic dimension and sample number of the privacy data to generate a data descriptor; S2, constructing a verifiable computing task description file containing a layer operation sequence and an activation function type based on the data descriptor and a predefined federal learning model structure; S3, distributing homomorphic ciphertext data and verifiable computing task description files as computing task sub-packages to mutually isolated external computing nodes, and executing forward propagation and backward propagation computation in a ciphertext domain by each computing node according to the description files to generate encrypted gradient update fragments; s4, after each external computing node completes the computation, generating zero knowledge proof for the ciphertext computing process; S5, verifying all encryption gradient update fragments and corresponding zero knowledge certificates in parallel, and executing homomorphic aggregation operation on the gradient update fragments passing verification to generate a global encryption gradient; S6, each participant uses a private key to decrypt the global encryption gradient, and updates the federal learning model copy based on the decrypted plaintext gradient; s7, performing ciphertext re-encryption on local homomorphic ciphertext data of each participant according to output of the federal learning model copy, generating a re-encrypted data packet of multiparty secure data exchange, and uploading the re-encrypted data packet to a shared data pool; In the step S2, constructing a verifiable computing task description file containing a layer operation sequence and an activation function type based on the data descriptor and a predefined federal learning model structure specifically includes: the method comprises the steps of predefining the number of nodes of an input layer and parallel calculation paths of a federal learning model, marking hierarchical connection relations among the input layer, a hidden layer and an output layer in the federal learning model structure, and mapping characteristic dimension and sample number distribution data obtained by analyzing data descriptors into the federal learning model structure; generating a forward propagation linear transformation operation sequence and a nonlinear activation operation sequence layer by layer according to the hierarchical connection relation, and synchronously constructing a backward propagation gradient return path to form a continuous executable ciphertext calculation flow; converting each layer of operation nodes and connection relations in the ciphertext calculation flow into a standardized arithmetic circuit structure, and correspondingly mapping the activation function operation into an equivalent polynomial calculation sub-circuit to form a verifiable calculation circuit topology covering the whole forward and backward propagation processes; performing joint encapsulation on the arithmetic circuit topological structure and the sample distribution data to generate a verifiable computing task description file; In the step S3, the generation of the encrypted gradient update segment specifically includes: packaging homomorphic ciphertext data corresponding to different batch identifiers and verifiable computing task description files into independent computing task sub-packages, and respectively sending the independent computing task sub-packages to different external computing nodes; The external computing node analyzes the arithmetic circuit topology in the computing task sub-packet, and sequentially executes forward propagation operation and backward propagation operation corresponding to the federal learning model in the homomorphic ciphertext domain to form a ciphertext intermediate result of layer-by-layer recursion; Based on the ciphertext intermediate result, performing ciphertext derivation of the parameter gradient along a reverse propagation path, and generating an encryption gradient update segment corresponding to the federal learning model structure layer by layer; in the step S4, after each external computing node completes the computation, the generating of the zero knowledge proof for the current ciphertext computing process specifically includes: After the homomorphic ciphertext forward propagation and backward propagation calculation is completed, each external calculation node maps and constructs a layer-by-layer recursive ciphertext intermediate result and an encryption gradient update fragment into a corresponding arithmetic circuit instance calculation track based on an arithmetic circuit topological structure in a verifiable calculation task description file, generates a proof generating input meeting zero knowledge constraint according to the corresponding arithmetic circuit instance calculation track, and generates a zero knowledge proof consistent with an arithmetic circuit execution path in the ciphertext calculation process.
2. The method for securely sharing private data based on federal learning according to claim 1, wherein in S1, extracting feature dimensions and the number of samples of the private data to generate a data descriptor specifically includes: Performing field-level analysis on the local privacy data of each participant, and dividing the privacy data into a feature field set and a sample index sequence according to a preset data structure template to form a plaintext feature matrix with consistent structure; And performing ciphertext mapping on the plaintext feature matrix element by adopting an addition homomorphic encryption operator, generating homomorphic ciphertext data blocks which keep the original addition operation relation, performing sample aggregation on the homomorphic ciphertext data blocks, counting feature dimensions corresponding to the aggregated samples and sample quantity distribution data containing batch identifiers, and encoding the statistical result into a standardized data descriptor.
3. The federal learning-based privacy data security sharing method according to claim 1, wherein in S5, verifying all the encryption gradient update segments and the corresponding zero knowledge proof in parallel, performing homomorphic aggregation operation on the gradient update segments that pass verification, and generating the global encryption gradient specifically comprises: receiving encryption gradient updating fragments and corresponding zero knowledge certificates of external computing nodes, and calling a matched verification algorithm and a verification key for each zero knowledge certificate to verify the validity concurrently; Screening all encryption gradient update fragments with valid zero knowledge proof to form an effective fragment set, and aligning and rearranging according to the hierarchical structure of the federal learning model and model parameters; and performing layer-by-layer addition and aggregation operation on all encryption gradient values belonging to the same model parameter in the effective fragment set based on the addition homomorphic encryption operator to obtain global encryption gradient components of all model parameters, and summarizing the global encryption gradient components into a complete global encryption gradient.
4. The method for securely sharing private data based on federal learning according to claim 1, wherein in S6, each party decrypts the global encryption gradient using the private key, and updating the federal learning model copy based on the decrypted plaintext gradient specifically comprises: each participant performs homomorphic decryption operation on the global encryption gradient based on a locally stored private key to obtain a global gradient vector in a plaintext form; And constructing a global gradient vector map in a plaintext form as a model parameter adjustment amount according to a predefined federal learning model structure and a hierarchical connection relation, and updating the model parameter adjustment amount to a local federal learning model copy of the participant.
5. The federal learning-based private data secure sharing method according to claim 1, wherein in S7, according to the output of the federal learning model copy, performing ciphertext re-encryption on local homomorphic ciphertext data of each participant, generating a re-encrypted data packet of the multiparty secure data exchange, and uploading the re-encrypted data packet to the shared data pool specifically comprises: Performing ciphertext reasoning operation on the local homomorphic ciphertext data based on local federal learning model copies of each participant, outputting a prediction result mark corresponding to each sample, and constructing a sample-level data sharing judgment sequence according to the prediction result marks; establishing the same sharing strategy rule for all the participants, generating a corresponding key conversion parameter set by combining the data sharing judgment sequence, binding and packaging the key conversion parameter and the sample index identifier to form a structured re-encryption control instruction; and calling a re-encryption operator to execute key mapping transformation on local homomorphic ciphertext data of each participant according to the re-encryption control instruction, generating a re-encryption ciphertext data block which can be only solved by the target participant, and packaging the re-encryption ciphertext data block into a re-encryption data packet and writing the re-encryption ciphertext data block into a shared data pool.

Description

Private data safe sharing method based on federal learning Technical Field The invention relates to the technical field of privacy data security protection and sharing, in particular to a federal learning-based privacy data security sharing method. Background With the deep application of artificial intelligence and big data technology in the fields of medical health, financial management and control, government affair cooperation, industrial internet and the like, cross-institution data collaborative analysis and joint modeling gradually become important means for improving prediction precision and decision level, but in a real business scene, data held by different participants generally have high sensitivity, and relate to personal privacy, business confidentiality or core business information, original data are directly shared to face compliance risks and potential safety hazards, so that the data are in an island state for a long time, and the exertion of cross-institution intelligent collaborative capability is severely restricted. Taking medical joint diagnosis and treatment as an example, the patient diagnosis and treatment data accumulated by different hospitals has obvious complementary value in the aspects of disease risk assessment, postoperative complication prediction and the like, but is limited by privacy protection regulations, the original medical record data is difficult to directly exchange, on the premise of ensuring privacy safety, the patient samples are accurately judged to have cross-hospital sharing value, and are strictly limited to circulate, so that the patient samples become key problems restricting collaborative diagnosis and treatment landing, and in the fields of government administration and industrial collaboration, a large amount of complementary data also exist among different departments or enterprises, but a technical mechanism capable of accurately judging the data sharing feasibility, the sharing range and the sharing object based on intelligent model output is lacking, so that the existing data exchange mode is difficult to adapt to complex and changeable business requirements due to multiple manual rules or static policy configuration. Therefore, a privacy data safety sharing method taking federal learning model output as a decision basis needs to be constructed, and intelligent evaluation and accurate judgment of local data sample sharing values of all parties are realized on the basis of a multiparty joint training high-precision model, so that high-efficiency circulation and cooperative application of cross-institution data resources are promoted on the premise of guaranteeing data safety and privacy compliance. Disclosure of Invention In order to overcome the above-mentioned drawbacks of the prior art, an embodiment of the present invention provides a federally learning-based method for securely sharing private data to solve the problems set forth in the background art. In order to achieve the above purpose, the present invention provides the following technical solutions: a secure sharing method of private data based on federal learning comprises the following steps: s1, encrypting local privacy data of each participant by adopting an addition homomorphic encryption algorithm to generate homomorphic ciphertext data, and extracting characteristic dimension and sample number of the privacy data to generate a data descriptor; S2, constructing a verifiable computing task description file containing a layer operation sequence and an activation function type based on the data descriptor and a predefined federal learning model structure; S3, distributing homomorphic ciphertext data and verifiable computing task description files as computing task sub-packages to mutually isolated external computing nodes, and executing forward propagation and backward propagation computation in a ciphertext domain by each computing node according to the description files to generate encrypted gradient update fragments; s4, after each external computing node completes the computation, generating zero knowledge proof for the ciphertext computing process; S5, verifying all encryption gradient update fragments and corresponding zero knowledge certificates in parallel, and executing homomorphic aggregation operation on the gradient update fragments passing verification to generate a global encryption gradient; S6, each participant uses a private key to decrypt the global encryption gradient, and updates the federal learning model copy based on the decrypted plaintext gradient; s7, according to the output of the federal learning model copy, ciphertext re-encryption is carried out on local homomorphic ciphertext data of each participant, a re-encrypted data packet of multiparty safety data exchange is generated, and the re-encrypted data packet is uploaded to a shared data pool. As a further aspect of the present invention, in S1, extracting the feature dimension and the number of samples of