Search

CN-122001558-A - P-based2Training method for longitudinal federal learning model of FR-PSI and homomorphic encryption

CN122001558ACN 122001558 ACN122001558 ACN 122001558ACN-122001558-A

Abstract

The application discloses a longitudinal federation learning model training method based on P 2 FR-PSI and homomorphic encryption, which comprises the steps of constructing a cuckoo hash table according to a first user identification and a corresponding feature vector of a sender, executing a privacy set intersection S 3 R protocol with a receiver based on the cuckoo hash table and predicate conditions to obtain a first secret sharing vector, generating a first group of encryption masks based on the first secret sharing vector and the first user identification in the cuckoo hash table by using a first private key, processing a second group of encryption masks sent by a receiver based on the first private key to obtain a first processing result, acquiring a second processing result returned by the receiver, determining an intersection user set by combining the first processing result, and updating a first model parameter of the sender based on the intersection user set and homomorphic encryption technology. The method improves the flexibility of data alignment and the accuracy of the training of the VFL model, and the applicability of the VFL model in complex recommendation scenes.

Inventors

  • MA JINHUA
  • XU SHENGMIN
  • ZHAO HUIZHONG
  • HE XINLEI

Assignees

  • 福建师范大学

Dates

Publication Date
20260508
Application Date
20260312

Claims (4)

  1. 1. A longitudinal federal learning model training method based on P 2 FR-PSI and homomorphic encryption, executed by a sender, comprising: Constructing a cuckoo hash table according to a local first user identification of a sender and a feature vector corresponding to the first user identification; Executing a privacy set intersection S 3 R protocol with a receiver based on the cuckoo hash table and predicate conditions to obtain a first secret sharing vector, wherein when the feature vector in the cuckoo hash table meets the predicate conditions determined by the receiver, the sum of the first secret sharing vector and a second secret sharing vector obtained by the receiver is a first preset value, and otherwise, the sum of the first secret sharing vector and the second secret sharing vector obtained by the receiver is a second preset value; Generating a first group of encryption masks based on the first secret sharing vector and a first user identifier in the cuckoo hash table by using a first private key, and sending the first group of encryption masks to a receiver; Receiving a second group of encryption masks from a receiver, processing the second group of encryption masks based on a first private key, obtaining a first processing result and sending the first processing result to the receiver; Acquiring a second processing result returned by the receiver based on the first group of encryption masks, and determining an intersection user set according to the second processing result and the first processing result; And updating a first model parameter local to the sender based on the intersection user set and the homomorphic encryption technology, wherein the first model parameter is a part of parameters held by the sender in the longitudinal federal learning model.
  2. 2. The method of claim 1, wherein updating the first model parameters local to the sender based on the intersection user set and the homomorphic encryption technique comprises: calculating a first intermediate vector based on the intersection user set sender's local feature data and the sender's local first initial model parameters; encrypting the first intermediate vector by using the homomorphic encryption public key to obtain a first encrypted intermediate vector and transmitting the first encrypted intermediate vector to a receiver; based on the first encryption intermediate vector and the second encryption intermediate vector sent by the receiver, calculating a local first encryption gradient in a ciphertext state; generating a first random mask, adding the first random mask to the first encryption gradient to obtain a first mask gradient, and sending the first mask gradient to a coordinator; and receiving a first decryption gradient from the coordinator, and updating the first initial model parameters according to the first decryption gradient, wherein the first decryption gradient is obtained by decrypting the first mask gradient by the coordinator.
  3. 3. A longitudinal federal learning model training method based on P 2 FR-PSI and homomorphic encryption, executed by a receiver, comprising: Executing a predicate privacy set intersection S 3 R protocol based on a predicate condition and a cuckoo hash table constructed by a sender to obtain a second secret sharing vector, wherein the cuckoo hash table comprises a first user identifier and a feature vector corresponding to the first user identifier, and when the feature vector meets the predicate condition, the sum of the second secret sharing vector and the first secret sharing vector obtained by the sender is a first preset value, otherwise the sum is a second preset value; Generating a second group of encryption masks based on a second user identifier and a second secret sharing vector of the local receiver by using a second private key, and sending the second group of encryption masks to the sender; receiving a first processing result returned by the sender based on the second set of encryption masks and the first set of encryption masks sent by the sender; Processing the first group of encryption masks based on the second private key, obtaining a second processing result, transmitting the second processing result to the sender, and determining an intersection user set according to the second processing result and the first processing result; And updating a second model parameter local to the receiver based on the intersection user set and the homomorphic encryption technology, wherein the second model parameter is part of parameters held by the receiver in the longitudinal federal learning model.
  4. 4. A method according to claim 3, characterized in that updating the second model parameters local to the recipient based on the intersection user set and the homomorphic encryption technique comprises: Calculating a second intermediate vector based on the second initial model parameters of the receiver local and the local feature data and the corresponding tag data of the receiver in the intersection user set; Encrypting the second intermediate vector by using the homomorphic encryption public key to obtain a second encrypted intermediate vector and transmitting the second encrypted intermediate vector to the sender; calculating a local second encryption gradient in a ciphertext state based on the second encryption intermediate vector and the first encryption intermediate vector sent by the sender; generating a second random mask, adding the second random mask to the second encryption gradient to obtain a second mask gradient, and sending the second mask gradient to the coordinator; And receiving a second decryption gradient from the coordinator and updating a second initial model parameter according to the second decryption gradient, wherein the second decryption gradient is obtained by decrypting the second mask gradient by the coordinator.

Description

Longitudinal federal learning model training method based on P 2 FR-PSI and homomorphic encryption Technical Field The application relates to the technical field of information security and machine learning, in particular to a longitudinal federal learning model training method based on P 2 FR-PSI and homomorphic encryption. Background The recommendation system becomes a core component of modern digital services and is widely applied to the fields of electronic commerce, online advertising, social media platforms and the like. The system can promote user experience and create commercial value by pushing personalized content by using user preference and behavior data. However, as the system model continues to grow in reliance on fine-grained personal information, the contradiction between personalization and privacy has become a core challenge. Therefore, how to maintain personalized recommendation accuracy while guaranteeing sensitive user data security is a problem to be solved in the art. Longitudinal federal learning (VERTICAL FEDERATED LEARNING, VFL) provides a solution to this problem. In a VFL, multiple platforms (e.g., banking and e-commerce platforms) that possess different feature spaces but share overlapping user groups can jointly train a global model without exposing raw data. Through encryption interaction, the VFL realizes the availability and invisibility of cross-domain data, and effectively breaks through the limitation of data island. This scenario is particularly applicable to recommended tasks where different platforms hold complementary user information. A key premise of VFL model training is entity alignment, i.e., identifying overlapping user sets among participating institutions in a secure manner. Existing alignment protocols are typically built based on private set intersections (PRIVATE SET Intersection, PSI), a technique that enables parties to discover common intersection elements without revealing non-intersection elements. However, the conventional PSI is limited to exact equal matching, and in practical application, the platform often needs to perform layering and filtering on the user population meeting certain conditions of feature compliance, for example, a threshold condition (such as "income >10000" or "age < 25") combined with logical or operation, so that the existing alignment protocol cannot meet the more flexible alignment requirement commonly found in the recommendation system, which reduces the accuracy of training the VFL model, and results in poor applicability in complex recommendation scenarios. Disclosure of Invention In view of this, the embodiment of the application provides a longitudinal federal learning model training method based on P 2 FR-PSI and homomorphic encryption, which is executed by a sender and comprises the following steps: The method comprises the steps of constructing a cuckoo hash table according to a first user identifier of a sender and a feature vector corresponding to the first user identifier, executing a privacy set intersection S 3 R protocol with a receiver based on the cuckoo hash table and a predicate condition to obtain a first secret sharing vector, when the feature vector in the cuckoo hash table meets the predicate condition determined by the receiver, the sum of the first secret sharing vector and a second secret sharing vector obtained by the receiver is a first preset value, otherwise, the sum of the first secret sharing vector and the second secret sharing vector obtained by the receiver is a second preset value, the predicate condition comprises at least one of a logic combination predicate and a threshold value comparison predicate, generating a first set of encryption masks based on the first secret sharing vector and the first user identifier in the cuckoo hash table by utilizing a first private key, receiving the second set of encryption masks from the receiver based on the first secret sharing vector and the first private key, processing the second set of encryption masks based on the first private key, obtaining a second processing result returned by the receiver, determining that an intersection user set is a longitudinal parameter set based on the second processing result and the first processing result, and a sender is a longitudinal learning parameter model based on the first user set. Optionally, updating the first model parameters local to the sender based on the intersection user set and the homomorphic encryption technique includes: The method comprises the steps of collecting local characteristic data of a sender and first initial model parameters of the sender by a user, calculating a first intermediate vector based on the local characteristic data of the sender and the first initial model parameters of the sender, encrypting the first intermediate vector by using a homomorphic encryption public key to obtain a first encrypted intermediate vector and sending the first encrypted intermediate vector to a re