Search

CN-120930169-B - E-commerce platform multi-source data safety fusion method and system based on federal learning

CN120930169BCN 120930169 BCN120930169 BCN 120930169BCN-120930169-B

Abstract

The invention discloses a federation learning-based e-commerce platform multisource data safety fusion method and system, which relate to the technical field of federation learning and data safety intersection and comprise the steps of collecting user behavior data, transaction data and user attribute data for preprocessing to generate multi-modal feature vectors, calculating causal importance scores of the multi-modal feature vectors based on the multi-modal feature vectors to generate local score vectors, dynamically adjusting privacy protection intensity according to the local score vectors to generate encryption feature vectors, extracting exclusive weight vectors of all platforms based on fusion feature matrices, calculating relative contribution degrees of the feature weight intensities to generate a contribution proportion table, and performing zero knowledge proof verification on the contribution proportion table to generate a data fusion audit report. The invention realizes fine granularity dynamic regulation and control of privacy protection intensity, ensures that high-value characteristics keep higher data utility, low-value characteristics enhance privacy protection, and achieves optimal balance of privacy protection and data precision.

Inventors

  • ZHANG XUEFEI
  • ZHANG XINLIANG
  • ZHENG YING
  • ZHANG JINGJUAN

Assignees

  • 中国标准化研究院

Dates

Publication Date
20260508
Application Date
20250902

Claims (10)

  1. 1. A multi-source data safety fusion method of an e-commerce platform based on federal learning is characterized by comprising the steps of, Collecting user behavior data, transaction data and user attribute data for preprocessing, and generating a multi-mode feature vector; Based on the multi-modal feature vectors, calculating causal importance scores of the multi-modal feature vectors to generate local score vectors, and dynamically adjusting privacy protection intensity according to the local score vectors to generate encrypted feature vectors; according to the encryption feature vector, calculating the global ATE value of each feature, comparing the global ATE value with a causal significance threshold, and filtering the encryption feature vector to obtain a causal feature subset with high value; Performing encryption aggregation on the high-value causal feature subsets to generate global feature statistical vectors, and performing cross-platform feature distribution alignment through federal learning to generate a fusion feature matrix; Based on the fusion feature matrix, extracting exclusive weight vectors of each platform, calculating the relative contribution degree of the feature weight intensity, generating a contribution proportion table, and performing zero knowledge proof verification on the contribution proportion table to generate a data fusion audit report.
  2. 2. The method for securely fusing multi-source data of e-commerce platform based on federal learning of claim 1, wherein the generating multi-modal feature vectors comprises the steps of, The preprocessing comprises data cleaning and standardization processing; heterogeneous information fusion is carried out on the preprocessed user behavior data, transaction data and user attribute data, and a multi-modal feature vector is generated.
  3. 3. The method for secure fusion of e-commerce platform multisource data based on federal learning of claim 1, wherein the steps of calculating causal importance scores of the multi-modal feature vectors based on the multi-modal feature vectors to generate local score vectors are as follows, Performing modal separation on the multi-modal feature vector to generate a text feature sub-vector, an image feature sub-vector and a numerical feature sub-vector; Based on the text feature sub-vector, calculating an intervention effect value of the text feature on a target variable, performing gradient weighting type activation mapping on the image feature sub-vector, extracting visual causal saliency, and simultaneously performing inverse fact reasoning on the numerical feature sub-vector to identify an average processing effect; Integrating the intervention effect value, the visual causality significance and the average treatment effect to generate a multi-mode causality value set; based on the multi-modal causal effect value set, weight coefficients are distributed according to the characteristic modal types, comprehensive causal importance scores are calculated, and a local score vector is generated.
  4. 4. The method for securely fusing the e-commerce platform multi-source data based on the federal learning of claim 1, wherein the privacy protection intensity is dynamically adjusted according to the local score vector to generate an encryption feature vector, the method comprises the following steps of, Mapping the local score vector into a feature level privacy budget table by adopting a score-privacy budget mapping rule; based on the feature level privacy budget table, calling a quantum true random number generator to generate Laplacian noise distribution parameters, and generating a privacy noise intensity vector; performing element-level noise weighted disturbance on corresponding elements of the privacy noise intensity vector and the multi-mode feature vector to generate an initial noise injection feature vector; and adding orthogonal random disturbance to the initial noise injection feature vector, and carrying out KL divergence detection to generate an encryption feature vector.
  5. 5. The method for secure fusion of e-commerce platform multi-source data based on federal learning of claim 1, wherein the step of computing global ATE values for each feature based on encrypted feature vectors and comparing the global ATE values with causal saliency threshold values, filtering the encrypted feature vectors to obtain a causal feature subset with high values comprises the steps of, Based on the encryption feature vector, calculating global ATE values of all the features through homomorphic encryption protocols to generate a local ATE vector; Decrypting the local ATE vector, carrying out weighted average processing to generate a global ATE value vector, comparing the global ATE value vector with a causal saliency threshold value feature by feature, marking high-value features, and generating a feature marking vector; and executing filtering operation on the encrypted feature vector according to the feature tag vector to generate a high-value causal feature subset.
  6. 6. The method for securely fusing the e-commerce platform multi-source data based on the federal learning of claim 1, wherein the step of cryptographically aggregating the high-value causal feature subsets to generate a global feature statistical vector comprises the steps of, Identifying an arithmetic mean and standard deviation of each feature based on the high-value causal feature subset to generate a local feature statistical vector; Performing Paillier homomorphic encryption on the local feature statistical vector by using a coordination node to generate an encrypted statistical ciphertext; uploading the encrypted statistical ciphertext to a coordination node, and executing homomorphic accumulation and aggregation to generate an encrypted global statistical ciphertext; The coordination node decrypts the encrypted global statistics ciphertext by using the private key to obtain an original global statistics aggregate value, and performs standardization processing on the original global statistics aggregate value to generate a global feature statistics vector.
  7. 7. The method for safely fusing the e-commerce platform multisource data based on the federation learning of claim 1, wherein the step of performing cross-platform feature distribution alignment through the federation learning to generate a fused feature matrix comprises the following steps of, Based on the global feature statistical vector, carrying out standardized alignment processing on the high-value causal feature subset to generate an aligned local feature set; Splicing the aligned local feature sets into an augmentation matrix according to feature dimensions, and reducing dimensions to uniform dimensions to generate a platform-level feature projection matrix; and the coordination node receives the platform-level feature projection matrix, performs feature space convex combination optimization based on an entropy weight method, and generates a fusion feature matrix.
  8. 8. The method for safely fusing the e-commerce platform multi-source data based on the federal learning of claim 1, wherein the method comprises the steps of extracting exclusive weight vectors of each platform based on a fusion feature matrix, calculating the relative contribution degree of the intensity of the feature weights, generating a contribution proportion table, Analyzing the fusion feature matrix, extracting exclusive weight vectors of all electronic commerce platforms, calculating Frobenius norms, and generating a feature weight intensity set; Based on the feature weight intensity set, a relative contribution calculation is performed, and a platform contribution proportion table is generated.
  9. 9. The method for safely fusing the e-commerce platform multisource data based on federal learning according to claim 1, wherein the step of verifying zero knowledge proof of the contribution ratio table to generate a data fusion audit report comprises the following steps of, Zero knowledge proof verification is carried out on the e-commerce platform contribution proportion table through challenge seeds issued by the supervision nodes, and a zero knowledge proof certificate is obtained; And fusing the e-commerce platform contribution proportion table and the zero knowledge proof certificate according to the time stamp to generate a data fusion audit report.
  10. 10. The federal learning-based e-commerce platform multi-source data security fusion system is characterized by comprising a data acquisition module, a feature encryption module, a feature filtering module, a cross-platform fusion module and a joint auditing module, wherein the federal learning-based e-commerce platform multi-source data security fusion method is based on any one of claims 1-9; the data acquisition module is used for acquiring user behavior data, transaction data and user attribute data for preprocessing, and generating a multi-mode feature vector; the feature encryption module is used for calculating causal importance scores of the multi-modal feature vectors based on the multi-modal feature vectors, generating local score vectors, dynamically adjusting privacy protection intensity according to the local score vectors and generating encryption feature vectors; The feature filtering module is used for calculating the global ATE value of each feature according to the encryption feature vector, comparing the global ATE value with a causal saliency threshold value, filtering the encryption feature vector and obtaining a causal feature subset with high value; the cross-platform fusion module is used for carrying out encryption aggregation on the high-value causal feature subsets to generate global feature statistical vectors, and carrying out cross-platform feature distribution alignment through federal learning to generate a fusion feature matrix; And the joint audit module is used for extracting exclusive weight vectors of all platforms based on the fusion feature matrix, calculating the relative contribution degree of the feature weight intensity, generating a contribution proportion table, and carrying out zero knowledge proof verification on the contribution proportion table to generate a data fusion audit report.

Description

E-commerce platform multi-source data safety fusion method and system based on federal learning Technical Field The invention relates to the technical field of federal learning and data security intersection, in particular to a federal learning-based e-commerce platform multi-source data security fusion method and system. Background In the field of multi-source data fusion of an e-commerce platform, federal learning technology is widely applied because of the fact that cross-platform collaboration can be achieved on the premise of guaranteeing data privacy. The existing scheme generally adopts homomorphic encryption and a differential privacy mechanism, namely firstly, standardized preprocessing is carried out on user behavior, transaction and attribute data of each platform to generate feature vectors, then Laplace noise is added based on a preset static privacy budget to realize disturbance protection, and finally, encryption features are aggregated through a federal average algorithm. The method reserves the data distribution characteristic by utilizing encryption operation, meets the basic privacy compliance requirement, and simultaneously reserves the data distribution characteristic by encryption aggregation to provide technical support for cross-platform joint analysis. However, the existing method still has two limitations, namely firstly, static privacy budget allocation cannot dynamically adjust noise intensity according to feature level causal importance, so that high-value features are effectively attenuated due to excessive noise injection or low-sensitivity features are inadequately protected to generate information leakage risks, secondly, a causal inference-based cross-platform contribution measurement mechanism is lacking, the relative contribution of feature weights of all platforms is difficult to accurately quantify, and fairness and interpretability of fusion results are affected. Disclosure of Invention The present invention has been made in view of the above-described problems occurring in the prior art. Therefore, the invention provides an e-commerce platform multi-source data security fusion method based on federal learning, which solves the problems of static privacy protection, unbalanced characteristic utility and cross-platform contribution measurement misalignment. In order to solve the technical problems, the invention provides the following technical scheme: The invention provides a federal learning-based secure fusion method of e-commerce platform multisource data, which comprises the steps of collecting user behavior data, transaction data and user attribute data, preprocessing to generate multi-modal feature vectors, calculating causal importance scores of the multi-modal feature vectors based on the multi-modal feature vectors to generate local score vectors, dynamically adjusting privacy protection intensity according to the local score vectors to generate encryption feature vectors, calculating global ATE values of all features according to the encryption feature vectors, comparing the global ATE values with causal importance threshold values, filtering the encryption feature vectors to obtain high-value causal feature subsets, conducting encryption aggregation on the high-value causal feature subsets to generate global feature statistics vectors, executing cross-platform feature distribution alignment through federal learning to generate fusion feature matrixes, extracting exclusive weight vectors of all platforms based on the fusion feature matrixes, calculating relative contribution degrees of feature weight intensities to generate a contribution proportion table, and conducting zero knowledge proof verification on the contribution proportion table to generate a data audit report. As an optimal scheme of the E-commerce platform multi-source data safety fusion method based on federal learning, the method for generating the multi-mode feature vector comprises the following steps of, The preprocessing comprises data cleaning and standardization processing; heterogeneous information fusion is carried out on the preprocessed user behavior data, transaction data and user attribute data, and a multi-modal feature vector is generated. As an optimal scheme of the federal learning-based e-commerce platform multi-source data security fusion method, the multi-modal feature vector-based causal importance scores of the multi-modal feature vectors are calculated to generate local score vectors, the steps are as follows, Performing modal separation on the multi-modal feature vector to generate a text feature sub-vector, an image feature sub-vector and a numerical feature sub-vector; Based on the text feature sub-vector, calculating an intervention effect value of the text feature on a target variable, performing gradient weighting type activation mapping on the image feature sub-vector, extracting visual causal saliency, and simultaneously performing inverse fact reasoning on the nu