Search

CN-121980602-A - Data circulation method and system based on privacy protection

CN121980602ACN 121980602 ACN121980602 ACN 121980602ACN-121980602-A

Abstract

The application provides a data circulation method and system based on privacy protection, wherein the method comprises the steps that a plurality of data holders respectively carry out secret sharing on local data sets to generate share data, the share data are distributed on the data holders, the data holders compare field modes, sample ranges and characteristic dimensions of the share data under an encryption state based on a secure multiparty computing protocol to obtain comparison results, the data holders execute selected transverse federal or longitudinal federal merging operation to process the share data by using homomorphic encryption to obtain merging results, the data holders negotiate a unified time axis reference under the encryption state through the secure multiparty computing protocol, calculate offset of the data sets of the parties relative to the unified time axis reference by using time stamps of the share data, and the data holders carry out time sequence alignment on the share data according to the offset to obtain the time sequence aligned share data and generate a combined data set.

Inventors

  • WANG JUN
  • Request for anonymity
  • Request for anonymity
  • ZHANG WEI
  • Request for anonymity
  • Request for anonymity
  • DENG XIAOJUN
  • CHU CHENG
  • YANG SHAOJIE
  • Request for anonymity
  • ZHENG YING
  • Wang Guohang
  • SUN XIANPING
  • Xie Fenghuang
  • Request for anonymity
  • LIU YUE
  • CUI JUAN
  • WANG LEI
  • LI GUANGYU
  • GU XIAOPENG
  • GUO XIAOLEI

Assignees

  • 中科智慧(苏州)科技有限公司
  • 合肥市公安局
  • 中国电子科技集团有限公司电子科学研究院
  • 安徽华典大数据科技有限公司
  • 合肥中科智慧社区科技有限公司

Dates

Publication Date
20260505
Application Date
20251225

Claims (10)

  1. 1. A data flow method based on privacy protection, comprising: Each of a plurality of data holders performs secret sharing on a local data set to generate share data, wherein the share data is distributed among the plurality of data holders; comparing the field mode, the sample range and the characteristic dimension of the share data by the plurality of data holders under the encryption state based on a secure multiparty computing protocol to obtain a comparison result; The data holders determine specific positions of data combination according to the comparison results, and select a transverse federation or a longitudinal federation, wherein the transverse federation is combined for a sample range, and the longitudinal federation is combined for a characteristic dimension; the data holders execute selected horizontal federation or longitudinal federation merging operation, and homomorphic encryption is used for processing the share data to obtain merging results; the data holders negotiate a unified time axis reference in an encryption state through a secure multiparty computing protocol, and the offset of each party data set relative to the unified time axis reference is computed by utilizing the time stamp of the share data; And the plurality of data holders perform time sequence alignment on the share data according to the offset to obtain the time sequence aligned share data, and generate a joint data set.
  2. 2. The method of claim 1, wherein the plurality of data holders align field patterns, sample ranges, and feature dimensions of the share data in an encrypted state based on a secure multiparty computing protocol to obtain an alignment result, comprising: the plurality of data holders encrypt the share data by homomorphic encryption to generate encrypted share data; The multiple data holders calculate the similarity of field modes, the overlapping degree of sample ranges and the complementation degree of characteristic dimensions in the encrypted share data through a secure multiparty calculation protocol; Generating the comparison result according to the similarity, the overlap degree and the complementation degree, wherein the comparison result indicates alignable fields, overlapping sample ranges and complementary feature dimensions; The plurality of data holders extract alignable fields from the comparison result for field mapping, overlap sample ranges for sample alignment, and complementary feature dimensions for feature replenishment.
  3. 3. The method of claim 1, wherein the plurality of data holders determine specific locations for data merging based on the comparison, and select a horizontal federation or a vertical federation, comprising: The plurality of data holders acquire coverage areas and feature richness of data sets of all the parties; calculating a combined utility value according to the coverage range and the feature richness; If the overlapped sample ranges exceed a preset range, selecting the transverse federation to combine the sample ranges; if the complementary feature dimension exceeds a preset dimension, selecting the longitudinal federation to combine the feature dimensions; The plurality of data holders determine a specific location of the data merge as a combination of the alignable fields, overlapping sample ranges, and complementary feature dimensions.
  4. 4. The method of claim 3, wherein the plurality of data holders calculate a combined utility value from the coverage and the feature richness, comprising: the multiple data holders aggregate the coverage areas of all the parties through a secure multiparty computing protocol to obtain a total coverage area; The data holders calculate the ratio of the total coverage to the coverage of each data set as coverage utility; The plurality of data holders aggregate the feature richness of each party to obtain total feature richness; the data holders calculate the ratio of the total feature richness to the feature richness of the data set of each party as the richness utility; the plurality of data holders add the coverage utility to the rich utility to obtain the consolidated utility value.
  5. 5. The method of claim 1, wherein the plurality of data holders perform a selected horizontal federation or vertical federation merge operation, processing the share data using homomorphic encryption to obtain a merged result, comprising: if the transverse federation is selected, the plurality of data holders perform joint aggregation on the share data in the overlapping sample range; if the longitudinal federation is selected, the plurality of data holders perform feature stitching on the share data in the complementary feature dimension; the data holders apply homomorphic encryption to calculate intermediate results in the process of joint aggregation or feature splicing; The plurality of data holders recover the merged results from the intermediate results, the merged results characterizing the joined dataset structure.
  6. 6. The method of claim 1, wherein the plurality of data holders negotiate a unified timeline reference in an encrypted state via a secure multiparty computing protocol, calculating an offset of each party's data set relative to the unified timeline reference using a timestamp of the share data, comprising: The plurality of data holders perform secret sharing on the timestamps of the share data to generate timestamp shares; the data holders calculate the minimum value of the timestamp share of each party through a secure multiparty calculation protocol to be used as the unified time axis reference; The data holders calculate the difference value between the timestamp share of each party and the unified time axis reference by homomorphic encryption as the offset; The plurality of data holders verify the consistency of the offset and distribute to each party.
  7. 7. The method of claim 6, wherein the plurality of data holders calculate differences in the timestamp shares of each party and the unified timeline reference as the offset using homomorphic encryption, comprising: the plurality of data holders homomorphic encrypt the timestamp shares of each party to obtain encrypted timestamp shares; Calculating an encrypted value of the encrypted timestamp fraction minus the unified timeline reference by the plurality of data holders; The data holders decrypt and obtain the offset of each party from the calculation result; The plurality of data holders adjust the timestamp positions of the share data according to the offset.
  8. 8. The method of claim 1, wherein the plurality of data holders time-sequence align the share data according to the offset to obtain time-sequence aligned share data and generate a joint data set, comprising: The plurality of data holders apply the offset for timestamp translation for each party's share data; the plurality of data holders verify the consistency of the translated timestamps on the unified time axis reference; The plurality of data holders input the time-aligned share data into the merging operation; the plurality of data holders outputs the federated data set from the merge operation.
  9. 9. A privacy protection-based data flow system, comprising: the sharing generation module is used for carrying out secret sharing on the local data set by each of the plurality of data holders to generate share data, and the share data is distributed among the plurality of data holders; the encryption comparison module is used for comparing the field modes, the sample ranges and the characteristic dimensions of the share data under the encryption state based on the secure multiparty computing protocol by the plurality of data holders to obtain a comparison result; the position determining module is used for determining specific positions of data combination according to the comparison results by the plurality of data holders, and selecting a transverse federation or a longitudinal federation, wherein the transverse federation is used for combining the sample ranges, and the longitudinal federation is used for combining the characteristic dimensions; the merging operation module is used for the plurality of data holders to execute selected horizontal federation or longitudinal federation merging operation and process the share data by homomorphic encryption to obtain a merging result; The negotiation unification module is used for negotiating a unification time axis reference in an encryption state by the plurality of data holders through a secure multiparty calculation protocol, and calculating the offset of each party data set relative to the unification time axis reference by utilizing the time stamp of the share data; and the time sequence alignment module is used for performing time sequence alignment on the share data according to the offset by the plurality of data holders to obtain the share data after time sequence alignment and generating a joint data set.
  10. 10. The system of claim 9, wherein the encryption comparison module comprises: the encryption processing unit is used for encrypting the share data by the plurality of data holders by homomorphic encryption to generate encrypted share data; A similarity calculation unit, configured to calculate, by the multiple data holders, similarity of field patterns, overlapping degree of sample ranges, and complementarity of feature dimensions in the encrypted share data through a secure multiparty calculation protocol; The comparison result generation module is used for generating the comparison result according to the similarity, the overlap degree and the complementation degree, and the comparison result indicates alignable fields, an overlap sample range and complementary feature dimensions; an alignment field extraction unit, configured to extract alignable fields from the comparison results by the plurality of data holders for field mapping, overlap sample ranges for sample alignment, and complementary feature dimensions for feature replenishment.

Description

Data circulation method and system based on privacy protection Technical Field The present invention relates to the field of data processing technologies, and in particular, to a data circulation method and system based on privacy protection. Background In the field of data collaboration and sharing, cross-organization data joint processing is considered as a key support for promoting technical innovation and service value improvement, and efficient integration and analysis of data can obviously optimize decision making and resource allocation. However, this field is faced with a serious contradiction between privacy protection and data utilization, and how to realize efficient collaboration of multiparty data without revealing sensitive information becomes a challenge to be overcome. Currently, although some approaches attempt to protect data privacy through encryption or anonymization means, these schemes tend to be frustrating when faced with complex scenarios. Particularly, when data needs to be merged by multiple data holders, the existing method is difficult to cope with the problems of inconsistent data formats, non-uniform time records and the like, so that the integrity of data is sacrificed or a large amount of time and resources are consumed for manual adjustment in the collaboration process, the efficiency is low, and errors are prone to occurring. A further challenge is the coordination of the core factor of how to achieve time consistency while protecting privacy in the data merging process. Because of the difference of the data of each party in the recording time, for example, the data of one party is recorded in units of hours, and the data of the other party is summarized in units of days, even if the time stamp is shifted, the inconsistency directly affects the accuracy and the usability of the combined data. For example, in financial transaction analysis, if multiparty data cannot be aligned to the same time point, a transaction trend may be misjudged, which may affect the decision result. Further, in the inter-organization collaboration, the transaction data of a certain bank may record every minute fluctuation, while the payment data of another institution is summarized every day, and if the time differences cannot be safely compared and adjusted in an encrypted state, a unified time reference is difficult to form, and the data value is greatly reduced. Disclosure of Invention The invention provides a data circulation method and a system based on privacy protection, which aim to realize automation and intellectualization of multi-dimensional data alignment and merging while guaranteeing data privacy through encryption technology, and remarkably improve the safety and efficiency of distributed data collaboration. In a first aspect, the present invention provides a data circulation method based on privacy protection, which mainly includes: The method comprises the steps of enabling a plurality of data holders to respectively carry out secret sharing on local data sets to generate share data, enabling the share data to be distributed on the data holders, comparing field modes, sample ranges and characteristic dimensions of the share data by the data holders based on a secure multiparty computing protocol in an encryption state to obtain comparison results, enabling the data holders to determine specific parts of data combination according to the comparison results and select transverse federation or longitudinal federation, combining the sample ranges and the longitudinal federation to combine the characteristic dimensions, enabling the data holders to execute selected transverse federation or longitudinal federation combination operation, processing the share data by using homomorphic encryption to obtain a combination result, enabling the data holders to negotiate a unified time axis standard in the encryption state through the secure multiparty computing protocol, calculating offset of each data set relative to the unified time axis standard by using time stamps of the share data, enabling the data holders to carry out time sequence alignment according to the offset to obtain time sequence aligned share data, and generating a combined data set. Further, the plurality of data holders compare field patterns, sample ranges and feature dimensions of the share data in an encryption state based on a secure multiparty computing protocol to obtain a comparison result, wherein the plurality of data holders encrypt the share data by homomorphic encryption to generate encrypted share data, calculate similarity of the field patterns, overlap of the sample ranges and complementation of the feature dimensions in the encrypted share data through the secure multiparty computing protocol, generate the comparison result according to the similarity, the overlap and the complementation, and the comparison result indicates alignable fields, overlap sample ranges and complementation feature dimensions, and e