Search

CN-122001659-A - Method and related device for solving unbalanced data of digital networking terminal

CN122001659ACN 122001659 ACN122001659 ACN 122001659ACN-122001659-A

Abstract

The invention discloses a method and a related device for solving unbalanced data of a digital networking terminal, wherein the method comprises the steps of screening data of a first data party to obtain a sample data set; the method comprises the steps of sending a sample data set to a second data party, receiving hit information and the second data set returned by the second data party, sending the hit information to a first data party, receiving the first data set returned by the first data party, and exchanging the first data set with the second data set. According to the invention, the data of the first data party and the second data party are respectively reduced before the intersection, so that the first data set and the second data set are obtained, and the intersection of the first data set and the second data set is carried out, so that the data scale required to be processed in the intersection process is greatly reduced, and the unbalanced data intersection efficiency is improved.

Inventors

  • RU ZHIQIANG
  • LI JIAO
  • GUO YE
  • LAI CHUNJIANG
  • ZHOU BOYANG
  • CHEN ZHUO
  • WANG JIBIN

Assignees

  • 中移动信息技术有限公司
  • 中国移动通信集团有限公司

Dates

Publication Date
20260508
Application Date
20260211

Claims (11)

  1. 1. A method for solving unbalanced data of a digital network terminal is characterized by comprising the following steps: screening the data of the first data party to obtain a sample data set; The method comprises the steps of sending a sample data set to a second data party, and receiving hit information and a second data set returned by the second data party, wherein the hit information is determined according to the sample data set, the second data party combines data of the sample data set hit in data of the second data party to obtain the second data set, and the hit information represents information of second data in the second data set; the hit information is sent to the first data party, and a first data set returned by the first data party is received, wherein the first data party combines data conforming to the hit information in the data of the first data party to obtain the first data set, and the data volume of the first data party is smaller than the data volume of the second data party; And intersecting the first data set with the second data set.
  2. 2. The method for exchanging unbalanced data among terminals of the digital network of claim 1, wherein, Intersection of the first data set with the second data set includes: respectively carrying out data sub-bucket processing on the first data set and the second data set to obtain a plurality of first sub-bucket data of the first data set and a plurality of second sub-bucket data of the second data set; Aligning the first sub-bucket data with the second sub-bucket data, and intersecting the aligned first sub-bucket data with the second sub-bucket data.
  3. 3. The method for exchanging unbalanced data among the digital networking terminals according to claim 2, wherein, And intersecting the aligned first sub-bucket data and the aligned second sub-bucket data in a concurrent mode.
  4. 4. The method for evaluating unbalanced data of a digital network terminal according to claim 2 or 3, wherein, And adopting a low-bandwidth intersection algorithm to perform intersection on the aligned first sub-bucket data and the aligned second sub-bucket data.
  5. 5. The method for exchanging unbalanced data among the digital networking terminals according to claim 2, wherein, Respectively carrying out data barrel separation processing on the first data set and the second data set to obtain a plurality of first barrel separation data of the first data set and a plurality of second barrel separation data of the second data set, wherein the steps include: and adopting a bloom filter to perform data screening on the first data set and the second data set.
  6. 6. The method for exchanging unbalanced data among terminals of the digital network of claim 1, wherein, Screening the data of the first data party to obtain a sample data set, wherein the screening comprises the following steps: Determining an intersection field; Carrying out hash calculation on the intersection field in the data of the first data party to obtain abstract information; and carrying out fuzzy screening on the data of the first data party according to the abstract information to obtain the sample data set.
  7. 7. The method for exchanging unbalanced data between terminals of a digital network of claim 6 wherein, Screening the data of the first data party to obtain a sample data set, wherein the steps include: And carrying out de-duplication processing on the data of the first data side.
  8. 8. The utility model provides a device is handed over to unbalanced data of digital networking terminal which characterized in that includes: The screening module is used for screening the data of the first data party to obtain a sample data set; The first processing module is used for sending the sample data set to a second data party and receiving hit information and the second data set returned by the second data party, wherein the hit information is determined according to the sample data set, the second data party combines data of the second data party, which hit the sample data set, to obtain the second data set, and the hit information represents information of second data in the second data set; The second processing module is used for sending the hit information to the first data party and receiving a first data set returned by the first data party, wherein the first data party combines data conforming to the hit information in the data of the first data party to obtain the first data set, and the data volume of the first data party is smaller than the data volume of the second data party; And the intersection module is used for intersection of the first data set and the second data set.
  9. 9. An electronic device comprising a processor, a memory and a program or instruction stored on the memory and executable on the processor, which when executed by the processor, implements the steps of the method for exchanging unbalanced data at a digital network terminal as claimed in any one of claims 1 to 7.
  10. 10. A readable storage medium, wherein a program or an instruction is stored on the readable storage medium, and the program or the instruction realizes the steps in the method for solving unbalanced data of the digital networking terminal according to any one of claims 1 to 7 when being executed by a processor.
  11. 11. A computer program product comprising computer instructions which, when executed by a processor, implement the steps of the method of exchanging unbalanced data for a digital networking terminal as claimed in any one of claims 1 to 7.

Description

Method and related device for solving unbalanced data of digital networking terminal Technical Field The invention belongs to the technical field of data security, and particularly relates to a method and a related device for solving unbalanced data of a digital networking terminal. Background Unbalanced safety intersection is a protocol used for calculating intersection sets of two or more data sets in the fields of multiparty safety calculation and federal learning, in which each participant hopes to find out common elements (intersection sets) among the data sets without exposing the respective data, and the unbalanced safety intersection is suitable for scenes with large data set size difference among the participants, particularly relates to ultra-large-scale data (billions), and mainly aims at the large-scale data query requirements of government affairs at present, fully utilizes abundant data resources and realizes the billion-scale data batch query function. In the face of ultra-large scale unbalanced data scenes of the order of one billion to one hundred thousand, the existing unbalanced data intersection method is low in intersection efficiency. Disclosure of Invention The embodiment of the invention provides a method and a related device for solving the problem of low solving efficiency of the existing method for solving unbalanced data when aiming at a super-large-scale unbalanced data scene of one hundred million-hundred thousand levels. In a first aspect, a method for evaluating unbalanced data of a digital network terminal is provided, including: screening the data of the first data party to obtain a sample data set; The method comprises the steps of sending a sample data set to a second data party, and receiving hit information and a second data set returned by the second data party, wherein the hit information is determined according to the sample data set, the second data party combines data of the sample data set hit in data of the second data party to obtain the second data set, and the hit information represents information of second data in the second data set; the hit information is sent to the first data party, and a first data set returned by the first data party is received, wherein the first data party combines data conforming to the hit information in the data of the first data party to obtain the first data set, and the data volume of the first data party is smaller than the data volume of the second data party; And intersecting the first data set with the second data set. Optionally, the intersecting the first data set with the second data set includes: respectively carrying out data sub-bucket processing on the first data set and the second data set to obtain a plurality of first sub-bucket data of the first data set and a plurality of second sub-bucket data of the second data set; aligning the first sub-bucket data with the second sub-bucket data, and intersecting the aligned first sub-bucket data with the second sub-bucket data. Optionally, the aligned first sub-bucket data and the aligned second sub-bucket data are subjected to intersection in a concurrent mode. Optionally, a low-bandwidth intersection algorithm is adopted to perform intersection on the aligned first sub-bucket data and the aligned second sub-bucket data. Optionally, performing data binning processing on the first data set and the second data set respectively to obtain a plurality of first binned data of the first data set and a plurality of second binned data of the second data set, which includes: and adopting a bloom filter to perform data screening on the first data set and the second data set. Optionally, screening the data of the first data party to obtain a sample data set, including: Determining an intersection field; Carrying out hash calculation on the intersection field in the data of the first data party to obtain abstract information; and carrying out fuzzy screening on the data of the first data party according to the abstract information to obtain the sample data set. Optionally, screening the data of the first data party to obtain a sample data set, which includes: And carrying out de-duplication processing on the data of the first data side. In a second aspect, a device for evaluating unbalanced data of a digital network terminal is provided, including: The screening module is used for screening the data of the first data party to obtain a sample data set; The first processing module is used for sending the sample data set to a second data party and receiving hit information and the second data set returned by the second data party, wherein the hit information is determined according to the sample data set, the second data party combines data of the second data party, which hit the sample data set, to obtain the second data set, and the hit information represents information of second data in the second data set; The second processing module is used for sending the hit information to the f