Search

CN-121996693-A - Data processing method, device, equipment and readable storage medium

CN121996693ACN 121996693 ACN121996693 ACN 121996693ACN-121996693-A

Abstract

The application discloses a data processing method, a device, equipment and a readable storage medium, wherein a target business service triggering deduplication processing request is obtained, the deduplication processing request at least carries a received object identifier, a deduplication time range, deduplication information and a target business service identifier, partition creation parameters of a distributed bloom filter aiming at the target business service are determined from pre-cached target deduplication configuration information, a corresponding route partition identifier is determined according to the received object identifier and the partition creation parameters, at least one target partition corresponding to the deduplication time range is selected in a distributed service system according to the target business service identifier and the route partition identifier, and the received object identifier and the deduplication information are sent to bloom filters corresponding to each target partition, so that the bloom filter associated with each target partition executes deduplication detection based on the received object identifier and the deduplication information. Therefore, the duplicate removal of each business service can be managed uniformly, and the duplicate removal processing efficiency and accuracy are improved.

Inventors

  • ZHU JIANKAI
  • WU SHIDA
  • ZHAO GUOHAO

Assignees

  • 腾讯科技(深圳)有限公司

Dates

Publication Date
20260508
Application Date
20241108

Claims (15)

  1. 1. A method of data processing, comprising: Obtaining a de-duplication processing request triggered by a target service, wherein the de-duplication processing request at least carries a received object identifier, a de-duplication time range, de-duplication information and a target service identifier; Determining partition creation parameters of a distributed bloom filter aiming at the target business service from pre-cached target de-reconfiguration information according to the target business service identifier; determining a corresponding route partition identifier according to the received object identifier and the partition creation parameter, and selecting at least one target partition corresponding to the deduplication time range in a distributed service system according to the target business service identifier and the route partition identifier; The distributed service system comprises partitions which are created in advance for the reconfiguration information of different business services, and bloom filters associated with each partition; And sending the received object identification and the deduplication information to a bloom filter corresponding to each target partition, so that the bloom filter associated with each target partition executes deduplication detection based on the received object identification and the deduplication information.
  2. 2. The method according to claim 1, wherein selecting at least one target partition corresponding to the deduplication time range in a distributed service system according to the target traffic service identifier and the routing partition identifier comprises: determining a partition pre-created for the target business service in a distributed service system according to the target business service identifier; querying a plurality of candidate partitions corresponding to the route partition identifiers from the pre-created partitions; And determining at least one target partition corresponding to at least one target deduplication period contained in the deduplication time range from the plurality of candidate partitions.
  3. 3. The method of claim 2, wherein the determining, from the plurality of candidate partitions, at least one target partition corresponding to at least one target deduplication period included in the deduplication time range comprises: determining a plurality of target deduplication periods corresponding to the deduplication time range; and selecting a plurality of target partitions corresponding to the target deduplication periods from the plurality of candidate partitions.
  4. 4. A method according to any one of claims 1 to 3, wherein said determining a corresponding routing partition identity from said received object identity and said partition creation parameter comprises: determining the partition creation number of the distributed bloom filter of the target business service in a unit period based on the partition creation parameters; carrying out hash calculation on the received object identifier to obtain a candidate hash value; and performing modulo on the candidate hash values according to the partition creation number to obtain a route partition identification.
  5. 5. The method according to any one of claims 1 to 4, wherein determining partition creation parameters of a distributed bloom filter for the target traffic service from pre-cached target de-reconfiguration information according to the target traffic service identification comprises: Searching pre-cached target de-reconfiguration information corresponding to the target business service identifier from a cache space through a de-duplication checking service process; and determining partition creation parameters when creating a distributed bloom filter for the target business service according to the target reconfiguration information.
  6. 6. The method of claim 5, wherein prior to determining partition creation parameters for a distributed bloom filter for the target business service from pre-cached target de-reconfiguration information based on the target business service identification, the method further comprises: loading target reconfiguration information corresponding to the target business service identifier from a reconfiguration information public end through the duplication elimination detection service process; The public end of the de-reconfiguration information comprises de-reconfiguration information which is input by at least one business service aiming at de-duplication detection; And caching the loaded target reconfiguration information into the cache space.
  7. 7. The method of claim 6, wherein the selecting pre-creates at least one target partition corresponding to the deduplication time scale in a distributed service system based on the routing partition identification, the method further comprising: loading target reconfiguration information corresponding to the target business service identifier from the reconfiguration information public terminal; Determining the creation address, the deduplication duration information and the partition creation parameter of the distributed bloom filter according to the target deduplication configuration information; And creating a bloom filter corresponding to the corresponding partition in the creation address of the distributed service system according to the target business service identifier, the deduplication duration information and the partition creation parameter.
  8. 8. The method of claim 7, wherein creating a bloom filter corresponding to the respective partition in the creation address of the distributed service system according to the target business service identification, the deduplication duration information, and the partition creation parameter comprises: Generating a plurality of filter indexes corresponding to at least one deduplication period according to the target business service identifier, the deduplication duration information and the partition creation parameter; Sequentially detecting corresponding bloom filters in the creation address of the distributed service system according to a plurality of filter indexes corresponding to each deduplication period to obtain detection results; And determining a target filter index which is not created with a corresponding bloom filter according to the detection result, and creating a bloom filter of a partition corresponding to the target filter index in the creation address of the distributed service system.
  9. 9. The method of claim 8, wherein generating a plurality of filter indexes corresponding to at least one deduplication time period according to the target traffic service identification, the deduplication duration information, and the partition creation parameter comprises: determining a de-duplication time period range according to the de-duplication time length information; Determining a plurality of partition identifications to be created corresponding to a deduplication period according to the partition creation parameters; and constructing a plurality of filter indexes corresponding to each de-duplication period by combining the target business service identifier and each partition identifier to be created for each de-duplication period in the de-duplication period range.
  10. 10. The method according to any one of claims 1 to 9, wherein the obtaining the target traffic service triggered deduplication processing request comprises: Obtaining a deduplication processing request input by a target business service through a target interface of a deduplication checking service process; the request parameters defined by the target interface at least comprise a received object identifier, duplication removal information, a duplication removal time range and a target business service identifier.
  11. 11. The method of claim 10, wherein the type of the target interface includes a de-overwrite interface, and wherein after the sending the received object identification and the de-duplication information to the bloom filter corresponding to each target partition, the method further comprises: when the deduplication detection results of the bloom filters corresponding to each target partition are received and do not contain the deduplication information index, determining the target bloom filter of the target partition matched with the current time information; Constructing a deduplication information index according to the received object identifier and the deduplication information; and sending the deduplication information index to the target bloom filter for storage, and pushing target content corresponding to the deduplication information to a target receiving object corresponding to the receiving object identifier.
  12. 12. A data processing apparatus, comprising: The system comprises an acquisition unit, a target service processing unit and a processing unit, wherein the acquisition unit is used for acquiring a de-duplication processing request triggered by the target service, and the de-duplication processing request at least carries a received object identifier, a de-duplication time range, de-duplication information and a target service identifier; A determining unit, configured to determine, according to the target service identifier, partition creation parameters of a distributed bloom filter for the target service from pre-cached target deduplication configuration information; The selecting unit is used for determining a corresponding route partition identifier according to the received object identifier and the partition creation parameter, and selecting at least one target partition corresponding to the duplication elimination time range in a distributed service system according to the target business service identifier and the route partition identifier; The distributed service system comprises partitions which are created in advance for the reconfiguration information of different business services, and bloom filters associated with each partition; and the sending unit is used for sending the received object identification and the deduplication information to the bloom filter corresponding to each target partition, so that the bloom filter associated with each target partition executes deduplication detection based on the received object identification and the deduplication information.
  13. 13. A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the data processing method of any of claims 1 to 11 when executing the computer program.
  14. 14. A computer readable storage medium, characterized in that the computer readable storage medium stores a computer program adapted to be loaded by a processor for performing the data processing method of any of claims 1 to 11.
  15. 15. A computer program product comprising a computer program, characterized in that the computer program, when executed by a processor, implements the data processing method of any of claims 1 to 11.

Description

Data processing method, device, equipment and readable storage medium Technical Field The present application relates to the field of computer technologies, and in particular, to a data processing method, apparatus, device, and readable storage medium. Background The development of information technology has advanced the development of online business service industries, such as online shopping service industry, electronic ticket booking industry, game service industry, etc., and most of online business service industries relate to information push business. However, in the information push service of any business service, attention is paid to the problem of repeated pushing of information, and in order to avoid repeated pushing of information in a short period, it is necessary to perform deduplication processing for information push before pushing. The related art prevents the insertion of duplicate data by setting a unique constraint at the database level, or stores elements in a hash table manner and judges whether the stored elements are duplicated according to time complexity, or performs deduplication through a data structure of a third party service. In the research and practice process of the related technology, the inventor discovers that the related technology performs the duplicate removal detection in a database constraint, hash table or third party service mode, and the like, is only suitable for business services with basic duplicate removal detection requirements, cannot meet the duplicate removal detection requirements of vast business services, and is difficult to uniformly manage duplicate removal detection of each business service due to inconsistent duplicate removal detection standards among different business services, so that duplicate removal detection efficiency and accuracy of each business service are reduced. Disclosure of Invention The application provides a data processing method, a device, equipment and a readable storage medium, which can meet the requirement of the duplication elimination processing of vast business services, realize the unified management of the duplication elimination processing of each business service and improve the duplication elimination processing efficiency and accuracy of each business service. In order to solve the technical problems, the application provides the following technical scheme: the embodiment of the application provides a data processing method, which comprises the following steps: Obtaining a de-duplication processing request triggered by a target service, wherein the de-duplication processing request at least carries a received object identifier, a de-duplication time range, de-duplication information and a target service identifier; Determining partition creation parameters of a distributed bloom filter aiming at the target business service from pre-cached target de-reconfiguration information according to the target business service identifier; determining a corresponding route partition identifier according to the received object identifier and the partition creation parameter, and selecting at least one target partition corresponding to the deduplication time range in a distributed service system according to the target business service identifier and the route partition identifier; The distributed service system comprises partitions which are created in advance for the reconfiguration information of different business services, and bloom filters associated with each partition; And sending the received object identification and the deduplication information to a bloom filter corresponding to each target partition, so that the bloom filter associated with each target partition executes deduplication detection based on the received object identification and the deduplication information. Accordingly, an embodiment of the present application provides a data processing apparatus, including: The system comprises an acquisition unit, a target service processing unit and a processing unit, wherein the acquisition unit is used for acquiring a de-duplication processing request triggered by the target service, and the de-duplication processing request at least carries a received object identifier, a de-duplication time range, de-duplication information and a target service identifier; A determining unit, configured to determine, according to the target service identifier, partition creation parameters of a distributed bloom filter for the target service from pre-cached target deduplication configuration information; The selecting unit is used for determining a corresponding route partition identifier according to the received object identifier and the partition creation parameter, and selecting at least one target partition corresponding to the duplication elimination time range in a distributed service system according to the target business service identifier and the route partition identifier; The distributed service system comprises p