CN-122019501-A - Database data synchronization method, device, equipment and medium

CN122019501ACN 122019501 ACN122019501 ACN 122019501ACN-122019501-A

Abstract

The invention discloses a data synchronization method, a device, equipment and a medium of a database, which belong to the field of data synchronization, wherein the method is suitable for a producer cluster of a distributed database, and the producer cluster is in communication connection with a plurality of consumer clusters; and sending corresponding patch data to each consumer cluster according to the patch request uploaded by each consumer cluster so as to enable the consumer clusters to carry out data synchronization by using the patch data. According to the method, the Binlog information is sent in parallel, so that a plurality of consumer clusters synchronously identify missing data, the partitions of each consumer cluster do not need to be identified and tracked one by one, the processing time consumption is greatly shortened, and the data synchronization efficiency is remarkably improved.

Inventors

LIN XIZHEN
FANG QIFENG
YU JIAFA
FENG SHIQUAN
WANG BEIBEI
XIE ZHOULONG
CAI ZHENHAO

Assignees

广州市双照电子科技有限公司

Dates

Publication Date: 20260512
Application Date: 20260114

Claims (10)

1. A method of data synchronization of a database, the method being applicable to a producer cluster of a distributed database, the producer cluster being communicatively connected to a number of consumer clusters, the method comprising: constructing Binlog information, wherein the Binlog information is generated by binding monotonically increasing DR serial numbers of each partition with Binlog events; The Binlog information is sent to a plurality of consumer clusters in parallel, so that each consumer cluster identifies a plurality of missing serial numbers in a corresponding partition according to the serial numbers of the Binlog information; And sending corresponding complement data to each consumer cluster according to the complement request uploaded by each consumer cluster, and carrying out data synchronization by the consumer clusters by utilizing the complement data, wherein the complement request is a request constructed by the consumer clusters according to a plurality of missing serial numbers of the identified partitions.
2. The method for synchronizing data of a database according to claim 1, wherein said constructing Binlog information comprises: Determining Binlog events of a target partition, wherein the target partition is an extended added partition or a partition for executing a writing transaction; Acquiring a DR sequence number corresponding to a target partition according to the Binlog event, wherein the DR sequence number is a sequence number range constructed by monotonously increasing and counting according to an executed event through a counter established by the target partition when the partition is created or initialized; binding the DR sequence number and the Binlog event to obtain Binlog information.
3. The method for synchronizing data in a database according to claim 2, wherein the obtaining the DR sequence number corresponding to the target partition according to the Binlog event includes: determining corresponding Binlog strip values according to the transaction type of the Binlog event, and acquiring real-time count values of counters of the target partition; Determining an initial sequence number range by adopting the Binlog strip value and the real-time count value; And dynamically adjusting the initial sequence number range value and updating and verifying to obtain a DR sequence number, wherein the dynamic adjustment is to adjust the maximum value of the initial sequence number range according to a preset reserved buffer or a preset allocation step length, and the updating and verifying is to detect the validity of the sequence number range after the initial sequence number range updates the counter of the target partition.
4. The method for synchronizing data of a database according to claim 1, wherein said sending said Binlog information in parallel to a number of said consumer clusters comprises: Distributing the Binlog information to corresponding partition tasks according to the partition quantity of a database, and matching threads corresponding to the partition tasks in a preset thread pool so that each thread can carry out batch reading and serial number packaging processing on the Binlog information of the partition tasks to obtain a Binlog data packet; And calling a preset network transmission interface to send the Binlog data packet to a plurality of consumer clusters, and clearing a cache after receiving a confirmation signal fed back by the consumer clusters, wherein the confirmation signal is returned to the corresponding producer cluster after the consumer clusters receive the Binlog data packet and verify the serial number continuity of the Binlog data packet.
5. A method of data synchronization of a database, the method being applicable to a consumer cluster of a distributed database, the consumer cluster being communicatively coupled to a producer cluster, the method comprising: The method comprises the steps of obtaining Binlog information sent by a producer cluster, wherein the Binlog information is generated by binding monotonically increasing DR serial numbers of each partition with Binlog events by the producer cluster; Identifying a plurality of missing serial numbers in the corresponding subareas according to the serial numbers of the Binlog information; And acquiring the filling data from the producer cluster based on the missing sequence numbers, and updating the data by utilizing the filling data so as to synchronize the partition storage data with the data stored by the producer cluster.
6. The method for synchronizing data of a database according to claim 5, wherein the identifying a number of missing sequence numbers in the corresponding partition according to the sequence number of the Binlog information comprises: Extracting a sequence number to be updated and a partition ID from the Binlog information, wherein the sequence number to be updated is a sequence number which completes verification, and the verification is to verify the numerical value of the sequence number, the maximum sequence number of the sequence number and the range of the sequence number; Acquiring a real-time sequence range from an embedded tracker of the partition ID, wherein the embedded tracker is a mapping table of a partition ID and a sequence range set of a producer cluster; traversing all intervals of the real-time sequence range, and determining attribution relations between the sequence numbers to be updated and all intervals of the real-time sequence range; And determining a plurality of sequence expansion ranges based on the attribution relation, and acquiring the sequence number of each sequence expansion range to obtain a plurality of missing sequence numbers.
7. A data synchronization device for a database, the device being adapted for a producer cluster of a distributed database, the producer cluster being communicatively connected to a number of consumer clusters, the device comprising: the construction module is used for constructing Binlog information, wherein the Binlog information is generated by binding monotonically increasing DR serial numbers of each partition and Binlog events; the parallel sending module is used for sending the Binlog information to a plurality of consumer clusters in parallel so that each consumer cluster can identify a plurality of missing serial numbers in a corresponding partition according to the serial numbers of the Binlog information; And sending data to send corresponding complement data to each consumer cluster according to the complement request uploaded by each consumer cluster, wherein the complement request is a request constructed by the consumer clusters according to a plurality of missing serial numbers of the identified partitions, and the consumer clusters utilize the complement data to carry out data synchronization.
8. A data synchronization device for a database, the device being adapted for a consumer cluster of a distributed database, the consumer cluster being communicatively connected to a producer cluster, the device comprising: The acquisition module is used for acquiring Binlog information sent by a producer cluster, wherein the Binlog information is generated by binding monotonically increasing DR serial numbers of each partition with Binlog events by the producer cluster; the identification module is used for identifying a plurality of missing serial numbers in the corresponding subareas according to the serial numbers of the Binlog information; And the synchronization module is used for acquiring the filling data from the producer cluster based on the missing serial numbers, and carrying out data updating by utilizing the filling data so as to synchronize the partition storage data with the data stored by the producer cluster.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method of data synchronization of a database according to any one of claims 1-4 or the method of data synchronization of a database according to any one of claims 5-6 when executing the program.
10. A computer-readable storage medium storing computer-executable instructions for causing a computer to perform the method of data synchronization of a database according to any one of claims 1-4 or the method of data synchronization of a database according to any one of claims 5-6.

Description

Database data synchronization method, device, equipment and medium Technical Field The present invention relates to the field of copy data processing technologies, and in particular, to a method, an apparatus, a device, and a medium for synchronizing data in a database. Background Nodes of the distributed database are deployed on different machines, different partitions are processed by different sites (sites) in the nodes, and the sites (sites) of each Node have independent CPUs and memories for transaction processing, so that the processing efficiency can be improved. In a distributed environment, however, such multiple copy mechanisms tend to cause data inconsistencies. The data synchronization can ensure that the data on each node is consistent, and problems caused by the outdated or erroneous data of part of the nodes are avoided. For example, a Raft algorithm or other consistency protocol coordinates the order of operations between copies through a synchronization mechanism to prevent data collisions. In order to realize the data synchronization of each node of the database, the existing common data synchronization method is to update the replication site corresponding to each partition periodically, identify the missing sequence by taking the replication site as the starting point, and then transmit data to fill the missing part, thereby realizing the data synchronization of different partitions. However, the method has obvious technical defects that the distributed database usually comprises thousands of data partitions, the basic sites of the partitions are different, and each partition needs to be identified and tracked one by one, so that the processing time is long and the efficiency is low. Disclosure of Invention The invention provides a data synchronization method, device, equipment and medium of a database, which can solve the technical problem of low processing efficiency of data synchronization in the prior art. A first aspect of an embodiment of the present invention provides a method for synchronizing data of a database, the method being applicable to a producer cluster of a distributed database, the producer cluster being communicatively connected to a plurality of consumer clusters, the method comprising: constructing Binlog information, wherein the Binlog information is generated by binding monotonically increasing DR serial numbers of each partition with Binlog events; The Binlog information is sent to a plurality of consumer clusters in parallel, so that each consumer cluster identifies a plurality of missing serial numbers in a corresponding partition according to the serial numbers of the Binlog information; And sending corresponding complement data to each consumer cluster according to the complement request uploaded by each consumer cluster, and carrying out data synchronization by the consumer clusters by utilizing the complement data, wherein the complement request is a request constructed by the consumer clusters according to a plurality of missing serial numbers of the identified partitions. A second aspect of an embodiment of the present invention provides a method for synchronizing data of a database, the method being applicable to a consumer cluster of a distributed database, the consumer cluster being communicatively connected to a producer cluster, the method comprising: The method comprises the steps of obtaining Binlog information sent by a producer cluster, wherein the Binlog information is generated by binding monotonically increasing DR serial numbers of each partition with Binlog events by the producer cluster; Identifying a plurality of missing serial numbers in the corresponding subareas according to the serial numbers of the Binlog information; And acquiring the filling data from the producer cluster based on the missing sequence numbers, and updating the data by utilizing the filling data so as to synchronize the partition storage data with the data stored by the producer cluster. A third aspect of an embodiment of the present invention provides a data synchronization apparatus for a database, the apparatus being adapted for a producer cluster of a distributed database, the producer cluster being communicatively connected to a number of consumer clusters, the apparatus comprising: the construction module is used for constructing Binlog information, wherein the Binlog information is generated by binding monotonically increasing DR serial numbers of each partition and Binlog events; the parallel sending module is used for sending the Binlog information to a plurality of consumer clusters in parallel so that each consumer cluster can identify a plurality of missing serial numbers in a corresponding partition according to the serial numbers of the Binlog information; And sending data to send corresponding complement data to each consumer cluster according to the complement request uploaded by each consumer cluster, wherein the complement request is a request const