CN-122019266-A - Database fault recovery method, device, equipment and medium
Abstract
The invention discloses a fault recovery method, device, equipment and medium of a database, belonging to the field of database data recovery, wherein the method comprises the steps of sending preset Binlog information to each consumer cluster in a normal communication state with the consumer cluster, so that each consumer cluster updates a stored sequence number range set according to the preset Binlog information; and sending recovery processing data to the failed consumer nodes according to the recovery request so that the failed consumer nodes can carry out recovery processing according to the recovery processing data. According to the invention, the traditional single-point locus record is replaced by range set tracking, so that the gaps of a plurality of data missing can be immediately identified, the processing efficiency of fault recovery is improved, the recovery time is greatly shortened, the middle gap is not skipped during recovery, the condition that missing is caused during recovery is avoided, and the recovery precision is improved.
Inventors
- LIN XIZHEN
- FANG QIFENG
- YU JIAFA
- FENG SHIQUAN
- WANG BEIBEI
- XIE ZHOULONG
- CAI ZHENHAO
Assignees
- 广州市双照电子科技有限公司
Dates
- Publication Date
- 20260512
- Application Date
- 20260114
Claims (10)
- 1. A method of fault recovery of a database, the method being applicable to a producer cluster of a distributed database, the producer cluster being connected to at least one consumer cluster, the method comprising: Transmitting preset Binlog information to each consumer cluster in a normal communication state with the consumer clusters, so that each consumer cluster updates a stored sequence number range set according to the preset Binlog information, wherein the preset Binlog information is a Binlog file with a sequence number; When any consumer node of the consumer cluster fails, acquiring a recovery request of a processing node, wherein the processing node is a new node added to the consumer cluster, and the recovery request is a request generated after the processing node identifies a missing sequence number range based on a sequence number range set stored by the failed consumer node; And sending recovery processing data to the failed consumer node according to the recovery request, so that the failed consumer node can carry out recovery processing according to the recovery processing data, wherein the recovery processing data is data searched according to the missing sequence number range of the recovery request.
- 2. The method of claim 1, wherein the operation of generating the preset Binlog information comprises: When executing writing transaction to any database partition, generating an initial Binlog file; distributing a corresponding DR sequence number to the initial Binlog file to obtain a sequence Binlog file, wherein the DR sequence number is a sequence number which starts from 0 and is unique in a database partition; And storing the sequence Binlog file into a local transmission queue to obtain preset Binlog information, wherein the local transmission queue is a buffer structure for accumulating the preset Binlog information to be transmitted and is first-in first-out.
- 3. The method for recovering from a database according to claim 2, wherein said sending the preset Binlog information to each consumer cluster comprises: When a preset trigger condition is met, extracting a corresponding number of sequence Binlog files from a local transmission queue based on the preset number, wherein the preset trigger condition comprises the number of the sequence Binlog files stored in the local transmission queue, the interval duration of the sequence Binlog files sent or the service priority of a consumption node; Encapsulating the sequence Binlog file into one or more Binlog data packets, wherein each Binlog data packet comprises a partition ID, a sequence number list, a sequence Binlog file, a batch identification and verification information; And sending the Binlog data packet to a consumer cluster according to a transmission protocol and the capacity of the Binlog data packet.
- 4. The method of claim 3, wherein the sending the Binlog packets to the consumer cluster according to the transmission protocol and the capacity of the Binlog packets comprises: If the capacity of the Binlog data packet is smaller than or equal to the preset capacity, the Binlog data packet is sent to a consumer cluster according to a transmission protocol; If the capacity of the Binlog data packet is larger than the preset capacity, the Binlog data packet is segmented into a plurality of segmentation fragments, and the segmentation fragments are respectively sent to a consumer cluster according to a transmission protocol, wherein each segmentation fragment comprises a serial number and a partition ID.
- 5. A method for recovering from a database failure, the method being applicable to an added node of a consumer cluster of a distributed database, the consumer cluster being connected to a producer cluster, the method comprising: When any consumer node of a consumer cluster fails, identifying a missing sequence number range based on a sequence number range set stored by the failed consumer node, and generating a corresponding recovery request, wherein the sequence number range set stored by the failed consumer node is a sequence number range obtained by updating the stored sequence number range set according to preset Binlog information sent by a producer cluster and received when the consumer node is in normal communication with the producer cluster; And sending the recovery request to the producer cluster so that the producer cluster searches recovery processing data according to the missing sequence number range of the recovery request, and sending the recovery processing data to the failed consumer node for the failed consumer node to carry out recovery processing according to the recovery processing data.
- 6. The method of claim 5, wherein the identifying missing sequence number ranges based on the set of sequence number ranges stored by the failed consumer node and generating the corresponding recovery request comprises: Reading a sequence number range set stored by a failed consumer node, wherein the stored sequence number range set is a sequence number range set which is stored by the failed consumer node for each data partition responsible for and is checked; Ordering the stored sequence number range set, determining a global synchronization point and a maximum sequence number, and determining a scanning sequence range of the stored sequence number range set based on the global synchronization point and the maximum sequence number; identifying a plurality of missing sequence number ranges in the scanning sequence range by a preamble gap analysis mode or an internal gap analysis mode; and generating a recovery request by adopting a plurality of missing sequence number ranges, wherein the recovery request comprises a missing range list and an increment stream starting point, and the missing range list comprises a plurality of missing sequence number ranges.
- 7. A database fault recovery apparatus, the apparatus being adapted for a producer cluster of a distributed database, the producer cluster being connected to at least one consumer cluster, the apparatus comprising: The information sending module is used for sending preset Binlog information to each consumer cluster in a normal communication state with the consumer clusters so that each consumer cluster can update a stored sequence number range set according to the preset Binlog information, wherein the preset Binlog information is a Binlog file with a sequence number; The acquisition request module is used for acquiring a recovery request of a processing node when any consumer node of the consumer cluster fails, wherein the processing node is a new node added to the consumer cluster, and the recovery request is a request generated after the processing node identifies a missing sequence number range based on a sequence number range set stored by the failed consumer node; And the data sending module is used for sending recovery processing data to the failed consumer node according to the recovery request so as to enable the failed consumer node to carry out recovery processing according to the recovery processing data, wherein the recovery processing data is data searched according to the missing sequence number range of the recovery request.
- 8. A fault recovery apparatus for a database, the apparatus being adapted for an added node of a consumer cluster of a distributed database, the consumer cluster being connected to a producer cluster, the apparatus comprising: The system comprises a generation request module, a recovery request module and a storage module, wherein the generation request module is used for identifying a missing sequence number range based on a sequence number range set stored by a failed consumer node when any consumer node of the consumer cluster fails, and generating a corresponding recovery request, wherein the sequence number range set stored by the failed consumer node is a sequence number range obtained by updating a stored sequence number range set according to preset Binlog information which is sent by a producer cluster and received when the consumer node is in normal communication with the producer cluster; And the sending request module is used for sending the recovery request to the producer cluster so as to enable the producer cluster to search the recovery processing data according to the missing sequence number range of the recovery request, and sending the recovery processing data to the failed consumer node for the failed consumer node to carry out recovery processing according to the recovery processing data.
- 9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method of recovering from a fault of a database according to any one of claims 1-4 or the method of recovering from a fault of a database according to any one of claims 5-6 when executing the program.
- 10. A computer-readable storage medium storing computer-executable instructions for causing a computer to perform the method of recovering a database according to any one of claims 1 to 4 or the method of recovering a database according to any one of claims 5 to 6.
Description
Database fault recovery method, device, equipment and medium Technical Field The present invention relates to the field of database data recovery technologies, and in particular, to a method, an apparatus, a device, and a medium for recovering a database from a failure. Background Nodes of the distributed database are deployed on different machines, different partitions are processed by different sites (sites) in the nodes, and the sites (sites) of each Node have independent CPUs and memories, so that transactions are processed in a single-line Cheng Chuanhang lock-free mode, and the processing efficiency can be improved. And there is no shared content between copies, which is prone to data divergence due to non-deterministic operations (such as local computation of time stamps, pseudo-random number generation), network failures, and the like. Because each copy independently processes the same transaction in parallel, once the database fails, each copy needs to be recovered, so that the content and the format of each copy are completely consistent, and the service logic errors are avoided. One common failure recovery method is the traditional site-based database replication method, which relies on either recording an exact replication site (e.g., binlog filename and offset) or using a "high water mark" (e.g., log ID of the last application), taking the site or mark as the starting point to extract the data for the consumer to continue the data from the producer. The method has the following technical problems that a distributed database often comprises a plurality of data partitions, a consumer needs to maintain site information for each partition, the layout is complex, if any site information is inaccurate or loading is delayed, data loss or repeated application can be caused, data inconsistency is caused, recovered data is biased, continuous transmission is carried out by using marks or sites, a gap in the middle can be skipped, permanent data loss is caused, and data bias is further increased. Disclosure of Invention The invention provides a fault recovery method, device, equipment and medium for a database, which can solve the technical problem that the data recovery in the prior art has deviation. A first aspect of an embodiment of the present invention provides a method for recovering from a failure of a database, the method being applicable to a producer cluster of a distributed database, the producer cluster being connected to at least one consumer cluster, the method comprising: Transmitting preset Binlog information to each consumer cluster in a normal communication state with the consumer clusters, so that each consumer cluster updates a stored sequence number range set according to the preset Binlog information, wherein the preset Binlog information is a Binlog file with a sequence number; When any consumer node of the consumer cluster fails, acquiring a recovery request of a processing node, wherein the processing node is a new node added to the consumer cluster, and the recovery request is a request generated after the processing node identifies a missing sequence number range based on a sequence number range set stored by the failed consumer node; And sending recovery processing data to the failed consumer node according to the recovery request, so that the failed consumer node can carry out recovery processing according to the recovery processing data, wherein the recovery processing data is data searched according to the missing sequence number range of the recovery request. A second aspect of an embodiment of the present invention provides a method for recovering from a failure of a database, the method being applicable to an added node of a consumer cluster of a distributed database, the consumer cluster being connected to a producer cluster, the method comprising: When any consumer node of a consumer cluster fails, identifying a missing sequence number range based on a sequence number range set stored by the failed consumer node, and generating a corresponding recovery request, wherein the sequence number range set stored by the failed consumer node is a sequence number range obtained by updating the stored sequence number range set according to preset Binlog information sent by a producer cluster and received when the consumer node is in normal communication with the producer cluster; And sending the recovery request to the producer cluster so that the producer cluster searches recovery processing data according to the missing sequence number range of the recovery request, and sending the recovery processing data to the failed consumer node for the failed consumer node to carry out recovery processing according to the recovery processing data. A third aspect of an embodiment of the present invention provides a fault recovery apparatus for a database, the apparatus being adapted for a producer cluster of a distributed database, the producer cluster being connected to at least one consum