CN-121996641-A - Database active-standby switching method and system based on performance self-adaptive evaluation

CN121996641ACN 121996641 ACN121996641 ACN 121996641ACN-121996641-A

Abstract

The application discloses a database active-standby switching method and system based on performance self-adaptive evaluation. According to the method, the multidimensional performance indexes of the main and standby database nodes are collected in real time, the service health score of each node is dynamically calculated based on the self-adaptive weight model, and the fundamental transition from 'recovery after failure' to 'prevention before performance degradation' is realized. When the health score of the main node is continuously lower than a preset threshold value and a standby node with a significantly higher health score exists, the system automatically triggers a preventive switching instruction and sequentially executes graceful degradation procedures of write protection, data catch-up, traffic switching and role flipping. The application effectively solves the problems of single dimension, switching trigger lag and rough process control of the state judgment of the traditional heartbeat mechanism, can accurately identify the sub-health risk of the survival of the process but unavailable service, ensures the consistency of data and service continuity of the switching process, obviously reduces the service influence, and is suitable for key service scenes with severe requirements on high availability, such as finance, telecom and the like.

Inventors

ZHU CHEN
Su Zhangyan

Assignees

广州海量数据库技术有限公司

Dates

Publication Date: 20260508
Application Date: 20260129

Claims (10)

1. A method for switching between a database master and a database slave based on performance adaptive evaluation, the method comprising: s1, collecting multidimensional performance index data of a main database node and at least one standby database node; S2, evaluating the multidimensional performance index data based on a self-adaptive weight model, and calculating service health scores of a main database node and each standby database node, wherein the self-adaptive weight model dynamically adjusts index weights according to real-time states of all performance indexes; S3, when the service health score of the main database node meets the switching trigger condition and the service health score is higher than the standby database node of the main database node, generating a preventive switching instruction; And S4, executing a graceful degradation switching process, and switching the service flow from the main database node to the target standby database node, wherein the graceful degradation switching process comprises write protection, data catch-up, flow switching and role turning operations which are sequentially executed.
2. The method of claim 1, wherein the multi-dimensional performance index data in step S1 comprises at least CPU utilization, memory usage, disk I/O latency, network round trip latency, transactions per second, or queries per second; the multi-dimensional performance index data is acquired through acquisition agents deployed at the database nodes and the underlying infrastructure, and the acquisition frequency is configured to be 1 to 30 seconds.
3. The method according to claim 1, wherein the adaptive weight model in step S2 includes a sliding window statistics module for maintaining a fixed length timing window for each performance indicator, and calculating moving averages and moving standard deviations within the timing window to characterize recent normal baselines and fluctuation ranges of the indicators.
4. The method according to claim 1, wherein the calculating of the service health score in step S2 comprises: The method comprises the steps of carrying out standardization processing on original values of each performance index, and converting the original values into standard scores of 0 to 10 points, wherein the standardization processing comprises a segmentation score based on a static threshold value and an inverse deviation score based on a dynamic baseline; Calculating self-adaptive weights based on basic weights and dynamic adjustment rules of all performance indexes; And synthesizing service health scores of 0 to 100 points according to the standard and the normalized self-adaptive weight, wherein the lower the score is, the worse the service health degree is.
5. The method according to claim 1, wherein the adaptive weight model dynamically adjusts the index weights according to the real-time status of each performance index in step S2, and the dynamic adjustment rule includes: The deviation degree amplifying rule is that when the deviation of the current value of the performance index relative to the moving average value exceeds K times of the moving standard deviation, K is a configurable parameter, the index weight is amplified according to the deviation degree, and the amplifying factor is positively related to the deviation degree; A trend deterioration enhancement rule for determining that the index weight is added with an increment factor based on a linear regression slope when the performance index shows a deterioration trend in three or more continuous evaluation periods; And (3) a key exception overhead rule, namely setting the index weight to be the maximum value to dominate the health score calculation and decision when any performance index exceeds an absolute safety red line, wherein the absolute safety red line comprises at least one of replication delay of more than 10 seconds, CPU utilization rate of more than 95% and lasting for 1 minute.
6. The method according to claim 1, wherein the handover triggering condition in step S3 comprises: The service health of the main database node is lower than a preset health sub-threshold value in three continuous evaluation periods; The service health score of the standby database node is higher than the service health score of the main database node by more than a preset difference value.
7. The method according to claim 1, wherein the graceful degradation switching procedure in step S4 specifically includes: write protection, namely sending a write protection instruction to the main database node to enable the main database node to enter a read-only mode and prevent new transaction writing; data pursuit, namely monitoring the replication delay of the target backup database node, and executing flow switching after the data delay of the target backup database node approaches zero or within a preset safety threshold; flow switching, namely smoothly guiding application write flow to the target backup database node through a database connection middleware or a virtual IP drifting technology; And (3) role-flipping, namely changing the target standby database node into a new main database node in a writable state, and degrading the original main database node into the standby database node.
8. A database primary-backup switching system based on performance adaptive evaluation, wherein the system is operative to implement the steps of the database primary-backup switching method based on performance adaptive evaluation as claimed in any one of claims 1 to 7, the system comprising: The multi-dimensional performance data acquisition module is used for acquiring multi-dimensional performance index data of the main database node and at least one standby database node; The self-adaptive weight health assessment module is in communication connection with the multi-dimensional performance data acquisition module and is used for assessing the multi-dimensional performance index data based on a self-adaptive weight model and calculating service health scores of the main database node and each standby database node, wherein the self-adaptive weight model dynamically adjusts index weights according to real-time states of all performance indexes; the switching instruction generation module is used for generating a preventive switching instruction when the service health score of the main database node meets the switching trigger condition and the service health score is higher than that of the standby database node of the main database node; the active switching and graceful degradation module is in communication connection with the self-adaptive weight health assessment module and is used for executing a graceful degradation switching flow to switch the service flow from the main database node to the target standby database node, wherein the graceful degradation switching flow comprises write protection, data catch-up, flow switching and role turning operations which are sequentially executed.
9. The system of claim 8, wherein the system further comprises a controller configured to control the controller, The multi-dimensional performance data acquisition module comprises a CPU utilization rate monitoring unit, a memory utilization rate monitoring unit, a disk I/O delay monitoring unit, a network round trip delay monitoring unit and a transaction number per second/query number per second monitoring unit; The self-adaptive weight health assessment module comprises a sliding window statistics sub-module, an index standardization sub-module, a self-adaptive weight calculation sub-module, a service health score synthesis sub-module, a service health score analysis sub-module and a service health score analysis module, wherein the sliding window statistics sub-module is used for maintaining a time sequence window with fixed length for each performance index and calculating a moving average value and a moving standard deviation; the active switching and graceful degradation module comprises an active decision logic sub-module and a graceful degradation sub-module, wherein the active decision logic sub-module is used for judging whether a switching triggering condition is met or not, and the graceful degradation sub-module is used for sequentially executing write protection, data catch-up, flow switching and role overturning operation.
10. An electronic device is characterized by comprising a memory and a processor; A memory for storing a computer program; A processor for executing the computer program to implement the steps of the database active-standby switching method based on performance adaptive evaluation according to any one of claims 1-7.

Description

Database active-standby switching method and system based on performance self-adaptive evaluation Technical Field The present application relates to the field of database management technologies, and in particular, to a method and a system for switching between a database master and a database slave based on performance adaptive evaluation, a computer readable storage medium, and an electronic device. Background The high availability of the database as a core data carrier for the information system directly determines the continuity and stability of the upper layer traffic. In the background that the primary-backup replication architecture has become a mainstream high availability solution, the industry commonly adopts a heartbeat detection mechanism to implement failover. The mechanism performs communication by periodically sending heartbeat packages between the main library and the standby library, and if the standby library does not receive the heartbeat of the main library within a preset time, the main library is judged to be down and the switching is triggered. However, this binary survival state-based judgment mechanism has inherent defects, and it is difficult to satisfy the high requirements of modern business on database service quality: (1) The state judgment dimension is single, and sub-health risks cannot be identified The heartbeat mechanism can only judge whether the main library process survives, and belongs to Boolean judgment. For the "dead" state, i.e. the process survives but cannot provide effective service, caused by the exhaustion of the resources (such as continuous saturation of the CPU), internal blocking (such as lock waiting, slow query stacking) or network performance degradation (such as delay surge and packet loss) of the main library, the heartbeat mechanism is completely disabled. This will result in the system still erroneously maintaining the master connection in case the service has actually been interrupted, resulting in a long service unavailability. (2) Switching trigger mode is passive and failure prevention cannot be achieved The prior art is a typical "remedy after failure" strategy, and the switching process needs to be started after an explicit failure event occurs, i.e. the main library is completely unavailable (the heartbeat is overtime). This means that the system cannot intervene in the early stages of progressive deterioration of performance and continuous decline of service capacity, losing the critical time window for avoiding service interruption, and making it difficult to implement the transition from "failback" to "risk prevention". (3) The control of the switching process is rough, and the business influence is obvious The traditional scheme usually adopts a hard switching mode when triggering switching, and lacks the fine guarantee of data consistency and service continuity. And meanwhile, the standby library instantaneously receives all traffic under the condition of no preloading or preheating, so that the performance jitter is easy to be caused, and the service is delayed or interrupted again after switching. In summary, the existing database active-standby switching technology based on heartbeat detection has obvious defects in terms of state awareness foresight, switching trigger timeliness and process control accuracy, and is difficult to meet the severe requirements on continuity, consistency and stability of database services in high-requirement scenes such as finance, telecom, internet and the like. Disclosure of Invention In order to overcome the defect that the existing active-standby switching technology relies on single heartbeat detection, the application provides a novel database active-standby switching method and system based on performance adaptive evaluation. The application synthesizes the multidimensional performance index into a single and accurate service health score by constructing a weight model based on real-time statistical feedback, and drives active and elegant active and standby switching by the weight model, thereby realizing the fundamental transition from 'recovery after failure' to 'prevention before service degradation'. Specifically, the application provides the following technical scheme: The first aspect of the application provides a database active-standby switching method based on performance adaptive evaluation, which comprises the following steps: s1, collecting multidimensional performance index data of a main database node and at least one standby database node; S2, evaluating the multidimensional performance index data based on a self-adaptive weight model, and calculating service health scores of a main database node and each standby database node, wherein the self-adaptive weight model dynamically adjusts index weights according to real-time states of all performance indexes; S3, when the service health score of the main database node meets the switching trigger condition and the service health score is