CN-122027453-A - Switching method and device of fault node, storage medium and electronic equipment
Abstract
The application discloses a switching method and device of a fault node, a storage medium and electronic equipment, and relates to the technical field of computers, wherein the method comprises the steps of determining current performance parameter sets respectively matched with at least two candidate nodes in candidate node sets under the condition that the sending state of a heartbeat signal matched with a target node at a first moment is detected to meet an abnormal state condition; the method comprises the steps of calculating a plurality of performance parameter values in a current performance parameter set to obtain priority evaluation values of candidate nodes corresponding to the performance parameter values, determining the candidate nodes meeting priority conditions determined based on the priority evaluation values as target candidate nodes, and establishing communication connection by taking the target candidate nodes as new target nodes at the target moment when the target candidate nodes finish data recovery operation, so that the technical problem of inaccurate switching of fault nodes in the prior art is solved.
Inventors
- FENG JIAJIA
- Deng can
Assignees
- 济南浪潮数据技术有限公司
Dates
- Publication Date
- 20260512
- Application Date
- 20260130
Claims (10)
- 1. A method for switching a failed node, comprising: under the condition that the sending state of the heartbeat signal matched with the target node at the first moment is detected to meet an abnormal state condition, determining current performance parameter sets respectively matched with at least two candidate nodes in a candidate node set, wherein the abnormal state condition is used for indicating that the target node is in an abnormal working state, the current performance parameter sets are used for indicating the current health condition of the candidate nodes, and the candidate nodes are used for synchronizing data for conducting transaction processing on the target node; Calculating a plurality of performance parameter values in the current performance parameter set to obtain priority evaluation values of the candidate nodes corresponding to the performance parameter values; Determining the candidate node which is determined based on the priority evaluation value and meets a priority condition as a target candidate node, wherein the priority condition is used for indicating that the health condition of the candidate node meets the requirement of the transaction; and at the target moment when the target candidate node finishes the data recovery operation, establishing communication connection by taking the target candidate node as a new target node.
- 2. The method of claim 1, wherein the step of determining the position of the substrate comprises, Calculating a plurality of performance parameter values in the current performance parameter set to obtain priority evaluation values of the candidate nodes corresponding to the performance parameter values, wherein the priority evaluation values comprise: determining available resource assessment scores matched with the candidate nodes according to a plurality of performance parameter values in a first parameter subset, which indicates the current resource use condition of the candidate nodes, in the performance parameter set; determining a load assessment score matched with the candidate node according to a plurality of performance parameter values in a second parameter subset of the performance parameter set, wherein the second parameter subset indicates the current load condition of the candidate node; determining a communication evaluation value matched with the candidate node according to a plurality of performance parameter values in a third parameter subset of the performance parameter set, wherein the third parameter subset indicates the communication delay condition of the candidate node; Determining a data synchronization evaluation score matched with the candidate node according to a plurality of performance parameter values in a fourth parameter subset of the performance parameter sets, wherein the fourth parameter subset indicates the data synchronization delay condition of the candidate node; And carrying out weighted summation calculation on the available resource evaluation scores, the load evaluation scores, the communication evaluation scores and the data synchronization evaluation scores to obtain the priority evaluation scores of the candidate nodes.
- 3. The method of claim 2, wherein the step of determining the position of the substrate comprises, Determining the candidate nodes satisfying a priority condition based on the priority assessment score, including at least one of: Determining a change trend of the resource utilization rate corresponding to each candidate node, and determining that the candidate node meets the priority condition when the priority evaluation score is greater than a first threshold value and the change trend of the resource utilization rate is a decreasing trend; Determining a load change trend corresponding to each candidate node, and determining that the candidate node meets the priority condition when the priority evaluation score is greater than a second threshold and the load change trend meets a stable fluctuation condition; Determining historical average communication delays corresponding to the candidate nodes respectively, and determining that the candidate nodes meet the priority condition when the priority evaluation score is larger than a third threshold value and the historical average communication delay is smaller than a fourth delay; and determining data synchronization delays corresponding to the candidate nodes respectively, and determining that the candidate nodes meet the priority condition when the priority evaluation value is larger than a second threshold value and the data synchronization delay is smaller than a fifth delay threshold value.
- 4. The method of claim 1, wherein the step of determining the position of the substrate comprises, After the target candidate node completes the target moment of the data recovery operation and establishes communication connection with the target candidate node as a new target node, the method comprises the following steps: Determining the latest log sequence identification of the total recovery data obtained by the target candidate node executing the data recovery operation; Determining a first log sequence identifier corresponding to the target node at the first moment; executing full data synchronization operation on the corresponding target node at the first moment under the condition that the difference value between the latest log sequence identifier and the first log sequence identifier is larger than a target threshold value; and executing incremental data synchronization operation on the corresponding target node at the first moment under the condition that the difference value between the latest log sequence identification and the first log sequence identification is smaller than or equal to the target threshold value.
- 5. The method of claim 1, wherein the step of determining the position of the substrate comprises, Determining that the transmission state of the heartbeat signal matched with the target node meets an abnormal state condition comprises at least one of the following: Receiving the heartbeat signal in response to a signal receiving instruction, sending a detection signal to the target node under the condition that the heartbeat signal is failed to be received continuously N times, and determining that the abnormal state condition is met under the condition that the target node does not respond to the detection signal within a target time period, wherein N is a natural number larger than 1; Determining a plurality of monitoring nodes from the candidate node set, wherein the monitoring nodes receive the heartbeat signals in response to the signal receiving instructions, and determining that the abnormal state condition is met under the condition that a target number of monitoring nodes fail to receive the heartbeat signals; monitoring the sending frequency of the heartbeat signal sent by the target node under ideal load conditions, and determining that the abnormal state condition is met under the condition that the sending frequency is larger than a first frequency threshold or smaller than a second frequency threshold.
- 6. The method of claim 1, wherein the step of determining the position of the substrate comprises, Before detecting the sending state of the heartbeat signal matched with the target node at the first moment, the method comprises the following steps: determining a target time for generating the full transaction data and a target time interval for generating the incremental transaction data; The target node applies for a storage lock service at the target time of generating the full transaction data and at a reference time of generating the incremental transaction data, wherein the reference time is determined according to the target time and the target time interval, the storage lock service is used for completing commit of a running transaction and prohibiting commit of a new transaction.
- 7. The method of claim 1, wherein the step of determining the position of the substrate comprises, Establishing communication connection with the target candidate node as a new target node, including: Applying for a role lock service for the target candidate node, wherein the role lock service is used for prohibiting the candidate nodes except the target candidate node in the candidate node set from being switched to the new target node; and reestablishing communication connection with the client according to the target candidate address matched with the target candidate node, and adding the target node at the first moment into the candidate node set as a new candidate node.
- 8. A switching device of a faulty node, comprising: The first determining unit is used for determining current performance parameter sets respectively matched with at least two candidate nodes in the candidate node sets under the condition that the sending state of the heartbeat signal matched with the target node at the first moment is detected to meet abnormal state conditions, wherein the abnormal state conditions are used for indicating that the target node is in an abnormal working state, the current performance parameter sets are used for indicating the current health condition of the candidate node, and the candidate node is used for synchronizing data for conducting transaction processing on the target node; the computing unit is used for computing a plurality of performance parameter values in the current performance parameter set to obtain priority evaluation values of the candidate nodes corresponding to the performance parameter values; A second determining unit that determines, as a target candidate node, the candidate node determined based on the priority evaluation value that satisfies a priority condition for indicating that the health condition of the candidate node satisfies a requirement of the transaction; and the switching unit establishes communication connection by taking the target candidate node as a new target node at the target moment when the target candidate node finishes the data recovery operation.
- 9. An electronic device, characterized in that, Comprising the following steps: A memory for storing a computer program; Processor for implementing the steps of the switching method of a faulty node according to any one of claims 1 to 7 when executing said computer program.
- 10. A computer-readable storage medium comprising, The computer readable storage medium has stored therein a computer program, wherein the computer program when executed by a processor realizes the steps of the switching method of a faulty node according to any one of claims 1 to 7.
Description
Switching method and device of fault node, storage medium and electronic equipment Technical Field The present application relates to the field of computer technologies, and in particular, to a method and an apparatus for switching a failure node, a storage medium, and an electronic device. Background The high availability of database systems is directly related to the continuity of the business and the security of the data. To ensure that it can operate stably in any situation, high availability solutions, such as master-slave replication and NFS-based shared storage, become particularly important. The schemes aim at avoiding service interruption caused by single node faults through a data redundancy and fault detection mechanism, and improving the stability and reliability of database service. However, the existing scheme generally depends on single heartbeat loss judgment, lacks multi-dimensional health examination, is prone to misjudgment, adopts a fixed priority strategy to elect a new master node, cannot be dynamically adjusted according to the real-time state of the node, can cause the node with insufficient performance to be selected as the new master, influences the subsequent operation of the system, and limits the improvement of the high availability of the database. The technical problem that the switching of the fault node is inaccurate in the prior art is solved. Disclosure of Invention The application provides a switching method and device of a fault node, a storage medium and electronic equipment, and aims to at least solve the technical problem that in the prior art, the switching of the fault node is inaccurate. The application provides a switching method of fault nodes, which comprises the steps of determining current performance parameter sets respectively matched with at least two candidate nodes in a candidate node set under the condition that the sending state of a heartbeat signal matched with the target node at a first moment is detected to meet abnormal state conditions, wherein the abnormal state conditions are used for indicating that the target node is in an abnormal working state, the current performance parameter sets are used for indicating the current health condition of the candidate node, the candidate node is used for synchronizing data of transaction processing on the target node, calculating a plurality of performance parameter values in the current performance parameter sets to obtain priority evaluation scores of the candidate node corresponding to the current performance parameter sets, determining the candidate node meeting the priority conditions determined based on the priority evaluation scores as the target candidate node, wherein the priority conditions are used for indicating that the health condition of the candidate node meets the requirement of the transaction processing, and establishing communication connection by taking the target candidate node as a new target node at the target moment that the target candidate node finishes the data recovery operation. The application further provides a switching device of the fault node, which comprises a first determining unit, a calculating unit and a second determining unit, wherein the first determining unit is used for determining a current performance parameter set respectively matched with at least two candidate nodes in a candidate node set under the condition that the sending state of a heartbeat signal matched with the target node at a first moment is detected to meet an abnormal state condition, the abnormal state condition is used for indicating that the target node is in an abnormal working state, the current performance parameter set is used for indicating the current health condition of the candidate node, the candidate node is used for synchronizing data of transaction processing on the target node, the calculating unit is used for calculating a plurality of performance parameter values in the current performance parameter set to obtain priority evaluation scores of the corresponding candidate nodes, the second determining unit is used for determining the candidate nodes meeting priority conditions determined based on the priority evaluation scores as the target candidate nodes, the priority conditions are used for indicating that the health conditions of the candidate nodes meet the requirement of transaction processing, and the switching unit is used for establishing communication connection with the target candidate nodes as new target nodes at the target moment when the target candidate nodes finish data recovery operation. The application also provides electronic equipment which comprises a memory and a processor, wherein the memory is used for storing a computer program, and the processor is used for realizing the steps of any fault node switching method when executing the computer program. The application also provides a computer readable storage medium, in which a computer program is stored, wh