CN-121984843-A - Master-slave equipment fault switching method and system
Abstract
The application discloses a method and a system for switching faults of main equipment and standby equipment, wherein the method comprises the steps of detecting performance state parameters of the main equipment and network state parameters between the main equipment and the standby equipment, determining a current heartbeat frequency strategy according to the network state parameters and the performance state parameters, collecting corresponding running state data in the main equipment based on preset detection dimensions when judging that the main equipment has fault risks according to the current heartbeat frequency strategy, performing cross verification according to the running state data, and executing switching operation to switch service flow from the main equipment to the standby equipment if the main equipment is determined to be faulty according to a cross verification result. Through multi-dimensional accurate verification, invalid switching or frequent switching caused by misjudgment is avoided, and reliability and necessity of switching operation are guaranteed, so that the overall efficiency and accuracy of main and standby equipment fault switching are improved.
Inventors
- FENG WEI
- YUAN FENG
Assignees
- 深圳市丰润达科技有限公司
Dates
- Publication Date
- 20260505
- Application Date
- 20260407
Claims (10)
- 1. A primary and backup device failover method, the method comprising: detecting a performance state parameter of a main device and a network state parameter between the main device and a standby device; determining a current heartbeat frequency strategy according to the network state parameter and the performance state parameter; When judging that the main equipment has fault risk according to the current heartbeat frequency strategy, acquiring corresponding running state data in the main equipment based on a preset detection dimension, wherein the preset detection dimension comprises at least two dimensions of a network dimension, a hardware dimension and an application dimension; Performing cross verification according to the running state data to obtain a cross verification result; and if the main equipment is determined to be faulty according to the cross-validation result, executing switching operation to switch the service flow from the main equipment to the standby equipment.
- 2. The method of claim 1, wherein the step of determining a current heartbeat frequency policy based on the network state parameter and the performance state parameter comprises: determining target heartbeat frequencies corresponding to the threshold intervals based on the network state parameters and the threshold intervals in which the performance state parameters are located, wherein each target heartbeat frequency is preset with corresponding threshold interval triggering conditions and interval response requirements; And adjusting the current sending frequency of the heartbeat message to the target heartbeat frequency, and determining a current heartbeat frequency strategy according to the target heartbeat frequency, wherein the current heartbeat frequency strategy comprises a strategy for sending the heartbeat message at the target heartbeat frequency.
- 3. The method of claim 2, wherein the step of adjusting the current transmission frequency of the heartbeat message to the target heartbeat frequency comprises: Comparing the current transmission frequency with the target heartbeat frequency; and when the current sending frequency is inconsistent with the target heartbeat frequency, adjusting the current sending frequency to the target heartbeat frequency in an intermediate frequency transition mode.
- 4. The method of claim 2, wherein the current heartbeat policy further comprises a policy that detects whether a heartbeat response of the master device meets the interval response requirement; The step of collecting corresponding running state data in the main device based on a preset detection dimension when the main device is judged to have fault risk according to the current heartbeat frequency strategy comprises the following steps: Acquiring a current heartbeat response of the main equipment under the target heartbeat frequency, and judging whether the current heartbeat response meets the interval response requirement; and if not, acquiring the running state data of the main equipment from at least two dimensions of the network dimension, the hardware dimension and the application dimension.
- 5. The method of claim 4, wherein the step of collecting operational status data of the master device comprises: Performing connectivity detection, port availability detection and link quality detection on the master device to obtain network layer state data as running state data corresponding to the network dimension; Acquiring hardware performance data of the main equipment, and obtaining hardware performance state data as running state data corresponding to the hardware dimension; Or, performing a simulated service request on the main equipment and analyzing the corresponding service flow characteristics to obtain application layer running state data serving as the running state data corresponding to the application dimension.
- 6. The method of claim 1, wherein the step of cross-validating the operational status data to obtain a cross-validation result comprises: Judging whether the running state data corresponding to each dimension has data abnormal conditions or not; when data abnormality exists in the running state data corresponding to at least two dimensions, judging that the main equipment fails; When the data abnormality exists in the running state data corresponding to one dimension, judging that the main equipment has abnormal fluctuation; And when no data abnormality exists in the running state data corresponding to each dimension, judging that the main equipment does not have faults.
- 7. The method of claim 1, wherein prior to the step of performing a handoff operation to handoff traffic from the primary device to the backup device, comprising: Synchronizing basic configuration information and service state information of the main equipment to the standby equipment when the main equipment normally operates; The basic configuration information is full information synchronized to the standby equipment when the main equipment is initialized, and the service state information is incremental information synchronized to the standby equipment according to the current heartbeat frequency strategy.
- 8. The method of claim 7, wherein the step of performing a handoff operation to handoff traffic from the primary device to the backup device comprises: Sending an activation instruction to the standby equipment so as to enable the standby equipment to load the basic configuration information and the service state data; Updating a local routing table of the standby equipment to point the next hop corresponding to the virtual IP address to the standby equipment; Traffic destined for the primary device is directed to the backup device based on the traffic smoothing transition policy.
- 9. The method of claim 8, wherein prior to the step of sending an activation instruction to the standby device, further comprising: sending a status confirmation request to the master device; If the response of the main equipment is not received within the preset timeout time, confirming that the main equipment fails; A lock mechanism is activated to prohibit the master device and the standby device from changing state during a handoff.
- 10. A primary and backup device failover system, the system comprising: the heartbeat detection module is used for detecting the performance state parameters of the main equipment and the network state parameters between the main equipment and the standby equipment; The state monitoring module is used for acquiring corresponding running state data in the main equipment based on a preset detection dimension when judging that the main equipment has fault risk according to the current heartbeat frequency strategy, wherein the preset detection dimension comprises at least two dimensions of a network dimension, a hardware dimension and an application dimension; And the fault processing module is used for executing switching operation to switch the service flow from the main equipment to the standby equipment if the main equipment is determined to be faulty according to the cross-validation result.
Description
Master-slave equipment fault switching method and system Technical Field The present application relates to the field of failover technologies, and in particular, to a primary and backup device failover method and system. Background With the rapid development of network services, especially the increasing of application scenarios with higher real-time requirements, the high availability of network devices has become a key foundation for ensuring continuous operation of services. In various network devices, a main-standby redundancy architecture is a common technical scheme for improving the reliability of a system, and by deploying main devices and standby devices, when the main devices are in failure, the standby devices take over services, so that the influence of the equipment failure on the services is reduced as much as possible. At present, a Virtual Routing Redundancy Protocol (VRRP) and related technical schemes thereof are commonly adopted for fault detection and switching between main equipment and standby equipment. The technology mainly detects the survival state of the opposite party by periodically exchanging heartbeat messages between the main equipment and the standby equipment, judges that the main equipment fails when the standby equipment does not receive the heartbeat response of the main equipment within a set time, then triggers a switching flow, and the standby equipment takes over network resources such as virtual IP and forwards service traffic. However, because the mechanism adopts a fixed time interval to send heartbeat packets, faults of the master equipment cannot be found timely, and the fault detection means between the master equipment and the slave equipment is single and mainly depends on simple connectivity detection, the faults are difficult to identify timely when the master equipment has abnormal application layer or service processing capacity reduction and the like, and the timeliness of the fault switching is further affected. Disclosure of Invention The application mainly aims to provide a main and standby equipment fault switching method and system, and aims to solve the technical problem that the existing main and standby equipment fault switching mode is insufficient in timeliness. In order to achieve the above objective, the present application provides a method for switching between a primary device and a backup device, where the method for switching between a primary device and a backup device includes: detecting a performance state parameter of a main device and a network state parameter between the main device and a standby device; determining a current heartbeat frequency strategy according to the network state parameter and the performance state parameter; When judging that the main equipment has fault risk according to the current heartbeat frequency strategy, acquiring corresponding running state data in the main equipment based on a preset detection dimension, wherein the preset detection dimension comprises at least two dimensions of a network dimension, a hardware dimension and an application dimension; Performing cross verification according to the running state data to obtain a cross verification result; and if the main equipment is determined to be faulty according to the cross-validation result, executing switching operation to switch the service flow from the main equipment to the standby equipment. The step of determining a current heartbeat frequency policy according to the network state parameter and the performance state parameter includes: determining target heartbeat frequencies corresponding to the threshold intervals based on the network state parameters and the threshold intervals in which the performance state parameters are located, wherein each target heartbeat frequency is preset with corresponding threshold interval triggering conditions and interval response requirements; And adjusting the current sending frequency of the heartbeat message to the target heartbeat frequency, and determining a current heartbeat frequency strategy according to the target heartbeat frequency, wherein the current heartbeat frequency strategy comprises a strategy for sending the heartbeat message at the target heartbeat frequency. In an embodiment, the step of adjusting the current sending frequency of the heartbeat message to the target heartbeat frequency includes: Comparing the current transmission frequency with the target heartbeat frequency; and when the current sending frequency is inconsistent with the target heartbeat frequency, adjusting the current sending frequency to the target heartbeat frequency in an intermediate frequency transition mode. In an embodiment, the current heartbeat policy further includes a policy that detects whether a heartbeat response of the master device meets the interval response requirement; The step of collecting corresponding running state data in the main device based on a preset detection dimension when the main