US-12621203-B2 - Network address assignment failover
Abstract
In some examples, a first controller of a cluster of controllers provides a first controller network address assignment service for a first collection of client devices. The first controller detects a communication interruption with a second controller of the cluster of controllers, and determines whether a second controller network address assignment service of the second controller is unavailable, based on a determination of whether a network address assignment was performed within a specified recent time interval at the second controller. Based on determining that the second controller network address assignment service of the second controller is unavailable, the first controller transitions to a partner down state as part of a network address assignment failover in which the first controller provides the first controller network address assignment service for the first collection of client devices and for a second collection of client devices associated with the second controller.
Inventors
- Sanjay Kaniyoor Surendra Hegde
- ISAAC THEOGARAJ
- Ashutosh Nasikkar
Assignees
- HEWLETT PACKARD ENTERPRISE DEVELOPMENT LP
Dates
- Publication Date
- 20260505
- Application Date
- 20240626
- Priority Date
- 20240222
Claims (20)
- 1 . A first controller of a cluster of controllers, the first controller comprising: a hardware processor; and a non-transitory storage medium storing instructions executable on the hardware processor to: provide a first controller network address assignment service for client devices of a first collection of client devices; detect a communication interruption with a second controller of the cluster of controllers in which communication over a first communication path between the first controller and the second controller is interrupted; determine whether a second controller network address assignment service of the second controller is unavailable, based on a determination of whether a network address assignment was performed within a specified recent time interval at the second controller; and based on determining that the second controller network address assignment service of the second controller is unavailable, transition the first controller to a partner down state as part of a network address assignment failover in which the first controller provides the first controller network address assignment service for the client devices of the first collection of client devices and for client devices of a second collection of client devices associated with the second controller.
- 2 . The first controller of claim 1 , wherein the first controller network address assignment service and the second controller network address assignment service are Dynamic Host Configuration Protocol (DHCP) services.
- 3 . The first controller of claim 1 , wherein the determination of whether the network address assignment was performed within the specified recent time interval is based on a check of a network address database.
- 4 . The first controller of claim 3 , wherein the network address database comprises a first network address database used by the second controller, and the instructions are executable on the hardware processor to: use a second network address database at the first controller, the second network address database being synchronized to the first network address database.
- 5 . The first controller of claim 4 , wherein the first network address database and the second network address database become unsynchronized responsive to the first controller providing the first controller network address assignment service for the client devices of the second collection of client devices, and wherein the instructions are executable on the hardware processor to: detect that a communication is re-established between the first controller and the second controller; and based on detecting that the communication is re-established between the first controller and the second controller, initiate synchronization of the first network address database and the second network address database.
- 6 . The first controller of claim 4 , wherein the first network address database and the second network address database are Internet Protocol (IP) address lease databases.
- 7 . The first controller of claim 1 , wherein the first controller is to perform the first controller network address assignment service for the first collection of client devices that are associated with a first subset of Media Access Control (MAC) addresses, and the second controller is to perform the second controller network address assignment service for the second collection of client devices that are associated with a second subset of MAC addresses.
- 8 . The first controller of claim 1 , wherein the instructions are executable on the hardware processor to: detect the communication interruption with the second controller based on detecting a failure to receive a first heartbeat indication over the first communication path.
- 9 . The first controller of claim 8 , wherein the failure to receive the first heartbeat indication over the first communication path comprises a failure to receive the first heartbeat indication in an Internet Protocol Security (IPsec) tunnel of the first communication path.
- 10 . The first controller of claim 9 , wherein the instructions are executable on the hardware processor to: detect the communication interruption with the second controller further based on detecting a failure to receive a second heartbeat indication over a second communication path that is different from the first communication path.
- 11 . The first controller of claim 1 , wherein the instructions are executable on the hardware processor to: responsive to the detecting of the communication interruption with the second controller, cause sending, from the first controller to the second controller, a health query over a second communication path between the first controller and the second controller, wherein the second communication path is different from the first communication path; and receive, at the first controller over the second communication path in response to the health query, an indication that the second controller network address assignment service of the second controller is unavailable, wherein the determining that the second controller network address assignment service of the second controller is unavailable is based on receiving the indication, and wherein the indication is based on the second controller determining that the network address assignment was not performed within the specified recent time interval.
- 12 . The first controller of claim 11 , wherein the determining of whether the network address assignment was performed within the specified recent time interval is relative to a timestamp of the health query.
- 13 . The first controller of claim 1 , wherein the instructions are executable on the hardware processor to: based on determining that the second controller network address assignment service of the second controller is unavailable, cause sending, from the first controller, a shutdown indication to the second controller to request that the second controller enter a shutdown state.
- 14 . The first controller of claim 1 , wherein the instructions are executable on the hardware processor to: detect a further communication interruption with the second controller; responsive to the detecting of the further communication interruption with the second controller, cause sending, from the first controller, a health query over a second communication path to the second controller; and based on detecting that the second controller has not responded to the health query after a threshold number of retries, determine that the second controller network address assignment service of the second controller is unavailable.
- 15 . The first controller of claim 1 , wherein the determination of whether the network address assignment is performed within the specified recent time interval is performed by a management system separate from the first controller and the second controller, and wherein instructions are executable on the hardware processor to: receive, at the first controller, an indication that the management system has caused the second controller to shut down based on the management system determining that the second controller has not performed the network address assignment within the specified recent time interval.
- 16 . The first controller of claim 1 , wherein the instructions are executable on the hardware processor to: detect, at the first controller, greater than a threshold number of client requests for network address assignments from a same client device or collection of client devices within a specified time interval, wherein the client requests from the same client device or collection of client devices should have been serviced by the second controller; and based on the detection of greater than the threshold number of client requests from the same client device or collection of client devices should have been serviced by the second controller, indicate a communication interruption between the first controller and the second controller.
- 17 . A system comprising: a controller cluster comprising a first controller to provide a first controller network address assignment service and a second controller to provide a second controller network address assignment service, wherein the first controller has an inter-controller communication path to the second controller, the first controller to: detect a communication interruption over the inter-controller communication path with the second controller; determine whether the second controller network address assignment service of the second controller is unavailable, based on a determination of whether a network address assignment was performed within a specified recent time interval at the second controller; and based on determining that the second controller network address assignment service of the second controller is unavailable, transition the first controller to a partner down state as part of a network address assignment failover in which the first controller provides the first controller network address assignment service for client devices of a first collection of client devices associated with the first controller and for client devices of a second collection of client devices associated with the second controller.
- 18 . The system of claim 17 , wherein the first controller is to: responsive to transitioning to the partner down state, send, from the first controller, a partner down indication to the second controller, wherein the second controller is to: responsive to the partner down indication from the first controller, store an indication that the second controller when transitioning from an unavailable state is to enter a recovery state; and in the recovery state, synchronize a database of assigned network addresses associated with the second controller with a database of assigned network addresses associated with the first controller.
- 19 . A method comprising: providing, by a first controller of a controller cluster, a first Dynamic Host Configuration Protocol (DHCP) service to a first collection of client devices having properties that map to the first controller, wherein the first DHCP service uses a first lease database; providing, by a second controller of the controller cluster, a second DHCP service to a second collection of client devices having properties that map to the second controller, wherein the first DHCP service uses a second lease database; synchronizing the first lease database and the second lease database over an inter-controller communication path between the first and second controllers; detecting, by the first controller, a communication interruption over the inter-controller communication path with the second controller; determining, by the first controller, whether the second DHCP service of the second controller is unavailable, based on a determination of whether a DHCP lease activity occurred within a specified recent time interval at the second controller; and based on determining that the second DHCP service of the second controller is unavailable, transitioning the first controller to a partner down state as part of a DHCP service failover in which the first controller provides the first DHCP service for the first collection of client devices and for the second collection of client devices.
- 20 . The method of claim 19 , wherein the determination of whether the DHCP lease activity occurred within the specified recent time interval at the second controller is based on information communicated over a second communication path through a computing environment comprising a service to be accessed by client devices of the first collection of client devices and the second collection of client devices.
Description
BACKGROUND Computing environments such as data centers, cloud environments, or other types of computing environments can provide services for client devices. The client devices are able to access the computing environments over a network. BRIEF DESCRIPTION OF THE DRAWINGS Some implementations of the present disclosure are described with respect to the following figures. FIG. 1 is a block diagram of a network arrangement that includes a controller cluster, a computing environment, a management system, and client devices, according to some examples. FIG. 2A and FIG. 2B depict a flow diagram of a process for handling a communication interruption between the controllers of a controller cluster, in accordance with some examples. FIG. 3 is a flow diagram of a process for handling a communication interruption between the controllers of a controller cluster, in accordance with further examples. FIG. 4 is a block diagram of a controller according to some examples. FIG. 5 is a block diagram of a system according to some examples. FIG. 6 is a flow diagram of a process according to some examples. Throughout the drawings, identical reference numbers designate similar, but not necessarily identical, elements. The figures are not necessarily to scale, and the size of some parts may be exaggerated to more clearly illustrate the example shown. Moreover, the drawings provide examples and/or implementations consistent with the description; however, the description is not limited to the examples and/or implementations provided in the drawings. DETAILED DESCRIPTION Client devices can communicate data with a computing environment through network devices of a network arrangement. Examples of network devices include switches, wireless access points, gateways, concentrators, or other network devices. In some examples, a network arrangement can include a controller that has a Dynamic Host Configuration Protocol (DHCP) server for assigning Internet Protocol (IP) addresses to client devices. For example, the DHCP server may receive a DHCP message from a client device, where the DHCP message contains a Media Access Control (MAC) address of the client device. Based on the MAC address, the DHCP server sends, to the client device, a response DHCP message containing the IP address correlated to the MAC address. Multiple controllers with respective DHCP servers may be included in a controller cluster for redundancy to protect against faults. For example, a two-node controller cluster may include two controllers with their respective DHCP servers. The DHCP servers of the controller cluster can includes a primary DHCP server and a secondary DHCP server. Both the primary and secondary DHCP servers can actively provide DHCP service, with the primary DHCP server serving a first collection of client devices, and the secondary DHCP server serving a second collection of client devices. The multiple DHCP servers in the controller cluster form a DHCP-High Availability (DHCP-HA) arrangement. If a first DHCP server (serving the first collection of client devices) of the DHCP-HA arrangement were to fail, then a second DHCP server of the DHCP-HA arrangement can take over providing DHCP service to client devices of both the first collection and the second collection. In some cases, communications between the primary and secondary DHCP servers may be interrupted. The communication interruption between the primary and secondary DHCP servers may be due to a fault of a communication path between the primary and secondary DHCP servers. However, even though communications between the primary and secondary DHCP servers are interrupted, the primary and secondary DHCP servers may continue to be active (i.e., the primary and secondary DHCP servers are able to continue providing DHCP service). In other cases, it is also possible that the communication interruption is caused by a fault of one of the primary and secondary DHCP servers; in this latter case, the faulty DHCP server may no longer be able to provide DHCP service. In response to a detection of a communication interruption between the primary and secondary DHCP servers, a first DHCP server (referred to as “DHCP server A”) in the DHCP-HA arrangement may not be able to make a determination of whether the communication interruption is caused by a communication path fault or a fault of the partner DCHP server (referred to as “DHCP server B”) in the DHCP-HA arrangement. In some cases, DHCP server A may simply continue to serve its collection of client devices (“collection A”) and not serve the other collection of client devices (“collection B”) associated with DHCP server B even though the communication interruption occurred. In such cases, if the communication interruption is caused by a faulty DHCP server B that is no longer able to provide DHCP service, then the client devices of collection B associated with DHCP server B would not be able to obtain IP address assignments (referred to as “leases”) from DHCP se