EP-4250094-B1 - HITLESS UPGRADE OF A NETWORK DEVICE
Inventors
- MAHISHI, Shrish
- RAJAN, Ramesh
- PAUL, VIJAY
- MAHAJAN, SANJEEV ANANDRAO
- JAIN, Atit
- SRINIVASAN, PRAMOD
Dates
- Publication Date
- 20260513
- Application Date
- 20220505
Claims (13)
- A network device (210), comprising: one or more memories; and one or more processors to: obtain a data package associated with an in-service software upgrade, ISSU, procedure; determine, based on the data package, that a control plane of the network device is to be rebooted to facilitate performance of the ISSU procedure; cause, based on determining that the control plane is to be rebooted, a plurality of applications of the network device to stop executing on the network device; cause, based on causing the plurality of applications to stop executing, a control plane state of the network device to be frozen, wherein causing the control plane state of the network device to be frozen comprises identifying data objects associated with the control plane state that are stored in one or more distributed data structures, and causing the data objects to be stored in a non-distributed data structure; cause, based on causing the control plane state of the network device to be frozen, the ISSU procedure to be performed; cause, based on causing the ISSU procedure to be performed, the control plane state of the network device to be restored, wherein causing the control plane state of the network device to be restored comprises identifying data objects associated with the control plane state that are stored in a non-distributed data structure, and causing the data objects to be stored in one or more distributed data structures; and cause, based on causing the control plane state of the network device to be restored, the plurality of applications to resume executing on the network device.
- The network device of claim 1, wherein the one or more processors, to obtain the data package, are to: receive, from a client device, a command indicating that the network device is to be updated via performance of the ISSU procedure; send, to a server device and based on the command, a request for the data package; and receive, based on sending the request, the data package.
- The network device of claim 1 or claim 2, wherein the one or more processors, to determine that the control plane of the network device is to be rebooted to facilitate performance of the ISSU procedure, are to: process the data package to identify a first set of one or more applications, of the plurality of applications, that are to be updated as a result of performance of the ISSU procedure; determine, based on the first set of one or more applications of the network device, a second set of one or more applications of the plurality of applications that are to be impacted by performance of the ISSU procedure; and determine, based on the second set of one or more applications, that the control plane of the network device is to be rebooted to facilitate performance of the ISSU procedure.
- The network device of any preceding claim, wherein the plurality of applications includes a first set of one or more applications and a second set of one or more applications, wherein the one or more processors, to cause the plurality of applications to stop executing on the network device, are to: cause the first set of one or more applications to stop executing; and cause, after the first set of one or more applications have stopped executing, the second set of one or more applications to stop executing, wherein an updating of the control plane state is to cease after the first set of one or more applications have stopped executing.
- The network device of claim 4, wherein the one or more processors, to cause the plurality of applications to resume executing on the network device, are to: cause the second set of one or more applications to resume executing; and cause, after the second set of one or more applications have resumed executing, the first set of one or more applications to resume executing, wherein the updating of the control plane state is to resume after the first set of one or more applications have resumed executing.
- The network device of any preceding claim, wherein the one or more processors, to cause the ISSU procedure to be performed, are to: cause, based on the data package, a control plane operating system and kernel of the network device to update the control plane operating system and kernel.
- The network device of any preceding claim, wherein a data plane of the network device continues to operate during a period of time that begins when the network device causes the plurality of applications to stop executing and that ends when the network device causes the plurality of applications to resume executing.
- A computer-readable medium comprising a set of instructions, the set of instructions comprising: one or more instructions that, when executed by one or more processors of a network device (210), cause the network device to: obtain a data package associated with an in-service software upgrade, ISSU, procedure; determine, based on the data package, that a control plane of the network device is to be rebooted to facilitate performance of the ISSU procedure; cause, based on determining that the control plane is to be rebooted, a plurality of applications of the network device to stop executing on the network device; cause, based on causing the plurality of applications to stop executing, a control plane state of the network device to be frozen, wherein causing the control plane state of the network device to be frozen comprises identifying data objects associated with the control plane state that are stored in one or more distributed data structures, and causing the data objects to be stored in a non-distributed data structure; cause, based on causing the control plane state of the network device to be frozen, the ISSU procedure to be performed; cause, based on causing the ISSU procedure to be performed, the control plane state of the network device to be restored, wherein causing the control plane state of the network device to be restored comprises identifying data objects associated with the control plane state that are stored in a non-distributed data structure, and causing the data objects to be stored in one or more distributed data structures; and cause, based on causing the control plane state of the network device to be restored, the plurality of applications to resume executing on the network device.
- The computer-readable medium of claim 8, wherein the one or more instructions, that cause the network device to determine that the control plane of the network device is to be rebooted as part of performance of the ISSU procedure, cause the network device to: determine that a control plane operating system and kernel of the network device is to be updated as a result of performance of the ISSU procedure; and determine, based on determining that the control plane operating system and kernel of the network device is to be updated, that the control plane of the network device is to be rebooted as part of performance of the ISSU procedure.
- The computer-readable medium of claim 8 or claim 9, wherein the one or more instructions, that cause the network device to cause the ISSU procedure to be performed, cause the network device to: send, to another network device that is connected to the network device, one or more messages indicating that the network device is to be unavailable for a particular period of time.
- The computer-readable medium of any of claims 8 to 10, wherein the one or more instructions, when executed by the one or more processors, further cause the network device to: cause, based on causing the plurality of applications to resume executing on the network device, a hardware state of the network device to be updated.
- The computer-readable medium of any of claims 8 to 11, wherein the one or more instructions, when executed by the one or more processors, further cause the network device to: determine that the network device is to not be rebooted as part of performance of another ISSU procedure at the network device; and cause, based on determining that the network device is to not be rebooted, the other ISSU procedure to be performed at the network device, wherein causing the other ISSU procedure to be performed does not cause the plurality of applications to stop executing on the network device and does not cause the control plane state of the network device to be frozen.
- A method performed by a network device (210), the method comprising: obtaining a data package associated with an in-service software upgrade, ISSU, procedure; determining, based on the data package, that a control plane of the network device is to be rebooted to facilitate performance of the ISSU procedure; causing, based on determining that the control plane is to be rebooted, a plurality of applications of the network device to stop executing on the network device; causing, based on causing the plurality of applications to stop executing, a control plane state of the network device to be frozen, wherein causing the control plane state of the network device to be frozen comprises identifying data objects associated with the control plane state that are stored in one or more distributed data structures, and causing the data objects to be stored in a non-distributed data structure; causing, based on causing the control plane state of the network device to be frozen, the ISSU procedure to be performed; causing, based on causing the ISSU procedure to be performed, the control plane state of the network device to be restored, wherein causing the control plane state of the network device to be restored comprises identifying data objects associated with the control plane state that are stored in a non-distributed data structure, and causing the data objects to be stored in one or more distributed data structures; and causing, based on causing the control plane state of the network device to be restored, the plurality of applications to resume executing on the network device.
Description
BACKGROUND An in-service software upgrade (ISSU) procedure is a technique for updating software on a network device without taking the network device offline. In this way, the network device can be updated while minimizing disruption to traffic and control plane services provided by the network device. In this context, US 2016/313986 A1 discloses the live-update of a virtual machine monitor executing virtual machines in a control plane. SUMMARY The invention is set out in the appended set of claims. BRIEF DESCRIPTION OF THE DRAWINGS Figs. 1A-1G are diagrams of an example implementation described herein.Fig. 2 is a diagram of an example environment in which systems and/or methods described herein may be implemented.Figs. 3-4 are diagrams of example components of one or more devices of Fig. 2.Fig. 5 is a flowchart of example processes relating to hitless upgrade of a network device. DETAILED DESCRIPTION The following detailed description of example implementations refers to the accompanying drawings. The same reference numbers in different drawings may identify the same or similar elements. A network device includes a control plane (e.g., to manage operation of the network device) and a data plane (e.g., to route packets, or other data, to and from the network device). The network device may execute a plurality of applications to enable one or more functionalities of the control plane, such as one or more applications to generate and maintain routes (e.g., to and from other network devices), one or more applications to manage the network device (e.g., related to power management, thermal management, hardware management, and/or other management of the network device), and/or one or more applications to manage packet forwarding by the network device, among other examples. Typically, a network device includes multiple routing engine modules, such as a primary routing engine module and a backup routing engine module. When the network device is to be upgraded, the network device may perform an in-service software upgrade (ISSU) procedure to update the network device. Accordingly, the routing engine modules are updated one at a time, with one routing engine module in operation while the other routing engine module is updated. In this way, the control plane of the network device can be updated without taking the network device offline. However, in many cases, a network device does not have multiple routing engine modules, so performing an ISSU procedure causes the network device to be taken offline. This impacts an operational performance of the network device and causes networking issues (e.g., misrouting issues, blackholing issues, or other issues) associated with the impacted operational performance of the network device. Some implementations described herein are directed to providing a "hitless upgrade" of a network device (e.g., the network device is able to continue routing packets via a data plane of the network device while a control plane of the network device undergoes an upgrade). The network device obtains a data package associated with an ISSU procedure (e.g., to update the control plane of the network device) and determines whether the control plane of the network device is to be rebooted to facilitate performance of the ISSU procedure. For example, the network device determines that the control plane is to be rebooted when an application, of a plurality of applications, of the network device (e.g., that is able to modify the control plane state of the network device) is impacted by performance of the ISSU procedure and/or that a control plane operating system and kernel of the network device is to be updated by performance of the ISSU procedure. Accordingly, the network device causes the plurality of applications to stop executing on the network device and causes the control plane state of the network device to be frozen (e.g., by saving data objects associated with execution of the plurality of applications in a data structure). The network device then causes the ISSU procedure to be performed (e.g., while the plurality of applications are stopped and while the control plane state is frozen). After performance of the ISSU procedure, the network device causes the control plane state to be restored (e.g., by restoring the data objects to distributed data structures of the network device) and cause the plurality of applications to resume executing. In this way, the network device continues to route traffic (e.g.., via the data plane of the network device) while the network device performs processing steps associated with performing the ISSU procedure. Notably, by stopping execution of the plurality of applications and freezing the control plane state, the network device is able to quickly recover a control plane functionality (after performance of the ISSU procedure) that is similar to a control plane functionality of the network device prior to performance of the ISSU procedure. In this way, the contro