US-12619497-B2 - Failover handling for pods executing an application in a high availability mode

US12619497B2US 12619497 B2US12619497 B2US 12619497B2US-12619497-B2

Abstract

The technology disclosed herein enables a service manager of a container orchestration platform to handle failovers of pods executing an application in a high availability mode. In a particular example, a method includes receiving pod information including unique application identifiers generated by the application and indications of which of the pods are active and standby. The method further includes configuring service objects provided by the container orchestration platform of the pods to each correspond to respective ones of the pods based on the unique application identifiers. The method also includes receiving updated pod information indicating a first pod of the pods, which was on standby, is now active having first application identifier of the unique application identifiers previously assigned to a second pod that failed. Additionally, the method includes reconfiguring a service object associated with the first application identifier to correspond to the first pod instead of the second pod.

Inventors

Rohit Juneja
Pardhiva Janardhana Krishna Munnaluru

Assignees

ORACLE INTERNATIONAL CORPORATION

Dates

Publication Date: 20260505
Application Date: 20230714

Claims (16)

1 . A method for a service manager to handle failover for pods executing an application in a high availability mode, the method comprising: receiving pod information about the pods, wherein the pod information includes unique pod names indicated in pod labels assigned to the pods, unique application identifiers generated by the application for the pods, and indications of which of the pods are active and which of the pods are on standby, and wherein the pods are orchestrated by a container orchestration platform; configuring service objects provided by the container orchestration platform to each correspond to respective ones of the pods based on the unique application identifiers by labelling the service objects with service labels that include the unique application identifiers of the respective ones of the pods, wherein the service objects expose corresponding ones of the pods to a network; receiving updated pod information indicating a first pod of the pods, which was previously on standby, is now active with a first application identifier of the unique application identifiers previously assigned to a second pod of the pods that failed; and reconfiguring a service object of the service objects associated with the first application identifier to select a first label of the pod labels assigned to the first pod instead of a second label of the pod labels assigned to the second pod.
2 . The method of claim 1 , comprising: receiving second updated pod information, wherein the second updated pod information indicates the second pod is now on standby with a second application identifier of the unique application identifiers previously assigned to the first pod; and reconfiguring a second service object of the service objects associated with the second application identifier to correspond to the second pod instead of the first pod.
3 . The method of claim 1 , wherein the pods transmit the pod information.
4 . The method of claim 1 , wherein a controller pod for the application manages the application and wherein the controller pod transmits the pod information after receiving the pod information from the pods.
5 . The method of claim 4 , wherein the controller pod generates the unique application identifiers on behalf of the application.
6 . The method of claim 1 , wherein the pod information includes a service name from the application and wherein configuring the service objects comprises: naming the service objects with names generated based on the pod information, wherein each of the names indicate the service name, an application identifier of a corresponding pod, and whether the corresponding pod is active or on standby.
7 . The method of claim 1 , wherein the service manager is implemented in another pod of the container orchestration platform.
8 . An apparatus to handle failover for pods executing an application in a high availability mode, the apparatus comprising: one or more computer readable storage media; a processing system operatively coupled with the one or more computer readable storage media; and program instructions stored on the one or more computer readable storage media that, when read and executed by the processing system, direct the apparatus to: receive pod information about the pods, wherein the pod information includes unique pod names indicated in pod labels assigned to the pods, unique application identifiers generated by the application for the pods and indications of which of the pods are active and which of the pods are on standby, and wherein the pods are orchestrated by a container orchestration platform; configure service objects provided by the container orchestration platform to each correspond to respective ones of the pods based on the unique application identifiers by labelling the service objects with service labels that include the unique application identifiers of the respective ones of the pods, wherein the service objects expose corresponding ones of the pods to a network; receive updated pod information indicating a first pod of the pods, which was previously on standby, is now active with a first application identifier of the unique application identifiers previously assigned to a second pod of the pods that failed; and reconfigure a service object of the service objects associated with the first application identifier to select a first label of the pod labels assigned to the first pod instead of a second label of the pod labels assigned to the second pod.
9 . The apparatus of claim 8 , wherein the program instructions direct the processing system to: receive second updated pod information, wherein the second updated pod information indicates the second pod is now on standby with a second application identifier of the unique application identifiers previously assigned to the first pod; and reconfigure a second service object of the service objects associated with the second application identifier to correspond to the second pod instead of the first pod.
10 . The apparatus of claim 8 , wherein the pods transmit the pod information.
11 . The apparatus of claim 8 , wherein a controller pod for the application manages the application and wherein the controller pod transmits the pod information after receiving the pod information from the pods.
12 . The apparatus of claim 11 , wherein the controller pod generates the unique application identifiers on behalf of the application.
13 . The apparatus of claim 8 , wherein the pod information includes a service name from the application and wherein to configure the service objects, the program instructions direct the processing system to: name the service objects with names generated based on the pod information, wherein each of the names indicate the service name, an application identifier of a corresponding pod, and whether the corresponding pod is active or on standby.
14 . The apparatus of claim 8 , wherein the program instructions are executed via another pod of the container orchestration platform executing on the apparatus.
15 . One or more computer-readable storage media having program instructions stored thereon for handling failover for pods executing an application in a high availability mode, the program instructions, when read and executed by a processing system, direct the processing system to: receive pod information about the pods, wherein the pod information includes unique pod names indicated in pod labels assigned to the pods, unique application identifiers generated by the application for the pods and indications of which of the pods are active and which of the pods are on standby, and wherein the pods are orchestrated by a container orchestration platform; configure service objects provided by the container orchestration platform to each correspond to respective ones of the pods based on the unique application identifiers by labelling the service objects with service labels that include the unique application identifiers of the respective ones of the pods, wherein the service objects expose corresponding ones of the pods to a network; receive updated pod information indicating a first pod of the pods, which was previously on standby, is now active with a first application identifier of the unique application identifiers previously assigned to a second pod of the pods that failed; and reconfigure a service object of the service objects associated with the first application identifier to select a first label of the pod labels assigned to the first pod instead of a second label of the pod labels assigned to the second pod.
16 . The one or more computer-readable storage media of claim 15 , wherein the program instructions direct the processing system to: receive second updated pod information, wherein the second updated pod information indicates the second pod is now on standby with a second application identifier of the unique application identifiers previously assigned to the first pod; and reconfigure a second service object of the service objects associated with the second application identifier to correspond to the second pod instead of the first pod.

Description

BACKGROUND Container orchestration platforms, such as Kubernetes®, manage deployment of containerized applications in what are commonly referred to as pods. Each pod may include one or more containers for executing application processes therein. Kubernetes is at least one platform that does not natively include features that support high availability for an application executing in pods. For instance, an application that supports high availability may designate a set of active pods and a set of standby pods. If one of the active pods fails, then the application will designate one of the standby pods to take over for the failed pod. However, since Kubernetes does not support the application's failover capabilities, a service object that avails one or more of the pods to a network will still point to the failed pod once that pod is respawned within the standby set as directed by the application. The service will not point to the pod that the application activated to take over for the failed pod. SUMMARY The technology disclosed herein enables a service manager of a container orchestration platform to handle failovers of pods executing an application in a high availability mode. In a particular example, a method includes receiving pod information about the pods. The pod information includes unique application identifiers generated by the application for the pods and indications of which of the pods are active and which of the pods are on standby. The pods are orchestrated by the container orchestration platform. The method further includes configuring service objects provided by the container orchestration platform to each correspond to respective ones of the pods based on the unique application identifiers. The service objects expose corresponding ones of the pods to a network. The method also includes receiving updated pod information indicating a first pod of the pods, which was previously on standby, is now active with a first application identifier of the unique application identifiers previously assigned to a second pod of the pods that failed. Additionally, the method includes reconfiguring a service object of the service objects associated with the first application identifier to correspond to the first pod instead of the second pod. In other examples, an apparatus performs the above-recited method and program instructions stored on computer readable storage media direct a processing system to perform the above-recited method. BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 illustrates an implementation for handling failover of pods executing an application in a high availability mode. FIG. 2 illustrates an operation to handle failover of pods executing an application in a high availability mode. FIG. 3A illustrates an operational scenario for handling failover of pods executing an application in a high availability mode. FIG. 3B illustrates an operational scenario for handling failover of pods executing an application in a high availability mode. FIG. 3C illustrates an operational scenario for handling failover of pods executing an application in a high availability mode. FIG. 4 illustrates a computing system for handling failover of pods executing an application in a high availability mode. DETAILED DESCRIPTION Container orchestration platforms that organize containers into pods may use service objects (often simply referred to as services) to avail those pods to a communication network. When data on a network is intended for an application executing in a pod, a service object implemented by the container orchestration platform and corresponding to the pod directs the data to the pod. Kubernetes is an example container orchestration platform that uses service objects but other container orchestration platforms may use service objects, or something analogous thereto, in a similar manner. Traditionally, when a pod fails, a service object will continue to associate with that pod. As such, communications with the pod and processing by the pod will stop until the pod has recovered (e.g., is respawned by the container orchestration platform). Even if the pod takes a relatively small amount of time to recover, the time may still be enough to have adverse consequences for functionality being provided by an application executing in the pod. For example, if the application provides transcoding functionality for real-time user communication sessions, then failure of the pod may cause an undesirable gap in communications or may cause the sessions to drop altogether. An application may be configured to operate in a high availability mode across the pods to avoid issues caused by a pod's failure. When an active pod fails, another pod waiting on standby can be activated by the application to take the place of the failed pod. However, the service object for the failed pod will not automatically switch its association from the failed pod to the newly activated pod, which prevents the application in the newly activated pod f