EP-4740404-A1 - METHOD AND SYSTEM FOR MANAGING NETWORK CONNECTIVITY

EP4740404A1EP 4740404 A1EP4740404 A1EP 4740404A1EP-4740404-A1

Abstract

The present disclosure relates to a system (104) and a method (400) for managing network connectivity during a process failure in a communication network (106) The system (104) includes a transceiver (208) configured to receive a failure trigger from a first node (102). The system (104) further includes a network manager (210) configured to disconnect a connection between the first node (102) and the system (104) on receiving the failure trigger in real time. The framework unit (202) is further configured to initiate one or more failover procedures, and transfer tasks to be performed by the first node (102) to a second node (108), and thereby manage network connectivity.

Inventors

BHATNAGAR, AAYUSH
BISHT, BIRENDRA
SINGH, HARBINDER
Soren, Rohit
Aggarwal, Pravesh
Sahu, Bidhu
SINGH, PRIYANKA
JASWANI, Tikam
Swami, Mukul

Assignees

Jio Platforms Limited

Dates

Publication Date: 20260513
Application Date: 20240627

Claims (17)

1. A method (400) of managing network connectivity during failure, the method comprising the steps of: receiving, (402) by one or more processors (204), a failure trigger from a first node (102); disconnecting, (404) by the one or more processors (204), a connection between the first node (102) and the one or more processors (204) on receiving the failure trigger in real time; initiating, (406) by the one or more processors (204), one or more failover procedures; transferring, (408) by the one or more processors (204), tasks to be performed by the first node (102) to a second node (108).
2. The method (400) as claimed in claim 1 , wherein the failure trigger is one of a crash signal, a resource exhaust signal, a deadlock, and a closure notification pertaining to closing of the tasks at the first node (102).
3. The method (400) as claimed in claim 1, wherein upon disconnection, the one or more processors (204) is informed of the failure in real time owing to the disconnection.
4. The method (400) as claimed in claim 1, wherein the first node (102) is an active node, and the second node (108) is a standby node.
5. The method (400) as claimed in claim 1, wherein the step of, initiating one or more failover procedures, includes the step of: invoking, by the one or more processors (204), one or more Application Programming Interface (API) functions.
6. The method (400) as claimed in claim 5, wherein upon invoking the one or more API functions, the one or more processors (204) is configured to initiate a kernel cleanup.
7. The method (400) as claimed in claim 5, wherein the second node (108) is selected by the one or more processors (204) simultaneously when the one or more API functions are invoked, thereby transitioning from the first node (102) to the second node (108).
8. The method (400) as claimed in claim 1, wherein the one or more processors (204), by transferring, tasks to be performed by the first node (102) to the second node (108), enables takeover of restarting procedure and managing network connectivity during failure.
9. A system (104) for managing network connectivity during failure, the system comprising: a transceiver (208) configured to, receive, a failure trigger from a first node; a network manager(210) configured to: disconnect, a connection between the first node and the system (104) on receiving the failure trigger; a framework unit (202) configured to: initiate, one or more failover procedures; and transfer, tasks to be performed by the first node to a second node.
10. The system (104) as claimed in claim 9, wherein the failure trigger is one of a crash signal, a resource exhaust signal, a deadlock and a closure notification pertaining to closing of the tasks at the first node (102).
11. The system (104) as claimed in claim 9, wherein upon disconnection, the framework unit (202) is informed regarding the failure in real time owing to the disconnection.
12. The system (104) as claimed in claim 9, wherein the first node (102) is an active node and the second node (108) is a standby node.
13. The system (104) as claimed in claim 9, wherein the framework unit (202) initiates one or more failover procedures by invoking, one or more Application Programming Interface (API) functions.
14. The system (104) as claimed in claim 13, upon invoking the one or more API functions, the framework unit (202) is configured to initiate a kernel cleanup.
15. The system (104) as claimed in claim 9, wherein the framework unit (202) transfers, tasks to be performed by the first node (102) to the second node (108), enables takeover of restarting procedure and managing network connectivity during failure.
16. The system (104) as claimed in claim 13, wherein the second node (108) is selected by the framework unit (202) simultaneously when the one or more API functions are invoked, thereby transitioning from the first node (102) to the second node (108).
17. A non-transitory computer-readable medium having stored thereon computer- readable instructions that, when executed by a processor (204), cause the processor (204) to: receive, a failure trigger from a first node (102); disconnect, a connection between the first node (102) and the one or more processors (204) on receiving the failure trigger; initiate, one or more failover procedures; and transfer, tasks to be performed by the first node (102) to a second node (108), thereby recovering the tasks and managing network connectivity during failure.

Description

METHOD AND SYSTEM FOR MANAGING NETWORK CONNECTIVITY FIELD OF THE INVENTION [0001] The present invention relates to wireless communication networks, more particularly relates to managing network connectivity during a process failure in a communication network. BACKGROUND OF THE INVENTION [0002] In a network, large volume of requests, are received and processed within a minute time frame. Processing such large volume of requests in a satisfactory manner indicates a positive user experience. The network utilizes various network components such as servers and network functions in form of a cluster-based application, to process the received requests. However, sometimes the components of the network experience some downtime due to some errors or due to some hardware malfunctioning. Sorting of such errors and failures is essential to maintain the quality of service of the network. [0003] In the cluster-based application of the network, a high availability framework (HA) is provided which detects process failures based on TCP connection failure or by heartbeat mechanism. The heartbeat mechanism includes sending a communication packet from one node to another node in the network in order to monitor the health of the nodes, networks and network interfaces, and to prevent cluster partitioning which occurs when communication is lost between one or more nodes in the cluster and a failure of the lost nodes cannot be confirmed. [0004] It is well known in the art that the cluster is a group of inter-connected nodes or hosts that work together to support one or more applications and a middleware. In a cluster mode, the HA launches a driver inside the cluster. In a single cluster, there is indeed a single driver node responsible for managing the one or more applications. The cluster mode is a good choice for production of workloads pertaining to one or more applications that require high availability, scalability, and security. [0005] Due to this, HA experiences delay in detecting process failures and initiating failover. These delays occur due to the time taken by the Linux kernel to clean up resources like TCP connections, RAM, open files etc. after a process exits due to crashes, resource exhausts, deadlock etc. Additionally, periodic heartbeat mechanisms introduce further delays. Therefore, the development of a rapid failure detection and recovery framework is crucial to improve the overall reliability and performance time sensitive systems. SUMMARY OF THE INVENTION [0006] One or more embodiments of the present disclosure provide a system and method for managing network connectivity during a process failure in a communication network. [0007] In one aspect of the present invention, a method of managing network connectivity during the process failure in a communication network is provided. The method includes receiving a failure trigger from a first node. In an embodiment, the failure trigger is one of a crash signal, a resource exhaust signal, and a deadlock. The method further includes disconnecting, a connection between the first node and a one or more processors on receiving the failure trigger in real time. In an embodiment, upon disconnection the one or more processors is informed of the process failure in real time owing to the disconnection. The method further includes initiating one or more failover procedures. The method further includes invoking one or more Application Programming Interface (API) functions. Upon invoking one or more API functions, the method further initiates a kernel cleanup. The method further includes transferring tasks to be performed by the first node to a second node. In an embodiment, the first node is an active node, and the second node is a standby node. In an embodiment, the second node is selected simultaneously when the one or more API functions are invoked, thereby transitioning from the first node to the second node. In an embodiment, the method further transfers tasks to be performed by the first node to the second node, enables takeover of restarting procedure and managing network connectivity during process failure. [0008] In another aspect of the present invention a system for managing network connectivity during a process failure in a communication network is disclosed. The system includes a transceiver configured to receive, a failure trigger from a first node. In an embodiment, the failure trigger is one of a crash signal, a resource exhaust signal, and a deadlock. Further, the system includes a network manager configured to disconnect, a connection between the first node and the system on receiving the failure trigger in real time. In an embodiment, upon disconnection the framework unit is informed of the process failure in real time owing to the disconnection. The framework unit is further configured to initiate one or more failover procedures by invoking, one or more Application Programming Interface (API) functions. Upon invoking one or more API functions, the framew