EP-4740385-A1 - SYSTEM AND METHOD FOR RESOLVING A DEADLOCK EVENT IN A NETWORK
Abstract
The present disclosure relates to a system (125) and a method (400) for resolving a deadlock event in a network (105) The system (125) includes a registration module (220) to receive a request for thread registration from a thread initiating device (110). The system further includes a detection module (225) to detect a deadlock event in the registered threads and capture data pertaining to the deadlock event for efficient resolution of the deadlock event. The system further includes a recovery module (225) to initiate standby process for uninterrupted service provision while the resolution of deadlock is in continuation. Thereby, the system (125) resolves the deadlock event in the network (105) in an optimized manner and facilitates post-recovery debugging of the registered thread. The method (400) includes various steps for resolving the deadlock in the network (105).
Inventors
- BHATNAGAR, AAYUSH
- Swami, Mukul
- BISHT, BIRENDRA
- SINGH, HARBINDER
- Soren, Rohit
- SINGH, PRIYANKA
- Aggarwal, Pravesh
- Sahu, Bidhu
- JASWANI, Tikam
- BHAMU, Sarita
Assignees
- Jio Platforms Limited
Dates
- Publication Date
- 20260513
- Application Date
- 20240627
Claims (14)
- 1. A method (400) for resolving a deadlock event in a network, the method comprises the steps of: receiving, by one or more processors (205), a request for a thread registration from a thread initiating device (110); capturing, by the one or more processors (205), data pertaining to the thread requesting for the thread registration and completing the thread registration; initiating, by the one or more processors (205), a deadlock detection task by periodically transmitting signals to the registered thread; detecting, by the one or more processors (205), occurrence of a deadlock event when number of responses received from the registered thread for the periodically transmitted signals is less than a pre-defined threshold; and initiating, by the one or more processors (205), a standby process until the deadlock event is resolved.
- 2. The method as claimed in claim 1 , wherein the request is received from the thread initiating device (110) for the thread registration includes at least one of, request generated voluntarily, and generated manually using the thread initiating device.
- 3. The method as claimed in claim 1, wherein the request is intended to register the thread such that the thread is protected against the deadlock event.
- 4. The method as claimed in claim 1, wherein an auditor thread is utilized by the one or more processors (205) to detect occurrence of the deadlock event, wherein the one or more processors utilizing the auditor thread periodically transmit signals to the registered thread.
- 5. The method as claimed in claim 1, wherein the pre-defined threshold is the minimum number of times the one or more processors (205) is required to receive responses from the registered thread within a pre-defined time period.
- 6. The method as claimed in claim 1, wherein the method further comprises the steps of: gathering, by the one or more processors (205), information about the registered thread experiencing the deadlock event; and capturing, by the one or more processors (205), stack trace of the registered thread for ascertaining root cause of the deadlock event, thereby facilitating post-recovery debugging.
- 7. A system (125) for resolving a deadlock event in a network, the system comprising: a registration module (220) configured to: receive, a request for a thread registration from a thread initiating device (110); and capture, data pertaining to the thread requesting for thread registration and completing the thread registration; a detection module (225) configured to: initiate, a deadlock detection task by periodically transmitting signals to the registered thread; and detect, occurrence of a deadlock event when number of responses received from the registered thread for the periodically transmitted signals is less than a pre-defined threshold; and a recovery module (230) configured to, initiate, a standby process until the deadlock event is resolved.
- 8. The system as claimed in claim 7, wherein the request is received from the thread initiating device (110) for the thread registration includes at least one of, request generated voluntarily, and generated manually using the thread initiating device (HO).
- 9. The system as claimed in claim 7, wherein the request is intended to register the thread such that the thread is protected against the deadlock event.
- 10. The system as claimed in claim 7, wherein the detection module (225) is configured to utilize an auditor thread to detect occurrence of the deadlock event, wherein the detection module utilizing the auditor thread periodically transmit signals to the registered thread.
- 11. The system as claimed in claim 7, wherein the pre-defined threshold is the minimum number of times the one or more processors is required to receive responses from the registered thread within a pre-defined time period.
- 12. The system as claimed in claim 7, wherein the system is further configured to: gather, by the recovery module (230), information about the registered thread experiencing the deadlock event; and capture, by the recovery module (230), stack trace of the registered thread for ascertaining root cause of the deadlock event, thereby facilitating postrecovery debugging.
- 13. A thread initiating device (110), comprising: one or more primary processors (305) communicatively coupled to one or more processors (205), the one or more primary processors (305) coupled with a memory (310), wherein said memory stores instructions which when executed by the one or more primary processors causes the thread initiating device (110) to: transmit, a request for thread registration to the one or more processors (205); and wherein the one or more processors is configured to perform the steps as claimed in claim 1.
- 4. A non-transitory computer-readable medium having stored thereon computer- readable instructions that, when executed by a processor (205), causes the processor (205) to: receive, a request for a thread registration from a thread initiating device (HO); capture, data pertaining to the thread requesting for thread registration and completing the thread registration; initiate, a deadlock detection task by periodically transmitting signals to the registered thread; detect, occurrence of a deadlock event when number of responses received from the registered thread for the periodically transmitted signals is less than a pre-defined threshold; and initiate, a standby process until the deadlock event is resolved.
Description
SYSTEM AND METHOD FOR RESOLVING A DEADLOCK EVENT IN A NETWORK FIELD OF THE INVENTION [0001] The present disclosure relates to network communication, and more particularly relates to resolving a deadlock event in the network. BACKGROUND OF THE INVENTION [0002] In a distributed system and cloud computing context, multiple processes are executed within a time frame which may, sometimes, cause deadlock. Deadlock refers to a situation when two or more processes/threads are each waiting for a resource held by another process/thread, and each process/thread waits for another process/thread, to complete an action, to release a lock. A deadlock may also refer to more than one processes/threads waiting for resources in a circular chain until the lock is released. Generally, only a process/thread holding a resource may release the resource, and typically the processes/threads will not release the resource until processing has been completed. In simpler terms, deadlock is a state of the distributed system and cloud computing system where progress is halted because conflicting processes/threads are unable to continue. [0003] A deadlock in distributed system or computing system may arise if the system relies on resource allocation mechanisms which may be inefficient. The resource allocation mechanism may be used for accessing controls in the computing system, or shared resources, or files/programs with sharing attributes enabled. [0004] In order to resolve deadlock certain systems with high reliability requirements may choose to ignore deadlocks altogether, assuming that they will occur infrequently, and the impact can be mitigated only when the system restarts manually or by other similar means. While in other scenarios, systems are employed to periodically detect deadlock using algorithms like resource allocation graphs or bankers' algorithm. Once a deadlock is detected, the system can recover by terminating one or more processes or pre-empting resources. [0005] Deadlock is not only observed in computing system, but also in hardware like System-on-Chip (SoC) where multiple hardware or software components within the SoC are unable to make progress due to resource conflicts or synchronization issues. SoCs are also found in routers, and hence a router may also face a deadlock issue with the multiple Transmission Control Protocol (TCP) request coming from devices and applications alike. [0006] Further the deadlock in hardware/computing may occur for multiple reasons like, multiple components within the hardware or programs within computing system may contend for shared resources such as memory, buses, or peripheral interfaces. If there is insufficient coordination or resource allocation mechanism in place, it can lead to deadlocks. For example, if two components/threads are simultaneously trying to access the same memory region, and one holds a lock while waiting for another resource held by the second component/thread, a deadlock can occur if the second component/thread is also waiting for the first component's lock. [0007] The deadlock may arise from improper synchronization mechanisms between different components/threads. For example, if two or more components/threads are waiting for specific signals or events to occur before proceeding, and those signals or events are never triggered, a deadlock can happen. [0008] Hardware deadlock may ascend if an interrupt service routine (ISRs) is not designed carefully. For instance, if an ISR tries to access a resource that is already held by another component, and that component is waiting for the ISR to complete before releasing the resource, a deadlock can arise. [0009] Further, communication mechanisms like message passing or shared memory can also be a source of deadlocks. If components engage in blocking communication and get stuck waiting for messages from each other, a deadlock can occur. [0010] The current state of art discloses implementation of deadlock detection algorithms that periodically analyze the system state to identify potential deadlocks. If a deadlock is detected, appropriate recovery mechanisms can be invoked, such as resource preemption or process termination, to resolve the deadlock. [0011] Further, in a multi-threaded software, multiple threads execute concurrently and perform various tasks. Due to complexity of the system and the interdependencies among threads, applications use mu texes and semaphores to protect resources while being used by a thread to protect against data inconsistency and corruption. However, due to some coding bug or unhandled event, deadlock can occur. Deadlocks can lead to some threads freeze or system freeze and can be challenging to identify and resolve. While in partial or full system deadlock, without auto-detection and auto-recovery, significantly impacts system responsiveness and throughput. The issue needs to be addressed. [0012] Therefore, there is a need for a system and method configured to detect and resolve the dea