EP-4740379-A1 - SYSTEM AND METHOD FOR HANDLING ALARMS
Abstract
The present disclosure relates to handling of alarms in a Network Management System (125) Stream data collected from network elements (110) is parsed and transformed into raise alarm or clear alarm. The alarms are stored in a distributed cache (225) after comparison with existing alarms already present in the distributed cache (225). A unique alarm identifier is generated for each alarm, and the clear alarms corresponding to the unique alarm identifiers are retrieved. A database (220) is checked for presence of associated raise alarms, and the raise alarms are deleted from an active section when the associated raise alarms are identified to be present and the clear alarms are streamed for retrying when the associated raise alarms are identified to be absent. The database (220) is also checked for presence of the raise alarms corresponding to retry alarm data and the raise alarms are deleted from the active section.
Inventors
- BHATNAGAR, AAYUSH
- BISHT, SANDEEP
- MISHRA, RAHUL
- DIVY, Dipankar
- SHARMA, Smridhi
- E, Elanchezhiyan
- TIWARI, SUMIT
Assignees
- Jio Platforms Limited
Dates
- Publication Date
- 20260513
- Application Date
- 20240702
Claims (20)
- 1. A method of handling alarms in a Network Management System (NMS) (125), the method comprising: collecting, by one or more processors (205), stream data from one or more network elements (110); parsing and transforming, by the one or more processors (205), the stream data into alarms in a standardized format, the alarms comprising one of two event types, raise alarm and clear alarm; storing, by the one or more processors (205), the alarms in a distributed cache (225) after comparing the alarms with existing alarms already present in the distributed cache (225); generating, by the one or more processors (205), a unique alarm identifier for each alarm; retrieving, by the one or more processors (205), the clear alarms corresponding to the unique alarm identifiers from the distributed cache (225); checking, by the one or more processors (205), a database (220) for presence of associated raise alarms, and deleting the raise alarms from an active section when the associated raise alarms are identified to be present and streaming the clear alarms for retrying when the associated raise alarms are identified to be absent; and checking, by the one or more processors (205), the database (220) for presence of the raise alarms corresponding to retry alarm data and deleting the raise alarms from the active section when identified to be present.
- 2. The method as claimed in claim 1, the stream data including Fault, Configuration, Accounting, Performance, and Security (FCAPs) information.
- 3. The method as claimed in claim 1 , comprising updating, by the one or more processors (205), an occurrence count and a timestamp array of the alarms based on the comparison of the alarms with the existing alarms.
- 4. The method as claimed in claim 1 , comprising updating, by the one or more processors (205), metadata associated with the alarms, enriching the alarms with additional information, and storing details of the alarms into the distributed cache (225).
- 5. The method as claimed in claim 1, comprising incrementing, by the one or more processors (205), a retry count and reproducing the retry alarm data into the stream data when the raise alarms are identified to be absent, thereby retrying an alarm until cleared or a count of retry threshold is exhausted.
- 6. The method as claimed in claim 1 , comprising performing, by the one or more processors (205), one or more operations on the alarms including planned event processing, Artificial Intelligence (Al)-based correlation to identify patterns or related events, and trouble ticketing to initiate incident management processes.
- 7. The method as claimed in claim 1 , comprising performing, by the one or more processors (205), one or more of monitoring and running fault processor topics, performing a lookup on a table based on the topics, extracting alarms present in the distributed cache for more than a configurable duration, and processing the alarms.
- 8. The method as claimed in claim 1, comprising deleting, by the one or more processors (205), the raise alarms from the active section, adding clearance metadata to the raise alarms, and storing the raise alarms in an archived section of the database (220).
- 9. The method as claimed in claim 1 , comprising providing, by the one or more processors (205), the distributed cache (225) with the configurable interval for periodically storing the alarms.
- 10. The method as claimed in claim 1, comprising scanning, by the one or more processors (205), the distributed cache (225) for identifying a stranded alarm.
- 11. A Network Management System (NMS) (125) for handling alarms, wherein the NMS (125) comprises: a collector component (228) configured to: collect stream data from network elements (110); and parse and transform the stream data into alarms in a standardized format, the alarms comprising one of two event types, raise alarm and clear alarm; a distributed cache (225) configured to store the alarms in after comparing the alarms with existing alarms already present in the distributed cache (225), a Fault processor Master (FM) module (230) configured to generate a unique alarm identifier for each alarm; a clear FM module (240) configured to: retrieve clear alarms corresponding to the unique alarm identifiers from the distributed cache (225); and check a database (220) for presence of associated raise alarms, and delete the raise alarms from an active section when the associated raise alarms are identified to be present and stream the clear alarms for retrying when the associated raise alarms are identified to be absent; and a retry FM module (245) configured to check the database (220) for presence of the raise alarms corresponding to retry alarm data and delete the raise alarms from the active section when identified to be present.
- 12. The NMS (125) as claimed in claim 11, wherein the stream data includes Fault, Configuration, Accounting, Performance, and Security (FCAPs) information.
- 13. The NMS (125) as claimed in claim 11, wherein an occurrence count and a timestamp array of the alarms is updated based on the comparison of the alarms with the existing alarms.
- 14. The NMS (125) as claimed in claim 11, wherein metadata associated with the alarms is updated, the alarms are enriched with additional information, and details of the alarms are stored into the distributed cache (225).
- 15. The NMS (125) as claimed in claim 11, wherein a retry count is incremented and the retry alarm data is reproduced into the stream data when the raise alarms are identified to be absent, thereby retrying an alarm until cleared or a count of retry threshold is exhausted.
- 16. The NMS (125) as claimed in claim 11, wherein the NMS (125) performs one or more operations on the alarms including planned event processing, Al-based correlation to identify patterns or related events, and trouble ticketing to initiate incident management processes.
- 17. The NMS (125) as claimed in claim 11, wherein the NMS (125) is configured to perform one or more of monitoring running fault processor topics, performing a lookup on a table based on the topics, extracting alarms present in the distributed cache (225) for more than a configurable duration, and processing the alarms.
- 18. The NMS (125) as claimed in claim 11, wherein when the raise alarms are deleted from the active section, the NMS (125) adds clearance metadata to the raise alarms, and stores the raise alarms in an archived section of the database (220).
- 19. The NMS (125) as claimed in claim 11, wherein the distributed cache (225) is provided with a configurable interval for periodically storing the alarms.
- 20. The NMS (125) as claimed in claim 11, wherein the NMS (125) scans the distributed cache (225) for identifying a stranded alarm.
Description
SYSTEM AND METHOD FOR HANDLING ALARMS FIELD OF THE INVENTION [0001] The present invention relates to the field of network management systems and, more specifically, to the efficient handling of repetitive alarms and auditing. BACKGROUND OF THE INVENTION [0002] Network Management Systems (NMS) are essential for monitoring and maintaining the health and performance of computer networks. These systems generate alarms or alerts when specific events occur, indicating potential issues or anomalies within the network infrastructure. Alarms play a crucial role in monitoring and maintaining the health and performance of network elements, services, and interfaces. Alarms are raised to notify operators and administrators about critical events or abnormal conditions that require attention or intervention. [0003] Alarm management is an integral part of network management systems and involves the detection, processing, and communication of alarms. Alarms are typically raised when certain predefined thresholds are exceeded, errors occur, or specific conditions are met within the network infrastructure. These alarms act as signals to indicate potential issues, faults, or performance degradation that may impact the overall network functionality. [0004] Alarms are raised for various reasons, including network faults, equipment failures, performance bottlenecks, security breaches, or service disruptions. Repetitive alarms often occur when the underlying cause of an alarm persists, leading to recurring notifications. This commonly happens when services are down or when common interfaces experience fluctuations, causing multiple nodes, network elements, or network functions to repeatedly raise and clear alarms until the issue is resolved. [0005] A typical alarm management system consists of several components that work together to handle alarms efficiently. The components include a Fault Processor (FP) or Fault Manager (FM) , Enrichment and Correlation modules, and Trouble Ticketing (TT) System. [0006] In the field of network management, various approaches have been proposed to handle alarms and improve system performance. Existing systems utilize distributed processing and auditing of alarms to address some of the challenges associated with repetitive alarms. However, these solutions often lack comprehensive optimization and fail to provide efficient handling of repetitive alarms. [0007] Despite the existing approaches, network management systems still encounter delays and inefficiencies when processing repetitive alarms. The recurring insertion, updating, and deletion of alarms from persistent storage and other processing stages lead to redundant operations and resource consumption, causing performance bottlenecks. Moreover, processing multiple stages for each raise and clear alarm, such as enrichment, correlation, and trouble ticketing, further exacerbates the problem. [0008] There is therefore a need for an innovative solution that efficiently handles repetitive alarms, optimizes alarm processing workflow, reduces computational resources, and provides effective auditing, without compromising accuracy or losing any alarm data. BRIEF SUMMARY OF THE INVENTION [0009] One or more embodiments of the present disclosure provide a system and method of handling alarms in a Network Management System (NMS). [0010] In one aspect of the present invention, a system for handling alarms in the NMS (henceforth referred as system) is disclosed. The system includes a collector component configured to collect stream data from network elements, and parse and transform the stream data into alarms in a standardized format. The stream data includes Fault, Configuration, Accounting, Performance, and Security (FCAPs) information. The alarms comprise one of two event types, raise alarm and clear alarm. The system further includes a distributed cache configured to store the alarms in after comparing the alarms with existing alarms already present in the distributed cache. The system further includes a Fault processor Master (FM) module configured to generate an unique alarm identifier for each alarm. The system further includes a raise FM module configured to retrieve the alarms corresponding to the unique alarm identifiers from the distributed cache. The system further includes a clear FM module configured to retrieve clear alarms corresponding to the unique alarm identifiers from the distributed cache and check the database for presence of associated raise alarms, and delete the raise alarms from an active section when the associated raise alarms are identified to be present and stream the clear alarms for retrying when the associated raise alarms are identified to be absent. The system further includes a retry FM module configured to check the database for presence of the raise alarms corresponding to retry alarm data and delete the raise alarms from the active section when identified to be present. [0011] In one aspect, one or more parameters i