Search

CN-122027441-A - Comprehensive monitoring alarm system for running state of server

CN122027441ACN 122027441 ACN122027441 ACN 122027441ACN-122027441-A

Abstract

The invention discloses a comprehensive monitoring alarm system for running states of servers, which belongs to the field of monitoring alarm systems and comprises a monitoring management module, a program management module, an alarm configuration module and a data acquisition and processing module, wherein the monitoring management module is configured to provide a user interaction interface so as to display and manage information lists of one or more monitored servers, the information of the monitored servers at least comprises server IP addresses, server account information and port information used by programs running on the servers, and the program management module is configured to manage program lists deployed on the monitored servers. The invention can realize the multi-dimensional comprehensive monitoring of server hardware resources, application states, network traffic and the like, thereby improving the operation and maintenance efficiency and the system reliability.

Inventors

  • WANG YAN
  • TANG DONGLIANG

Assignees

  • 山东道春信息技术有限公司

Dates

Publication Date
20260512
Application Date
20251217

Claims (9)

  1. 1. The comprehensive monitoring alarm system for the running state of the server is characterized by comprising a monitoring management module, a program management module, an alarm configuration module and a data acquisition and processing module; The monitoring management module is configured to provide a user interaction interface to display and manage information lists of one or more monitored servers, wherein the information of the monitored servers at least comprises server IP addresses, server account information and port information used by programs running on the servers; The program management module is configured to manage a program list deployed on the monitored server, wherein the program list comprises a program name, git source code warehouse information to which a program belongs, branch information of program operation, server and account information to which the program belongs, a program monitoring port, a program deployment path, a program heartbeat state, program dependent service information, a program starting script and starting sequence information; The alarm configuration module is configured to receive alarm rules customized by a user according to the requirements of a service platform, wherein the alarm rules are associated with state indexes of the monitored server and/or the running state of the application program; The data acquisition and processing module is configured to access a corresponding monitored server based on the server information and account information provided by the monitoring management module, acquire system resource state indexes of the monitored server and running state data of the application program based on the program list information provided by the program management module, and further configured to process the acquired state data, perform state judgment according to the alarm rules set by the alarm configuration module, and generate and send alarm information when alarm conditions are met.
  2. 2. The server operation state comprehensive monitoring alarm system according to claim 1, wherein the data acquisition and processing module comprises a data acquisition unit, a data transmission unit and a data storage unit; The system comprises a data acquisition unit, a data processing unit and a data processing unit, wherein the data acquisition unit is used for acquiring system resource state indexes of a monitored server through an agent program deployed on the monitored server, the system resource state indexes comprise CPU (central processing unit) utilization rate, memory utilization rate, disk I/O (input/output) throughput and network flow data, and acquiring running state data of the application program, wherein the running state data comprise a process survival state, a port monitoring state, an application program performance index, log output content and heartbeat signals periodically transmitted by the program; The data acquisition unit is further configured to perform flow monitoring of the server, the flow monitoring process records monitoring time, monitoring records, monitoring personnel identification and monitoring equipment information, analyzes based on the recorded flow data to identify a flow abnormal target, and extracts flow characteristics from the identified abnormal target, wherein the flow characteristics comprise a source IP address, a target IP address, a port number, a protocol type and a data packet size; the data transmission unit is configured to use a message queue middleware or a remote procedure call framework to package and transmit the state data, the flow characteristics and the traceability result acquired by the data acquisition unit; The data storage unit is configured to receive and store the state data, the flow characteristics and the tracing result sent by the data transmission unit, and comprises a time sequence database for storing system resource state indexes with time stamps and application running state data, and a relational database for storing flow monitoring records and the tracing result.
  3. 3. The server running state comprehensive monitoring alarm system according to claim 2, wherein the data acquisition and processing module further comprises a data preprocessing unit and a traceability priority management unit; The data preprocessing unit is configured to perform data cleaning, format standardization and data aggregation operation on the collected original state data before the data is transmitted to the data storage unit, wherein the data cleaning operation comprises the steps of removing abnormal values and filling missing data values, the format standardization operation converts the state data from different data sources into a uniform internal data format, and the data aggregation operation comprises the step of performing downsampling processing on the high-frequency state data according to a preset time window; The tracing priority management unit is in communication connection with the data acquisition unit, is configured to receive the tracing result generated by the data acquisition unit, and performs priority coding on the tracing result, and the priority coding performs weighted calculation based on the severity, the influence range, the occurrence frequency and the importance of related services of the abnormal traffic in the tracing result to generate a priority level.
  4. 4. The system of claim 3, further comprising an analysis and alarm engine module communicatively coupled to the data acquisition and processing module and invoking alarm rules set in the alarm configuration module; the analysis and alarm engine module is provided with a rule engine, the rule engine supports a user to define alarm condition logic through a domain specific language, and the alarm condition logic supports combination judgment of state indexes of a plurality of different dimensions; The analysis and alarm engine module is further configured with a memory usage state analysis unit, the memory usage state analysis unit is configured to monitor the memory usage state of each software application program when running, perform independent memory usage analysis on each software application program, and analyze the collaborative change of the memory usage state when a plurality of software application programs run simultaneously, the memory usage state analysis unit is further configured to generate a dynamic line graph based on time sequence data of the memory usage state, and extract the memory usage pattern feature based on the dynamic line graph, and the analysis and alarm engine module is further configured to compare and analyze the current memory usage pattern feature with the past memory usage patterns stored in the history record, and identify memory usage abnormality by calculating similarity or deviation value, so as to realize multi-dimensional memory state analysis.
  5. 5. The system of claim 4, wherein the analysis and alarm engine module is further integrated with an intelligent detection unit; the intelligent detection unit is configured with a machine learning model for carrying out time sequence analysis on the historical and real-time state data acquired by the data acquisition and processing module so as to dynamically learn the normal behavior mode of the monitored server and/or the application program and establish a dynamic threshold according to the normal behavior mode; The intelligent detection unit compares the real-time state data with the dynamic threshold value, and when an abnormal state deviating from a normal behavior mode is detected, the analysis and alarm engine module is triggered to generate abnormal alarm information, and the abnormal alarm information is independent of or assisted with an alarm rule based on a fixed threshold value; The intelligent detection unit models the memory occupation mode of single software and a plurality of software when the single software and the plurality of software cooperatively run by using a clustering algorithm or an abnormality detection algorithm according to the memory use state, and improves the accuracy of memory abnormality detection by combining the analysis result of the dynamic line graph.
  6. 6. The server operating state integrated monitoring and alarm system of claim 5 further comprising a Git management module communicatively coupled to the program management module and configured to manage Git source code repository information associated with the application program; The Git management module provides a user interface and supports a user to inquire and retrieve project source codes according to the platform type, the Git warehouse address and developer information; The Git management module maintains a plurality of branch information associated with each Git repository, each branch name being referenced as a program version identification in a program manifest of the program management module, thereby associative mapping of a running instance of a program with a particular source code branch.
  7. 7. The comprehensive monitoring and alarming system for running state of server according to claim 6, wherein the program list in the program management module further records the dependent service information of the program, when the data acquisition and processing module or the analysis and alarming engine module detects that a certain program instance has a fault or abnormal state, the system can automatically check the running state of the dependent service according to the dependent service information recorded in the program list, and output the state information of the dependent service as part of alarming context or root cause analysis; in the memory monitoring scene, when the memory use abnormality is detected, the system also analyzes whether the abnormality is caused by memory leakage or resource competition of the dependent service according to the dependency relationship in the program list, and brings the associated information into an analysis report.
  8. 8. The server operating condition integrated monitoring and alarm system of claim 7 further comprising a notification and action module in communication with the analysis and alarm engine module and the data acquisition and processing module and configured to receive generated alarm information; the notification and action module supports configuration of various alarm notification channels, including an email, a short message, an instant messaging tool robot interface and a Webhook callback interface; the notification and action module is also configured with an alarm routing strategy, and the alarm routing strategy routes the alarm information to different notification channels or designated receivers according to the severity level of the alarm information, a service platform to which an alarm source belongs or preset duty arrangement; and for the tracing result and the internal memory analysis result, the notification and action module adjusts the urgency and the repetition frequency of the notification according to the priority code or the abnormal severity degree.
  9. 9. The system is characterized by further comprising a system setting module, a menu management unit and a system setting module, wherein the system setting module is used for managing background configuration data of the whole system, the system setting module comprises a user management unit, a role management unit and a menu management unit, the user management unit is used for managing account information of a system user and comprises newly built users, reset user passwords, inquiring and editing user detailed information, the role management unit is used for defining and managing different user roles and distributing different system operation authorities for each role, the operation authorities at least comprise access and operation authorities for the monitoring management module, the program management module and the alarm configuration module, the menu management unit is used for dynamically managing menu items of a system user interface, supporting adding and deleting operations for the menu items and adjusting display sequences of the menu items in the user interface, and the system setting module is further used for managing parameters related to flow monitoring and memory monitoring, including monitoring time planning, monitoring device registration information, historical data retention strategy allocation and source priority tracing rules.

Description

Comprehensive monitoring alarm system for running state of server Technical Field The invention relates to the field of monitoring alarm systems, in particular to a comprehensive monitoring alarm system for the running state of a server. Background With the development of computer network technology, network devices, servers, middleware, service systems and the like make it difficult for network administrators to cope with them, and therefore, monitoring of these devices, particularly servers, is extremely important for maintenance of computer networks. In order to acquire the running state information of the server in real time so as to ensure safe and stable running of the server, the special monitoring software is mainly adopted in the industry to monitor the running state information of the server system at present. The prior patent 201710025251.9 discloses a server running state monitoring system, wherein a data acquisition module acquires data information of the working condition of the server and sends the acquired data information to a data processing module, the data processing module receives the data information sent by the data acquisition module, the data processing module sets the running time of the data acquisition module in a broadcasting mode, uniformly sets the data acquisition and sending processing time of the data acquisition module, the data processing module is in communication connection with the data acquisition module in a mode of an optical cable or WIFI or CAN, the data processing module compares the acquired data information with a preset threshold value and current data, and when the current data exceeds the preset threshold value or has bad effect compared with the current data, alarm prompt is carried out, alarm statistics is formed on the alarm information, the alarm statistics is stored, the data processing module receives a request from a client, and the processed data is returned to a user through an HTTP protocol. The existing server monitoring system has single function, and mostly only monitors threshold values for basic hardware resources such as a CPU, a memory, a disk and the like, so that the comprehensiveness and depth perception of the running state of the server cannot be realized. Therefore, a person skilled in the art provides a comprehensive monitoring and alarming system for the running state of a server to solve the problems in the background art. Disclosure of Invention Aiming at the defects in the prior art, the invention provides a comprehensive monitoring alarm system for the running state of a server, which comprises a monitoring management module, a program management module, an alarm configuration module and a data acquisition and processing module; The monitoring management module is configured to provide a user interaction interface to display and manage information lists of one or more monitored servers, wherein the information of the monitored servers at least comprises server IP addresses, server account information and port information used by programs running on the servers; The program management module is in communication connection with the monitoring management module and is configured to manage a program list deployed on the monitored server, wherein the program list comprises a program name, git source code warehouse information to which a program belongs, branch information of program operation, server and account information to which the program belongs, a program monitoring port, a program deployment path, a program heartbeat state, program dependent service information, a program starting script and starting sequence information; The alarm configuration module is integrated in the monitoring management module or is independently arranged, and is configured to receive alarm rules customized by a user according to the requirements of a service platform, wherein the alarm rules are associated with the state indexes of the monitored server and/or the running states of the application programs; The data acquisition and processing module is respectively connected with the monitoring management module and the program management module in a communication way, is configured to access a corresponding monitored server based on server information and account information provided by the monitoring management module, acquire system resource state indexes of the monitored server and running state data of the application program based on program list information provided by the program management module, and is also configured to process the acquired state data, judge the state according to alarm rules set by the alarm configuration module, and generate and send alarm information when alarm conditions are met. The data acquisition and processing module comprises a data acquisition unit, a data transmission unit and a data storage unit; The system comprises a data acquisition unit, a data processing unit and a data processing unit, wherein the data