Search

CN-122019312-A - Method, device, computer equipment and medium for processing alarm event in server

CN122019312ACN 122019312 ACN122019312 ACN 122019312ACN-122019312-A

Abstract

The application relates to a method, a device, computer equipment and a medium for processing an alarm event in a server, which are used for responding to configuration operation of an alarm system, sending configuration dynamic data corresponding to the configuration operation of the alarm system to a jLogtail collector, acquiring an original log of the server acquired by the jLogtail collector, converting the acquired original log into a standard format event and pushing the standard format event to a message queue, calling an MCP context manager to consume the standard format event from the message queue, supplementing context information by taking a core mark corresponding to the standard format event as a clue, generating complete context information, and judging whether a preset alarm triggering condition is met or not based on the complete context information and combining a dynamic threshold algorithm with a weighted diagnosis result of a diagnostic agent cluster. The whole scheme can realize high-efficiency and accurate alarm event processing.

Inventors

  • LU HAIYUAN
  • DU CONG

Assignees

  • 湖南长银五八消费金融股份有限公司

Dates

Publication Date
20260512
Application Date
20260129

Claims (10)

  1. 1. A method for processing an alarm event in a server, the method comprising: Responding to the alarm system configuration operation, and transmitting configuration dynamic data corresponding to the alarm system configuration operation to a jLogtail collector; acquiring an original log of the acquisition server of the jLogtail acquisition unit, converting the acquired original log into a standard format event, and pushing the event to a message queue; Calling an MCP context manager to consume a standard format event from the message queue, supplementing context information by taking a core identifier corresponding to the standard format event as a clue, and generating complete context information; Based on the complete context information, a dynamic threshold algorithm is combined with a weighted diagnosis result of the diagnosis agent cluster to judge whether a preset alarm triggering condition is met.
  2. 2. The method of claim 1, wherein the issuing jLogtail the configuration dynamic data corresponding to the alarm system configuration operation to the collector in response to the alarm system configuration operation comprises: Displaying a visual Web configuration interface; responding to alarm system configuration operation on the visual Web configuration interface to obtain configuration dynamic data; Storing the configuration dynamic data to a MySQL database in a persistence mode; And calling an HTTP interface and pushing the configuration dynamic data to the jLogtail collector.
  3. 3. The method of claim 1, wherein the obtaining the jLogtail collector collecting the original log of the server, converting the collected original log into a standard format event, and pushing to a message queue comprises: Acquiring jLogtail an original log of a collection server of a collector; Preprocessing the original log to obtain a preprocessed log; converting, by the MCP adapter, the preprocessed log into standard format events including a time ID, a timestamp, a service name, an error type, and core context information; pushing the standard format event to a Kafka message queue.
  4. 4. The method of claim 1, wherein invoking the MCP context manager to consume a standard format event from the message queue, supplementing context information with a core identifier corresponding to the standard format event as a hint, the generating complete context information comprising: Calling an MCP context manager to consume a standard format event from the message queue, and extracting a service name, an error type and a core field from the standard format event as a basic context; Inquiring and aggregating associated data from a log, an index, a link tracking and a knowledge base system by taking a core mark in the standard format event as a clue to obtain aggregated data; enriching the basic context based on the aggregated data, generating complete context information including basic context, enhanced context and lifecycle information; Initializing and updating the complete context information through a preset MCP life cycle hook function.
  5. 5. The method of claim 1, wherein the determining whether a preset alarm triggering condition is met based on the complete context information in combination with a weighted diagnosis result of a dynamic threshold algorithm and a diagnostic agent cluster comprises: Acquiring real-time transaction data and historical service data of a server, and calculating real-time transaction loss and maximum allowable loss according to the real-time transaction data, the historical service data and the complete context information; Calculating a business influence coefficient according to the real-time transaction loss and the maximum allowable loss, and calculating a dynamic alarm threshold according to the business influence coefficient; extracting an error type field from the complete context information, and determining a target event type of the current event through matching with a built-in event type library and context auxiliary verification; calling a plurality of agents in the diagnosis agent cluster to carry out collaborative diagnosis according to the target event type to obtain a plurality of agent diagnosis results; performing weighted aggregation on the plurality of agent diagnosis results to obtain comprehensive confidence coefficient; And judging whether a preset alarm triggering condition is met or not according to the dynamic alarm threshold value and the comprehensive confidence coefficient.
  6. 6. The method according to claim 1, wherein the determining whether the preset alarm triggering condition is met based on the complete context information and by combining a dynamic threshold algorithm with a weighted diagnosis result of the diagnostic agent cluster further comprises: if the preset alarm triggering condition is met, generating structural alarm information based on the complete context information; pushing the structured alarm information and continuously monitoring the alarm solving state; if the alarm timeout is not solved, an operation and maintenance work order is created and pushed on the trusted work order platform based on the structured alarm information.
  7. 7. The method of claim 6, wherein after pushing the structured alarm information and continuously monitoring the alarm state, further comprising: if the alarm is processed, collecting an alarm event ID and associated information of a historical alarm event; constructing initial association relation diagrams of different alarm events according to preset association dimensions by taking the alarm event ID as a node, wherein the preset association dimensions comprise link association, index association, error type association and service association; Calculating the comprehensive similarity among the nodes based on the error types, the problem sources and the repair measure data corresponding to the nodes of the initial association relation diagram; and giving weight to the association edges in the initial association relation graph based on the comprehensive similarity, and eliminating the association edges with the weight lower than a preset weight threshold value to obtain an updated association relation graph.
  8. 8. An alarm event handling device in a server, the device comprising: The configuration module is used for responding to the configuration operation of the alarm system and transmitting the configuration dynamic data corresponding to the configuration operation of the alarm system to the jLogtail collector; the log acquisition module is used for acquiring an original log of the jLogtail acquisition server, converting the acquired original log into a standard format event and pushing the event to the message queue; The context supplementing module is used for calling the MCP context manager to consume the standard format event from the message queue, supplementing the context information by taking the core identifier corresponding to the standard format event as a clue, and generating complete context information; And the alarm judging module is used for judging whether a preset alarm triggering condition is met or not based on the complete context information and by combining a dynamic threshold algorithm with a weighted diagnosis result of the diagnosis agent cluster.
  9. 9. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any of claims 1 to 7 when the computer program is executed.
  10. 10. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 7.

Description

Method, device, computer equipment and medium for processing alarm event in server Technical Field The present application relates to the field of computer technologies, and in particular, to a method, an apparatus, a computer device, a storage medium, and a computer program product for processing an alarm event in a server. Background In the field of server operation and maintenance, an alarm system is a key link for guaranteeing stable operation of a server, and has important significance for timely finding and solving potential problems. The traditional server alarm system mainly adopts various technical schemes, an open source monitoring alarm tool such as PrometheusAlertmanager triggers an alarm when data reach a threshold value through presetting a threshold value rule, a business alarm management platform such as PagerDuty has functions of alarm grouping, scheduling, upgrading strategies and the like, a log analysis system such as ELKStack triggers an alarm based on keyword matching in a log, and a collector technology such as Logstash, fluentd processes log data by adopting a plug-in architecture. However, the conventional techniques described above have significant limitations in practical applications. On one hand, the multisource data cannot realize unified description and effective transmission due to differences of formats, sources and the like, so that the data is difficult to integrate, comprehensive and accurate information is difficult to form for alarm judgment, on the other hand, the alarm rules are mostly static threshold configuration, dynamic adjustment capability is lacked, the alarm strategy cannot be timely adjusted according to dynamic changes of the actual running state of a server, manual operation and even restarting of service are often required for configuration update, and response efficiency is low. More importantly, when facing complicated server alarm events, the conventional technology mainly relies on manual experience for positioning, so that not only is a great deal of manpower and time consumed, but also the accuracy is low, and high-efficiency and accurate alarm event processing is difficult to realize. Disclosure of Invention In view of the foregoing, it is desirable to provide an efficient and accurate method, apparatus, computer device, computer readable storage medium, and computer program product for processing an alarm event in a server. In a first aspect, the present application provides a method for processing an alarm event in a server. The method comprises the following steps: Responding to the alarm system configuration operation, and transmitting configuration dynamic data corresponding to the alarm system configuration operation to a jLogtail collector; acquiring an original log of the acquisition server of the jLogtail acquisition unit, converting the acquired original log into a standard format event, and pushing the event to a message queue; calling an MCP (Multi-Context Processing ) context manager to consume a standard format event from the message queue, supplementing context information by taking a core identifier corresponding to the standard format event as a clue, and generating complete context information; Based on the complete context information, a dynamic threshold algorithm is combined with a weighted diagnosis result of the diagnosis agent cluster to judge whether a preset alarm triggering condition is met. In one embodiment, responding to the alarm system configuration operation and transmitting the configuration dynamic data corresponding to the alarm system configuration operation to the jLogtail collector includes: Displaying a visual Web configuration interface; responding to alarm system configuration operation on the visual Web configuration interface to obtain configuration dynamic data; Storing the configuration dynamic data to a MySQL database in a persistence mode; an HTTP interface (HyperText Transfer Protocol Interface ) is invoked to push the configuration dynamic data to jLogtail collector. In one embodiment, the obtaining the jLogtail collector to collect the original log of the server, converting the collected original log into the standard format event, and pushing the event to the message queue includes: Acquiring jLogtail an original log of a collection server of a collector; Preprocessing the original log to obtain a preprocessed log; converting, by the MCP adapter, the preprocessed log into standard format events including a time ID, a timestamp, a service name, an error type, and core context information; pushing the standard format event to a Kafka message queue. In one embodiment, the calling the MCP context manager to consume a standard format event from the message queue, supplementing context information with a core identifier corresponding to the standard format event as a thread, and generating complete context information includes: Calling an MCP context manager to consume a standard format event from the m