CN-122001738-A - Distributed self-adaptive node management method, system and equipment
Abstract
The invention relates to the field of intelligent operation and maintenance, in particular to a distributed self-adaptive node management method, system and equipment, which comprises the steps of updating a server list according to service heartbeat information sent by other received monitoring servers; the method comprises the steps of receiving node management information broadcast by other monitoring servers, updating a list of other management nodes according to the received node management information broadcast by the other monitoring servers, updating a list of master nodes according to the received registration information and node heartbeat information sent by the monitored nodes, broadcasting the node management information to each other monitoring server corresponding to the server list according to the master node list, and returning the list of other management nodes and the list of master nodes according to the received node list query information. The invention enhances the autonomous response capability of the system when facing global state inquiry, realizes the seamless migration of monitoring responsibilities and the consistent maintenance of the global state, effectively avoids monitoring blind areas or overlapping, and greatly improves the robustness and the operation and maintenance efficiency of the whole monitoring system under the dynamic environment.
Inventors
- ZHANG YUKUN
- FAN RONGHAI
Assignees
- 云尖信息技术股份有限公司
Dates
- Publication Date
- 20260508
- Application Date
- 20251219
Claims (10)
- 1. A method for distributed adaptive node management, the method deployed at a monitoring server, the method comprising: updating a server list according to the received service heartbeat information sent by other monitoring servers, wherein the server list comprises server identifications of at least one other monitoring server; Updating a management node list according to the received node management messages broadcast by other monitoring servers, wherein the management node list comprises server identifications of at least one other monitoring server and node identifications corresponding to each server identification; updating a master node list according to the received registration information and node heartbeat information sent by the monitored node, wherein the master node list comprises node identifiers, and the monitored node corresponding to the node identifiers is registered in the monitoring server; Broadcasting node management information to each other monitoring server corresponding to the server list according to the master node list; and responding to the received node list query information, and returning the other pipe node list and the master node list.
- 2. The method according to claim 1, wherein the step of updating the server list according to the received service heartbeat information transmitted from the other monitoring servers comprises: receiving service heartbeat information broadcast by other monitoring servers; extracting server identifiers of the other monitoring servers according to the service heartbeat information to obtain a target server identifier; Acquiring the receiving time of the service heartbeat information as the service heartbeat time of the other monitoring servers; Traversing the server list, judging whether the server list comprises the target server identifier or not, and if the server list does not comprise the target server identifier, adding the target server identifier and the service heartbeat time into the server list; traversing the server list, filtering to obtain service heartbeat time which exceeds a preset service timeout threshold from the current time, and deleting the corresponding server identifier from the server list.
- 3. The method of claim 1, wherein the node management message includes server identifiers of other monitoring servers, at least one node identifier to be processed, and a node update type corresponding to the node identifier to be processed; and updating the list of other management nodes according to the received node management messages broadcast by other monitoring servers, wherein the step of updating the list of other management nodes comprises the following steps: Judging the node update type as adding nodes or deleting nodes: If the node is the added node, adding the node identifier to be processed in the node identifier corresponding to the server identifier in the other management node list; If the node is deleted, deleting the node identification to be processed from the node identifications corresponding to the server identifications in the other management node list.
- 4. A distributed adaptive node management method according to claim 3, wherein the step of adding the node identifier to be processed in the node identifier corresponding to the server identifier in the list of his management nodes comprises: Judging whether the node management message is up-to-date in the node management message of the added node type of the node identification to be processed according to the other management node list and the master node list; If yes, deleting the node identification to be processed from the other management node list and the master node list, and adding the node identification to be processed into the node identification corresponding to the server identification in the other management node list.
- 5. The method according to claim 4, wherein each node identifier in the node management message further corresponds to a node registration time based on global timing service; And judging whether the node management message is the latest in the node management message of the added node type of the node identification to be processed according to the other management node list and the master node list, wherein the step of judging whether the node management message is the latest in the node management message of the added node type of the node identification to be processed comprises the following steps: traversing the management node list and the management node list, and searching the corresponding node registration time according to the node identification to be processed as registered time; judging whether the registered time can be found or not, and judging whether the registered time is later than the node registration time or not; if the judging result is affirmative, judging that the node management message is not the latest; otherwise, judging the node management message as the latest.
- 6. The method of claim 1, wherein updating the list of hosting nodes based on the received registration information sent by the monitored node comprises: Receiving registration information sent by a monitored node, wherein the registration information comprises hardware information of the monitored node and a node identifier; acquiring the current time as node registration time; and adding the node identification and the node registration time to the master node list.
- 7. The method of claim 1, wherein the step of updating the master node list based on the received node heartbeat information sent by the monitored node comprises: Receiving node heartbeat information sent by a monitored node, wherein the node heartbeat information comprises a node identifier; Acquiring the receiving time of the node heartbeat information as the node heartbeat time of the monitored node; updating the last heartbeat time corresponding to the node identifier in the master node list according to the node heartbeat time; traversing the master node list, filtering to obtain the last heartbeat time which is more than a preset node timeout threshold from the current time, and deleting the corresponding node identifier from the master node list.
- 8. A method of distributed adaptive node management according to claim 1, wherein the method further comprises: Responding to received node control information, wherein the node control information comprises a node identifier to be controlled; According to the other management node list and the management node list, determining a monitoring server corresponding to the node identification to be controlled as a target monitoring server; Determining whether the target monitoring server is itself: if yes, processing the node control information; Otherwise, the node control information is sent to the target monitoring server.
- 9. A distributed adaptive node management system, comprising: The first updating module is used for updating a server list according to the received service heartbeat information sent by other monitoring servers, wherein the server list comprises server identifications of at least one other monitoring server; The system comprises a second updating module, a first updating module and a second updating module, wherein the second updating module is used for updating a management node list according to received node management messages broadcast by other monitoring servers, and the management node list comprises server identifications of at least one other monitoring server and node identifications corresponding to each server identification; The system comprises a monitoring server, a third updating module, a first updating module and a second updating module, wherein the third updating module is used for updating a master node list according to the received registration information and node heartbeat information sent by the monitored node, the master node list comprises node identifiers, and the monitored node corresponding to the node identifiers is registered in the monitoring server; The broadcasting module is used for broadcasting node management information to each other monitoring server corresponding to the server list according to the master node list; And the response module is used for responding to the received node list query information and returning the other pipe node list and the master node list.
- 10. An electronic device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the distributed adaptive node management method according to any of claims 1to 8 when the computer program is executed.
Description
Distributed self-adaptive node management method, system and equipment Technical Field The invention relates to the field of intelligent operation and maintenance, in particular to a distributed self-adaptive node management method, system and equipment. Background In the current information age, large-scale distributed systems have become the infrastructure supporting various internet services, cloud computing platforms, and enterprise core applications. In such systems, there are a large number of monitored nodes and monitoring servers. The monitored node refers to a physical server, virtual machine or intelligent hardware running specific business logic or providing specific services, which are the actual load bearing units of the system functions. The monitoring server is a core management entity which is specially used for collecting, processing and analyzing the state data of the monitored nodes. The centralized monitoring of the scattered and huge monitored nodes can help operation and maintenance personnel to quickly locate performance bottlenecks, analyze fault root causes, provide data support for capacity planning and resource optimization, realize advanced management functions such as automatic operation and maintenance, elastic expansion and the like, and are core links for constructing a stable, efficient and intelligent modern operation and maintenance system. Currently, a centralized or hierarchical static architecture is generally adopted in a distributed node monitoring management method common in the industry. In this mode, each server instance in the monitoring server cluster typically works independently or the monitoring responsibilities are divided by a pre-configured fixed master-slave relationship. Each monitoring server is only responsible for managing the monitored nodes which are pre-allocated or registered on the monitoring server, and maintaining the state information of the nodes managed by the monitoring server. However, in such static architecture, each monitoring server has only a partial view, and when the operation and maintenance platform needs to acquire the state of the nodes of the whole network, perform load balancing across server domains or fault root cause analysis, a centralized coordination component must be additionally introduced or initiate complex multi-round inter-server query aggregation. This introduces not only single point failure and performance bottleneck risks, but also significantly increases system complexity and response delay. Moreover, when the monitoring server cluster itself dynamically changes or network partitions appear, the monitoring responsibilities of static configuration cannot be automatically migrated and re-equalized, so that monitoring blind areas or overlapping are caused, and the robustness and operation and maintenance efficiency of the whole monitoring system are seriously reduced. Disclosure of Invention In order to solve the problems, the invention provides a distributed self-adaptive node management method, a system and equipment. The first aspect of the present invention discloses a distributed adaptive node management method, which is deployed in a monitoring server and comprises: updating a server list according to the received service heartbeat information sent by other monitoring servers, wherein the server list comprises server identifications of at least one other monitoring server; Updating a management node list according to the received node management messages broadcast by other monitoring servers, wherein the management node list comprises server identifications of at least one other monitoring server and node identifications corresponding to each server identification; updating a master node list according to the received registration information and node heartbeat information sent by the monitored node, wherein the master node list comprises node identifiers, and the monitored node corresponding to the node identifiers is registered in the monitoring server; Broadcasting node management information to each other monitoring server corresponding to the server list according to the master node list; and responding to the received node list query information, and returning the other pipe node list and the master node list. Further, the step of updating the server list according to the received service heartbeat information sent by the other monitoring servers includes: receiving service heartbeat information broadcast by other monitoring servers; extracting server identifiers of the other monitoring servers according to the service heartbeat information to obtain a target server identifier; Acquiring the receiving time of the service heartbeat information as the service heartbeat time of the other monitoring servers; Traversing the server list, judging whether the server list comprises the target server identifier or not, and if the server list does not comprise the target server identifier, adding the target serve