Search

CN-122019229-A - Smart park big data cluster memory leakage checking method, system, equipment and medium

CN122019229ACN 122019229 ACN122019229 ACN 122019229ACN-122019229-A

Abstract

The invention relates to a method, a system, equipment and a medium for checking memory leakage of a big data cluster in an intelligent park, wherein the positioning method comprises the steps of screening abnormal nodes; the method comprises the steps of judging whether an abnormal node is abnormal in system-level memory or application-level memory through cross verification of memory occupation conditions of the abnormal node by a host monitoring tool and operating system commands, reversely searching corresponding application identifiers according to process identifiers with high memory occupation on the abnormal node of the application-level memory abnormality, and carrying out space-time correlation analysis on the application corresponding to the searched application identifiers. According to the method, the system, the equipment and the medium for checking the leakage of the large data cluster memory in the intelligent park, through multi-source data cross-validation and hierarchical progressive analysis, the accurate positioning and the efficient treatment of the memory leakage problem are realized.

Inventors

  • Huang Lingdi

Assignees

  • 上海张江智荟科技有限公司

Dates

Publication Date
20260512
Application Date
20260115

Claims (9)

  1. 1. The method for checking the leakage of the large data cluster memory in the intelligent park is characterized by comprising the following steps: step S1, screening nodes which have the utilization rate of the system memory exceeding a preset threshold value and have the continuous growth trend of a system memory utilization curve from a cluster monitoring system as abnormal nodes; Step S2, for the abnormal node, cross-verifying the memory occupation condition of the abnormal node through a host monitoring tool and an operating system command to judge the abnormal node as system-level memory abnormality or application-level memory abnormality; step S3, based on an operating system command, acquiring a process identifier with high memory occupation on an abnormal node with abnormal application-level memory, and reversely searching a corresponding application identifier according to the process identifier; And S4, performing space-time correlation analysis on the application corresponding to the searched application identifier, wherein the space-time correlation analysis comprises the steps of checking whether the abnormal condition of an operation log of the application is met or not and whether the operation time period of the application coincides with the time point when the abnormal node memory starts to abnormally grow, so that the application causing memory leakage is positioned.
  2. 2. The method for checking the memory leakage of the big data cluster in the intelligent park according to claim 1, wherein the preset threshold value of the system memory utilization rate of the node is 80% -90%, the memory utilization curve of the node is quantized through linear regression within a certain time, and when the cluster is the YARN cluster, the method for defining the node in the YARN cluster as the abnormal node is that the memory utilization rate of the YARN layer exceeds the preset threshold value, the system memory utilization rate exceeds the preset threshold value, and the system memory utilization curve is in a continuous increasing trend.
  3. 3. The smart campus big data cluster memory leak check method of claim 1, wherein the cross-validation in step S2 comprises: comparing the total memory usage of the host collected by the monitoring system with the sum of all process resident memories calculated by the operating system command top; and when the total memory usage of the host is significantly greater than the sum of the resident memory of all the processes, determining that the abnormal node is an abnormal node caused by the memory management problem of the system kernel layer.
  4. 4. The method for checking memory leakage of a big data cluster in an intelligent park according to claim 1, wherein in step S3, the application identifier is reversely searched according to the process identifier, specifically: Inquiring command line information corresponding to the process identifier, and analyzing application identifiers distributed by the framework from the command line information.
  5. 5. The method for checking leakage of the large data cluster memory of the intelligent park according to claim 1, wherein the space-time correlation analysis in the step S4 comprises time correlation and space correlation, the time correlation analysis application operates in a time period which is consistent with a time point when the node memory starts to abnormally grow, and the space correlation check application operation log has abnormal information related to the memory.
  6. 6. The smart campus big data cluster memory leak check method of claim 1, further comprising: And repeatedly executing the steps S1-S4 on a plurality of abnormal nodes of the cluster, screening out common applications which exist on the plurality of nodes at the same time, and taking the common applications as key suspected objects.
  7. 7. A system for checking leakage of a large data cluster memory of an intelligent park, which is used for realizing the checking leakage of the large data cluster memory of the intelligent park according to any one of claims 1 to 6, and is characterized by comprising: The node screening module is used for screening out nodes, which are used for screening out that the utilization rate of the system memory exceeds a preset threshold value and the utilization curve of the system memory is in a continuous increasing trend, from the cluster monitoring system as abnormal nodes; The cross verification module is used for cross verifying the memory occupation condition of the nodes screened by the node screening module through a host monitoring tool and an operating system command so as to judge that the abnormal node is abnormal in system-level memory or abnormal in application-level memory; The process application association module is used for acquiring a process identifier with high memory occupation on an abnormal node with abnormal application-level memory based on an operating system command, and reversely searching a corresponding application identifier according to the process identifier; the space-time analysis engine is used for performing space-time correlation analysis on the application corresponding to the application identifier searched by the process application correlation module, and comprises the steps of checking whether the abnormal condition of an operation log of the application is met or not and whether the operation time period of the application coincides with the time point when the abnormal node memory starts to abnormally grow, so that the application causing memory leakage is positioned.
  8. 8. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the method for locating clustered memory exception operations according to any one of claims 1-6 when executing the program.
  9. 9. A computer readable storage medium having a computer program stored thereon, wherein the computer program when executed by a processor performs the steps of the smart campus big data cluster memory leak investigation method according to any of the claims 1-6.

Description

Smart park big data cluster memory leakage checking method, system, equipment and medium Technical Field The invention belongs to the technical field of computers, and particularly relates to a method, a system, equipment and a storage medium for checking memory leakage of a big data cluster in an intelligent park. Background With the rapid development of new generation information technologies such as internet of things and cloud computing, the construction of an intelligent park has become an important way for improving management efficiency, service level and industry gathering capability of the park. The intelligent park realizes comprehensive perception and real-time monitoring of elements such as people, vehicles, things, facilities, environments and the like in the park by deploying a large number of sensors, monitoring equipment and management systems. This process produces massive, multi-source, heterogeneous real-time data that constitutes the "blood" of the intelligent operation of the campus. Undoubtedly, the value mining of these massive data is highly dependent on modern big data technology. By constructing a big data platform, the intelligent application such as energy optimization, security and protection early warning, traffic guiding, facility predictive maintenance and the like can be realized by collecting, storing, cleaning, integrating and analyzing the park operation data, and finally, a strong data support is provided for scientific decisions of park managers. Thus, the big data platform is the "wisdom brain" and core infrastructure of the wisdom park. However, with the explosive growth of data volume and the increasing complexity of data processing in parks, the underlying big data infrastructure often employs a distributed cluster architecture (e.g., hadoop, spark, etc.) to meet the high-concurrency, high-throughput computing and storage requirements. Under such architecture, the stability and reliability of the system is directly dependent on the health of the individual nodes in the cluster. In actual operation, especially in a large-scale and high-load intelligent park application scenario, the problem of node disconnection or heartbeat loss frequently occurs in a large data cluster. Frequent node disconnection can cause partial calculation task failure, data read-write interruption and cluster performance rapid decline, even the avalanche of the whole data processing link can be caused, and the continuity and reliability of each application in the intelligent park are seriously threatened. Therefore, how to effectively diagnose, early warn and solve the frequent disconnection problem of the big data cluster nodes becomes a key technical challenge for guaranteeing the stable and efficient operation of the intelligent park. Disclosure of Invention The invention aims to solve the problem that the node is frequently dropped or the heartbeat is lost in the large data cluster in the prior art, and provides a method, a system, equipment and a medium for checking the memory leakage of the large data cluster in an intelligent park, which realize the accurate positioning and the efficient treatment of the memory leakage problem through multi-source data cross-validation and hierarchical progressive analysis. In order to solve the problems, the invention adopts the following technical scheme. The method for checking the leakage of the large data cluster memory in the intelligent park comprises the following steps: Step S1, screening nodes of which the system memory utilization rate exceeds a preset threshold value and the system memory utilization curve is in a continuous growth trend from a cluster monitoring system as abnormal nodes; Step S2, for the abnormal node, cross-verifying the memory occupation condition of the abnormal node through a host monitoring tool and an operating system command to judge the abnormal node as system-level memory abnormality or application-level memory abnormality; step S3, based on an operating system command, acquiring a process identifier with high memory occupation on an abnormal node with abnormal application-level memory, and reversely searching a corresponding application identifier according to the process identifier; And S4, performing space-time correlation analysis on the application corresponding to the searched application identifier, wherein the space-time correlation analysis comprises the steps of checking whether the abnormal condition of an operation log of the application is met or not and whether the operation time period of the application coincides with the time point when the abnormal node memory starts to abnormally grow, so that the application causing memory leakage is positioned. According to the intelligent park big data cluster memory leakage checking method, the preset threshold value of the system memory utilization rate of the node is 80% -90%, the memory utilization curve of the node is quantized through linear regression w