CN-121996524-A - System health degree assessment method and device and electronic equipment
Abstract
The embodiment of the application relates to the technical field of operation and maintenance monitoring, in particular to a system health degree evaluation method and device and electronic equipment. The method comprises the steps of collecting infrastructure performance data, application performance data and business index data of a target system as original data, carrying out standardized processing on the original data, carrying out data arrangement according to dimensions of technical indexes and business indexes to generate standardized data, calling a business influence quantization model to calculate influence coefficients of each technical index in the standardized data on each business index, carrying out statistics to generate an influence matrix, distributing grading weights of each technical index according to the influence matrix, respectively calculating single index scores based on real-time technical index data in the standardized data, and carrying out weighted calculation on each single index score according to the grading weights to generate a system health grade. The method can accurately quantify the actual influence degree of technical abnormality on the core service, and realize intelligent operation and maintenance operation from pure technical monitoring to service value guarantee.
Inventors
- ZHANG ZENGJUN
- CHEN CUNLI
- ZHANG WEIJIAN
Assignees
- 度小满云智科技(北京)有限公司
- 度小满科技(北京)有限公司
Dates
- Publication Date
- 20260508
- Application Date
- 20251225
Claims (10)
- 1. A method for evaluating system health, comprising: The method comprises the steps of collecting multi-source performance data of a target system as original data, wherein the multi-source performance data comprises infrastructure performance data, application performance data and business index data; After the original data is subjected to standardization processing, data arrangement is carried out according to the dimension of the technical index-the service index to generate standardized data, wherein the standardization processing comprises format standardization and time dimension alignment; invoking a business influence quantization model to calculate influence coefficients of each technical index in the standardized data on each business index, and carrying out statistics to generate an influence matrix; according to the influence matrix, assigning a scoring weight of each technical index; and respectively calculating single index scores based on real-time technical index data in the standardized data, carrying out weighted calculation on each single index score according to the scoring weight, and taking the result obtained by the weighted calculation as a system health score.
- 2. The method for evaluating system health according to claim 1, wherein the method for generating the business impact quantization model comprises: Invoking a machine learning algorithm to extract historical standardized data in the data lake, and analyzing the association relation between technical index fluctuation and service index change; based on the association relation, defining calculation logic of the technical index variation and the business index variation, and constructing an initial model; Storing the real-time standardized data to the data lake; and periodically extracting real-time incremental data from the data lake, updating model parameters in the initial model according to the real-time incremental data, and taking the initial model with optimized parameters as the business influence quantization model.
- 3. The method for evaluating the health degree of a system according to claim 1, wherein after the normalization processing is performed on the raw data, data arrangement is performed according to dimensions of a technical index-a business index, comprising: Invalid data rejection and noise data filtering are carried out on the original data, and the generated data are used as preprocessing data; converting the preprocessed data according to a preset standardized format, and sorting the preprocessed data according to time to generate time sequence data; and carrying out data association on technical index data and business index data in the same time window in the time sequence data, and storing the technical index data and the business index data according to the association dimension classification of the technical index and the business index as the standardized data.
- 4. The method of claim 1, wherein the collecting multi-source performance data of the target system comprises: collecting server CPU utilization rate, memory utilization rate, network throughput, database inquiry time consumption and middleware service state of an infrastructure layer and a technical component layer as the infrastructure performance data; collecting response time, error rate and calling frequency of a key interface of an application layer as the application performance data; And collecting the transaction success rate, order stream, payment time consumption and user activity of the business system as the business index data.
- 5. The method according to any one of claims 1 to 4, characterized by further comprising, after weighting calculation of each of the single-index scores according to the scoring weight: taking the result obtained by the weighted calculation as a preliminary health degree score; Acquiring business index data in the standardized data; Calculating the deviation direction and the deviation degree of the business index data and the corresponding normal threshold value; adjusting the preliminary health score according to the direction of deviation and the degree of deviation; And taking the weighted calculation result as a system health degree score, specifically taking the adjusted score as the system health degree score.
- 6. The method of claim 5, wherein adjusting the preliminary health score based on the direction of departure and the degree of departure comprises: if the deviation direction display business index data is lower than the corresponding normal threshold value, correspondingly and downwards adjusting the preliminary health degree score according to the deviation degree; and if the deviation direction display business index data is higher than the corresponding normal threshold value, correspondingly and upwards adjusting the preliminary health degree score according to the deviation degree.
- 7. The method of assessing the health of a system of claim 5, further comprising, after said adjusting said preliminary health score based on said direction of deviation and said degree of deviation: Reversely positioning a core technical index which causes the preliminary health degree score to change based on the influence matrix; Extracting an influence coefficient and an influence direction corresponding to the core technical index, wherein the influence direction is determined according to the positive value and the negative value of the influence coefficient; estimating the business index variation when the core technical index is abnormal to obtain a business index variation estimated value; And collecting names, influence coefficients, influence directions and the business index change predicted values corresponding to the core technical indexes, and generating an influence factor labeling report.
- 8. The method of claim 7, wherein reverse locating the core technical indicator that resulted in the change in the preliminary health score based on the impact matrix comprises: screening out technical indexes with absolute values of influence coefficients larger than a core index threshold value in the influence matrix as primary selection indexes; Sequencing the primary selection indexes according to the sequence of the absolute values of the influence coefficients from large to small, and selecting the first N indexes, wherein N is a preset positive integer; determining a linkage influence path among all technical indexes, and determining all technical indexes associated with the first N indexes according to the linkage influence path; and taking the first N indexes and the corresponding associated technical indexes as the core technical indexes.
- 9. An evaluation device for system health, comprising: The multi-source acquisition unit is used for acquiring multi-source performance data of a target system as original data, wherein the multi-source performance data comprises infrastructure performance data, application performance data and business index data; The data processing unit is used for carrying out data arrangement according to the dimension of the technical index-service index after carrying out standardization processing on the original data to generate standardized data, wherein the standardization processing comprises format standardization and time dimension alignment; the technical business influence assessment unit is used for calling a business influence quantization model to calculate influence coefficients of each technical index in the standardized data on each business index, and generating an influence matrix through statistics; The weight distribution unit is used for distributing the scoring weight of each technical index according to the influence matrix; and the scoring calculation unit is used for calculating single index scores based on real-time technical index data in the standardized data respectively, carrying out weighted calculation on each single index score according to the scoring weight, and taking the result obtained by the weighted calculation as a system health score.
- 10. An electronic device comprising at least one processor, and a memory communicatively coupled to the at least one processor; Wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of assessing health of a system as claimed in any one of claims 1 to 8.
Description
System health degree assessment method and device and electronic equipment Technical Field The embodiment of the application relates to the technical field of operation and maintenance monitoring, in particular to a system health degree evaluation method and device and electronic equipment. Background Under the background of high-speed development of digital economy, online financial business has become a core support of economic activities by virtue of the characteristics of transaction high-frequency, service real-time and scene diversification, and the stability and reliability of system operation are directly related to transaction success rate, user experience and enterprise core income, so that system health assessment becomes a key link in the field of operation and maintenance monitoring. Currently, operation and maintenance work is being transformed from traditional technical fault repair to service value guarantee, and core complaints are the actual influence of abnormality of a health degree scoring accurate mapping system on core services, so that clear priority guidance is provided for operation and maintenance decisions, and resource mismatch or key service loss caused by technology and service disconnection is avoided. In the prior art, the system health evaluation mainly depends on three schemes, namely, rule scoring based on a static threshold, assigning scores to the intervals of the indexes by presetting a fixed threshold interval for technical indexes such as CPU utilization rate and memory utilization rate and weighting and calculating comprehensive scores, a scoring card model based on fixed weights, presetting weights of all indexes by an expert, obtaining health scores through linear weighting, keeping the weights unchanged during the system operation, and a machine learning model, and monitoring a data training model to perform anomaly detection and scoring according to a historical technology. The prior art has the obvious defects that firstly, the technical view angle is severely disjointed from the service value, the scoring result only reflects the operation state of a technical component, the influence of system abnormality on core services such as transaction success rate, revenue and the like cannot be quantified, so that operation and maintenance teams are difficult to judge the service severity of faults, secondly, the strategy is stiff, the resource allocation is low-efficiency, the fixed threshold value or the fixed weight cannot adapt to the differentiated requirements of service peak periods and non-peak periods, the fault influence of key service associated indexes can be underestimated, or abnormal non-core indexes are excessively alarmed, thirdly, the intelligent level is insufficient, alarm storm is easy to cause, a machine learning model cannot carry out alarm convergence and priority sequencing on multi-index anomalies according to the service influence, and high-efficiency operation and maintenance decision is difficult to support. Therefore, how to make the system health degree score accurately reflect the actual influence of technical anomalies on core business is a problem that needs to be solved by those skilled in the art. Disclosure of Invention The application aims to at least provide a system health evaluation method, a system health evaluation device and electronic equipment, which can accurately quantify the actual influence degree of technical abnormality on core business and realize intelligent operation and maintenance operation from pure technical monitoring to business value guarantee. To solve the above technical problem, at least one embodiment of the present application provides a method for evaluating system health, including: The method comprises the steps of collecting multi-source performance data of a target system as original data, wherein the multi-source performance data comprises infrastructure performance data, application performance data and business index data; After the original data is subjected to standardization processing, data arrangement is carried out according to the dimension of the technical index-the service index to generate standardized data, wherein the standardization processing comprises format standardization and time dimension alignment; invoking a business influence quantization model to calculate influence coefficients of each technical index in the standardized data on each business index, and carrying out statistics to generate an influence matrix; according to the influence matrix, assigning a scoring weight of each technical index; and respectively calculating single index scores based on real-time technical index data in the standardized data, carrying out weighted calculation on each single index score according to the scoring weight, and taking the result obtained by the weighted calculation as a system health score. In one embodiment, the method for generating the business impact quantization model inclu