CN-121997209-A - Data anomaly detection method, apparatus, device, storage medium, and program product
Abstract
The invention discloses a data anomaly detection method, a device, equipment, a storage medium and a program product, which are used for carrying out anomaly detection on data to be evaluated by using various models, combining various data anomaly detection technologies, comprehensively considering statistical characteristics, long-term trends, seasonal changes and accumulated changes, improving the comprehensiveness, accuracy and robustness of anomaly detection, and effectively solving the problem that a single method cannot cope with complex data modes. In addition, a weight is distributed to each evaluation model according to the importance or performance of the evaluation model based on historical data, and a target evaluation result is selected from a plurality of candidate evaluation results through a weighted voting mechanism, so that false alarm and false alarm can be effectively reduced, the accuracy and reliability of anomaly detection are improved, the dependence on a single method or a threshold value is reduced, and stronger real-time adaptability is provided under the condition of dynamic change of a data stream.
Inventors
- LONG JIE
- PAN WEI
- BAI YANG
- LIU JIALIN
- WANG TIANZHU
- ZHONG JIANGYING
- LI QIANG
- ZHANG BIN
- CHI YONG
- CHEN JING
- ZHANG JIN
- LIU MINGYI
Assignees
- 中国移动通信集团设计院有限公司
- 中国移动通信集团有限公司
Dates
- Publication Date
- 20260508
- Application Date
- 20241106
Claims (13)
- 1. A data anomaly detection method, comprising: Acquiring data to be evaluated and historical data; inputting the data to be evaluated into at least two evaluation models to obtain at least two candidate evaluation results; Obtaining corresponding confidence coefficient according to the correct detection times of each evaluation model to the historical data, and determining model weight of the evaluation model by utilizing the confidence coefficient; and screening a target evaluation result from at least two candidate evaluation results according to the model weight, wherein the target evaluation result is the detection result of the data to be evaluated.
- 2. The data anomaly detection method of claim 1, wherein the at least two evaluation models are any two or three of a moving average model, a time series prediction model, and an accumulation and monitoring model.
- 3. The data anomaly detection method of claim 2, wherein when the moving average model is included in at least two evaluation models, the step of generating corresponding candidate evaluation results by the moving average model includes: calculating a moving average and a standard deviation of the historical data; Determining an evaluation range according to the moving average and the standard deviation; and comparing the data to be evaluated with an evaluation range to generate corresponding candidate evaluation results.
- 4. A data anomaly detection method as claimed in claim 3, wherein the evaluation range consists of a first upper limit value and a first lower limit value, and wherein the determining the evaluation range from the moving average and the standard deviation comprises: determining the first upper limit value according to the moving average value, the standard deviation and a preset first fixed value; and determining the first lower limit value according to the moving average value, the standard deviation and a preset second fixed value.
- 5. The data anomaly detection method of claim 2, wherein when the time series prediction model is included in at least two evaluation models, the step of generating the corresponding candidate evaluation result by the time series prediction model includes: Inputting the history data into the time sequence prediction model so that the time sequence prediction model outputs a predicted value; Calculating an error value between the predicted value and the data to be evaluated; And comparing the error value with a preset error threshold value to generate a corresponding candidate evaluation result.
- 6. The data anomaly detection method of claim 2, wherein when the accumulation and monitoring model is included in at least two evaluation models, the step of generating corresponding candidate evaluation results by the accumulation and monitoring model includes: determining a reference value and a deviation range according to the historical data; calculating a deviation value between the data to be evaluated and the reference value; Adding the deviation value into a reference accumulation sum to obtain a corresponding target accumulation sum, wherein the reference accumulation sum is the deviation value calculated when the last data to be evaluated is evaluated; And comparing the target accumulated sum with the deviation range to generate a corresponding candidate evaluation result.
- 7. The method of claim 6, wherein the reference running sum comprises a reference positive running sum and a reference negative running sum, the target running sum comprises a target positive running sum and a target negative running sum, the deviation range is composed of a second upper limit value and a second lower limit value, and the comparing the target running sum with the deviation range to generate the corresponding candidate evaluation result comprises: Comparing the target positive running sum with the second upper limit value and comparing the target negative running sum with the second lower limit value to generate a corresponding candidate evaluation result.
- 8. The method for detecting data anomalies according to claim 1, wherein the obtaining a corresponding confidence level based on the number of correct detections of the historical data by each of the evaluation models includes: Acquiring the correct detection times of each evaluation model on the historical data; And taking the ratio of the correct detection times to the total detection times as the confidence of the current evaluation model.
- 9. The data anomaly detection method of claim 8, wherein the method further comprises: calculating a confidence coefficient difference value between the current confidence coefficient and the initial confidence coefficient; And updating the model weight by using the confidence difference value.
- 10. A data anomaly detection device, comprising: The data acquisition module is used for acquiring data to be evaluated and historical data; The candidate evaluation result generation module is used for inputting the data to be evaluated into at least two evaluation models to obtain at least two candidate evaluation results; The model weight determining module is used for obtaining corresponding confidence coefficient according to the correct detection times of each evaluation model to the historical data, and determining the model weight of the evaluation model by utilizing the confidence coefficient; And the detection result generation module is used for screening a target evaluation result from at least two candidate evaluation results according to the model weight, and taking the target evaluation result as the detection result of the data to be evaluated.
- 11. A data anomaly detection device comprising a processor, a memory and a computer program stored in the memory and configured to be executed by the processor, the processor implementing the data anomaly detection method of any one of claims 1 to 9 when the computer program is executed.
- 12. A computer readable storage medium, characterized in that the computer readable storage medium comprises a stored computer program, wherein the computer program, when run, controls a device in which the computer readable storage medium is located to perform the data anomaly detection method according to any one of claims 1 to 9.
- 13. A computer program product comprising computer instructions which, when executed by a processor, implement a data anomaly detection method as claimed in any one of claims 1 to 9.
Description
Data anomaly detection method, apparatus, device, storage medium, and program product Technical Field The present invention relates to the field of data processing, and in particular, to a method, apparatus, device, storage medium, and program product for detecting data anomalies. Background With the wide application of artificial intelligence technology, the use of time series data in various industries is more and more popular, and time series data processing is an important application field of artificial intelligence technology, for example, by monitoring service data continuously generated in the operation process of a service system, abnormality can be found and processed in time, and the operation stability of the service system is ensured. The prior art for identifying abnormal conditions of time sequence data in a service is generally carried out in a manner that a time sequence numerical value of each day is calculated, an average value of historical data is calculated as a contemporaneous average value, then a difference value between the data and the contemporaneous average value is calculated, a threshold value is manually set according to service experience, and if the difference value exceeds the threshold value, the observed value is marked as an abnormal value. Although the existing mode can identify abnormal data, the influence of data noise can be received when the contemporaneous mean value is calculated, so that the abnormality detection is inaccurate, the noise and the actual abnormality cannot be effectively distinguished, and the possibility of false alarm and false omission is increased. Disclosure of Invention The embodiment of the invention aims to provide a data anomaly detection method, a device, equipment, a storage medium and a program product, which can effectively reduce false alarm and missing alarm and improve the accuracy of data anomaly detection. In order to achieve the above object, an embodiment of the present invention provides a data anomaly detection method, including: Acquiring data to be evaluated and historical data; inputting the data to be evaluated into at least two evaluation models to obtain at least two candidate evaluation results; Obtaining corresponding confidence coefficient according to the correct detection times of each evaluation model to the historical data, and determining model weight of the evaluation model by utilizing the confidence coefficient; and screening a target evaluation result from at least two candidate evaluation results according to the model weight, wherein the target evaluation result is the detection result of the data to be evaluated. As an improvement of the above-described aspect, the at least two evaluation models are any two or three of a moving average model, a time series prediction model, and an accumulation and monitoring model. As an improvement of the above-described aspect, when the moving average model is included in at least two evaluation models, the step of generating the corresponding candidate evaluation result by the moving average model includes: calculating a moving average and a standard deviation of the historical data; Determining an evaluation range according to the moving average and the standard deviation; and comparing the data to be evaluated with an evaluation range to generate corresponding candidate evaluation results. As an improvement of the above-described aspect, the evaluation range is composed of a first upper limit value and a first lower limit value, and the determination of the evaluation range from the moving average and the standard deviation includes: determining the first upper limit value according to the moving average value, the standard deviation and a preset first fixed value; and determining the first lower limit value according to the moving average value, the standard deviation and a preset second fixed value. As an improvement of the above-described aspect, when the time series prediction model is included in at least two evaluation models, the step of generating the corresponding candidate evaluation result by the time series prediction model includes: Inputting the history data into the time sequence prediction model so that the time sequence prediction model outputs a predicted value; Calculating an error value between the predicted value and the data to be evaluated; And comparing the error value with a preset error threshold value to generate a corresponding candidate evaluation result. As an improvement of the above solution, when the accumulation and monitoring model is included in at least two evaluation models, the step of generating the corresponding candidate evaluation result by the accumulation and monitoring model includes: determining a reference value and a deviation range according to the historical data; calculating a deviation value between the data to be evaluated and the reference value; Adding the deviation value into a reference accumulation sum to obt