Search

CN-118133103-B - Time sequence data anomaly detection model generation method and device and electronic equipment

CN118133103BCN 118133103 BCN118133103 BCN 118133103BCN-118133103-B

Abstract

The disclosure provides a method and a device for generating a time sequence data anomaly detection model and electronic equipment, relates to the technical field of computers, and particularly relates to the technical field of artificial intelligence such as data processing and deep learning. The method comprises the steps of obtaining a plurality of candidate models and a verification data set, wherein each candidate model is used for reconstructing time sequence data, the verification data set comprises sample time sequence data and a label, the label is used for describing whether each data in the sample time sequence data is abnormal, determining first time sequence data corresponding to the reconstructed sample time sequence data of each candidate model and first recall rate corresponding to each candidate model, selecting a target model from the plurality of candidate models according to the first time sequence data, the first recall rate and the verification data set, and generating a time sequence data abnormal detection model according to the target model.

Inventors

  • SUN TING
  • DU YUNING
  • LIU YI
  • ZHAO QIAO
  • HU XIAOGUANG
  • YU DIANHAI
  • MA YANJUN

Assignees

  • 北京百度网讯科技有限公司

Dates

Publication Date
20260505
Application Date
20231220

Claims (20)

  1. 1. A method for generating a time sequence data abnormality detection model comprises the following steps: Acquiring a plurality of candidate models and a verification data set, wherein each candidate model is used for reconstructing time sequence data, the verification data set comprises sample time sequence data and a label, the label is used for describing whether each machine operation data in the sample time sequence data is abnormal, and the sample time sequence data is a data sequence in which the machine operation data obtained by monitoring a sensor are arranged based on time sequence; determining first time sequence data corresponding to the sample time sequence data reconstructed by each candidate model and first recall rate corresponding to each candidate model; selecting a target model from a plurality of candidate models according to the first time sequence data, the first recall rate and the verification data set; And generating a time sequence data abnormality detection model according to the target model, wherein the time sequence data abnormality detection model judges whether the machine has faults or not.
  2. 2. The method of claim 1, wherein the selecting a target model from a plurality of the candidate models based on the first timing data, the first recall, and the validation data set comprises: Determining a plurality of integrated models, wherein each integrated model is integrated by at least one candidate model; based on the first recall rate, fusing the first time sequence data reconstructed by the candidate models in each integrated model to obtain second time sequence data corresponding to each integrated model; Determining a second recall rate corresponding to each integrated model according to the second time sequence data and the verification data set; and determining a candidate model in the integrated model with the highest second recall rate as the target model.
  3. 3. The method of claim 2, wherein the fusing the first time series data reconstructed from the candidate models in each integrated model based on the first recall rate to obtain the second time series data corresponding to each integrated model comprises: adding the first recall rates corresponding to each candidate model in the integrated model to obtain a first numerical value; determining the ratio of the first recall rate corresponding to each candidate model in the integrated model to the first value as a first weight corresponding to each candidate model in the integrated model; And based on the first weight corresponding to each candidate model in the integrated model, fusing the first time sequence data reconstructed by each candidate model in the integrated model to acquire the second time sequence data corresponding to the integrated model.
  4. 4. The method of claim 2, wherein after determining the second recall corresponding to each of the integrated models from the second time series data and the validation data set, further comprising: updating the integrated model, and determining a second recall rate corresponding to the updated integrated model; and updating the updated integrated model until a preset iteration stop condition is reached, and determining a candidate model in the integrated model with the highest second recall rate as a target model.
  5. 5. The method of claim 4, wherein the updating the integration model comprises: Updating the integrated model based on a genetic algorithm.
  6. 6. The method of claim 4, wherein the updating the integration model comprises at least one of: replacing the candidate model with the lowest first recall rate in the integrated model with other candidate models; Deleting the candidate model with the lowest first recall rate in the integrated model; Adding other candidate models in the integrated model; Either candidate model of the two integrated models is swapped.
  7. 7. The method of claim 4, wherein the iteration stop condition is any one of: the iteration times reach the preset times; The difference between the maximum second recall rate in the nth iteration result and the maximum second recall rate of each iteration result in the previous m iteration results is smaller than a first threshold, wherein m is a positive integer, and n is a positive integer larger than m.
  8. 8. The method of any of claims 1-7, wherein the generating a time series data anomaly detection model from the target model comprises: Adding the first recall rates corresponding to the target models to obtain a second value under the condition that the number of the target models is a plurality of; determining the ratio of the first recall rate corresponding to the target model to the second value as a target weight corresponding to the target model; And integrating the target model into the time sequence data abnormality detection model, wherein the time sequence data abnormality detection model comprises target weights corresponding to the target model.
  9. 9. The method of any of claims 1-7, wherein the determining the first time series data corresponding to the sample time series data reconstructed for each candidate model and the first recall corresponding to each candidate model comprises: Inputting the sample time sequence data into each candidate model to acquire the first time sequence data reconstructed by each candidate model; Determining a difference value between each of the first time series data and corresponding data in the sample time series data; Determining any one data as predicted abnormal data under the condition that a difference value corresponding to the any one data in the first time sequence data is larger than a difference threshold value; determining a first total number of the predicted abnormal data and a second total number of the tag-labeled abnormal data; A ratio of the first total number to the second total number is determined as the first recall.
  10. 10. The method of any of claims 1-7, wherein the obtaining a plurality of candidate models comprises: Acquiring a training data set and a plurality of initial models, wherein the training data set comprises third time sequence data, and each data in the third time sequence data is normal data; training each of the initial models based on the training data set to obtain a plurality of candidate models.
  11. 11. A time series data anomaly detection method, comprising: Acquiring time sequence data to be detected; Inputting the time sequence data to be detected into a time sequence data abnormity detection model to obtain target reconstruction time sequence data corresponding to the time sequence data to be detected, wherein the time sequence data abnormity detection model is generated based on the method of any one of claims 1-10; And determining abnormal data in the time sequence data to be detected according to the difference between the time sequence data to be detected and the target reconstruction time sequence data.
  12. 12. The method of claim 11, wherein the inputting the timing data to be detected into the timing data anomaly detection model to obtain the target reconstructed timing data corresponding to the timing data to be detected comprises: Inputting the time sequence data to be detected into each target model under the condition that the time sequence data abnormality detection model comprises a plurality of target models so as to acquire initial reconstruction time sequence data output by each target model; and fusing the plurality of initial reconstruction time sequence data to acquire the target reconstruction time sequence data.
  13. 13. The method of claim 12, wherein the fusing the plurality of initial reconstruction timing data to obtain the target reconstruction timing data comprises: Obtaining a target weight corresponding to each target model; And fusing the plurality of initial reconstruction time sequence data based on the target weight corresponding to each target model so as to acquire the target reconstruction time sequence data.
  14. 14. A time series data abnormality detection model generation device includes: the acquisition module is used for acquiring a plurality of candidate models and a verification data set, wherein each candidate model is used for reconstructing time sequence data, the verification data set comprises sample time sequence data and a label, the label is used for describing whether each machine operation data in the sample time sequence data is abnormal, and the sample time sequence data is a data sequence in which the machine operation data obtained by monitoring a sensor are arranged based on time sequence; The determining module is used for determining first time sequence data corresponding to the sample time sequence data reconstructed by each candidate model and a first recall rate corresponding to each candidate model; A selection module for selecting a target model from a plurality of candidate models according to the first time sequence data, the first recall and the verification data set; the generation module is used for generating a time sequence data abnormality detection model according to the target model, and the time sequence data abnormality detection model is used for judging whether the machine has faults or not.
  15. 15. The apparatus of claim 14, wherein the selection module is configured to: Determining a plurality of integrated models, wherein each integrated model is integrated by at least one candidate model; based on the first recall rate, fusing the first time sequence data reconstructed by the candidate models in each integrated model to obtain second time sequence data corresponding to each integrated model; Determining a second recall rate corresponding to each integrated model according to the second time sequence data and the verification data set; and determining a candidate model in the integrated model with the highest second recall rate as the target model.
  16. 16. The apparatus of claim 15, wherein the selection module is configured to: adding the first recall rates corresponding to each candidate model in the integrated model to obtain a first numerical value; determining the ratio of the first recall rate corresponding to each candidate model in the integrated model to the first value as a first weight corresponding to each candidate model in the integrated model; And based on the first weight corresponding to each candidate model in the integrated model, fusing the first time sequence data reconstructed by each candidate model in the integrated model to acquire the second time sequence data corresponding to the integrated model.
  17. 17. The apparatus of claim 16, further comprising an update module to: updating the integrated model, and determining a second recall rate corresponding to the updated integrated model; and updating the updated integrated model until a preset iteration stop condition is reached, and determining a candidate model in the integrated model with the highest second recall rate as a target model.
  18. 18. The apparatus of claim 17, wherein the update module is configured to: Updating the integrated model based on a genetic algorithm.
  19. 19. The apparatus of claim 17, wherein the update module is configured to: replacing the candidate model with the lowest first recall rate in the integrated model with other candidate models; Deleting the candidate model with the lowest first recall rate in the integrated model; Adding other candidate models in the integrated model; Either candidate model of the two integrated models is swapped.
  20. 20. The apparatus of claim 17, wherein the iteration stop condition is any one of: the iteration times reach the preset times; The difference between the maximum second recall rate in the nth iteration result and the maximum second recall rate of each iteration result in the previous m iteration results is smaller than a first threshold, wherein m is a positive integer, and n is a positive integer larger than m.

Description

Time sequence data anomaly detection model generation method and device and electronic equipment Technical Field The disclosure relates to the technical field of computers, in particular to the technical field of artificial intelligence such as data processing and deep learning, and specifically relates to a method and a device for generating a time sequence data anomaly detection model and electronic equipment. Background Time series data refers to a set of data arranged in a time series, wherein each data point is associated with a particular point in time or time period. Abnormal points in the time series data refer to points where patterns in the time series data have inconsistencies, such as abrupt rises or falls, trend changes, hierarchical transitions, exceeding historical maximum/minimum values, and the like. Anomaly detection of time series data aims to quickly and accurately find out the anomaly points. Disclosure of Invention The disclosure provides a method and device for generating a time sequence data anomaly detection model and electronic equipment. According to a first aspect of the present disclosure, there is provided a method for generating a time series data anomaly detection model, including: Acquiring a plurality of candidate models and a verification data set, wherein each candidate model is used for reconstructing time sequence data, the verification data set comprises sample time sequence data and a label, and the label is used for describing whether each data in the sample time sequence data is abnormal or not; determining first time sequence data corresponding to the sample time sequence data reconstructed by each candidate model and first recall rate corresponding to each candidate model; selecting a target model from a plurality of candidate models according to the first time sequence data, the first recall rate and the verification data set; And generating a time sequence data abnormality detection model according to the target model. According to a second aspect of the present disclosure, there is provided a time series data anomaly detection method including: Acquiring time sequence data to be detected; inputting the time sequence data to be detected into a time sequence data abnormality detection model to obtain target reconstruction time sequence data corresponding to the time sequence data to be detected, wherein the time sequence data abnormality detection model is generated based on the generation method of the time sequence data abnormality detection model in the first aspect; And determining abnormal data in the time sequence data to be detected according to the difference between the time sequence data to be detected and the target reconstruction time sequence data. According to a third aspect of the present disclosure, there is provided a generation apparatus of a time series data abnormality detection model, including: The acquisition module is used for acquiring a plurality of candidate models and a verification data set, wherein each candidate model is used for reconstructing time sequence data, the verification data set comprises sample time sequence data and a label, and the label is used for describing whether each data in the sample time sequence data is abnormal or not; The determining module is used for determining first time sequence data corresponding to the sample time sequence data reconstructed by each candidate model and a first recall rate corresponding to each candidate model; A selection module for selecting a target model from a plurality of candidate models according to the first time sequence data, the first recall and the verification data set; And the generating module is used for generating a time sequence data abnormality detection model according to the target model. According to a fourth aspect of the present disclosure, there is provided a time series data abnormality detection apparatus including: the first acquisition module is used for acquiring time sequence data to be detected; a second obtaining module, configured to input the timing data to be detected into a timing data anomaly detection model to obtain target reconstructed timing data corresponding to the timing data to be detected, where the timing data anomaly detection model is generated based on the apparatus of any one of claims 14 to 23; The determining module is used for determining abnormal data in the time sequence data to be detected according to the difference between the time sequence data to be detected and the target reconstruction time sequence data. According to a fifth aspect of the present disclosure, there is provided an electronic device comprising: at least one processor, and A memory communicatively coupled to the at least one processor, wherein, The memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of generating a temporal data anomaly detection model as described in the first aspec