Search

CN-116244632-B - Training method and device for time sequence data non-supervision abnormal detection model, and time sequence data non-supervision abnormal detection method and device

CN116244632BCN 116244632 BCN116244632 BCN 116244632BCN-116244632-B

Abstract

Training method and device for time sequence data anomaly detection model, and method and device for time sequence data anomaly detection, and relate to the technical field of time sequence. In order to solve the problem that an unsupervised time sequence has pollution to a data set in the aspect of anomaly detection in the prior art, the technical scheme provided by the invention is that the training method of a time sequence data anomaly detection model comprises the following steps of 1, collecting a time sequence data set as original data, preprocessing the original data to generate disturbance data, and 2, training the detection model according to the disturbance data and the original data. Further, in step 1, the preprocessing includes the operations of data partitioning and random perturbation of the time series data. In step 2, the method for training the detection model specifically includes extracting features of disturbance data and original data, and training a preset model according to the obtained data. The method is suitable for the work of unsupervised time series anomaly detection.

Inventors

  • ZHANG FENGBIN
  • SHENG CHAOYANG
  • HE DONG

Assignees

  • 哈尔滨理工大学

Dates

Publication Date
20260512
Application Date
20230309

Claims (10)

  1. 1. The training method of the time sequence data unsupervised anomaly detection model is characterized by comprising the following steps of: step 1, collecting a time sequence data set of a univariate time sequence, preprocessing the time sequence data, and generating initial time domain data and frequency domain data; step 2, training a detection model according to the frequency domain data and the initial time domain data; In particular, the method comprises the steps of, The time sequence data is composed of a time domain and a frequency domain, then a disturbance mode is randomly carried out, point abnormality, context abnormality and all abnormality are carried out, original data are disturbed, and a disturbed data set and the original data set are placed in a feature extraction network TCN to carry out feature extraction; A data input stage, namely giving a time sequence input, wherein the sequence consists of time domain features and frequency domain features, perturbing the data features according to a certain proportion to generate negative sample data considered by a model, then inputting original data and perturbed data into a TCN feature extraction network at the same time to perform feature extraction, and then inputting the data into the models of an upper part and a lower part for training; In the model training stage, firstly, the upper half model inputs the time sequence data after disturbance and the data after the original data are extracted with characteristics into a comparison learning module, a cosine distance function is contracted in the comparison learning module, and the distance function is applied to the module to detect the abnormality in such a way; The lower half model inputs the perturbed data and the original data into a model calculation Binary CrossEntropy Loss (BCELoss) at the same time, namely, binary cross entropy loss, and the calculation formula is as follows: ; Wherein the method comprises the steps of Is the output of the model, which is the output of the model, Is a real tag.
  2. 2. The method according to claim 1, wherein the preprocessing includes the operations of data normalization, data division and modality conversion for the time series data in step 1.
  3. 3. The method for training the non-supervision abnormal detection model of time series data according to claim 1, wherein in the step 2, the method for training the detection model is specifically that two low-dimensional embeddings are obtained according to the frequency domain data and the initial time domain data, respectively, and the detection model is trained according to the two low-dimensional embeddings.
  4. 4. A training device for an unsupervised anomaly detection model of time series data, the device comprising: the module 1 is used for collecting a time sequence data set of a univariate time sequence, preprocessing the time sequence data and generating initial time domain data and frequency domain data; the module 2 is used for training a detection model according to the frequency domain data and the initial time domain data; In particular, the method comprises the steps of, The time sequence data is composed of a time domain and a frequency domain, then a disturbance mode is randomly carried out, point abnormality, context abnormality and all abnormality are carried out, original data are disturbed, and a disturbed data set and the original data set are placed in a feature extraction network TCN to carry out feature extraction; A data input stage, namely giving a time sequence input, wherein the sequence consists of time domain features and frequency domain features, perturbing the data features according to a certain proportion to generate negative sample data considered by a model, then inputting original data and perturbed data into a TCN feature extraction network at the same time to perform feature extraction, and then inputting the data into the models of an upper part and a lower part for training; In the model training stage, firstly, the upper half model inputs the time sequence data after disturbance and the data after the original data are extracted with characteristics into a comparison learning module, a cosine distance function is contracted in the comparison learning module, and the distance function is applied to the module to detect the abnormality in such a way; The lower half model inputs the perturbed data and the original data into a model calculation Binary CrossEntropy Loss (BCELoss) at the same time, namely, binary cross entropy loss, and the calculation formula is as follows: ; Wherein the method comprises the steps of Is the output of the model, which is the output of the model, Is a real tag.
  5. 5. The apparatus according to claim 4, wherein the preprocessing includes data normalization, data division and mode conversion of the time series data in the module 1.
  6. 6. The training device for the non-supervision abnormal detection model of time series data according to claim 4, wherein the specific implementation manner of training the detection model in the module 2 is to obtain two low-dimensional embeddings according to the frequency domain data and the initial time domain data, respectively, and train the detection model according to the two low-dimensional embeddings.
  7. 7. The method for detecting the unsupervised abnormality of the time sequence data is characterized by comprising the following steps: Step 3, training a detection model; The training detection method is the time sequence data unsupervised anomaly detection model training method of claim 1; step 4, verifying the detection model, and if the verification result meets the preset requirement, performing step 5; and 5, detecting the time sequence data set according to the detection model.
  8. 8. An unsupervised anomaly detection apparatus for time series data, the apparatus comprising: the module 3 is used for training a detection model; the training detection module is the time sequence data unsupervised anomaly detection model training device of claim 4; The module 4 is used for verifying the detection model, and if the verification result meets the preset requirement, the module 5 is executed; And the module 5 is used for detecting the time sequence data set according to the detection model.
  9. 9. A computer storage medium storing a computer program, wherein the computer program stored in the medium is for being read by a computer to perform the time series data unsupervised anomaly detection model training method according to any one of claims 1 to 3 or the time series data unsupervised anomaly detection method according to claim 7.
  10. 10. A computer comprising a processor and a storage medium for storing a computer program, characterized in that when the processor reads the computer program, the computer performs the time series data unsupervised anomaly detection model training method of any one of claims 1 to 3 or the time series data unsupervised anomaly detection method of claim 7.

Description

Training method and device for time sequence data non-supervision abnormal detection model, and time sequence data non-supervision abnormal detection method and device Technical Field The technical field of time sequence, in particular to time sequence data anomaly detection. Background Over the last decades, with the rapid development of informatization, a large amount of time series data has been continuously created. Since the functional status of various target systems, such as large data centers, cloud servers, spacecraft, and even human beings, these time series data are one source, the present invention can monitor and alert the target systems to potential faults, threats, and risks by identifying the abnormal status (i.e., anomalies) of the target systems. Anomaly detection is an important area of data mining and analysis, the purpose of which is to find anomalous data observations that differ significantly from most data that are critical to achieving this goal. Because of the cost and difficulty of labeling the work in these practical applications, time series anomaly detection is generally defined as an unsupervised task that contains unlabeled training data. Unsupervised time series anomaly detection typically relies on the normality of learning data through a class of classifications without supervision signal guidance. However, this learning process faces two key challenges, (1) the existence of unknown anomalies in the training set, and (2) the lack of knowledge about anomalies of interest. In particular, the learning process may deviate from anomalies (i.e., anomalous contamination) that are hidden in the training set, because by directly assuming that all observations in the training set are normal, the entire training set is typically input into a classification model. Abnormal contamination can greatly interfere with the learning process, resulting in a severe overfitting. Furthermore, during the learning process, if the true anomaly is not known, an inaccurate normal boundary may be found, because it is difficult to define the range of normal behavior in this case. Disclosure of Invention In order to solve the problem of pollution of an unsupervised time sequence data set in the aspect of anomaly detection in the prior art, the technical scheme provided by the invention is as follows: the method for training the time sequence data unsupervised anomaly detection model comprises the following steps: step 1, collecting a time sequence data set of a univariate time sequence, preprocessing the time sequence data, and generating initial time domain data and frequency domain data; And step 2, training a detection model according to the frequency domain data and the initial time domain data. Further, there is provided a preferred embodiment, wherein in the step 1, the preprocessing includes operations of data normalization, data division and modality conversion on the time series data. Further, a preferred embodiment is provided, wherein in the step 2, the method for training the detection model specifically includes obtaining two low-dimensional embeddings according to the frequency domain data and the initial time domain data, and training the detection model according to the two low-dimensional embeddings. Based on the same inventive concept, the invention also provides a time sequence data unsupervised anomaly detection model training device, which comprises: the module 1 is used for collecting a time sequence data set of a univariate time sequence, preprocessing the time sequence data and generating initial time domain data and frequency domain data; And the module 2 is used for training a detection model according to the frequency domain data and the initial time domain data. Further, there is provided a preferred embodiment, wherein in the module 1, the preprocessing includes operations of data normalization, data division and modality conversion on the time series data. Further, a preferred embodiment is provided, wherein in the module 2, the training of the detection model is specifically implemented by obtaining two low-dimensional embeddings according to the frequency domain data and the initial time domain data, and training the detection model according to the two low-dimensional embeddings. Based on the same inventive concept, the invention also provides a time sequence data unsupervised anomaly detection method, which comprises the following steps: Step 3, training a detection model; the training detection method is the training method of the time sequence data unsupervised anomaly detection model; step 4, verifying the detection model, and if the verification result meets the preset requirement, performing step 5; and 5, detecting the time sequence data set according to the detection model. Based on the same inventive concept, the invention also provides a time sequence data unsupervised anomaly detection device, which comprises: the module 3 is used for training a detection mode