CN-122020127-A - Time sequence anomaly detection method based on mixed expert frame

CN122020127ACN 122020127 ACN122020127 ACN 122020127ACN-122020127-A

Abstract

The application provides a time sequence anomaly detection method based on a mixed expert framework. The mixed expert anomaly detection framework solves the problem that a single model is difficult to adapt to a dynamic heterogeneous time sequence mode through cooperation of a shared expert and a vertical expert of attention gating, realizes balance between calculation efficiency and detection accuracy by dynamically adjusting the number of active experts based on resource load, can be quickly adapted by a two-stage transfer learning strategy (pre-training and fine-tuning) only by a small amount of target scene data, avoids the cost of repeated modeling of each scene, ensures that the experts utilize balance by a balance loss function, and prevents model degradation. Finally, the method is obviously superior to clustering fine tuning and large model schemes in cross-scene universality, instantaneity and stability.

Inventors

CHEN PENGFEI
HUANG WEIWEI
Tan Gou
ZHENG ZIBIN

Assignees

中山大学

Dates

Publication Date: 20260512
Application Date: 20260213

Claims (10)

1. A time series anomaly detection method based on a hybrid expert framework, the method comprising: Acquiring an original time sequence, and preprocessing the original time sequence to obtain a target time sequence; inputting the target time sequence into a mixed expert frame, and determining an anomaly score corresponding to the target time sequence; the mixed expert framework comprises a first branch and a second branch, wherein the first branch determines sharing scores through sharing expert modules, the second branch determines a preset number of vertical expert modules through an attention gating network, and corresponding vertical scores are determined through the vertical expert modules; And determining the abnormal condition of the target time point in the target time sequence according to the magnitude relation between the abnormal score and a preset abnormal threshold value.
2. The method of claim 1, wherein the attention gating network is configured to: Converting the target time sequence into query projections through a first matrix, and determining expert keys corresponding to the vertical expert modules; determining the attention score of each vertical expert module according to the query projection, the expert key and the time sequence length of the target time sequence; and selecting a preset number of vertical expert modules according to the attention score.
3. The method of claim 2, wherein said determining the attention score of each of said vertical expert modules based on the query projections and the expert keys, and the time series length of the target time series, comprises: Calculating to obtain a first attention factor according to the vector product of the query projection and the expert key and the time sequence length of the target time sequence; Calculating attention scores of the vertical expert modules according to the first attention factors and the second attention factors which are obtained through training and correspond to the vertical expert modules; wherein the second attention factor is used for indicating the static preference degree of each vertical expert module on the target time sequence; And, the method further comprises: Normalizing the attention score of each selected vertical expert module to be used as a weight parameter of each selected vertical expert module; wherein the weight parameter is used for determining a vertical score of the corresponding vertical expert module.
4. The method according to claim 2, wherein the preset number is dynamically configured according to a resource load condition, specifically including: the method comprises the steps of monitoring the loading rate and the memory occupancy rate of a processor carrying each vertical expert module in real time, and executing dynamic configuration judgment once every preset period based on the loading rate or the memory occupancy rate of the processor: when the processor load rate is higher than a first preset threshold, reducing the preset quantity; And when the memory occupancy rate is lower than a second preset threshold value, the preset quantity is lifted.
5. The method of any one of claims 1-4, wherein the training process of the hybrid expert framework comprises: the first stage, jointly updating parameters of the shared expert module, the vertical expert module and the attention gating network on a source scene data set; And a second stage, freezing the parameters of the shared expert module on the target scene data set corresponding to the target time sequence, and updating the parameters of the vertical expert module and the attention gating network.
6. The method of claim 5, wherein the loss function of the training process comprises a shared expert module loss function, a vertical expert module loss function, and a balanced loss function; The balance loss function is used for indicating the comprehensive importance condition of each vertical expert module in a preset time window, and is obtained by calculating after the super-parameter coefficient is adjusted according to the sum of products of the selected frequencies and the weight fractions of each vertical expert module in the preset time window.
7. A time series anomaly detection device based on a hybrid expert framework, the device comprising: The acquisition module is used for acquiring an original time sequence, preprocessing the original time sequence and acquiring a target time sequence; the processing module is used for inputting the target time sequence into a mixed expert framework and determining an abnormality score corresponding to the target time sequence; the mixed expert framework comprises a first branch and a second branch, wherein the first branch determines sharing scores through sharing expert modules, the second branch determines a preset number of vertical expert modules through an attention gating network, and corresponding vertical scores are determined through the vertical expert modules; The processing module is further configured to determine an abnormal condition of the target time point in the target time sequence according to the magnitude relation between the abnormality score and a preset abnormality threshold.
8. The apparatus of claim 7, wherein the attention gating network is to: Converting the target time sequence into query projections through a first matrix, and determining expert keys corresponding to the vertical expert modules; determining the attention score of each vertical expert module according to the query projection, the expert key and the time sequence length of the target time sequence; and selecting a preset number of vertical expert modules according to the attention score.
9. A computer device comprising one or more processors and a memory having stored therein computer readable instructions which, when executed by the one or more processors, perform the steps of the method of any of claims 1-6.
10. A storage medium having stored therein computer readable instructions which, when executed by one or more processors, cause the one or more processors to perform the steps of the method of any of claims 1-6.

Description

Time sequence anomaly detection method based on mixed expert frame Technical Field The application relates to the field of artificial intelligence, in particular to a time sequence anomaly detection method based on a mixed expert frame. Background Time series analysis is intended to mine the regularity (e.g., trending, periodicity, abnormal fluctuations) of data over time for predicting, monitoring, or diagnosing system conditions. The core is to build a mathematical relation model of historical data and future states. The time sequence anomaly detection is a function of learning a mapping of the input sequence to the tag sequence, so that in practical application, the anomaly value in the input sequence is determined through the value of the tag sequence. In the abnormal detection of the time sequence, the time sequence in the real scene has dynamic behaviors and heterogeneous modes, a common single model is difficult to adapt to multiple modes, and meanwhile, the time sequences in different scenes have different distributions, so that the cost is greatly increased by training a neural network model for each scene independently. In some possible embodiments, the method for detecting abnormal transition learning across scenes may include a clustering method or a method based on a large model architecture, where the clustering method is to cluster according to a time sequence mode, and then fine-tune a model for each cluster, and the large model architecture is based on a transform principle, and uses the generalization capability of the large model to perform pre-training and fine-tuning across scenes. The model fine tuning after clustering can improve adaptability, but can increase calculation cost, is highly sensitive to clustering quality, and can cause detection performance reduction due to misclassified time sequences, and the large model architecture has high calculation cost, high memory consumption and large parameter storage, and is not suitable for a resource-limited or real-time monitoring system. Therefore, a time sequence anomaly detection method is needed, and the method can be combined with a mechanism of a hybrid expert model to be used as a general time sequence anomaly detection model framework, so that the problems of insufficient stability, universality and application efficiency of the current model are solved. Disclosure of Invention The present application aims to solve at least one of the above technical drawbacks, and in particular, the technical drawbacks of the prior art, such as insufficient stability, versatility and application efficiency of the time sequence anomaly detection model framework. In a first aspect, the present application provides a method for detecting a time series anomaly based on a hybrid expert framework, the method comprising: Acquiring an original time sequence, and preprocessing the original time sequence to obtain a target time sequence; inputting the target time sequence into a mixed expert frame, and determining an anomaly score corresponding to the target time sequence; the mixed expert framework comprises a first branch and a second branch, wherein the first branch determines sharing scores through sharing expert modules, the second branch determines a preset number of vertical expert modules through an attention gating network, and corresponding vertical scores are determined through the vertical expert modules; And determining the abnormal condition of the target time point in the target time sequence according to the magnitude relation between the abnormal score and a preset abnormal threshold value. As an alternative embodiment, the attention gating network is configured to: Converting the target time sequence into query projections through a first matrix, and determining expert keys corresponding to the vertical expert modules; determining the attention score of each vertical expert module according to the query projection, the expert key and the time sequence length of the target time sequence; and selecting a preset number of vertical expert modules according to the attention score. As an alternative embodiment, said determining the attention score of each of the vertical expert modules according to the query projection and the expert key, and the time series length of the target time series, includes: Calculating to obtain a first attention factor according to the vector product of the query projection and the expert key and the time sequence length of the target time sequence; Calculating attention scores of the vertical expert modules according to the first attention factors and the second attention factors which are obtained through training and correspond to the vertical expert modules; wherein the second attention factor is used for indicating the static preference degree of each vertical expert module on the target time sequence; And, the method further comprises: Normalizing the attention score of each selected vertical expert module to