Search

CN-120579100-B - Meta-learning anomaly detection method and system under full life cycle of data

CN120579100BCN 120579100 BCN120579100 BCN 120579100BCN-120579100-B

Abstract

The invention discloses a meta learning anomaly detection method and a system under the full life cycle of data, comprising the steps of collecting an original data stream according to a preset sampling frequency to obtain an original data set with a stage label; the method comprises the steps of carrying out null value-zero value cleaning on the model, outputting a coarse screening data set, dividing the model into a plurality of element learning tasks according to life cycle stages, carrying out external circulation and internal circulation iteration by taking the tasks as units, obtaining initialization parameters irrelevant to the stages, generating a stage self-adaptive anomaly detection model, calculating anomaly scores of unlabeled query sets and stream data to be tested, rechecking by combining a median deviation threshold value, and outputting a final anomaly data set. According to the invention, through the meta-learning task division of the full life cycle stage and the model independent meta-learning framework based on the gradient, the problems of obvious data characteristic difference, insufficient generalization capability of an abnormal detection model and difficulty in adapting to the concept drift between stages in different stages of the full life cycle of data are solved.

Inventors

  • WU CHUNQING

Assignees

  • 国投智能(南京)信息科技有限公司

Dates

Publication Date
20260508
Application Date
20250520

Claims (9)

  1. 1. The meta learning anomaly detection method under the full life cycle of data is characterized by comprising the following steps: according to a preset sampling frequency, respectively acquiring original data streams in six stages of generation, transmission, storage, processing, archiving and destruction of a target information system in a data life cycle to obtain an original data set D 0 with a stage label; Performing null-zero value cleaning on the original data set D 0 , and performing out-of-limit filtering according to the adaptive upper and lower limit dynamic thresholds and the quartile range criteria of each life cycle stage to output a coarse screening data set D 1 ; Dividing the coarse screening data set D 1 into a plurality of element learning tasks T i according to life cycle stages, extracting stage feature vectors, a labeling support sample set S i and an unlabeled query set Q i for each task; Adopting a model independent element learning framework based on gradient, and carrying out external loop iteration and internal loop iteration by taking the task T i as a unit to obtain initialization parameters irrelevant to stages ; For the real-time collected target stage data flow, the initialization parameters are utilized The labeling support sample set S i executes K-step gradient updating to generate a stage self-adaptive anomaly detection model; the generation stage self-adaptive anomaly detection model comprises constructing a new support sample buffer area S' of a target stage data stream acquired in real time, and initializing parameters Starting from the point, only thawing the weight of the density estimation module, performing n 1 steps of small-batch gradient update on the new support sample buffer S', and setting the inner loop learning rate as And (2) and Immediately evaluating loss on a miniature verification set V ' synchronously collected with the new support sample buffer area S ' after each step of updating, and stopping updating if the loss is continuously reduced for q times, wherein the model parameter obtained when the updating is stopped is marked as theta ', and the corresponding model is a stage self-adaptive abnormal detection model; Calculating an anomaly score for the unlabeled query set Q i and the stream data to be tested by using the anomaly detection model, rechecking by combining a median deviation threshold Z 1 , and outputting a final anomaly data set A; When the stage distribution drift metric is detected to exceed the preset threshold Z 2 , an incremental meta-learning process is triggered to retrain.
  2. 2. The method for detecting abnormal meta learning under full life cycle of data according to claim 1, wherein said obtaining the original data set D 0 with the phase label comprises: respectively deploying data acquisition agent points in six stages of generation, transmission, storage, processing, archiving and destruction of the target information system; Synchronously triggering sampling for each data acquisition agent point according to the unified sampling time reference, and writing a unique time stamp for each sampling batch; Writing the sampled original data into a temporary storage buffer area, and simultaneously attaching a stage label, a data source node ID, a sampling frequency and a sampling precision; After passing the buffer area consistency check, writing the added original data into the original data set D 0 in batches, and recording the written batch number; The original data set D 0 at least includes a data time stamp, a data value, a data source node ID, a corresponding life cycle stage tag, a sampling frequency, and sampling precision information.
  3. 3. The method for meta learning anomaly detection under a full lifecycle of data according to claim 1 or 2, wherein the output coarse screening dataset D 1 comprises: performing field integrity check on the original dataset D 0 , and eliminating data lines with missing key fields; Filling a missing mark into a field detected as an all-zero or empty string and moving the field into an abnormal candidate table; calculating an adaptive upper and lower limit threshold according to a reference value V ref of a current life cycle stage, and detecting out-of-limit data based on a quarter bit distance criterion; and writing the data samples which are not out of limit and have complete fields into the coarse screening data set D 1 , and moving the out-of-limit samples into the abnormal candidate table.
  4. 4. A method for detecting abnormal meta-learning under a full life cycle of data according to claim 3, wherein dividing the coarse screening dataset D 1 into a plurality of meta-learning tasks T i according to life cycle stages comprises: Setting a sliding window with a fixed length r in each life cycle stage, sliding according to a step length r 1 , and forming a candidate sub-data set by the data in the window and the stage label thereof; filtering the candidate sub-data sets according to a data quantity threshold N min , and reserving the sub-data sets with the data quantity meeting the requirement as a task T i ; A unique task number, phase tag, and window time range metadata are written for each task T i .
  5. 5. The method for detecting abnormal meta learning under a full life cycle of data according to claim 4, wherein extracting a phase feature vector, labeling a support sample set S i and an unlabeled query set Q i for each task includes: The phase feature vector is formed by splicing statistical features, frequency domain features and time sequence features, wherein the statistical features comprise mean values, variances, skewness and kurtosis, the frequency domain features are magnitudes of three main peaks before Fourier spectrum, and the time sequence features comprise autocorrelation coefficients and fluctuation rates; The labeling support sample set S i is selected according to layered sampling, so that a normal sample and a suspected abnormal sample are obtained, and the total sample number |Si| is smaller than or equal to m; The unlabeled query set Q i is composed of task window residual data according to a time sequence, and ensures that Q i and S i are not overlapped and cover a complete window period.
  6. 6. The method for detecting abnormal learning under full life cycle of data according to claim 1, wherein the obtaining stage-independent initialization parameters Comprising: b tasks T i are randomly extracted from each life cycle stage in sequence to form a group of training tasks; For each task T i in the training tasks, performing n-step gradient update on the labeling support sample set S i to obtain temporary model parameters And calculate the temporary model parameters Loss values on corresponding unlabeled query set Q i ; Loss values for all tasks Averaging according to the learning rate of the outer loop For original model parameters Performing primary element gradient update to obtain new model parameters; repeating the steps until the maximum iteration round M is reached; solidifying the finally converged model parameters into phase-independent initialization parameters 。
  7. 7. The method for meta-learning anomaly detection under a full lifecycle of data as claimed in claim 1, wherein the outputting the final anomaly dataset a comprises: calculating an anomaly score for each sample to be tested; Then, judging that the primary screening is abnormal according to a median deviation threshold Z 1 ; Performing phase consistency check on the sample with the primary screening abnormality, and confirming the sample as a final abnormality when the abnormality score exceeds a threshold Z 2 ; The abnormality type, occurrence stage and time stamp of the sample for which abnormality is confirmed are recorded and stored in an abnormality data set A.
  8. 8. A meta learning anomaly detection system under a full life cycle of data, comprising: One or more processors; A memory storing instructions that, when executed by the one or more processors, cause the one or more processors to perform operations comprising the flow of the meta-learning anomaly detection method under a full lifecycle of data as claimed in any one of claims 1 to 7.
  9. 9. A computer-readable medium storing software, wherein the software includes instructions executable by one or more computers, the instructions causing the one or more computers to perform operations comprising the flow of the meta-learning anomaly detection method of any one of claims 1-7 under a full lifecycle of data.

Description

Meta-learning anomaly detection method and system under full life cycle of data Technical Field The invention relates to the technical field of data processing and anomaly detection, in particular to a meta learning anomaly detection method and system under the full life cycle of data. Background With the rapid development of big data, artificial intelligence and information system technologies, the abnormal detection of data at each stage of the life cycle gradually becomes an important means for guaranteeing the data quality and the stable operation of the system, and in recent years, meta learning is used as a machine learning method capable of realizing small sample efficient learning and fast adaptation to new tasks, and is widely focused and researched in the field of abnormal detection. In the existing meta-learning anomaly detection method, static or fixed model parameters are mostly adopted, in an actual production environment, when a system running environment or data characteristics are subjected to conceptual drift, the detection effect of the model is obviously reduced, high detection accuracy is difficult to maintain for a long time, meanwhile, the existing anomaly detection for a multi-stage data life cycle scene lacks a unified stage self-adaptive technical framework, and the model is usually required to be designed for each stage independently, so that development cost is high, response speed is low, and existing knowledge and parameters cannot be shared effectively among stages. CN114239712a discloses an anomaly detection method based on heterogeneous information network element learning architecture, which captures the structural information, heterogeneous characteristics and unlabeled information of heterogeneous information networks through a graphic neural network respectively to realize effective migration among different networks, CN119939445A discloses a few-sample anomaly detection method based on element learning and multi-mode large models, semantic-anomaly network is constructed through vision and language encoders to enhance generalization capability and robustness of cross-class anomaly detection and improve anomaly detection precision, the method has obvious effects in specific fields, but mainly focuses on anomaly detection of static data scenes or fixed class tasks, and does not fully consider the difference characteristics and dynamic changes of data in each stage of a full life cycle, so that the rapid self-adaption problem when the data characteristics in different stages of the data life cycle have obvious differences is difficult to deal with, and the application effect and generalization performance of the method in complex information system scenes are limited. Therefore, development of an anomaly detection technical scheme capable of considering the data feature differences at each stage of the full life cycle and rapidly and adaptively adjusting when the data features drift is needed to meet urgent requirements of real-time, robust and high-precision anomaly detection in a complex information system is needed. In summary, the existing anomaly detection technology has the problems of poor stage generalization capability, untimely response to concept drift and difficult effective multiplexing of life cycle stage knowledge, the invention provides a meta learning anomaly detection method under the full life cycle of data, through meta-learning task division of the full life cycle stage, a gradient-based model-independent meta-learning framework and a stage self-adaptive online updating mechanism, the problems that the data feature difference is obvious in different stages of the full life cycle of data, the generalization capability of an anomaly detection model is insufficient, and the concept drift among stages is difficult to adapt are effectively solved. Disclosure of Invention This section is intended to summarize some aspects of embodiments of the application and to briefly introduce some preferred embodiments, which may be simplified or omitted in this section, as well as the description abstract and the title of the application, to avoid obscuring the objects of this section, description abstract and the title of the application, which is not intended to limit the scope of this application. The present invention has been made in view of the above-described problems occurring in the prior art. In order to solve the technical problems, the invention provides the following technical scheme that according to the preset sampling frequency, original data streams are respectively collected at six stages of generation, transmission, storage, processing, archiving and destruction of a target information system in a data life cycle to obtain an original data set D 0 with a stage label; Performing null-zero value cleaning on the original data set D 0, and performing out-of-limit filtering according to the adaptive upper and lower limit dynamic thresholds and the quartile