Search

CN-121997024-A - Cross-domain log anomaly detection method based on contrast learning

CN121997024ACN 121997024 ACN121997024 ACN 121997024ACN-121997024-A

Abstract

The invention belongs to the technical field of operation and maintenance of the electric power Internet of things, in particular relates to a cross-domain log abnormality detection method based on contrast learning, and aims to solve the problems that log distribution difference is large and abnormal modes are difficult to migrate under different systems and operating environments. The method comprises the steps of carrying out field extraction, template analysis and sequence construction on original log data to obtain a log sequence data set, adopting a two-stage enhancement strategy to generate positive samples on the log sequence data set, combining a cross-domain log sequence and an abnormal log sequence to construct negative samples to form positive and negative sample pairs, inputting the positive and negative sample pairs into an encoder, carrying out pre-training on the encoder by adopting a combined loss function for comparing learning loss and classifying learning loss and weighting summation to obtain a universal representation, freezing the main parameters of the pre-trained encoder, introducing a condition sensing adapter into the encoder layer, carrying out target domain adaptation optimization on the universal representation through the classifying learning updating adapter and related gating parameters, and outputting a log abnormal detection result.

Inventors

  • YANG ZHAOFEI
  • LI RONGSHENG
  • WU YAO
  • XU CHENGYU
  • SUN HAICHUAN
  • LI JIACHAO
  • Men Liyan
  • GAO WEI
  • JING FENG
  • LI WEIBO
  • HUANG DACHENG
  • WU HANWEI
  • HE JING
  • ZHOU XIN
  • DUAN XIAORONG

Assignees

  • 国网山西省电力有限公司信息通信分公司

Dates

Publication Date
20260508
Application Date
20260127

Claims (10)

  1. 1. The cross-domain log anomaly detection method based on contrast learning is characterized by comprising the following steps of: S1, carrying out field extraction, template analysis and sequence construction on original log data to obtain a log sequence data set, wherein the log sequence data set comprises a plurality of sequence sets formed by standardized log templates according to time sequence; S2, generating positive samples for the log sequence data set by adopting a two-stage enhancement strategy, and constructing negative samples by combining a cross-domain log sequence and an abnormal log sequence to form positive and negative sample pairs, wherein the two-stage enhancement strategy is a characteristic enhancement mode for respectively carrying out item-level fine-grained disturbance and sequence-level structural disturbance on a log template sequence; S3, inputting the positive and negative samples into an encoder, and pre-training the encoder by adopting a combined loss function for comparing learning loss and classifying learning loss and weighting and summing to obtain a general representation with cross-system robust semantic and time sequence discrimination capability; And S4, freezing the main parameters of the pre-trained encoder, introducing a condition sensing adapter into an encoder layer, performing target domain adaptation optimization on the general characterization through classification learning updating of the adapter and related gating parameters, and outputting a log abnormality detection result, wherein the condition sensing adapter is a parameter efficient adaptation component for compensating the target domain characteristic offset based on the target domain log window statistical characteristics to generate a condition vector.
  2. 2. The cross-domain log anomaly detection method based on comparison learning is characterized by comprising the steps of splitting time stamps, levels, components and content core fields of an original log in step S1, filtering redundant information to obtain a log core content set, replacing dynamic parameters in the log core content with uniform placeholders by adopting a log analysis algorithm to generate a standardized log template, and combining the standardized log template into a log template sequence according to a log type by adopting a corresponding window strategy according to a time sequence to form a log sequence data set.
  3. 3. The cross-domain log anomaly detection method based on contrast learning according to claim 1, wherein the process of generating positive samples by the two-stage enhancement strategy in the step S2 includes entry-level enhancement and sequence-level enhancement, wherein the sequence subjected to the sequence-level enhancement is used as the positive samples, the entry-level enhancement is used for carrying out fine-granularity disturbance on a single log template in a log sequence data set, and the sequence-level enhancement is used for carrying out structural disturbance on the sequence subjected to the entry-level enhancement.
  4. 4. The method for detecting the abnormality of the cross-domain log based on the contrast learning according to claim 1, wherein the process of constructing the negative sample in the step S2 includes selecting a cross-domain log sequence different from a source of the positive sample and a log sequence labeled as an abnormality category, and combining the two types of sequences into a negative sample set.
  5. 5. The method for detecting the cross-domain log anomalies based on the contrast learning according to claim 1, wherein in the joint loss function in the step 3, contrast learning loss is used for realizing separation of positive sample characterization clustering and negative sample characterization, classification learning loss is used for adapting to log anomaly detection two classification tasks, and contribution degrees of the two types of loss are balanced through weight coefficients.
  6. 6. The method for detecting the cross-domain log abnormality based on the contrast learning according to claim 1 is characterized in that the encoder in the step S3 is a multi-layer transducer encoder, the process of inputting positive and negative sample pairs into the encoder comprises model input construction, embedded layer encoding, position encoding and attention enhancement and feature aggregation, wherein the model input construction is used for splicing log templates in positive and negative sample sequences in time sequence and inserting preset separators between adjacent log templates, token segmentation is carried out on the spliced sequences, the Token ID sequences are mapped into Token ID sequences according to preset word lists, the Token ID sequences refer to unique digital identifications corresponding to each Token in the preset word lists, the Token ID sequences are truncated or filled to form fixed-length input sequences, attention masks are generated to identify effective Token positions, the embedded layer encoding is used for mapping the fixed-length Token ID input sequences into initial feature representations in a continuous vector form, the position encoding mechanism is introduced into the position encoding mechanism, the encoder senses the position information of each element in the log sequences, the attention enhancement and the feature aggregation are mapped into a linear feature-level-dependency relation between the capture sequence and the feature aggregation sequence, and the global feature aggregation is carried out on the obtained through a global feature aggregation level.
  7. 7. The cross-domain log anomaly detection method based on contrast learning of claim 1 is characterized in that in the step S4, target domain adaptation specifically comprises construction of target domain condition vectors, insertion condition sensing adapters and adapter parameter updating, wherein the insertion condition sensing adapters are used for extracting statistical features from target domain log window sequences, the statistical features comprise template frequency, template diversity duty ratio, template repeatability and template distribution entropy, the insertion condition sensing adapters are used for respectively inserting the adapters after the attention sub-layer and the feedforward network sub-layer of each layer of a transducer encoder are output, the adapter parameter updating is used for freezing parameters of an embedded layer, an attention layer and a feedforward network layer of a backbone of the encoder, and only the adapters and gating related parameters are updated.
  8. 8. The cross-domain log anomaly detection method based on contrast learning of claim 7 is characterized in that the processing process of the condition-aware adapter comprises the steps of compressing output features of an encoder sub-layer to a low-dimensional bottleneck space, performing nonlinear activation on the low-dimensional features, generating gating coefficients based on target domain condition vectors, performing channel-by-channel modulation on the bottleneck space features, mapping the modulated bottleneck features back to original feature dimensions, generating feature compensation terms, and fusing the feature compensation terms with the output features of the encoder sub-layer in a residual form.
  9. 9. The method for detecting cross-domain log anomalies based on contrast learning according to claim 6, wherein the feature aggregation converts sequence-level features into sample-level global feature representations in an average pooling, first element feature extraction or weighted pooling manner.
  10. 10. The cross-domain log anomaly detection method based on contrast learning according to claim 1, wherein the outputting process of the anomaly detection result in step S4 includes inputting the adapted and optimized feature representation into a classification head, mapping the feature representation into a probability distribution of normal and anomaly by an activation function, and determining an anomaly state of a log sample according to a preset probability threshold.

Description

Cross-domain log anomaly detection method based on contrast learning Technical Field The invention relates to the technical field of operation and maintenance of the electric power Internet of things, in particular to a cross-domain log anomaly detection method based on contrast learning. Background With the deep advancement of smart grid construction, a power grid system has formed a complex heterogeneous network covering multiple types of equipment such as substation automation equipment, a dispatching control system, a power distribution terminal, a new energy grid-connected device and the like. The log is used as a core carrier for recording the running state, the operation behavior and the fault event of the power grid equipment, and has the typical characteristics of strong time sequence dependence, high professional isomerism and serious abnormality consequences, namely, the log not only comprises normal operation records such as equipment start-stop, parameter adjustment and the like, but also conceals abnormal information such as equipment faults, communication interruption, malicious invasion and the like, and once the abnormality is not detected in time, serious safety accidents such as large-area power failure, equipment damage and the like can be caused, so that the abnormal detection based on the log becomes a core technical support for guaranteeing the safe and stable operation of the intelligent power grid. The existing log anomaly detection and adaptation technology still has two core defects, and the high-reliability operation and maintenance requirements of a complex system are difficult to meet. On the one hand, the cross-domain adaptation capability is deficient, the domain offset scene generalization performance is poor, a traditional full-quantity fine adjustment scheme needs to update all parameters of a transducer pre-training model, relies on massive annotation data, is high in calculation cost and deployment difficulty, cannot adapt to a low-resource cross-domain scene, the existing parameter efficient fine adjustment and light-weight adaptation scheme does not combine log semantics and time sequence dependency characteristics, only uses general domain logic, is difficult to accurately capture cross-system log semantic differences, and the manual design promt is limited by professional domain knowledge and is insufficient. On the other hand, the semantic time sequence feature is not captured sufficiently, a high-efficiency feature enhancement means is lacking, the traditional method can only capture shallow time sequence features, deep semantic association of logs cannot be mined, complex abnormal recognition capability on multi-step linkage faults and the like is weak, the traditional Transformer-based method improves the semantic capture capability, but does not introduce high-efficiency feature enhancement strategies such as contrast learning and the like, subtle differences of similar log sequences are difficult to distinguish, enhancement schemes aiming at log entry level and sequence level features are lacking, feature learning is insufficient in a low-sample scene, and detection accuracy and reliability are limited. Disclosure of Invention The invention aims to provide a cross-domain log anomaly detection method based on contrast learning, which aims to solve the problems that log distribution difference is large and anomaly modes are difficult to migrate under different systems and running environments. In order to achieve the purpose, the invention adopts the following technical scheme that the invention provides a cross-domain log anomaly detection method based on contrast learning. The method comprises the steps of S1, carrying out field extraction, template analysis and sequence construction on original log data to obtain a log sequence data set, wherein the log sequence data set comprises a plurality of sequence sets formed by standardized log templates according to time sequence, S2, generating positive samples on the log sequence data set by adopting a double-stage enhancement strategy, constructing negative samples by combining a cross-domain log sequence and an abnormal log sequence to form positive and negative sample pairs, wherein the double-stage enhancement strategy is a characteristic enhancement mode of carrying out item-level fine-granularity disturbance and sequence-level structure disturbance on the log template sequence respectively, S3, inputting the positive and negative sample pairs into an encoder, and carrying out pre-training on the encoder by adopting a joint loss function of weighted summation of comparison learning loss and classification learning loss to obtain a universal representation with cross-system robust semantics and time sequence discrimination capability. And S4, freezing the main parameters of the pre-trained encoder, introducing a condition sensing adapter into an encoder layer, performing target domain adaptation optimization on