Search

CN-121996796-A - Lightweight log semantic enhancement and anomaly detection method adapting to electric power Internet of things

CN121996796ACN 121996796 ACN121996796 ACN 121996796ACN-121996796-A

Abstract

The invention belongs to the technical field of operation and maintenance of the electric power Internet of things, in particular relates to a lightweight log semantic enhancement and anomaly detection method for adapting to the electric power Internet of things, and aims to solve the problems of inaccurate semantic expression and insufficient adaptation of complex scenes in log anomaly detection in the prior art. The method comprises the steps of preprocessing an original log of the electric power Internet of things to obtain a structured log and a log template, respectively performing semantic embedding processing on the obtained log template and the structured log to generate a first semantic vector and a second semantic vector, fusing the first semantic vector and the second semantic vector through cross-semantic attention fusion processing to obtain a fused semantic vector sequence, and inputting the fused semantic vector sequence into a teacher-student distillation type anomaly detection model to obtain a log anomaly detection result, wherein the teacher-student distillation type anomaly detection model is composed of a complex teacher model and a lightweight chemical raw model, and anomaly detection is achieved through knowledge distillation.

Inventors

  • LI WEIBO
  • JING FENG
  • XU CHENGYU
  • HUANG DACHENG
  • SUN HAICHUAN
  • LI JIACHAO
  • Men Liyan
  • YANG ZHAOFEI
  • WU HANWEI
  • DUAN XIAORONG
  • ZHOU XIN
  • HE JING
  • LI RONGSHENG
  • WU YAO
  • GAO WEI

Assignees

  • 国网山西省电力有限公司信息通信分公司

Dates

Publication Date
20260508
Application Date
20260127

Claims (10)

  1. 1. The lightweight log semantic enhancement and anomaly detection method adapting to the electric power Internet of things is characterized by comprising the following steps of: S1, preprocessing an original log of the electric power Internet of things to obtain a structured log and a log template, wherein the log template refers to standardized expression formed by repeated static texts in the log, and the structured log refers to structured data comprising the template and variable description; S2, respectively carrying out semantic embedding processing on the log template and the structured log obtained based on the step S1 to generate a first semantic vector and a second semantic vector, wherein the semantic vector refers to a machine-recognizable digital vector for converting text semantics; s3, fusing the first semantic vector and the second semantic vector through cross-semantic attention fusion processing to obtain a fused semantic vector sequence, wherein the cross-semantic attention fusion processing refers to a processing mode of aligning and fusing log template structure semantics and business semantics through an attention mechanism and strengthening key feature expression; And S4, inputting the fusion semantic vector sequence into a teacher-student distillation type anomaly detection model to obtain a log anomaly detection result, wherein the teacher-student distillation type anomaly detection model is composed of a complex teacher model and a light-weight chemical raw model, and anomaly detection is realized through knowledge distillation.
  2. 2. The method for enhancing the semantics and detecting the anomalies of the lightweight log adapted to the electric power Internet of things according to claim 1, wherein the preprocessing in the step S1 comprises the steps of extracting a log template and a variable part corresponding to each log message in an original log, and the formula is expressed as follows: Wherein, the For the i-th original log, In order to be a log parsing function, As the log template corresponding to the i-th original log, The variable part at least comprises a number, an address and an identifier in the original log; and generating a structured log in a JSON format by adopting Qwen and analyzing a static text in the original log by adopting a Drain algorithm to obtain a log template.
  3. 3. The method for enhancing the semantics and detecting the abnormality of the lightweight log adapting to the electric power Internet of things according to claim 2 is characterized in that a structured log containing variable description in a JSON format is generated through a prompt engineering driver Qwen, wherein the prompt engineering comprises at least one of self-prompt, thinking chain prompt and context prompt, the variable description at least comprises explanation of numbers, addresses and identifiers in an original log, the self-prompt is to guide Qwen to autonomously mine variable association semantics implied in the original log through preset instructions, the thinking chain prompt is to guide Qwen to gradually complete the structured analysis according to the steps of log analysis, variable identification and semantic description, and the context prompt is to combine the business scene context of the original log so as to ensure consistency of the variable description and equipment operation logic.
  4. 4. The method for enhancing the semantics and detecting the anomalies of the lightweight log adapted to the electric power internet of things according to claim 1, wherein in step S2, a pre-trained FastText model is adopted to perform semantic embedding on a log template to obtain a first semantic vector, and a pre-trained FastText model is adopted to perform semantic embedding on an analysis text in a structured log to obtain a second semantic vector.
  5. 5. The method for lightweight log semantic enhancement and anomaly detection for adapting to the power internet of things according to claim 4, wherein the analysis text is text content generated by Qwen based on a log template and a variable part and used for describing the meaning of log semantics.
  6. 6. The lightweight log semantic enhancement and anomaly detection method for the adaptive power Internet of things according to claim 1, wherein the fusion processing in the step S3 comprises the steps of taking a first semantic vector as a query vector, taking a second semantic vector as a key vector and a value vector, and realizing collaborative modeling of structure semantics and business semantics through projection, pooling and weighted fusion operation of a multi-head attention mechanism.
  7. 7. The method for enhancing the semantic of the lightweight log and detecting the abnormality of the adaptive power Internet of things according to claim 6, wherein the fusion processing further comprises the steps of arranging the fusion semantic vectors of the single logs in time sequence, and obtaining a fusion semantic vector sequence for enhancing the key feature expression after residual linking and normalization processing.
  8. 8. The lightweight log semantic enhancement and anomaly detection method for the adaptive power Internet of things according to claim 1 is characterized in that in step S4, the teacher model is a multi-layer time sequence model for learning semantic knowledge of a history log and generating soft labels, and the student model is a single-layer time sequence model for receiving a fused semantic vector sequence and outputting anomaly detection results.
  9. 9. The method for lightweight log semantic enhancement and anomaly detection for adapting a power internet of things of claim 8, wherein the knowledge distillation comprises constructing a total loss function training student model based on a hard prediction result and a loss of a real label of the student model, a soft prediction result and a loss of a soft label of a teacher model.
  10. 10. The lightweight log semantic enhancement and anomaly detection method for the adaptive power internet of things according to claim 1 is characterized in that the anomaly detection result in the step S4 comprises anomaly identification and anomaly cause description, wherein the anomaly cause description is generated based on variable meanings in a structured log and log semantic analysis results.

Description

Lightweight log semantic enhancement and anomaly detection method adapting to electric power Internet of things Technical Field The invention relates to the technical field of operation and maintenance of the electric power Internet of things, in particular to a lightweight log semantic enhancement and anomaly detection method suitable for the electric power Internet of things. Background In the power internet of things system, a dispatching control system, power transmission and transformation equipment, an edge perception terminal and a communication network can continuously generate mass log data in the continuous operation process. The logs comprehensively record the core information such as the running state of the equipment, the communication transmission quality, the execution flow of the scheduling instruction, the key events of the system and the like, and are important data supports for guaranteeing the safe and stable running of the power grid, rapidly positioning fault hidden dangers and identifying safety threats. With the continuous expansion of the scale of the electric power Internet of things, log data generated by a large-scale dispatching center core system daily reaches TB level, and the large-scale dispatching center has high-frequency and large-scale remarkable characteristics, and strict requirements are provided for the accuracy of log analysis, the high efficiency of real-time analysis and the accuracy of anomaly detection. In order to meet the requirement of log analysis, various log semantic modeling and anomaly detection methods have appeared in the prior art. The method is mainly based on a word bag model, simple word vector average and the like to perform log representation, semantic modeling is realized by global average pooling or word term aggregation with fixed weight, and a part of advanced schemes rely on a deep time sequence model or directly introduce a large language model to participate in online reasoning so as to attempt to improve detection accuracy. Meanwhile, the log analysis link usually adopts a traditional algorithm to extract a log template, focuses on formalized modeling of the template or event sequence, and carries out abnormal recognition work based on the formalized modeling. However, the prior art cannot distinguish the semantic contribution difference between the fault related terms and the common vocabulary, so that abnormal related semantics are weakened, and the distinguishing capability is limited under the scene that templates are similar but the semantic difference is obvious. Modeling missing of hidden device operation logic, fault cause and effect relationship and variable business meaning in the log is difficult to express numerical change and association between device states and fault types, and understanding of complex abnormal scenes is limited. The anomaly detection result only outputs labels or probability scores, lacks explanation of analysis processes, variable meanings and anomaly causes, has no unified structural representation, is unfavorable for realizing automatic butt joint and linkage analysis with a subsequent operation and maintenance system, and seriously influences the actual application effect of the log analysis technology in the scene of the electric power Internet of things. Disclosure of Invention The invention aims to provide a lightweight log semantic enhancement and anomaly detection method adapting to the electric power Internet of things, and aims to solve the problems of inaccurate semantic expression and insufficient adaptation of complex scenes in log anomaly detection in the prior art. In order to achieve the purpose, the technical scheme is that the lightweight log semantic enhancement and anomaly detection method adapting to the electric power Internet of things is provided. The method comprises the steps of S1, preprocessing an original log of the electric power Internet of things to obtain a structured log and a log template, wherein the log template refers to standardized expression formed by repeated static texts in the log, the structured log refers to structured data comprising templates and variable descriptions, S2, respectively performing semantic embedding processing on the log template and the structured log obtained in the step S1 to generate a first semantic vector and a second semantic vector, wherein the semantic vectors refer to machine-recognizable digital vectors for converting text semantics, S3, fusing the first semantic vector and the second semantic vector through cross-semantic attention fusion processing to obtain a fused semantic vector sequence, wherein the cross-semantic attention fusion processing refers to a processing mode of aligning and fusing the structure semantics of the log template with service semantics and strengthening key feature expression through an attention mechanism, S4, inputting the fused semantic vector sequence into a teacher-to-student distillation type anoma