CN-122020458-A - CLI replay attack-based power consumption anomaly detection method

CN122020458ACN 122020458 ACN122020458 ACN 122020458ACN-122020458-A

Abstract

The invention discloses an electricity consumption anomaly detection method based on CLI replay attack, which relates to the technical field of network security and comprises the steps of (1) data acquisition and preprocessing, (2) minority sample self-adaption enhancement, (3) data isolation and consistency verification, (4) multimode collaborative training, (5) difficult sample iteration processing, (6) anti-sample generation and robust training, (7) threshold self-adaption optimization, (8) neural symbol joint reasoning, (9) result prediction and evaluation, and (4) modeling capability of a depth model on complex characteristic interaction relation and stability of a traditional model in a small sample and unbalanced data scene are achieved, so that overall accuracy and robustness of anomaly detection results are improved, the whole method can accurately capture minority attack under a complex scene by combining dynamic rule optimization and composite disturbance enhancement, meanwhile stability and high efficiency of traceability of detection performance are maintained, and the problem that the existing detection method leaks under a dynamic attack scene is effectively solved.

Inventors

ZHAO JIANWEN
QIAO FUQUAN
NIE YINGKUN
GAO LIDONG
GU YANG
ZHANG LIHUA

Assignees

国网山东省电力公司泰安供电公司

Dates

Publication Date: 20260512
Application Date: 20260115

Claims (10)

1. The power consumption anomaly detection method based on CLI replay attack is characterized by comprising the steps of (1) data acquisition and preprocessing, (2) minority sample self-adaptation enhancement, (3) data isolation and consistency verification, (4) multi-model parallel training and prediction fusion, (5) difficult sample iterative processing, (6) anti-sample generation and robust training, (7) threshold self-adaptation optimization, (8) neural symbol joint reasoning and (9) result prediction and evaluation; In the step (4), after data isolation and consistency verification are completed, multi-model parallel training and prediction fusion are built based on training data, a table data modeling model based on deep learning and a classification model based on traditional machine learning are respectively built, wherein the deep learning model adopts FT-transducer, high-order interactive modeling is carried out on sample characteristics through characteristic embedding and multi-head self-attention mechanism, the traditional machine learning model adopts a nonlinear discrimination relation of capturing characteristics by a multi-layer perceptron, and for any sample The prediction probability of the abnormal class is output through training of the FT-transducer model, meanwhile, the corresponding abnormal prediction probability is output through the MLP model, and after the prediction result of the basic model is obtained, the probability level fusion method is adopted to integrate the model outputs.
2. The method for detecting electrical anomalies based on CLI replay attack according to claim 1, wherein in step (1), a multi-stage data processing and feature construction method is adopted: The method comprises the steps of processing data missing values, namely aiming at the missing problem in time sequence power utilization data, adopting a combined processing method of combining numerical filling and missing indication, carrying out missing value filling on target numerical characteristics based on a median strategy to ensure the robustness of data distribution, constructing corresponding missing indication characteristics for each characteristic, and combining the filled numerical characteristics with the missing indication characteristics to form an expansion characteristic representation comprising numerical information and missing mode information, so that the recognition capability of a subsequent abnormal detection model is improved on the premise of not losing missing structural information; the original time sequence feature extraction comprises the steps of extracting multidimensional statistical features from an original daily electricity consumption sequence, wherein the multidimensional statistical features comprise basic statistical features, periodic features, frequency domain features, abnormal fluctuation features, power consumption sudden increase/sudden decrease times, zero value/low value abnormal days and continuous abnormal modes, wherein the basic statistical features comprise daily average power consumption, standard deviation of power consumption, variation coefficients, skewness and kurtosis, the periodic features comprise difference degrees of power consumption modes of workdays and holidays, gradient of power consumption trends of month and seasonal indexes, the abnormal fluctuation features comprise power consumption sudden increase/sudden decrease times, zero value/low value abnormal days and continuous abnormal modes, and the behavior consistency features comprise the mahalanobis distance between the power consumption modes of users and the standard modes of the users of the same type.
3. The electricity consumption anomaly detection method based on CLI replay attack according to claim 1 is characterized in that in the step (2), a minority class sample is adaptively enhanced, statistical analysis is conducted on sample label distribution, when the proportion of the detected anomaly sample is lower than a preset threshold, a minority class sample enhancement module is triggered, the preset threshold is set to be smaller than 1:3-1:50 in proportion to the normal sample, the adaptive enhancement processing comprises adaptively selecting a sample enhancement mode and enhancement strength according to the distribution density of the anomaly sample in a feature space, a neighborhood structure and a boundary relation between the anomaly sample and the normal sample, generating a synthesized anomaly sample to supplement original anomaly sample distribution through at least one mode of sample interpolation generation based on the neighborhood relation, sample expansion based on local density perception and sample generation based on feature disturbance, dynamically controlling the number of the synthesized samples according to a target class balance interval in the sample generation process, and executing validity check on the generated synthesized samples, eliminating low confidence samples which are abnormal in distance or overlap with the normal samples to avoid excessive anomaly sample generation, and improve the subsequent generalized training model capability of the anomaly sample while relieving class imbalance problem.
4. The method for detecting electrical anomalies based on CLI replay attack according to claim 1, wherein the data set isolation and consistency check in step (3) is performed by first employing a sample fingerprint-based duplicate detection method for feature vectors of each sample And its label Splicing and encoding to construct sample fingerprint vector And generating a sample fingerprint value by a hash function Detecting whether samples with complete repetition or high similarity exist or not by comparing the consistency of fingerprint values in different data subsets, judging that the risk of data leakage exists and re-executing data dividing operation when the same or similar fingerprint values are detected, respectively modeling the statistical distribution of each characteristic dimension in a training set, a verification set and a test set by adopting a statistical distribution consistency detection method, and judging the similarity degree by calculating the distance index between the characteristic distributions of each data subset.
5. The method for detecting electrical anomalies based on CLI replay attack as claimed in claim 4, wherein in step (3), kullback-Leibler divergence is used to distribute training sets With validation set or test set distribution The measurement is performed and is defined as formula (1): When the divergence value is lower than a preset distribution difference threshold value, abnormal distribution overlapping among different data subsets is judged to trigger a repartitioning or manual verification process, a cross detection method based on feature space distance constraint is adopted, samples are mapped into an original feature space or a low-dimensional feature space after main component analysis and dimension reduction, euclidean distance among samples crossing the data subsets is calculated, and the Euclidean distance is defined as a formula (2): when there are pairs of samples from different subsets of data, the distance of which is less than a preset distance threshold When the sample pairs are judged to be highly similar in the feature space, potential information leakage risks exist, and relevant samples are removed or redistributed; By combining three methods of sample fingerprint hash detection, statistical distribution consistency detection and feature space distance constraint detection, multi-level verification is carried out on the independence between the data subsets, so that mutual isolation of model training, verification and testing processes is ensured, and the objectivity and credibility of an abnormal detection model evaluation result are improved.
6. The method for detecting electrical anomalies based on CLI replay attack according to claim 1, wherein the step (5) of difficult sample identification and iterative reinforcement is characterized in that after multi-model prediction fusion is completed, a difficult sample identification and iterative reinforcement mechanism is introduced to further improve the discrimination capability of the model on decision boundary areas and complex anomaly samples; comprehensive anomaly probability based on fusion model output Identifying samples with high prediction uncertainty or misclassified as a difficult sample set for samples Its prediction uncertainty can be measured by distance from the classification threshold: When (when) Less than a preset uncertainty threshold Judging the sample as a difficult sample and simultaneously, combining the predicted result with a real label Inconsistent samples are incorporated directly into difficult sample sets; For the difficult samples, a data enhancement method based on local disturbance is adopted to generate enhanced samples by applying controlled noise in the original feature space: And adding the generated reinforced sample and the original sample into a training set together, and executing multiple rounds of retraining on the model, so that the expression capacity of the model in an abnormal boundary area is gradually enhanced.
7. The method for detecting electrical anomalies based on CLI replay attacks of claim 1, wherein step (6) resists sample generation and robust training is performed by using a trained model-based loss function Generating a countermeasure sample by adopting a rapid gradient sign method, wherein the disturbance form is as follows: Wherein alpha represents a disturbance intensity coefficient, and the challenge sample and the original sample are input into a model together for joint training by minimizing the comprehensive loss function of the original sample and the challenge sample: the stability and generalization capability of the model in the face of input noise and malicious disturbance are improved.
8. The method for detecting the power consumption abnormality based on the CLI replay attack according to claim 1, wherein the threshold self-adaptive optimization in the step (7) is implemented by introducing a threshold self-adaptive optimization module after obtaining the abnormality prediction probability of the fusion model, so as to avoid degradation of detection performance caused by adopting a fixed threshold.
9. The method of claim 1, wherein the step (8) of combined neurosymbol reasoning includes introducing a neurosymbol collaborative reasoning mechanism based on the model prediction result to secondarily correct the anomaly determination result, constructing a symbol rule set based on expert knowledge Each rule outputs a rule confidence And carrying out weighted fusion on the fusion model prediction probability and the rule reasoning result to obtain the corrected abnormal score.
10. The method for detecting electrical anomalies based on CLI replay attack according to claim 1, wherein the step (9) of predicting and evaluating results includes performing final result prediction and performance evaluation on the test data set after model training, fusion prediction and threshold adaptation optimization are completed; fusion model outputs final abnormal prediction probability for sample And generating a final prediction label of the sample by combining the optimal classification threshold value obtained by optimizing on the verification set, wherein the judgment rule is as follows: ; Wherein, the An abnormal sample is represented and is displayed, And (3) representing normal samples, performing systematic evaluation on the model detection performance based on the prediction result and the real labels, wherein in view of the fact that the number of abnormal samples in the experimental data set is significantly smaller than that of normal samples, the single-use accuracy index is difficult to objectively reflect the real performance of the model, and therefore recall rate and F1 score are selected as comprehensive evaluation indexes, wherein the F1 score is used as a main evaluation basis to measure the overall discrimination capability of the model under the condition of unbalanced category.

Description

CLI replay attack-based power consumption anomaly detection method Technical Field The invention relates to the technical field of network security, in particular to an electricity consumption abnormality detection method based on CLI replay attack, which is particularly suitable for realizing accurate and interpretable replay attack detection on CLI operation by fusing the perceptibility of a neural network model and the reasoning capability of a symbol rule. Background In the field of network security, command Line Interface (CLI) replay attacks implement camouflage penetration through the sequence of operations of a composite legal user, which is a typical threat of bypassing static authority management and behavior baseline auditing. The existing detection method relies on a deep neural network to construct an end-to-end learning framework, utilizes the automatic extraction capability of the model to high-dimensional characteristics to identify abnormal modes, and can achieve certain detection precision on a closed data set, but the black box characteristics of the detection method lead to logic chain breakage of attack judgment, and safety personnel are difficult to learn that the model makes decisions based on key indications, so that the unexplainability not only weakens the feasibility of attack tracing and strategy optimization, but also prevents the compliance of the model in key fields such as finance, energy and the like due to lack of audit basis, and the congenital deficiency of the traditional data driving method in the aspect of safety semantic understanding is exposed. The further contradiction lies in the neglect of the trusted basis and environmental adaptability of the existing scheme to the detection flow. The training stage often lacks a strict data isolation mechanism, so that pre-training data and target test data infiltrate into a training process through implicit association, and a model evaluation result deviates from the performance of a real scene. The problem of significant class imbalance of real command line operation data generally exists, the attack sample ratio is usually lower than fifteen percent, the traditional undersampling is easy to lose the fine granularity characteristics of most classes, and the oversampling is easy to introduce redundant noise to aggravate the overfitting. Although research attempts are made to introduce hierarchical undersampling to alleviate the problem, the method is mainly stopped in a shallow strategy of random sampling according to class proportion, and hierarchical logic is designed without combining feature space distribution, so that the sampled data still has difficulty in fully representing a diversified command mode of a normal sample and parameter solidification features of an attack sample, and the recognition capability of a model on minority class attacks is limited. In addition, when facing the real data containing noise, category overlapping and resistance disturbance, the single model is easy to be over fitted or has high omission ratio to a low-frequency attack mode, and when facing disturbance such as parameter fine tuning, time interval jitter and the like, the robustness is suddenly reduced. Previous partial researches try to alleviate problems through model stacking or feature engineering, but fail to bring data isolation, interpretable reasoning, hierarchical undersampling and robust training into a unified frame, so that a detection system still faces the dilemma of misreporting and missed reporting unbalance under a dynamic attack scene, and the cooperative improvement of safety and reliability is difficult to realize. In the prior art, although the problem of evaluating deficiency height is relieved through data isolation, the problem of evaluating the interpretability is attempted to be improved by rules, the problem of class imbalance caused by low proportion of attack samples in real data is still not solved, the problem of key distinguishing characteristics of data loss after sampling is caused by the fact that characteristic space distribution is not considered in traditional undersampling, the recall rate of a model for minority class attacks is insufficient, the problem of cutting off of interpretability reasoning and model training, rule base updating is delayed to attack manipulation evolution, rule weights cannot be dynamically optimized through sample distribution change in the training process is solved, and the problem of poor generalization capability of reinforcing and training strategies on complex disturbance is solved, when complex disturbance such as signal distortion caused by equipment isomerism, an attacker semi-reasonable operation sequence and the like is solved, model instantaneity and stability are reduced, and each module is in isolation operation and lacks a parameter linkage optimization mechanism, so that break points exist in data balance, feature learning and decision interpretation lin