CN-116781321-B - LSTM-based numerical control system log auditing method and terminal

CN116781321BCN 116781321 BCN116781321 BCN 116781321BCN-116781321-B

Abstract

The invention belongs to the technical field of log audit, and discloses a log audit method and a terminal of a numerical control system based on LSTM, wherein the method comprises the steps of reconstructing three types of dedicated resolvers of logs by utilizing a new log analysis tool improved based on a Drain log analysis tool, and analyzing the logs into template sentences and variables; and identifying the log mode by using a two-way long-short-term memory model, namely carrying out time sequence-based anomaly detection on the log by carrying out prediction within a certain window size on the log key and calculating Gaussian errors of the log. The invention has high precision on classical data set, and the accuracy, recall rate and F value are higher than those of traditional method. The method improves the accuracy of log analysis, perfects the log processing flow of the system, realizes full-automatic log analysis, realizes two main abnormal models in the system, improves the super-parameters of the models by testing, and perfects the flow details from log analysis to input of the models.

Inventors

LU SONGFENG
ZHOU LITIAN
ZHU JIANXIN
LUO YONG
Nie Hewang

Assignees

华中科技大学

Dates

Publication Date: 20260508
Application Date: 20230504

Claims (6)

1. The LSTM-based numerical control system log auditing method is characterized by comprising the following steps of: Reconstructing three types of dedicated journal resolvers by using a new journal resolving tool based on improvement of a journal resolving tool Drain, resolving the journal into template sentences and variables, and correspondingly converting the journal template sentences into journal keys; the log mode is identified by using a two-way long-short-term memory model, namely log timing-based anomaly detection is carried out by carrying out prediction within a certain window size on a log key and calculating Gaussian errors of the log; The LSTM-based numerical control system log auditing method comprises the following steps of: step one, analyzing a log by using an analysis tree with fixed depth in a self-grinding tool ReDrain and utilizing an improved analysis tree, and separating to obtain log template sentences and log parameters; Step two, using a log template dictionary formulated in advance to convert log template sentences into log keys, namely corresponding numbers, classifying the log keys into different sessions by using different session marker bits to obtain a log key sequence based on a time sequence in the same session, dividing log variables into different log types, and inputting the variables of the same log into the same parameter anomaly detection model to obtain data of feature selection; Thirdly, constructing and training an execution path abnormality detection model, training a log key analyzed by a normal mode log by using a keras-constructed Bi-LSTM neural network, detecting a log key mixed by a normal mode and an abnormal mode by using the trained execution path abnormality detection model, and outputting an execution path abnormality detection result; Step four, constructing and training a parameter abnormality detection model, and detecting whether abnormality exists in a system key record in a log parameter value by using the trained parameter abnormality detection model; fifthly, parameter updating after returning is carried out on the mispredicted samples, fitting training is carried out again by using a fit function of keras frames based on the trained models, and a normal log key sequence can be successfully identified through a network after training, updating and weight adjustment; The parsing tree with fixed depth in the self-grinding tool ReDrain and parsing the log by using the improved parsing tree, the obtaining log template sentences and log parameters includes: firstly, setting some expressions according to prior knowledge of each log frame, and analyzing [ r '(\d+ \) 3} \d+' ] of log variables; Secondly, carrying out log message length grouping processing; searching by using the first mark, namely searching based on the first mark as a constant, and replacing the representation by </SUB > when the first mark is a variable; Finally, searching is continued according to the similarity of the marks, after the maximum simSeq of the input log in the analysis tree is obtained, the analysis tree is compared with a similarity threshold st transmitted into a tool, if the analysis tree is larger than the similarity threshold st, the analysis tree is a proper log group, when the marks of the input log and the marks of the same positions of the subtrees are the same, the update cannot be carried out, when the marks of the input log and the marks of the same positions of the subtrees are different, the marks of the subtrees are replaced by </times >, if the marks of the input log and the marks of the same positions of the subtrees are not the same, a new subtree is created, and if the marks of the subtrees are not found, a new log group is created; the construction and training of the execution path abnormality detection model comprises the following steps: (1) The execution path abnormality detection model is constructed as follows: An input layer for inputting a log key sequence w with the size of a sliding window, wherein w= { X t-h ,…,x t-2 ,x t-1 }, X represents a log key obtained by analyzing a real log, belongs to a unique log key set X, and X t represents a next log key; The output layer is used for converting the output of the last hidden layer into a probability distribution function by utilizing a softmax function for multiple classifications to represent a probability matrix Pr (X t ＝k i |x t-h ,…,x t-2 ,x t-1 ), wherein the probability matrix Pr (X t ＝k i ) is used for representing probability distribution of all log keys from X; (2) The training execution path abnormality detection model comprises the steps of intercepting fewer fragments in a log set in a normal state in a target system as a training set; the detecting whether the system key record in the log parameter value has abnormality by using the trained parameter abnormality detection model comprises the following steps: And calculating MSE between the predicted value and the true value by using a parameter anomaly detection model, wherein the MSE is calculated according to the following formula: ; Wherein x t represents a true value, x p represents a predicted value, and n represents an average value; Selecting calculation function of stats module in scipy as confidence interval of MSE, when error is in high confidence interval of MSE, making correspondent log be normal, otherwise judging log be abnormal.
2. The LSTM-based numerical control system log auditing method according to claim 1, wherein the log message length is the number of separated marks in a log statement, wherein maxchild is larger than 3; The search is continued according to the similarity of the marks as follows: ; Wherein seq1 represents the log which is input at this time and is preprocessed, seq2 represents a log template represented by a log group represented by a subtree in a tree, I represents an ith mark in an input sequence, N represents a sequence length, equ is as follows: 。
3. the LSTM-based numerical control system log auditing method according to claim 1, wherein the feature vectorization processing of log template statements and log parameters by using different methods respectively comprises: (1) Counting the log keys by using template sentences, and taking the sequence numbers counted by the template sentences as codes of the log keys, or directly carrying out characterization processing on the log keys through counting, wherein the log keys are log template sentences; (2) The method comprises the steps of preprocessing a log parameter, removing flag bit parameters, punctuation marks, special symbols and other information which cannot be used as parameter abnormality judgment standards, forming a two-dimensional matrix by the aid of the residual parameter characters, enabling each x on an x axis to represent a parameter of a log, carrying out statistics on texts by using a text.token module under a keras framework and converting the texts into dictionaries to obtain text word frequency information in parameter values, and converting all texts in the matrix into numbers by using built-in functions.
4. The method for auditing logs of the numerical control system based on the LSTM according to claim 1, wherein the step of detecting whether the target log key is normal by using the trained execution path abnormality detection model, and obtaining the log template statement abnormality detection result comprises the steps of: the first h analyzed log keys are sent to an execution path abnormality detection model to obtain a plurality of log keys, the first g log keys are reserved, whether the target log key is in the reserved log keys is judged, if yes, the target log key is judged to be normal, and if not, the target log key is abnormal; the target log key exception indicates that an exception occurs in a program location represented by a log statement of the exception log key.
5. The LSTM based numerical control system log audit method of claim 1 wherein constructing and training a parameter anomaly detection model includes: The parameter anomaly detection model is as follows: An input layer for inputting a parameter value vector in a log setting time stamp; the output layer is used for outputting a parameter prediction vector corresponding to the dimension of the input vector based on the parameter value vector sequence of the latest history record as a prediction for the next parameter value vector; normalizing the values in each vector by the average value and standard deviation of all values of the same parameter position in training data, setting an MSE function as a loss function, and training the parameter anomaly detection model by using a normal data set.
6. An information data processing terminal, characterized in that the information data processing terminal is configured to perform the steps of the LSTM-based numerical control system log auditing method according to any one of claims 1 to 5.

Description

LSTM-based numerical control system log auditing method and terminal Technical Field The invention belongs to the technical field of log audit, and particularly relates to a log audit method and a terminal of a numerical control system based on LSTM. Background At present, with the rapid development of big data technology, the latest technology of log audit is also continuously upgraded to meet the demands of more fields. Along with the increasing of network attacks, attack modes and attack modes are continuously changed, and the traditional log analysis technology based on rules cannot meet the current network security requirements due to the problems of low efficiency, low precision, incapability of adapting to environmental changes and the like. With the development of big data technology and artificial intelligence technology, the log analysis method is gradually changed from the traditional rule analysis method into pattern recognition based on machine learning. The existing log analysis method is mainly divided into two types, namely a model method based on statistical analysis, machine learning, deep learning and the like, and a method based on expert knowledge, pattern recognition and the like. Firstly, model methods based on statistical analysis, machine learning, deep learning and the like are the most popular and studied log analysis model at present because the model methods have little dependence on log data and can better describe complex change modes. Secondly, methods based on expert knowledge, pattern recognition and the like are also mainstream research methods in current log analysis, and the methods analyze log data according to user experience and attack characteristics, so that the methods have certain limitations. In order to overcome the defects, a log analysis method based on log feature selection and machine learning is presented, and the method can predict a host computer and an attack process which are most likely to attack in a large amount of attack log data. Because the traditional log analysis method is based on rules, the problem that false alarms or false alarms are easy to occur exists. The model method based on machine learning and deep learning does not pretreat the original data, so that the attack detection can be rapidly and accurately carried out in the face of a new attack mode and attack mode, and early warning and response can be carried out The system log data volume is sufficient to use deep learning to learn features for processing as opposed to the more advanced approach of using deep learning for numerical control system anomaly detection. Meanwhile, both supervised learning and unsupervised learning have been discussed as being applied to log anomaly detection. At the application level, a log data analysis method based on machine learning has become the mainstream. For example, running a machine learning based log audit model on a Hadoop, spark, etc. big data platform has achieved significant results. However, these methods have the disadvantages of 1. Failing to audit newly generated log data, 2. Existing machine learning algorithms have problems in processing logs, such as failing to fully utilize historical data or failing to take into account the impact of new data, 3. Existing studies are mostly based on a specific classification or clustering task without fully considering the relationships between logs, 4. Existing methods fail to handle the large amount of noise and abnormal data present in the logs. In the industry, there are a variety of security audit systems, such as log audit systems, intrusion detection systems, and the like. Mature products such as NIDS of the green allium, mo Xiang of sinomenium, etc. These systems can complete a complete set of inspection operations, including collection, auditing, and final anomaly detection. However, in practical deployment, most of these systems are suitable for a single type of host, and the detected anomalies cannot be covered completely, so that the accuracy cannot be guaranteed, and high false alarms need to be determined manually. Meanwhile, larger deployment is required, and joint deployment is required even on multiple hosts. Through the analysis, the problems and defects of the prior art are that the existing log audit technology is low in accuracy when analyzing logs, cannot audit newly generated log data and cannot process a large amount of noise and abnormal data existing in the logs, and the prior art cannot detect two different types of anomalies of an execution path and parameters at the same time. Disclosure of Invention Aiming at the problems existing in the prior art, the invention provides a numerical control system log auditing method based on LSTM. The invention is realized in such a way that the numerical control system log auditing method based on the LSTM comprises the following steps: reconstructing three types of dedicated journal resolvers by using a new journal resolving tool im