CN-121462396-B - Data intelligence-based composite network fault classification model, training method and fault classification method

CN121462396BCN 121462396 BCN121462396 BCN 121462396BCN-121462396-B

Abstract

A composite network fault classification model, a training method and a fault classification method based on data intelligence, which belong to the field of network fault classification and are used for solving the problem of network fault classification, the technical key point is that the embedded vector corresponding to the same fault class belonging to the fault class of a few classes is obtained by performing oversampling operation through a first CVAE model; the method comprises the steps of obtaining an embedded vector corresponding to the same fault category belonging to a plurality of fault categories, obtaining a representative embedded vector of the fault category by performing downsampling operation through a first CVAE model, constructing a fifth training set according to a synthesized embedded vector corresponding to the fault category of a few categories and the representative embedded vector corresponding to the fault category belonging to the plurality of categories, and fine-tuning a third BERT model and an adaptive classifier thereof by using the fifth training set to obtain a fourth BERT model, wherein the fourth BERT model and the adaptive classifier thereof are network fault classification models.

Inventors

QI HENG
Ye Xuezhou
WANG XINQI

Assignees

大连理工大学

Dates

Publication Date: 20260508
Application Date: 20251201

Claims (20)

1. A network fault classification model training method is characterized by comprising the following steps of The training set comprises a second training set, wherein the second training set comprises a sample of a fault-related log group, the logs in the fault-related log group represent the same fault type according to causal relationship, and the logs are arranged in time sequence; Performing fine tuning on the second BERT model and an adaptive classifier thereof by using a third training set to obtain the third BERT model, wherein the third training set comprises fault-related log groups of the second training set and corresponding fault types thereof, and different fault-related log groups have labels of the same fault type or labels of different fault types; Correspondingly generating an embedded vector by using a third BERT model through samples of a fault-related log group of a third training set, constructing a fourth training set by using the embedded vector and a fault type label corresponding to the embedded vector, and training a CVAE model by using the fourth training set to obtain a first CVAE model; obtaining the number of samples corresponding to the embedded vectors of the labels of different barrier categories according to the labels of the fault categories corresponding to the embedded vectors of the fourth training set, and classifying the fault categories according to the number to comprise a minority class and a majority class; The method comprises the steps that an embedded vector corresponding to the same fault class belonging to a fault class of a minority class is obtained by performing oversampling operation through a first CVAE model, and a synthesized embedded vector corresponding to the fault class belonging to the minority class is obtained; performing downsampling operation on an embedded vector corresponding to the same fault class belonging to the fault classes of the plurality of classes through a first CVAE model to obtain a representative embedded vector of the fault class, and obtaining a representative embedded vector corresponding to the fault class belonging to the plurality of classes; And constructing a fifth training set according to the synthesized embedded vectors corresponding to the fault categories of the few classes and the representative embedded vectors corresponding to the fault categories belonging to the most classes, and performing fine adjustment on the third BERT model and the adaptive classifier thereof by using the fifth training set to obtain a fourth BERT model, wherein the fourth BERT model and the adaptive classifier thereof are the network fault classification models.
2. The method of claim 1, wherein obtaining a fault-related log group of the weblog comprises Obtaining log templates according to the weblogs, and constructing a frequency time sequence of each log template of the equipment; Constructing a complete connection undirected graph of the log template; Based on the frequency time sequence, carrying out causal relation test on edges in the full-connection undirected graph, and cutting off pseudo-causal edges in the full-connection undirected graph to obtain a plurality of connected subgraphs; and (3) arranging the log templates corresponding to all the nodes in the connected subgraph according to time sequence to obtain a corresponding fault related log group.
3. The method of claim 2, wherein the training set further comprises a first training set, wherein the first training set comprises data of logs and descriptions.
4. A method according to any one of claims 1-3, characterized in that the number of samples of the embedded vectors generated for each fault-related log group of each fault class is counted and an average value is calculated And standard deviation , Representing hyper-parameters, wherein a failure class corresponds to the number of samples The fault class is classified into a minority class, wherein if the fault class is classified into the minority class, the embedded vector corresponding to the fault class is an original embedded vector, the number of samples is the original number of samples, the synthesized embedded vector is obtained based on the following way that a potential variable is obtained by sampling from standard normal distribution, the fault class label vector is used as the condition of CVAE model, the decoder of CVAE model generates the synthesized embedded vector of the potential variable, and the steps are repeated until the sum of the number of samples of the synthesized embedded vector and the original number of samples is 。
5. A method according to any one of claims 1-3, characterized in that the number of samples of the embedded vectors generated for each fault-related log group of each fault class is counted and an average value is calculated And standard deviation , Representing hyper-parameters, wherein a failure class corresponds to the number of samples The fault type is classified into a plurality of types, wherein if the fault type is classified into the plurality of types, the embedded vector corresponding to the fault type is an original embedded vector, the representative embedded vector of the fault type is obtained based on the following modes that according to the original embedded vector and the fault type label vector, a potential variable corresponding to the original embedded vector is generated through an encoder of a CVAE model, potential variables of all the embedded vectors corresponding to the fault type are obtained, potential points are sampled in a type potential space convex hull area constructed by all the potential variables, nearest neighbor potential variables are searched for each sampling point, and the embedded vector corresponding to the nearest neighbor potential variables is used as the representative embedded vector.
6. The method of claim 1, wherein the fault class classification further comprises a balance class; counting the sample number of the embedded vectors correspondingly generated by the fault related log groups of each fault category, and calculating the average value And standard deviation , Representing super-parameters, wherein the number of samples of the embedded vector corresponding to a fault class falls within the interval The fault class is classified into an equilibrium class; According to the embedded vectors corresponding to the fault categories of the equalization classes, the synthesized embedded vectors corresponding to the fault categories of the minority classes and the original embedded vectors corresponding to the synthesized embedded vectors, and the representative embedded vectors corresponding to the fault categories belonging to the majority classes, a fifth training set is constructed, and the third BERT model and the adaptive classifier thereof are subjected to fine tuning by using the fifth training set to obtain a fourth BERT model, wherein the fourth BERT model and the adaptive classifier thereof are the network fault classification models.
7. The method of claim 1, wherein the training further comprises: In the process that the model is used for classifying network faults, the model obtains the confidence coefficient of the fault category of one fault-related log group based on the input fault-related log group in the data set comprising the fault-related log group; Outputting fault types with the confidence coefficient higher than the confidence coefficient threshold according to the confidence coefficient of the fault category and the confidence coefficient threshold, wherein if the highest confidence coefficient in the confidence coefficient of the fault category is lower than the confidence coefficient threshold, a fault related log group belongs to an unknown type fault; Performing fault type labeling on a fault related log group corresponding to the fault of the unknown type to obtain a fault type label; and placing the fault related log group and the fault class label in a third training set to obtain an updated third training set, and using the updated third training set for model training.
8. The method of claim 1, wherein the log is jointly trained with the word mask prediction task of the described contrast learning task and fault-related log group, wherein model parameters are jointly optimized by MLM loss and LDA loss in training until convergence to optimal parameters; In the comparison learning task of the log and the description, the input of the BERT comprises the log and the description, and the BERT encodes the log and the description into hidden representations; In the word mask prediction task of the fault related log group, the input of the BERT comprises the fault related log group, a certain proportion of logs in each fault related log group are randomly masked, the BERT encodes the fault related log group into hidden representation for prediction masking, and the optimization target comprises logs with cross entropy loss prediction masked.
9. The network fault classification method is based on a network fault classification model and is characterized in that the network fault classification model comprises a fourth BERT model and an adaptive classifier thereof, wherein the classifier comprises a global average pooling layer, a full connection layer and a probability conversion layer; The network fault classification method comprises the following steps: The fourth BERT model converts input data into sequence embedded vectors, wherein the input data comprises a fault-related log group; the global average pooling layer compresses the sequence embedded vector into a first feature vector; the full connection layer carries out linear transformation on the first feature vector to obtain a second feature vector; the probability conversion layer maps the second feature vector output by the full-connection layer into probability distribution of labels of all fault categories through a Sigmoid function; wherein the network fault classification model is trained based on the training method of claim 1.
10. The method of claim 9, wherein obtaining a fault-related log group of the weblog comprises Obtaining log templates according to the weblogs, and constructing a frequency time sequence of each log template of the equipment; Constructing a complete connection undirected graph of the log template; Based on the frequency time sequence, carrying out causal relation test on edges in the full-connection undirected graph, and cutting off pseudo-causal edges in the full-connection undirected graph to obtain a plurality of connected subgraphs; and (3) arranging the log templates corresponding to all the nodes in the connected subgraph according to time sequence to obtain a corresponding fault related log group.
11. The method of claim 10, wherein the training set further comprises a first training set, wherein the first training set comprises data of logs and descriptions.
12. The method according to any one of claims 9-11, wherein the number of samples of the embedded vectors generated for each fault-related log group of each fault class is counted and an average value is calculated And standard deviation , Representing hyper-parameters, wherein a failure class corresponds to the number of samples The fault class is classified into a minority class, wherein if the fault class is classified into the minority class, the embedded vector corresponding to the fault class is an original embedded vector, the number of samples is the original number of samples, the synthesized embedded vector is obtained based on the following way that a potential variable is obtained by sampling from standard normal distribution, the fault class label vector is used as the condition of CVAE model, the decoder of CVAE model generates the synthesized embedded vector of the potential variable, and the steps are repeated until the sum of the number of samples of the synthesized embedded vector and the original number of samples is 。
13. The method according to any one of claims 9-11, wherein the number of samples of the embedded vectors generated for each fault-related log group of each fault class is counted and an average value is calculated And standard deviation , Representing hyper-parameters, wherein a failure class corresponds to the number of samples The fault type is classified into a plurality of types, wherein if the fault type is classified into the plurality of types, the embedded vector corresponding to the fault type is an original embedded vector, the representative embedded vector of the fault type is obtained based on the following modes that according to the original embedded vector and the fault type label vector, a potential variable corresponding to the original embedded vector is generated through an encoder of a CVAE model, potential variables of all the embedded vectors corresponding to the fault type are obtained, potential points are sampled in a type potential space convex hull area constructed by all the potential variables, nearest neighbor potential variables are searched for each sampling point, and the embedded vector corresponding to the nearest neighbor potential variables is used as the representative embedded vector.
14. The method of claim 9, wherein the fault class classification further comprises a balance class; counting the sample number of the embedded vectors correspondingly generated by the fault related log groups of each fault category, and calculating the average value And standard deviation , Representing super-parameters, wherein the number of samples of the embedded vector corresponding to a fault class falls within the interval The fault class is classified into an equilibrium class; According to the embedded vectors corresponding to the fault categories of the equalization classes, the synthesized embedded vectors corresponding to the fault categories of the minority classes and the original embedded vectors corresponding to the synthesized embedded vectors, and the representative embedded vectors corresponding to the fault categories belonging to the majority classes, a fifth training set is constructed, and the third BERT model and the adaptive classifier thereof are subjected to fine tuning by using the fifth training set to obtain a fourth BERT model, wherein the fourth BERT model and the adaptive classifier thereof are the network fault classification models.
15. The method of claim 9, wherein the training further comprises: In the process that the model is used for classifying network faults, the model obtains the confidence coefficient of the fault category of one fault-related log group based on the input fault-related log group in the data set comprising the fault-related log group; Outputting fault types with the confidence coefficient higher than the confidence coefficient threshold according to the confidence coefficient of the fault category and the confidence coefficient threshold, wherein if the highest confidence coefficient in the confidence coefficient of the fault category is lower than the confidence coefficient threshold, a fault related log group belongs to an unknown type fault; Performing fault type labeling on a fault related log group corresponding to the fault of the unknown type to obtain a fault type label; and placing the fault related log group and the fault class label in a third training set to obtain an updated third training set, and using the updated third training set for model training.
16. The method of claim 9, wherein the log is jointly trained with the word mask prediction task of the described contrast learning task and fault-related log group, wherein model parameters are jointly optimized by MLM loss and LDA loss during training until convergence to optimal parameters; In the comparison learning task of the log and the description, the input of the BERT comprises the log and the description, and the BERT encodes the log and the description into hidden representations; In the word mask prediction task of the fault related log group, the input of the BERT comprises the fault related log group, a certain proportion of logs in each fault related log group are randomly masked, the BERT encodes the fault related log group into hidden representation for prediction masking, and the optimization target comprises logs with cross entropy loss prediction masked.
17. The network fault classification device comprises a network fault classification model, and is characterized in that the network fault classification model comprises a fourth BERT model and an adaptive classifier thereof, wherein the classifier comprises a global average pooling layer, a full connection layer and a probability conversion layer; a fourth BERT model to convert input data into sequence embedded vectors, wherein the input data includes a fault-related log group; a global averaging pooling layer for compressing the sequence embedded vector into a first feature vector; the full-connection layer is used for carrying out linear transformation on the first characteristic vector to obtain a second characteristic vector; The probability conversion layer is used for mapping the second feature vector output by the full-connection layer into probability distribution of labels of all fault categories through a Sigmoid function; wherein the network fault classification model is trained based on the training method of claim 1.
18. The apparatus of claim 17, wherein obtaining a fault-related log group of the weblog comprises Obtaining log templates according to the weblogs, and constructing a frequency time sequence of each log template of the equipment; Constructing a complete connection undirected graph of the log template; Based on the frequency time sequence, carrying out causal relation test on edges in the full-connection undirected graph, and cutting off pseudo-causal edges in the full-connection undirected graph to obtain a plurality of connected subgraphs; and (3) arranging the log templates corresponding to all the nodes in the connected subgraph according to time sequence to obtain a corresponding fault related log group.
19. The apparatus of claim 18, wherein the training set further comprises a first training set, wherein the first training set comprises data of logs and descriptions.
20. The apparatus according to any one of claims 17-19, wherein the number of samples of the embedded vectors generated for each fault-related log group of each fault class is counted and an average value is calculated And standard deviation , Representing hyper-parameters, wherein a failure class corresponds to the number of samples The fault class is classified into a minority class, wherein if the fault class is classified into the minority class, the embedded vector corresponding to the fault class is an original embedded vector, the number of samples is the original number of samples, the synthesized embedded vector is obtained based on the following way that a potential variable is obtained by sampling from standard normal distribution, the fault class label vector is used as the condition of CVAE model, the decoder of CVAE model generates the synthesized embedded vector of the potential variable, and the steps are repeated until the sum of the number of samples of the synthesized embedded vector and the original number of samples is 。

Description

Data intelligence-based composite network fault classification model, training method and fault classification method Technical Field The invention belongs to the field of network fault classification, and relates to a data intelligent-based composite network fault classification model, a training method and a fault classification method. Background One of the core challenges of network operation and maintenance is to accurately identify risks and faults from the mass network device logs, which directly determines the preventive capability and response speed of system faults. The network device generates a large amount of logs in one day, most of the logs are negligible redundant information, and the core alarm log which truly represents serious risks accounts for only 3% -5%. More complex, single log information is often limited, associated logs representing the same event may be discontinuously distributed on a time sequence, and related information needs to be actively searched depending on operation and maintenance experience, so that comprehensive analysis and accurate judgment can be performed. For example, when an EIGRP neighbor interrupt alert (% EIGRP-5-NBRCHANGE: EIGRP-IPv4 (1) neighbor10.0.0.2 (gigabit Ethernet 0/3) isdown) occurs on a device, the type of failure (which may be temporary network jitter or configuration problems) cannot be determined by this log alone. In conjunction with port LINK state change alarms (% LINK-3-UPDOWN: interfaceGigabitEthernet0/3, changedstatedown) and packet loss threshold out-of-limit alarms (% LINEPROTO-4-PLTHRESH: packetlossthresholdexceededonGi 0/3) logs in the contract device, the party can determine that a routing protocol interruption due to a hardware failure has occurred and then check port hardware or cables to repair the failure. Such multi-layer superimposed faults requiring cross-log correlation analysis are referred to as "composite network faults". In the early stage, engineers commonly adopt methods such as keyword filtering (depending on fixed rules and easy to miss), threshold triggering (static threshold is difficult to adapt to a dynamic network), manual sampling (the randomness leads to the loss of key information) and the like to carry out log screening, and further carry out fault diagnosis based on expert knowledge. The method has the problems of high labor cost, poor dynamic adaptability, high omission rate and the like. With the development of computer science and artificial intelligence, data-driven fault diagnosis technology reduces the dependence on manpower, improves the efficiency of fault diagnosis, makes the fault diagnosis technology rapidly become a hotspot in the field of fault diagnosis, and is successfully applied in many aspects. Although the method based on unsupervised learning (anomaly detection) does not depend on a data set with a label, the method can only judge whether the data set is abnormal or not, and the network system behavior mode can evolve along with factors such as service updating, configuration changing and the like, so that a pre-trained normal mode is invalid, and a large number of false positives are generated. The method for supervising and learning is limited by the lack of labeling data, manual labeling of composite network faults requires a large amount of human resources, and many types of composite faults are rare, so that model learning is difficult to support. Disclosure of Invention In order to solve the problem of network fault classification and the problem of poor classification model accuracy caused by extremely unbalanced number of fault type samples. In a first aspect, a method for training a network fault classification model according to some embodiments of the application includes The training set comprises a second training set, wherein the second training set comprises a sample of a fault-related log group, the logs in the fault-related log group represent the same fault type according to causal relationship, and the logs are arranged in time sequence; Performing fine tuning on the second BERT model and an adaptive classifier thereof by using a third training set to obtain the third BERT model, wherein the third training set comprises fault-related log groups of the second training set and corresponding fault types thereof, and different fault-related log groups can have labels of the same fault type or labels of different fault types; Correspondingly generating an embedded vector by using a third BERT model through samples of a fault related log group of a third training set, constructing a fourth training set by using the embedded vector and a corresponding fault type label, and training a CVAE model by using the fourth training set to obtain a first CVAE model; obtaining the number of samples corresponding to the embedded vectors of the labels of different barrier categories according to the labels of the fault categories corresponding to the embedded vectors of the fourth training