CN-121984792-A - Adaptive network attack prediction defense method, device and readable storage medium

CN121984792ACN 121984792 ACN121984792 ACN 121984792ACN-121984792-A

Abstract

The invention belongs to the technical field of network security, and relates to a self-adaptive network attack prediction defense method, a device and a readable storage medium, wherein a moment sequence is obtained by slicing a network data stream, a plurality of historical time sequence sampling sequences are obtained by taking the current moment as an end point and utilizing different scale time windows to forward sample, the characteristic extraction and the attack prediction are carried out on each historical time sequence sampling sequence to distribute weights to each characteristic vector, an attack risk probability interval and interval width are obtained based on each historical time sequence sampling sequence, the characteristic vectors, the weights and preset attack risk upper and lower boundaries, a first attack risk probability sequence, a second attack risk probability sequence, a third attack risk probability sequence and an attack risk fluctuation sequence are obtained based on the attack risk probability intervals at the current moment and the historical moment, a defense strategy acquisition process is described as a Markov decision process, a state space, an action space and a reward function are established, a deep learning frame is obtained, and network attack defense measures at the current moment are solved.

Inventors

ZHAI HUANHUAN
ZHU GUIQIAN

Assignees

苏州大学

Dates

Publication Date: 20260505
Application Date: 20260407

Claims (10)

1. An adaptive network attack prediction defense method, which is characterized by comprising the following steps: Slicing the network data stream according to a fixed time interval to obtain a time sequence of an equal time interval, and forward sampling by using different scale time windows with the current time as an end point to obtain a plurality of historical time sequence sampling sequences with different lengths at the current time; performing feature extraction and preliminary attack prediction on each historical time sequence sampling sequence at the current moment, and distributing weights to feature vectors of each historical time sequence sampling sequence based on a preliminary attack prediction result; Inputting each historical time sequence sampling sequence at the current moment, the feature vector of each historical time sequence sampling sequence, the weight of each historical time sequence sampling sequence, the preset attack risk upper bound and the preset attack risk lower bound into a deep time sequence coding network, and outputting an attack risk probability interval at the current moment and the attack risk probability interval width; Obtaining a first attack risk probability sequence at the current moment based on the maximum value of attack risk probabilities at the current moment and the historical moment, obtaining a second attack risk probability sequence at the current moment based on the minimum value of attack risk probabilities at the current moment and the historical moment, obtaining a third attack risk probability sequence at the current moment based on the width of attack risk probability intervals at the current moment and the historical moment, and obtaining an attack risk fluctuation sequence at the current moment based on the difference of the widths of the attack risk probability intervals at every two adjacent moments in the current moment and the historical moment; based on the first attack risk probability sequence, the second attack risk probability sequence, the third attack risk probability sequence and the attack risk fluctuation sequence, describing a defense strategy acquisition process at the current moment as a Markov decision process, establishing a state space, an action space and a reward function, obtaining a deep learning framework and solving to obtain network attack defense measures at the current moment.
2. The adaptive cyber attack prediction defense method according to claim 1, wherein the process of obtaining the time windows of different scales includes: acquiring a first scale time window based on the length of the time sequence, wherein the length of the first scale time window is more than or equal to 1/3 of the length of the time sequence and less than or equal to 1/2 of the length of the time sequence; forward sampling by using the first scale time window and taking the current moment as an end point to obtain a first historical time sequence sampling sequence; Calculating the mean value and standard deviation of the first historical time sequence sampling sequence to obtain a variation coefficient of the first historical time sequence sampling sequence, and scaling the first scale time window by using the variation coefficient to obtain a second scale time window; And calculating an autocorrelation coefficient of the first historical time sequence sampling sequence by using the autocorrelation function, and scaling the first scale time window based on the autocorrelation coefficient to obtain a third scale time window.
3. The adaptive cyber attack prediction defense method according to claim 1, wherein the feature extraction and preliminary attack prediction are performed on each of the historical time series sampling sequences at the present time, and the weight is assigned to the feature vector of each of the historical time series sampling sequences based on the preliminary attack prediction result, comprising: respectively inputting each historical time sequence sampling sequence into a first convolutional neural network for feature extraction to obtain feature vectors of each historical time sequence sampling sequence; Respectively inputting the feature vectors of each historical time sequence sampling sequence into a second convolution neural network to perform preliminary attack prediction, so as to obtain attack risk probability and prediction result confidence of each historical time sequence sampling sequence; clustering each historical time sequence sampling sequence based on attack risk probability and prediction result confidence level to obtain at least two historical time sequence sampling sequence clusters; Based on the attack risk probability average value and the prediction result confidence coefficient average value of all the historical time sequence sampling sequences in each historical time sequence sampling sequence cluster, obtaining the attack risk probability and the prediction result confidence coefficient of each historical time sequence sampling sequence cluster, and taking the attack risk probability average value and the prediction result confidence coefficient as the cluster center of the historical time sequence sampling sequence cluster; Calculating the distance from the historical time sequence sampling sequence to the cluster center of the historical time sequence sampling sequence cluster based on the attack risk probability and the prediction result confidence of each historical time sequence sampling sequence in each historical time sequence sampling sequence cluster; Obtaining the intra-class concentration of each historical time sequence sampling sequence cluster based on the average value of the distances from all the historical time sequence sampling sequences in each historical time sequence sampling sequence cluster to the center of the cluster; Sorting all the historical time sequence sampling sequence clusters in descending order according to the intensity of the class, and distributing weights to the feature vectors of the historical time sequence sampling sequences in each historical time sequence sampling sequence cluster based on the sorting result; wherein the weight of the feature vector of the history time sequence sampling sequence in the history time sequence sampling sequence cluster ranked in front is larger than the weight of the feature vector of the history time sequence sampling sequence in the history time sequence sampling sequence cluster ranked in back.
4. The adaptive cyber attack prediction defense method according to claim 1, wherein describing the defense strategy acquisition process at the current time as a markov decision process based on the first attack risk probability sequence, the second attack risk probability sequence, the third attack risk probability sequence, and the attack risk fluctuation sequence, and establishing a state space, an action space, and a reward function, includes: Defining a first attack risk probability sequence, a second attack risk probability sequence, a third attack risk probability sequence, an attack risk fluctuation sequence, flow characteristics of a network data stream, a system resource state and attack frequency as a state space; Defining an attack early warning threshold value, system resource scheduling, a defense strategy and an attack early warning level as an action space; And constructing a reward function based on the attack risk probability interval width reduction amount, the system resource scheduling cost and the defense strategy switching frequency.
5. The adaptive cyber attack prediction defense method according to claim 4, wherein constructing the reward function based on the attack risk probability interval width reduction amount, the system resource scheduling cost, and the defense policy switching frequency includes: the maximum reduction of attack risk probability and the reduction of attack risk probability interval width at adjacent moments are used as forward rewarding items; taking the system resource scheduling cost and the defense strategy switching frequency as penalty items; The bonus function is constructed based on the difference of the weighted sum of the forward bonus terms and the weighted sum of the penalty terms.
6. The adaptive cyber attack prediction defense method according to claim 5, wherein solving the deep reinforcement learning framework to obtain the cyber attack defense measure at the current time includes: s11, combining a first attack risk probability sequence, a second attack risk probability sequence, a third attack risk probability sequence, an attack risk fluctuation sequence, flow characteristics of a network data stream, a system resource state and attack frequency at the current moment, and inputting the combination as a state vector of a time slot t into a main Q network to obtain an action vector of the time slot t output by the main Q network; S12, executing the action vector of the time slot t, calculating the rewarding function value of the time slot t and the state vector of the time slot t+1, and storing the state vector of the time slot t, the action vector of the time slot t, the rewarding function value of the time slot t and the state vector of the time slot t+1 into an experience playback pool as a tuple; s13, updating t=t+1, and returning to execute the step S11 until the number of tuples in the experience playback pool reaches the preset number; S14, randomly extracting a sample packet formed by a preset number of tuples from an experience playback pool, respectively inputting each tuple in the sample packet into a main Q network and a target Q network, calculating the value of a loss function based on a predicted Q value corresponding to each tuple output by the main Q network and a target Q value corresponding to each tuple output by the target Q network, and updating the parameter of the main Q network by minimizing the value of the loss function; And S15, returning to the execution step S11 to perform iterative optimization on the main Q network until the iterative optimization times reach the preset times, and obtaining the network attack defending measures at the current moment based on the action vector of the time slot t output by the main Q network.
7. The adaptive cyber attack prediction defense method according to claim 6, wherein the calculating of the reward function value of the slot t further includes: If the rewarding function value of the time slot t is smaller than-1, setting the rewarding function value of the time slot t to-1; If the prize function value of the time slot t is greater than 1, the prize function value of the time slot t is set to 1.
8. The adaptive cyber attack prediction defense method according to claim 6 wherein the state vector of slot t Expressed as: , Wherein, the Representing a first attack risk probability sequence; Representing a second attack risk probability sequence; Representing a third attack risk probability sequence; representing an attack risk fluctuation sequence; representing traffic characteristics of the network data stream; representing a system resource state; Representing attack frequency; motion vector for time slot t Expressed as: , Wherein, the Representing an attack early warning threshold; representing system resource scheduling; Representing a defensive strategy; representing attack early warning level; Prize function value of time slot t The calculation formula of (2) is as follows: , Wherein, the Representing the maximum reduction of attack risk probability at adjacent moments; representing the width reduction amount of attack risk probability intervals at adjacent moments; representing the system resource scheduling cost; Representing a defensive policy switching frequency; Representation of Weights of (2); Representation of Weights of (2); Representation of Weights of (2); Representation of Is a weight of (2).
9. An adaptive cyber attack prediction defense device, comprising: the multi-scale sampling module is used for slicing the network data stream according to fixed time intervals to obtain time sequences of equal time intervals, and forward sampling is carried out by taking the current time as an end point by utilizing time windows of different scales to obtain a plurality of historical time sequence sampling sequences with different lengths at the current time; The attack probability prediction and weight distribution module is used for carrying out feature extraction and preliminary attack prediction on each historical time sequence sampling sequence at the current moment, and distributing weights to feature vectors of each historical time sequence sampling sequence based on a preliminary attack prediction result; the attack risk probability interval prediction module is used for inputting each historical time sequence sampling sequence at the current moment, the characteristic vector of each historical time sequence sampling sequence, the weight of each historical time sequence sampling sequence, the preset attack risk upper bound and the preset attack risk lower bound into the deep time sequence coding network, and outputting an attack risk probability interval at the current moment and an attack risk probability interval width; The data sequence construction module is used for obtaining a first attack risk probability sequence at the current moment based on the maximum value of attack risk probabilities at the current moment and the historical moment, obtaining a second attack risk probability sequence at the current moment based on the minimum value of attack risk probabilities at the current moment and the historical moment, obtaining a third attack risk probability sequence at the current moment based on the width of attack risk intervals at the current moment and the historical moment, and obtaining an attack risk fluctuation sequence at the current moment based on the difference between the widths of the attack risk intervals at every two adjacent moments in the current moment and the historical moment; The defending strategy acquisition module is used for describing a defending strategy acquisition process at the current moment as a Markov decision process based on the first attack risk probability sequence, the second attack risk probability sequence, the third attack risk probability sequence and the attack risk fluctuation sequence, establishing a state space, an action space and a reward function, obtaining a deep learning framework and solving to obtain network attack defending measures at the current moment.
10. A computer readable storage medium, characterized in that the computer readable storage medium has stored thereon a computer program which, when executed by a processor, implements the steps of the adaptive network attack prediction defense method according to any of claims 1 to 8.

Description

Adaptive network attack prediction defense method, device and readable storage medium Technical Field The invention relates to the technical field of network security, in particular to a self-adaptive network attack prediction defense method, a device and a computer readable storage medium. Background With the rapid development of cloud computing, internet of things and industrial Internet, the scale of a network system is continuously enlarged, and service forms show diversified and dynamic characteristics. Meanwhile, the network attack mode is increasingly complicated, and the attack behavior gradually shows the characteristics of enhanced concealment, longer attack chain, more staged attack steps, dynamic fluctuation of attack intensity and the like. Typical attacks include distributed denial of service attacks (Distributed Denial of Service, DDoS), scan detection, malicious code propagation, data theft, advanced persistent threat attacks (ADVANCED PERSISTENT THREAT, APT), and the like. In order to ensure the security of network systems, the prior art generally adopts an intrusion detection system (Intrusion Detection System, IDS), an intrusion prevention system (Intrusion Prevention System, IPS), a firewall, a traffic cleaning device, a security information and event management system (Srcurity Information AND EVENT MANAGEMENT, SIEM) and other means to realize attack detection and protection. The common intrusion detection system and the defending system mostly adopt a rule base or a feature matching mode to realize attack recognition, for example, network data packets, session behaviors or log time are matched through preset attack feature rules, when the rules are triggered, alarms are output or blocking operations (such as blocking IP, blocking ports and the like) are executed, the scheme is simple to realize and high in instantaneity, but highly depends on a manually maintained rule base, unknown attacks or variant attacks are difficult to recognize, and the strategy configuration is usually fixed and poor in adaptability. In order to solve the problem, in recent years, a deep learning method is used for network intrusion detection and attack prediction, and abnormal behavior recognition and attack probability evaluation are realized by modeling time sequence features of network traffic, however, network traffic data has the characteristics of multidimensional isomerism, strong non-stationarity, obvious burstiness and irregular sequence length, if a time sequence model is directly adopted for point value prediction, so that attack types or single attack risk scores are output, fluctuation of traffic features and uncertainty of prediction results are difficult to fully describe, and when traffic is subjected to severe change under the condition of attack mutation or noise interference, prediction deviation is extremely easy to appear in the model, so that the reliability of a subsequent defense strategy is influenced. In summary, the existing self-adaptive network attack prediction defense method ignores the characteristics of multidimensional isomerism, strong non-stationarity, obvious burstiness and irregular sequence length of network traffic data, and is easy to generate attack prediction deviation, thereby influencing the reliability of defense strategies. Disclosure of Invention Therefore, the technical problem to be solved by the invention is to solve the problems that the self-adaptive network attack prediction defense method in the prior art ignores the characteristics of multidimensional isomerism, strong non-stationarity, obvious burstiness and irregular sequence length of network traffic data, and attack prediction deviation is easy to occur, thereby influencing the reliability of a defense strategy. In order to solve the technical problems, the invention provides a self-adaptive network attack prediction defense method, which comprises the following steps: Slicing the network data stream according to a fixed time interval to obtain a time sequence of an equal time interval, and forward sampling by using different scale time windows with the current time as an end point to obtain a plurality of historical time sequence sampling sequences with different lengths at the current time; performing feature extraction and preliminary attack prediction on each historical time sequence sampling sequence at the current moment, and distributing weights to feature vectors of each historical time sequence sampling sequence based on a preliminary attack prediction result; Inputting each historical time sequence sampling sequence at the current moment, the feature vector of each historical time sequence sampling sequence, the weight of each historical time sequence sampling sequence, the preset attack risk upper bound and the preset attack risk lower bound into a deep time sequence coding network, and outputting an attack risk probability interval at the current moment and the attack risk probability int