CN-122027370-A - Abnormal attack identification method and system based on flow data analysis

CN122027370ACN 122027370 ACN122027370 ACN 122027370ACN-122027370-A

Abstract

The present invention relates to the field of data processing technologies, and in particular, to a method and a system for identifying abnormal attacks based on traffic data analysis. The method comprises the steps of obtaining a data packet sequence of each source IP in a preset time window, determining a time sequence rhythm stability index according to the characteristics of a packet sending time interval and the byte length, determining the space-time topology erosion degree according to the destination IP access dispersion and the destination port distribution breadth by combining the rhythm stability index, dynamically correcting the erosion degree by utilizing the whole network real-time traffic load and the erosion degree history increment, generating a final abnormal attack judgment coefficient, and further identifying an abnormal attack source. The invention solves the problem that the traditional technology is difficult to capture low-frequency slow attack and multi-source collaborative detection, and obviously improves the accurate identification capability of the hidden network attack while ensuring the stable operation in the service peak period through the space-time feature fusion and environment self-adaption mechanism.

Inventors

ZHANG FEI
ZHANG YUHAO

Assignees

广州大一互联网络科技有限公司

Dates

Publication Date: 20260512
Application Date: 20260415

Claims (10)

1. The abnormal attack identification method based on the flow data analysis is characterized by comprising the following steps: Acquiring a data packet sequence of each source IP address based on a preset time window; Acquiring time sequence rhythm stability indexes of each source IP address according to the packet sending time interval and the data packet byte length in the data packet sequence, and acquiring time-space topology aggressiveness of each source IP address according to the discrete degree of access of each source IP address to the destination IP address and the distribution breadth of access to the destination port by combining the time sequence rhythm stability indexes; Dynamically correcting the space-time topology erosion degree according to the real-time load state of the whole network flow and the historical change trend of the space-time topology erosion degree to obtain the final abnormal attack judgment coefficient of each source IP address; and comparing the final abnormal attack judgment coefficient with a preset judgment coefficient threshold value, and identifying an abnormal attack source.
2. The method for identifying abnormal attacks based on traffic data analysis according to claim 1, wherein the acquiring a time-series rhythm stability index for each source IP address comprises: taking the sum of byte lengths of all data packets in a data packet sequence of a kth source IP as the total byte number of the data packets of the kth source IP, taking the ratio of the total byte number of the data packets to the number of the data packets in the data packet sequence of the kth source IP as the average byte length of data packets sent by the kth source IP and recording as Acquiring the mean value and the variance of the time interval of the kth source IP sending data packet; , a time series rhythm stability indicator representing a kth source IP; representing the average value of the time interval of the kth source IP sending data packets; representing the variance of the time interval of the kth source IP transmitted data packet; representing preset super parameters; representing the number of data packets in the data packet sequence of the kth source IP; Representing a logarithmic function.
3. The method for identifying abnormal attacks based on traffic data analysis according to claim 2, wherein the obtaining the mean value and the variance of the time interval of the kth source IP transmission packet comprises: Taking the average value of the time intervals between all adjacent two data packets in the data packet sequence of the kth source IP as the average value of the time intervals of the data packets transmitted by the kth source IP, and taking the variance of the time intervals between all adjacent two data packets in the data packet sequence of the kth source IP as the variance of the time intervals of the data packets transmitted by the kth source IP.
4. The method for identifying abnormal attacks based on traffic data analysis according to claim 1, wherein the step of obtaining the spatio-temporal topology aggressiveness of each source IP address comprises: In the formula, Representing the spatio-temporal topological aggressiveness of the kth source IP; a time series rhythm stability indicator representing a kth source IP; A number of destination IP types representing a kth source IP; Representing the number of connections of the kth source IP to its mth category of IP; Representing the sum of the connection times of the kth source IP to all the destination IP of the kth source IP; a unique number of destination ports representing kth source IP access; Representing a first weight coefficient; Represents a second weight coefficient for adjusting Weights in the overall index; representing a logarithmic function with a base of 2.
5. The abnormal attack identification method based on traffic data analysis according to claim 4, wherein the obtaining of the number of connections of the kth source IP to the IP of the mth category thereof comprises: The number of categories of the destination IP address is counted from the flow records corresponding to all the data packets in the data packet sequence of each source IP and is used as the number of the destination IP types of each source IP, and the number of times of occurrence of the IP address of the mth category is counted from the flow records corresponding to all the data packets in the data packet sequence of the kth source IP and is used as the number of times of connection of the kth source IP to the IP of the mth category.
6. The method for identifying a traffic data analysis based anomaly attack according to claim 4, wherein the obtaining the number of unique destination ports accessed by the kth source IP includes: And counting the number of categories of the destination ports from the flow records corresponding to all the data packets in the data packet sequence of each source IP, and taking the number of categories as the unique number of the destination ports accessed by each source IP.
7. The method for identifying a denial of service attack based on traffic data analysis according to claim 1, wherein the obtaining a final denial of service attack decision coefficient for each source IP address comprises: , a final anomaly attack decision coefficient representing a kth source IP; representing the spatio-temporal topological aggressiveness of the kth source IP; Represents the historical spatio-temporal topology aggressiveness of the kth source IP; Representing the throughput value of the whole network per second at the current moment; A throughput value per second representing the whole under the condition of network stability; represents a trend sensitivity factor for adjusting the attention weight of the system to the erosion delta, e represents a natural constant, and max () represents a maximum function.
8. The method for identifying a anomaly attack based on traffic data analysis according to claim 7, wherein the obtaining of the historical spatio-temporal topological aggressiveness of the kth source IP comprises: The method comprises the steps of presetting a time window length N, marking a time stamp corresponding to a first data packet in a data packet sequence of a kth source IP as a starting moment, taking a sequence formed by data packets of the kth source IP acquired within N minutes before the starting moment as a historical data packet sequence of the kth source IP, acquiring the historical space-time topology aggressiveness of the kth source IP based on the historical data packet sequence of the kth source IP by using an acquisition method of the space-time topology aggressiveness of the source IP.
9. The method for identifying abnormal attack based on traffic data analysis according to claim 1, wherein the obtaining the sequence of the data packets of each source IP address based on the preset time window comprises: The method comprises the steps of acquiring data packets of network traffic in real time through probe equipment deployed at a network gateway or a switch mirror port, carrying out structural analysis on head information of each acquired data packet by adopting a standard protocol deep packet detection technology, extracting a stream record containing a source IP address, a destination port, a timestamp and a data packet size, presetting a time window length N, arranging all data packets transmitted by the same source IP address at the current moment according to ascending sequence of the timestamp in the corresponding stream record to obtain an original data packet sequence of each source IP, and taking a sequence formed by the data packets acquired in N minutes before the timestamp corresponding to the last data packet in the original data packet sequence of the kth source IP as the data packet sequence of each source IP.
10. A traffic data analysis based anomaly attack recognition system comprising a processor and a memory, the memory storing computer program instructions which, when executed by the processor, implement the traffic data analysis based anomaly attack recognition method according to any one of claims 1 to 9.

Description

Abnormal attack identification method and system based on flow data analysis Technical Field The present invention relates to the field of data processing technologies, and in particular, to a method and a system for identifying abnormal attacks based on traffic data analysis. Background With the rapid development of internet technology, network attack means are increasingly complicated and hidden, novel threats such as low-frequency and high-hidden slow attack, distributed cooperative attack and the like are continuously emerging, and unprecedented challenges are presented for the accuracy and timeliness of a network security protection system. The conventional network Intrusion Detection System (IDS) is used as one of the basic core devices for network security protection, and the abnormal behavior recognition logic is generally focused on macroscopic statistical indexes such as instantaneous flow rate and the number of data packets in unit time, for example, when the number of data packets per second exceeds a certain value and the instantaneous flow rate breaks through a specific upper limit, an abnormal alarm is triggered. However, the prior art has significant limitations in dealing with modern complex attacks. On one hand, the existing detection technology is used for carrying out feature matching analysis on single data packets or single network connection, does not consider time sequence association features, has obvious limitations when dealing with low-frequency slow attack initiated by an automatic script, for example, the low-frequency slow attack regulates and controls packet sending behaviors through a timing trigger mechanism, so that macroscopic indexes such as flow rate, data packet quantity and the like generated by a single node are always kept below a preset alarm threshold value, thereby avoiding detection; In summary, in the prior art, it is directly caused that the detection system is difficult to identify the complex attack mode with concealment and cross-session characteristics, so that the report missing rate is at a higher level for a long time, and reliable and effective security protection guarantee cannot be provided for the network environment. Disclosure of Invention The invention provides an abnormal attack identification method and system based on flow data analysis, which are used for solving the technical problem that a detection system is difficult to identify a complex attack mode with concealment and cross-session characteristics in the prior art, so that the report missing rate is in a higher level for a long time. In a first aspect, the present invention provides a method for identifying abnormal attacks based on traffic data analysis, which adopts the following technical scheme: the abnormal attack identification method based on flow data analysis comprises the following steps: Acquiring a data packet sequence of each source IP address based on a preset time window; Acquiring time sequence rhythm stability indexes of each source IP address according to the packet sending time interval and the data packet byte length in the data packet sequence, and acquiring time-space topology aggressiveness of each source IP address according to the discrete degree of access of each source IP address to the destination IP address and the distribution breadth of access to the destination port by combining the time sequence rhythm stability indexes; Dynamically correcting the space-time topology erosion degree according to the real-time load state of the whole network flow and the historical change trend of the space-time topology erosion degree to obtain the final abnormal attack judgment coefficient of each source IP address; and comparing the final abnormal attack judgment coefficient with a preset judgment coefficient threshold value, and identifying an abnormal attack source. The invention has the innovation that the final abnormal attack judgment coefficient of each source IP address is obtained by constructing the time rhythm stability and the space topology erosion degree, the limitation that the traditional detection only depends on the flow threshold is broken, the hidden low-frequency slow attack and the multi-source cooperative attack can be effectively identified, the false alarm rate in the service peak period is obviously reduced, and the safety protection precision of a power grid and a network environment is improved. Preferably, the obtaining the time sequence rhythm stability index of each source IP address includes: taking the sum of byte lengths of all data packets in a data packet sequence of a kth source IP as the total byte number of the data packets of the kth source IP, taking the ratio of the total byte number of the data packets to the number of the data packets in the data packet sequence of the kth source IP as the average byte length of data packets sent by the kth source IP and recording as Acquiring the mean value and the variance of the time interv