CN-121690862-B - Method and device for detecting high-speed network traffic abnormality
Abstract
The invention discloses a method and a device for detecting high-speed network flow abnormality, which belong to the technical field of network data flow real-time detection, the invention determines the estimated value of the network characteristic base of the storage position of the data flow through the data flow identification of the network data packet in each data flow, determines the abnormal data flow in the fine granularity level based on the estimated value of the network characteristic base, and determining an abnormal subnet in a coarse granularity level through a subnet identifier to which a network data packet in the abnormal data stream belongs, so that individual abnormal behaviors and group abnormal modes are captured, high-precision large-base stream detection and subnet-level aggregation analysis network traffic abnormality detection can be realized under the limited memory condition, and the problem that the current high-speed network traffic abnormality detection cannot be compatible with light-weight calculation and detection precision is solved.
Inventors
- LI CHUANWEI
- SUN YUE
- HUANG HE
- ZHANG HANWEN
- WANG BICHEN
- WANG ZHAOJIE
- JI PENG
Assignees
- 苏州大学
Dates
- Publication Date
- 20260512
- Application Date
- 20260210
Claims (8)
- 1. A method for detecting high-speed network traffic anomalies, comprising: acquiring a data flow identifier of a network data packet in each data flow; Determining a network characteristic base estimation value of a data stream for representing the size of the data stream at a storage position according to the data stream identification; The storage location comprises a first storage unit for storing and updating data stream state information and a second storage unit for storing and updating corresponding subnet aggregation characteristic information based on the data stream state information; the method for storing and updating the data stream state information by the first storage unit comprises the following steps: The data flow state information comprises a data flow identifier, a flow increment frequency count value and a network characteristic base estimation value; Generating a current data flow identifier corresponding to the network data packet according to the source address, the destination address and the protocol type characteristic corresponding to each arriving network data packet of the data flow; Calculating a first index position corresponding to each network data packet in the storage position based on the current data flow identification; the first index position is calculated using: ; Wherein, the Represent the first A first index location of the first storage unit corresponding to the network data packet, A first hash function is represented and is used to represent, Representing the identity of the current data stream, Representing the exclusive or operator, Represents a random seed and is used to represent a random seed, Representing the modulo operator, Representing the number of first memory cells; if the first index position in the first storage unit is occupied, the historical data flow identification and the corresponding historical network characteristic base estimation value exist in the first storage unit; Comparing the historical data flow identifier with the current data flow identifier, if the comparison result is consistent, performing increment operation on the flow increment frequency count value, updating the corresponding network characteristic base estimation value, otherwise triggering a conflict processing mechanism; screening out abnormal data streams based on the network characteristic base estimation value; acquiring a subnet identifier to which a network data packet in an abnormal data stream belongs; determining an abnormal subnet corresponding to the abnormal data flow based on the subnet identification; and carrying out association analysis processing on all the abnormal data streams and the abnormal subnets, and determining a key source flow of each abnormal subnet.
- 2. The method for detecting abnormal traffic of high-speed network according to claim 1, wherein the screening out abnormal data streams based on the network feature base estimation value comprises: and comparing each network characteristic base estimated value with a preset abnormal flow threshold, and if the network characteristic base estimated value is larger than the preset abnormal flow threshold, the data flow corresponding to the network characteristic base estimated value is an abnormal data flow.
- 3. The method for detecting abnormal traffic of high-speed network according to claim 1, wherein determining an abnormal subnet corresponding to an abnormal data flow based on a subnet identification comprises: determining an aggregation base estimation value which corresponds to the subnet identification and is used for representing the importance of the network characteristics at a storage position of the network data packet according to the subnet identification; and comparing each aggregation base number estimated value with a preset abnormal subnet threshold value, and if the aggregation base number estimated value is larger than the preset abnormal subnet threshold value, marking the subnet corresponding to the abnormal data stream as an abnormal subnet.
- 4. The method for detecting abnormal traffic of high-speed network according to claim 1, wherein the performing the association analysis processing on all abnormal data streams and abnormal subnets to determine the key source stream of each abnormal subnet comprises: Extracting corresponding subnet identification according to the source address and the destination address of each abnormal data stream to form a mapping from the subnet to the abnormal data stream set; Comparing the mapping from the sub-network to the abnormal data stream set with all the abnormal sub-networks one by one, and determining the abnormal data stream belonging to the abnormal sub-network and the abnormal data stream belonging to the normal sub-network; Calculating the base contribution degree of each abnormal data stream to the subnet abnormality to which the abnormal data stream belongs; And outputting a key source flow of each abnormal subnet from large to small according to the cardinal contribution degree.
- 5. The method for detecting high-speed network traffic anomalies according to claim 1, wherein the triggering conflict handling mechanism comprises: Generating a random value in the range of (0, 1); The substitution probability is calculated using: wherein, the method comprises the steps of, The probability of replacement is represented by a representation of the probability of replacement, A count value indicating a flow rate increase frequency; Comparing the replacement probability with a random value; If the replacement probability is larger than the random value, acquiring an element identifier of a data packet corresponding to the current data stream identifier, and transmitting the element identifier of the data packet corresponding to the current data stream identifier and the current data stream identifier to a second storage unit; if the replacement probability is not greater than the random value, the historical data stream identification and the corresponding historical network characteristic base number estimated value are sent to a second storage unit; the first memory location that triggered the conflict handling mechanism is reset.
- 6. The method according to claim 1, wherein the second storage unit includes a first counting unit and a second counting unit; the subnet aggregation characteristic information comprises a subnet identifier, a characteristic intensity value and an aggregation base number estimated value; Wherein the first counting unit stores subnet identification First characteristic intensity value And aggregating radix estimates ; Representing storage location number The subnet identification stored by the first counting unit in the second storage unit, Representing storage location number The characteristic intensity values stored by the first counting unit in the second storage units, Representing storage location number The aggregation base estimated value stored by the first counting unit in the second storage unit; The second counting unit stores subnet identification Second characteristic intensity value And aggregating radix estimates ; Representing storage location number The subnet identification stored by the second counting unit in the second storage unit, Representing storage location number The characteristic intensity values stored by the second counting unit in the second storage units, Representing storage location number The aggregation base estimation values stored by the second counting unit in the second storage units, 。
- 7. The method of claim 6, wherein the method of the second storage unit storing and updating the corresponding subnet aggregate characteristic information based on the data flow status information comprises: if the update of the state information of the data stream triggers a conflict processing mechanism, acquiring a subnet identification of a current data packet in the data stream; Calculating a second index position of the current data packet in the storage position based on the subnet identification; The second index position is calculated using: ; Wherein, the Represent the first A second index location of the second storage unit corresponding to the current data packet, A second hash function is represented and is used to represent, The identity of the sub-network is indicated, Representing the exclusive or operator, Represents a random seed and is used to represent a random seed, Representing the modulo operator, Representing the number of second storage units in the storage location; acquiring a first characteristic intensity value and a second characteristic intensity value of a second storage unit where the current data packet is located according to the second index position; acquiring an element identifier of a current data packet; according to the element identification of the current data packet, calculating a real-time characteristic intensity value corresponding to the current data packet; The following is adopted to calculate the real-time characteristic intensity value: ; Wherein, the Representing the real-time characteristic intensity value corresponding to the current data packet, Representing the calculation of the leading zero digit function, A third hash function is represented and is used to represent, An element identifier representing the current data packet; Comparing the real-time characteristic intensity value with the second characteristic intensity value, if the real-time characteristic intensity value is larger than the second characteristic intensity value, replacing the second characteristic intensity value with the real-time characteristic intensity value to obtain an updated second characteristic intensity value, otherwise, discarding the current data packet; If the first characteristic intensity value is smaller than the updated second characteristic intensity value, the two values are exchanged.
- 8. A high-speed network traffic anomaly detection apparatus for implementing the high-speed network traffic anomaly detection method of claim 1, the apparatus comprising: The first acquisition module is used for acquiring a data flow identifier of a network data packet in each data flow; the first calculation module is used for determining a network characteristic base estimation value of a data flow for representing the size of the data flow at a storage position according to the data flow identification; The storage location comprises a first storage unit for storing and updating data stream state information and a second storage unit for storing and updating corresponding subnet aggregation characteristic information based on the data stream state information; the method for storing and updating the data stream state information by the first storage unit comprises the following steps: The data flow state information comprises a data flow identifier, a flow increment frequency count value and a network characteristic base estimation value; Generating a current data flow identifier corresponding to the network data packet according to the source address, the destination address and the protocol type characteristic corresponding to each arriving network data packet of the data flow; Calculating a first index position corresponding to each network data packet in the storage position based on the current data flow identification; the first index position is calculated using: ; Wherein, the Represent the first A first index location of the first storage unit corresponding to the network data packet, A first hash function is represented and is used to represent, Representing the identity of the current data stream, Representing the exclusive or operator, Represents a random seed and is used to represent a random seed, Representing the modulo operator, Representing the number of first memory cells; if the first index position in the first storage unit is occupied, the historical data flow identification and the corresponding historical network characteristic base estimation value exist in the first storage unit; Comparing the historical data flow identifier with the current data flow identifier, if the comparison result is consistent, performing increment operation on the flow increment frequency count value, updating the corresponding network characteristic base estimation value, otherwise triggering a conflict processing mechanism; the processing module is used for screening out abnormal data streams based on the network characteristic base estimation value; the second acquisition module is used for acquiring the subnet identifier to which the network data packet belongs in the abnormal data stream; the second calculation module is used for determining an abnormal subnet corresponding to the abnormal data flow based on the subnet identification; And the analysis module is used for carrying out association analysis processing on all the abnormal data streams and the abnormal subnets and determining the key source flow of each abnormal subnet.
Description
Method and device for detecting high-speed network traffic abnormality Technical Field The invention relates to a method and a device for detecting high-speed network traffic abnormality, belonging to the technical field of network data flow real-time detection. Background In the high-speed network environment of modern data centers, the network link rate can reach hundreds of Gbps or even Tbps levels. To ensure that the monitoring performance matches the network line speed, the traffic monitoring mechanism typically needs to be deployed on a network processing chip of a switch or router, and perform real-time data processing based on-chip cache. However, on-chip cache resources are very limited, typically only a few megabytes in capacity, and the need to share memory with other network function modules has created a major bottleneck in achieving high-precision traffic monitoring. The traditional monitoring method relying on accurate counting is difficult to realize real-time processing in a high-speed network because of huge memory overhead caused by the need of maintaining complete state information for each data stream, and the sampling-based scheme can reduce resource consumption but can lose key flow characteristic information, so that the detection accuracy is reduced. Meanwhile, the network traffic presents the characteristic of long-tail distribution, a few superpropagator nodes occupy most traffic, and most common hosts only generate a small number of connections, so that the uniform sampling method is difficult to effectively capture key anomalies. In addition, modern forms of network attacks tend to be distributed and low-flow, and attackers often cooperatively send small-scale data packets through a large number of controlled hosts to avoid a threshold detection mechanism, and single traffic appears normal but can have serious effects after aggregation. In the prior art, a probability type data structure is generally introduced to reduce the cache pressure, but the problems of insufficient precision and the like exist in the process of hash collision and fine granularity analysis, and when a plurality of data streams are hashed to the same storage position, the estimation error is obviously increased, so that the accurate identification requirement on superpropagators and distributed attacks is difficult to meet. Meanwhile, the existing scheme focuses on single-flow level detection, and the lack of effective monitoring on sub-network level aggregate flow leads to difficulty in synchronously realizing multi-granularity monitoring targets of a host level and a sub-network level under the condition of limited memory, so that the comprehensive detection capability of the system on complex distributed abnormal behaviors is limited. Disclosure of Invention The invention aims to overcome the defects in the prior art, and provides a high-speed network traffic anomaly detection method and device, which can realize network traffic anomaly detection of high-precision large-base-number stream detection and subnet-level aggregation analysis under the condition of limited memory, and solve the problem that the current high-speed network traffic anomaly detection cannot achieve both light-weight calculation and detection precision. In order to solve the technical problems, the invention is realized by adopting the following technical scheme: the invention provides a method for detecting high-speed network traffic abnormality, which comprises the following steps: acquiring a data flow identifier of a network data packet in each data flow; Determining a network characteristic base estimation value of a data stream for representing the size of the data stream at a storage position according to the data stream identification; screening out abnormal data streams based on the network characteristic base estimation value; acquiring a subnet identifier to which a network data packet in an abnormal data stream belongs; determining an abnormal subnet corresponding to the abnormal data flow based on the subnet identification; and carrying out association analysis processing on all the abnormal data streams and the abnormal subnets, and determining a key source flow of each abnormal subnet. Further, screening out the abnormal data stream based on the network feature base estimation value includes comparing each network feature base estimation value with a preset abnormal flow threshold, and if the network feature base estimation value is greater than the preset abnormal flow threshold, the data stream corresponding to the network feature base estimation value is the abnormal data stream. Further, the determining, based on the subnet identification, an abnormal subnet corresponding to the abnormal data flow includes: determining an aggregation base estimation value which corresponds to the subnet identification and is used for representing the importance of the network characteristics at a storage position of the network d