CN-121984931-A - Priority-based high-visibility progressive data packet sampling method and system

CN121984931ACN 121984931 ACN121984931 ACN 121984931ACN-121984931-A

Abstract

The invention relates to a priority-based high-visibility stream-by-stream data packet sampling method and system, comprising the steps of step 1, stream initiation and priority marking, step 2, stream establishment and controller decision, step 3, rule issuing and policy execution, step 4, data forwarding, scheduling and sampling, wherein a priority scheduling mechanism of a data plane and global optimization of a control plane are decoupled and cooperated. In the data plane, the end host marks the degradation priority for the data packet according to the number of bytes sent by the stream, and the switch performs strict queue scheduling and probability sampling based on the priority. In the control plane, the controller cooperatively models the configuration of the route, the sampling probability and the degradation threshold value as a nonlinear optimization problem with the maximum sampling utility as a target, and solves by adopting an efficient online original-dual interior point method. The invention ensures high visibility and load balance, and obviously reduces control and calculation cost.

Inventors

CAI BINLEI
DONG XIAODONG
WANG YINGLONG

Assignees

齐鲁工业大学(山东省科学院)
山东省计算中心（国家超级计算济南中心）
南开大学

Dates

Publication Date: 20260505
Application Date: 20251222

Claims (9)

1. The high-visibility progressive data packet sampling method based on the priority is characterized by running on a system architecture, wherein the system architecture comprises three logic units and a collector, the three logic units comprise an end host, a software defined network switch and a controller, and the method comprises the following steps: Step 1, stream initiation and priority marking: when an application on the terminal host initiates a new data stream, the terminal host carries out degradation priority marking, which comprises immediately marking the first data packet of the new data stream with the highest priority, continuously tracking the accumulated byte number sent by the new data stream, and automatically distributing the priority of a lower level to the subsequent data packet when the byte number exceeds a preset degradation threshold value every time; step 2, flow establishment and controller decision: The first data PACKET of the new data flow arrives at a software defined network switch, the software defined network switch packages the first data PACKET into a PACKET-IN message, the PACKET-IN message is reported to a controller, and after the controller receives the message, a cooperative control mechanism is started, namely, an optimal transmission path is selected for the first data PACKET according to global topology and current load; And 3, rule issuing and policy execution: The controller installs the calculated flow table item to all software defined switches on the path through the OpenFlow protocol, wherein the flow table item definitely prescribes that the data packet with specific priority is classified to a specified queue and sampled with specified probability; Meanwhile, the controller configures a new degradation threshold value, namely the degradation threshold value sent back to the source host in the step 2, to the terminal host through a secure channel; and 4, data forwarding, scheduling and sampling: The subsequent packets of the new data flow undergo the following procedure in each software defined network switch on the optimal transmission path, according to their current priority: the data packets are matched in the flow table items and sent into corresponding strict priority queues according to the priorities of the data packets; the dispatcher of the software defined network switch always sends the head packet of the specified queue with highest priority and non-empty; When the data packet is scheduled and forwarded, a random number is generated according to the sampling probability associated with the appointed queue where the data packet is located; in the whole path, a software defined network switch supporting explicit congestion control monitors the length of a queue, marks a data packet when congestion occurs, and a terminal host dynamically adjusts the sending rate according to an ECN-Echo mark in an ACK packet to maintain the stability of the flow.
2. The high-visibility stream-by-stream packet sampling method according to claim 1, wherein in step 1, the priority flag comprises: Is provided with K priorities, expressed as ; Defining a degradation threshold vector as The starting values satisfy: ; for a transmitted number of bytes The priority p of the data packets of the data stream is determined by: If and only if 。
3. The method for sampling high-visibility stream-by-stream packets based on priority as claimed in claim 1, wherein in step 2, the cooperative control mechanism comprises: first, assume that the entire network topology is known to employ a directed graph A representation; Wherein, the Representing an ith software defined network switch, Representing an ith link; defining the upper limit of the sampling ability of the ith software defined network switch and the bandwidth of the ith network link as respectively And ; Assuming that the flow arrival rates of all source-destination node pairs are known And traffic-scale distribution function for all source-destination node pairs Corresponding probability density function Expected value ; Furthermore, a set of possible routing paths between all source-destination nodes is also known, denoted as ; Expressed in terms of And Distribution ratio of flow on feasible path for source-destination node pair; General benefit , wherein, Is a representation of And A concave increase function of the sampling gain of the flow of the source-destination node pair; Expressed in terms of And Sampling ratio of the stream for the source-destination node pair; Will all of 、 And Forming a variable vector x; Solving the route by adopting the following steps Probability of sampling Degradation threshold : 1) Setting convergence threshold epsilon, maximum iteration number I and algorithm parameters ; 2) Randomly selecting an initial point from a feasible solution set X of variable vectors X I.e. All the following constraints must be satisfied: Link capacity constraint: ; sampling load constraints: ; Sampling probability normalization constraint: , , ; Threshold ordering constraint: Wherein, the method comprises the steps of, ; ; ; Wherein, the Indicating the desired traffic size of the traffic transmitted by each flow through the queue k, Representing links The remaining bandwidth of the upper kth queue, Representing links The bandwidth occupied by the upper kth queue, For judging the function, if Belonging to The value is 1, otherwise 0; 3) Initializing dual Lagrangian variables Is a positive vector which is used for the generation of the data, ; 4) Build Newton's equation, bring in , And Solving to obtain affine direction And : ; Wherein, the Is that Is used for the gradient function of (a), For the same number of forward vectors as the number of constraints, And Jacobian matrix and diagonal matrix respectively composed of all constraint conditions, As dual variable A diagonal matrix is formed and is arranged, A Hessian matrix being a lagrangian function; 5) Introducing correction parameters Solving Newton's equation brought into step 4) to obtain corrected direction And The affine direction is calculated by the following formula And Correcting to obtain the descending direction And : ; ; Wherein, the ; 6) When the optimal solution is approached, a second-order correction step is performed, namely, the following second-order optimization problem is solved: ; ; Wherein, the And Respectively representing the j constraint condition and the gradient thereof; ; And Is that And Is used to determine the (j) th component of the (c), ; 7) The direction of descent after correction is calculated by the following formula: ; Wherein, the Is a collection The first formula in (a) is a value satisfied; 8) Updating decision variables according to the following formula Lagrangian multiplier : ; ; Returning to the step 4) to enter the next iteration until convergence or the maximum iteration number I is reached.
4. The method for sampling high-visibility per-flow data packets based on priority as claimed in claim 1, wherein a software defined network switch supporting explicit congestion control monitors a queue length, marks data packets when congestion occurs, and a terminal host dynamically adjusts a sending rate according to an ECN-Echo flag in an ACK packet to maintain flow stability, comprising: after receiving the ECN-CE marked packet, the end host sets an ECN-Echo flag in a subsequent series of ACK packets, maintains a weighted average congestion window adjustment factor, and smoothly reduces the congestion window according to the congestion window adjustment factor.
5. The high-visibility stream-by-stream packet sampling method based on priority of claim 1, wherein each output port of the software defined network switch maintains K queues, denoted as The scheduler works by queuing from the highest priority Starting the inspection if Non-empty, send the highest priority queue First pack, if Empty, check And so on.
6. A high visibility stream-wise data packet sampling method based on priority according to any of claims 1-5, wherein the sampling actions are integrated into a data packet processing pipeline of a software defined network switch, in which, Representing pairs belonging to queues on the software defined network switch Is used for the sampling probability of the data packet, Is encoded directly in the instruction set of the stream entry.
7. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the priority-based high-visibility progressive packet sampling method of any one of claims 1-5 when the computer program is executed.
8. A computer readable storage medium having stored thereon a computer program, which when executed by a processor implements the steps of the priority-based high-visibility progressive packet sampling method of any of claims 1-5.
9. A priority-based high-visibility stream-by-stream packet sampling system, comprising: the system comprises a stream initiation and priority marking module, a priority marking module and a priority marking module, wherein the stream initiation and priority marking module is configured to carry out degradation priority marking on an end host when an application on the end host initiates a new data stream, wherein the end host immediately marks the highest priority on the first data packet of the new data stream; The flow establishing and controller decision module is configured to enable a first data PACKET of the new data flow to reach a software defined network switch, the software defined network switch packages the first data PACKET into a PACKET-IN message, the PACKET-IN message is reported to the controller, and after receiving the message, the controller starts a cooperative control mechanism, namely, an optimal transmission path is selected for the first data PACKET according to global topology and current load; The rule issuing and policy executing module is configured to install the calculated flow table item to all software defined switches on the path through the OpenFlow protocol, wherein the flow table item definitely prescribes that the data packet with the specific priority is classified to a specified queue and sampled with specified probability; the data forwarding, scheduling and sampling module is configured to make the following data packet of the new data flow undergo the following processes in each software defined network switch on the optimal transmission path according to the current priority thereof, wherein the data packet is matched in a flow table entry and sent into a corresponding strict priority queue according to the priority thereof; the dispatcher of the software defined network switch always sends the head packet of the specified queue with highest priority and non-empty; When the data packet is scheduled and forwarded, a random number is generated according to the sampling probability associated with the appointed queue where the data packet is located; in the whole path, a software defined network switch supporting explicit congestion control monitors the length of a queue, marks a data packet when congestion occurs, and a terminal host dynamically adjusts the sending rate according to an ECN-Echo mark in an ACK packet to maintain the stability of the flow.

Description

Priority-based high-visibility progressive data packet sampling method and system Technical Field The invention relates to the technical field of Software Defined Networks (SDNs) and data center networks, in particular to a method and a system for efficiently sampling fine-grained data packets in a network. Background Network monitoring is a key stone for network operation and maintenance of a modern data center, and provides data support for various key tasks such as fault diagnosis, flow measurement, intrusion detection, performance optimization, charging and the like. However, as data center network sizes, complexities, and traffic grow exponentially, traditional "full traffic" monitoring methods (e.g., deep packet inspection, DPI) become impractical due to their huge resource consumption and scalability bottlenecks. In this context, packet sampling techniques have become an indispensable alternative to a lightweight approach to infer the overall network state by analyzing a representative subset of packets traversing the network. The existing data packet sampling frame mainly can be divided into three main categories according to the working principle and granularity, and each category has inherent and difficult-to-overcome defects: The first category is port-based sampling techniques, which represent industry standards including sFlow, cSamp, and SDN-based OpenSample, among others. Such techniques operate independently on each port of the switch, sampling the passing data packets with a fixed probability (e.g., 1 per N packet samples). Although sFlow is favored because it is widely supported by conventional switches, it has a fundamental drawback-the short-stream visibility is very low. Since the sampling is blind and stream independent, short streams containing only a few packets have a very low probability of being sampled when competing with the huge number of long streams, and are likely to be completely missing from the sample set. For example, at a sampling rate of 1/1000 or less, a short stream containing only 10 packets has almost no packets captured. This is fatal to network management tasks (especially intrusion detection and malware traffic analysis) that rely on complete flow sequences for analysis, as the spider traces of malicious activity or network failures tend to be hidden in these short flows. The second type is a sampling technique based on flow statistics, which typically represents the NetFlow of Cisco, and a series of Sketch (Sketch) schemes designed to save memory, such as ELASTICSKETCH, SKETCHVISOR and Nitrosketch, etc. Instead of directly sampling and storing data packets, such methods maintain a flow cache or sketch data structure at the switch, update per-flow statistics (e.g., number of packets, bytes) by sampling or measurement, and then report the flow statistics to the collector on a regular or regular basis. Their core goal is to make flow level measurements and inferences, such as building a traffic matrix. However, one fundamental limitation is that they are completely unable to obtain packet-level information. This means that the details of the payload content, the sequence of specific protocol fields, etc. of all data packets are all lost. Thus, such techniques cannot be applied to any scenario where deep inspection of the packet content is required, such as accurate traffic classification, matching of specific attack features, or application-level performance analysis, thus greatly limiting its scope of application. The third class is based on stream-by-stream packet sampling techniques, which are proposed to overcome the drawbacks of the first two classes of techniques. With the rise of Software Defined Networking (SDN) paradigms, researchers have proposed schemes such as Sample-on-Demand (SoD) and FlowShark, taking advantage of their centralized control capabilities and natural support for per-flow packet forwarding operations. They allow the controller to specify a sampling policy for each flow by extending the OpenFlow protocol, thus enabling true, customizable per-flow packet sampling. Despite advanced ideas, these approaches face serious scalability challenges, which are rooted in enormous control and computational overhead. In order to assign the appropriate sampling rate to each stream, the controller must know precisely the real-time packet rate of each stream. This relies on additional modules such as OpenTM or specialized packet rate estimators and long stream detectors (typically employing machine learning models). However, the flow completion time in a data center network is typically on the order of microseconds, while the rate estimator may take tens of seconds to stabilize, resulting in a significant deviation between the estimated and actual values, resulting in a sub-optimal sampling strategy. More seriously, frequent polling of the flow counter to obtain real-time information generates a huge amount of control messages, flooding the control chann