Search

US-20260128882-A1 - FLOW-LEVEL DEDUPLICATION OF NETWORK TRAFFIC IN A NETWORK TRAFFIC VISIBILITY SYSTEM

US20260128882A1US 20260128882 A1US20260128882 A1US 20260128882A1US-20260128882-A1

Abstract

A system and method for flow-level deduplication of network traffic are disclosed. A network node receives a first plurality of packets from a first network endpoint. The first plurality of packets represent a flow of data being communicated between the first network endpoint and a second network endpoint. The network node further receives a second plurality of packets from the second network endpoint. The network node identifies a sequence identifier of each packet of the first and second pluralities of packets. The network node determines that the first and second pluralities of packets are all associated with the same flow, based on the sequence identifiers of the first and second pluralities of packets. In response to that determination, the network node deduplicates the flow by discarding the first plurality of packets or the second plurality of packets. The network node may be a traffic visibility node.

Inventors

  • Murali BOMMANA
  • Sandeep DAHIYA
  • Santhosh Kumar

Assignees

  • GIGAMON INC.

Dates

Publication Date
20260507
Application Date
20251229

Claims (20)

  1. 1 . A method comprising: receiving, by a traffic visibility node, a first plurality of packets from a first network endpoint, wherein the first plurality of packets represent a flow of data being communicated between the first network endpoint and a second network endpoint; receiving, by the traffic visibility node, a second plurality of packets from the second network endpoint; identifying, by the traffic visibility node, a sequence identifier of each packet of the first plurality of packets and of each packet of the second plurality of packets; determining, by the traffic visibility node, that the first plurality of packets and the second plurality of packets are all associated with the same flow, based on the sequence identifiers of the first plurality of packets and the second plurality of packets; and in response to determining that the first plurality of packets and the second plurality of packets are all associated with the same flow, deduplicating the flow, by the traffic visibility node, by discarding at least a portion of the first plurality of packets or at least a portion of the second plurality of packets.
  2. 2 . The method of claim 1 , wherein determining that the first plurality of packets and the second plurality of packets are all associated with the same flow comprises determining that the sequence identifiers of all of the first plurality of packets and the second plurality of packets are identical.
  3. 3 . The method of claim 1 , wherein for each packet of the first plurality of packets and the second plurality of packets, the sequence identifier of the packet comprises a hash of a five-tuple and a directional indicator, the directional indicator being indicative of a communication direction of the packet.
  4. 4 . The method of claim 1 , wherein for each packet of the first plurality of packets and the second plurality of packets, the sequence identifier of the packet comprises a hash of header information from the packet, including source IP address, destination IP address, source port, destination port, protocol and a directional indicator, the directional indicator being indicative of a communication direction of the packet.
  5. 5 . The method of claim 1 , wherein the determining that the first plurality of packets and the second plurality of packets are all associated with the same flow comprises: reconstructing at least a portion of the flow at the traffic visibility node, by comparing at least a portion of data in the first plurality of packets with at least a portion of data in the second plurality of packets, within a sliding window.
  6. 6 . The method of claim 1 , wherein the first plurality of packets is at least a portion of an SSL Read stream or an SSL Write stream synthesized at the first network endpoint.
  7. 7 . The method of claim 1 , wherein the first plurality of packets and the second plurality of packets are each at least a portion of an SSL Read stream or an SSL Write stream synthesized at the first network endpoint or the second network endpoint.
  8. 8 . The method of claim 1 , wherein: the first plurality of packets and the second plurality of packets correspond to a flow of data being transmitted from the first network endpoint to the second network endpoint; the first plurality of packets is at least a portion of a synthesized SSL Write stream from the first network endpoint, corresponding to the flow of data being transmitted from the first network endpoint to the second network endpoint; and the second plurality of packets is at least a portion of a synthesized SSL Read stream from the second network endpoint, corresponding to the flow of data being transmitted from the first network endpoint to the second network endpoint.
  9. 9 . The method of claim 1 , wherein: the first plurality of packets and the second plurality of packets correspond to a flow of data being transmitted from the second network endpoint to the first network endpoint; the first plurality of packets is at least a portion of a synthesized SSL Read stream from the first network endpoint, corresponding to the flow of data being transmitted from the second network endpoint and to the first network endpoint; and the second plurality of packets is at least a portion of a synthesized SSL Write stream from the second network endpoint, corresponding to the flow of data being transmitted from the first network endpoint and to second network endpoint.
  10. 10 . The method of claim 1 , wherein the deduplicating the flow results in a deduplicated flow, the method further comprising: forwarding, by the traffic visibility node, at least a payload of a packet of the deduplicated flow to an external tool coupled to the traffic visibility node, for analysis.
  11. 11 . The method of claim 1 , wherein: for each packet of the first plurality of packets and the second plurality of packets, the sequence identifier of the packet comprises a hash of header information from the packet, including source IP address, destination IP address, source port, destination port, protocol and a directional indicator, the directional indicator being indicative of a communication direction of the packet; the first plurality of packets and the second plurality of packets are each at least a portion of an SSL Read stream or an SSL Write stream synthesized at the first network endpoint or the second network endpoint; determining that the first plurality of packets and the second plurality of packets are all associated with the same flow comprises determining that the sequence identifiers of all of the first plurality of packets and the second plurality of packets are identical; and deduplicating the flow results in a deduplicated flow; the method further comprising: forwarding, by the traffic visibility node, at least a payload of a packet of the deduplicated flow to an external tool coupled to the traffic visibility node, for analysis.
  12. 12 . At least one machine-readable storage medium having instructions stored thereon, execution of which by at least one processor causes performance of operations comprising: receiving, by a network node, a first plurality of packets from a first network endpoint that is external to the network node, wherein the first plurality of packets represent a flow of data being communicated between the first network endpoint and a second network endpoint that is external to the network node; receiving, by the network node, a second plurality of packets from the second network endpoint; identifying, by the network node, a sequence identifier of each packet of the first plurality of packets and of each packet of the second plurality of packets; determining, by the network node, that the first plurality of packets and the second plurality of packets are all associated with the same flow, based on the sequence identifiers of the first plurality of packets and the second plurality of packets; and in response to determining that the first plurality of packets and the second plurality of packets are all associated with the same flow, deduplicating the flow, by the network node, by discarding at least a portion of the first plurality of packets or at least a portion of the second plurality of packets, wherein deduplicating the flow results in a deduplicated flow.
  13. 13 . The at least one machine-readable storage medium of claim 12 , wherein determining that the first plurality of packets and the second plurality of packets are all associated with the same flow comprises determining that the sequence identifiers of all of the first plurality of packets and the second plurality of packets are identical.
  14. 14 . The at least one machine-readable storage medium of claim 12 , wherein for each packet of the first plurality of packets and the second plurality of packets, the sequence identifier of the packet comprises a hash of a five-tuple and a directional indicator, the directional indicator being indicative of a communication direction of the packet.
  15. 15 . The at least one machine-readable storage medium of claim 12 , wherein for each packet of the first plurality of packets and the second plurality of packets, the sequence identifier of the packet comprises a hash of header information from the packet, including source IP address, destination IP address, source port, destination port, protocol and a directional indicator, the directional indicator being indicative of a communication direction of the packet.
  16. 16 . The at least one machine-readable storage medium of claim 12 , wherein the determining that the first plurality of packets and the second plurality of packets are all associated with the same flow comprises reconstructing at least a portion of the flow at the traffic visibility node, by comparing at least a portion of data in the first plurality of packets with at least a portion of data in the second plurality of packets, within a sliding window.
  17. 17 . The at least one machine-readable storage medium of claim 12 , wherein the first plurality of packets is at least a portion of an SSL Read stream or an SSL Write stream synthesized at the first network endpoint.
  18. 18 - 21 . (canceled)
  19. 22 . A method comprising: detecting, by a worker node, invocation of an encryption/decryption function implemented in the worker node, wherein the invocation is to trigger encryption or decryption of a packet, wherein at least a portion of the packet is produced by or destined for a workload application in the worker node; and in response to detecting the invocation of the encryption/decryption function, capturing a clear text payload of the packet from the encryption/decryption function in the worker node, creating a modified packet based on the captured clear text payload of the packet, including synthesizing a plurality of headers for the modified packet and appending the plurality of headers to the clear text payload, the modified packet further including a hash of: a) at least some of the plurality of headers, and b) a directional indicator indicative of a communication direction of the packet, and sending the modified packet to a processing entity that is external to the worker node.
  20. 23 . The method of claim 22 , wherein the processing entity that is external to the worker node is a traffic visibility node.

Description

This is a continuation of U.S. patent application Ser. No. 18/441,400, filed on Feb. 14, 2024, which is incorporated by reference herein in its entirety. TECHNICAL FIELD At least one embodiment of the present disclosure pertains to techniques for providing deduplication of network data traffic, and more particularly, to a technique for providing flow-level deduplication of network data traffic in a network traffic visibility system. BACKGROUND Network communications traffic may be acquired at numerous entry points on a network by one or more devices called network traffic “visibility nodes” to provide extensive visibility of communications traffic flow and network security. These network traffic visibility nodes (or simply “visibility nodes” herein) may include physical devices, virtual devices, and Software Defined Networking (SDN)/Network Function Virtualization (NFV) environments, and may be collectively referred to as the computer network's “visibility fabric.” Various kinds of network tools are commonly coupled to such visibility nodes and used to identify, analyze, and/or handle security threats to the computer network, bottlenecks in the computer network, etc. Examples of such tools include an intrusion detection system (IDS), an intrusion prevention system (IPS), a network monitoring system, and an application monitoring system. The network visibility nodes are typically used to route network traffic (e.g., packets) to and from one or more connected network tools for these purposes. Examples of network visibility nodes suitable for these purposes include any of the GigaVUE® series of visibility appliances available from Gigamon® Inc. of Santa Clara, California. A network visibility node can be a physical device or system, or it can be a virtual device that is hosted by a physical device or system. A network visibility node commonly applies one or more policies to acquire and monitor traffic communicated in the target network. Encryption is often used to protect sensitive data communicated on computer networks. For example, encryption applications may encrypt data sent between servers and clients. However, the use of encryption may limit the capabilities of security tools that require data in clear text. BRIEF DESCRIPTION OF THE DRAWINGS Various features of the technology will become apparent to those skilled in the art from a study of the Detailed Description in conjunction with the drawings. Embodiments of the technology are illustrated by way of example and not limitation in the drawings, in which like references may indicate similar elements. FIG. 1A depicts an example of a network arrangement in which a network visibility node receives data packets from nodes in a computer network. FIG. 1B depicts another example of a network arrangement in which a network visibility node receives data packets from a node in a computer network. FIG. 2 is a block diagram showing an example of a network visibility node. FIG. 3 shows an example of a Kubernetes deployment. FIG. 4 shows an example of the relationship between services, nodes and pods in a Kubernetes deployment. FIG. 5 illustrates an example of a containerized environment in which the dynamic adaptation of traffic monitoring policies can be performed. FIG. 6 schematically illustrates the components of, and functional relationships between, a worker node and a corresponding encryption-compatible visibility (ECV) host. FIG. 7 is a flowchart illustrating an example of a process to configure the system to perform ECV. FIG. 8 is a flowchart illustrating an example of the runtime ECV process associated with a given worker node. FIG. 9 is a flowchart illustrating in greater detail an example of the step of capturing clear text payload data. FIG. 10 illustrates an example of synthesized Read and Write streams being sent to a traffic visibility node (TVN) for a particular encrypted flow between two applications. FIG. 11 is a flow diagram illustrating an example of a process of flow deduplication. FIG. 12 is a block diagram of significant components of a processing system, representing a physical platform that can implement one or more of the components described herein. DETAILED DESCRIPTION In this description, references to “an embodiment”, “one embodiment” or the like, mean that the particular feature, function, structure or characteristic being described is included in at least one embodiment of the technique introduced here. Occurrences of such phrases in this specification do not necessarily all refer to the same embodiment. On the other hand, the embodiments referred to also are not necessarily mutually exclusive. Encryption techniques and protocols, such as secure sockets layer (SSL) for example, may be used to protect sensitive data communicated on computer networks. For example, encryption applications may encrypt data sent between servers and clients. However, the use of encryption may limit the capabilities of a traffic visibility fabric, security to