US-12621224-B2 - Abstracting network traffic as video for representation learning

US12621224B2US 12621224 B2US12621224 B2US 12621224B2US-12621224-B2

Abstract

A plurality of captured packets are received. The plurality of captured packets are from a plurality of packet flows. A packet flow is a communication session between two devices. For example, a packet flow may be a communication session between a client and a server. The plurality of captured packets are sorted into individual packet flows. The individual packet flows are converted into individual videos. For example, each packet from each packet flow is stored as a separate video frame in an individual video. A machine learning algorithm is applied to the individual videos to perform analytic tasks on the individual videos. For example, the machine learning algorithm may be used to identify anomalies within a packet flow and/or between packet flows.

Inventors

Manish Marwah
Martin Fraser Arlitt

Assignees

MICRO FOCUS LLC

Dates

Publication Date: 20260505
Application Date: 20230207

Claims (20)

1 . A system comprising: a microprocessor; and a computer readable medium, coupled with the microprocessor and comprising microprocessor readable and executable instructions that, when executed by the microprocessor, cause the microprocessor to: receive a plurality of captured packets, wherein the plurality of captured packets are from a plurality of packet flows; sort the plurality of captured packets into individual packet flows; convert the individual packet flows into individual videos comprising a plurality of video frames by mapping bytes of one or more packets in each individual packet flow into corresponding pixels of one or more video frames; and apply a machine learning algorithm to the individual videos to classify the individual videos, wherein video frames in the individual videos comprise metadata and wherein the metadata comprises one or more off a source IP address, a destination IP address, a source port number, a destination port number, a type of frame, a multiple frame packet indicator, a packet number, a cryptographic hash offset, failure information, packet header locations, an event, a Media Access (MAC) address, and a network address.
2 . The system of claim 1 , wherein classifying the individual videos comprise one or more of: clustering the individual videos; identifying an anomaly in a specific individual video; identifying similar individual videos; and training the machine learning model.
3 . The system of claim 1 , wherein converting the individual packet flows into the individual videos comprises: placing all bytes of each packet in each of the individual packet flows to create each of the individual videos.
4 . The system of claim 3 , wherein each byte of each packet in the individual packet flows is represented as a corresponding pixel in an induvial video frame.
5 . The system of claim 3 , wherein each protocol header starts in a fixed location in each video frame in the individual videos.
6 . The system of claim 1 , wherein converting the individual packet flows into the individual videos comprises a coarse transformation that only discriminates between header and payload information.
7 . The system of claim 1 , wherein converting the individual packet flows into individual videos comprises a medium transformation that uses pooling to convert the individual packet flows into the individual videos.
8 . The system of claim 1 , wherein converting the individual packet flows into individual videos comprises a video summarization.
9 . The system of claim 1 , wherein one or more of the classified individual videos are displayed visually to a user.
10 . The system of claim 9 , wherein individual headers and/or a payload are displayed to the user in different colors and/or grey scale for identifying differences between the individual videos.
11 . The system of claim 9 , wherein the user can select individual headers and/or payload and wherein the selection causes the individual headers and/or payload to be converted back to original packet data for display.
12 . The system of claim 1 , wherein a first individual video is compared to a second individual video and wherein the second individual video is retrieved from a malicious flow database.
13 . The system of claim 1 , wherein the metadata comprises a plurality of: a source IP address, a destination IP address, a source port number, a destination port number, a type of frame, a multiple frame packet indicator, a packet number, a cryptographic hash offset, failure information, packet header locations, an event, a Media Access (MAC) address, and a network address.
14 . A method comprising: receiving, by a microprocessor, a plurality of captured packets, wherein the plurality of captured packets are from a plurality of packet flows; sorting, by the microprocessor, the plurality of captured packets into individual packet flows; converting, by the microprocessor, the individual packet flows into individual videos, wherein converting the individual packet flows into the individual videos comprises one or more of a coarse transformation that only discriminates between header and payload information and a medium transformation that uses pooling to convert the individual packet flows into the individual videos; and applying, by the microprocessor, a machine learning algorithm to the individual videos to classify the individual videos, wherein one or more of the classified individual videos are displayed visually to a user.
15 . The method of claim 14 , wherein converting the individual packet flows into the individual videos comprises: placing all bytes of each packet in each of the individual packet flows to create each of the individual videos and wherein each byte of each packet in the individual packet flows is represented as a pixel in an individual video frame.
16 . The method of claim 15 , wherein video frames in the individual videos comprise metadata, and wherein the metadata comprises one or more of: a source IP address, a destination IP address, a source port number, a destination port number, a type of frame, a multiple frame packet indicator, a packet number, a cryptographic hash offset, failure information, packet header locations, an event, a Media Access (MAC) address, and a network address.
17 . The method of claim 14 , wherein converting the individual packet flows into the individual videos comprises the coarse transformation that only discriminates between header and payload information.
18 . The method of claim 14 , wherein converting the individual packet flows into individual videos comprises the medium transformation that uses pooling to convert the individual packet flows into the individual videos.
19 . The method of claim 14 , wherein converting the individual packet flows into individual videos comprises a video summarization.
20 . A non-transient computer readable medium having stored thereon instructions that cause a processor to execute a method, the method comprising instructions to: receive a plurality of captured packets, wherein the plurality of captured packets are from a plurality of packet flows; sort the plurality of captured packets into individual packet flows; convert the individual packet flows into individual videos, wherein converting the individual packet flows into individual videos comprises a video summarization; and apply a machine learning algorithm to the individual videos to classify the individual videos, wherein the converting of the individual packet flows into individual videos comprises at least one of the following: a coarse transformation that only discriminates between header and payload information; a medium transformation that uses pooling to convert the individual packet flows into the individual videos; and a video summarization.

Description

BACKGROUND Systematically extracting actionable information from network traffic data is key in addressing many important cybersecurity problems, such as intrusion and malware detection, and for network management problems, such as application and device identification. A major challenge in building machine learning models for these applications is manually engineering features from network traffic data, which is voluminous, heterogeneous (e.g., may contain IP addresses, MAC addresses, port numbers, categorical and numerical values, etc.), and dynamic (e.g., there is a continuous initiation and termination of flows between hosts). SUMMARY These and other needs are addressed by the various embodiments and configurations of the present disclosure. The present disclosure can provide a number of advantages depending on the particular configuration. These and other advantages will be apparent from the disclosure contained herein. A plurality of captured packets are received. The plurality of captured packets are from a plurality of packet flows. A packet flow is a communication session between two devices. For example, a packet flow may be a communication session between a client and a server. The plurality of captured packets are sorted into individual packet flows. The individual packet flows are converted into individual videos. For example, each packet from each packet flow is stored as one or more video frames in an individual video. A machine learning algorithm is applied to the individual videos to classify the individual videos. For example, the machine learning algorithm may be used to identify anomalies within a packet flow and/or between packet flows. The phrases “at least one”, “one or more”, “or”, and “and/or” are open-ended expressions that are both conjunctive and disjunctive in operation. For example, each of the expressions “at least one of A, B and C”, “at least one of A, B, or C”, “one or more of A, B, and C”, “one or more of A, B, or C”, “A, B, and/or C”, and “A, B, or C” means A alone, B alone, C alone, A and B together, A and C together, B and C together, or A, B and C together. The term “a” or “an” entity refers to one or more of that entity. As such, the terms “a” (or “an”), “one or more” and “at least one” can be used interchangeably herein. It is also to be noted that the terms “comprising”, “including”, and “having” can be used interchangeably. The term “automatic” and variations thereof, as used herein, refers to any process or operation, which is typically continuous or semi-continuous, done without material human input when the process or operation is performed. However, a process or operation can be automatic, even though performance of the process or operation uses material or immaterial human input, if the input is received before performance of the process or operation. Human input is deemed to be material if such input influences how the process or operation will be performed. Human input that consents to the performance of the process or operation is not deemed to be “material”. Aspects of the present disclosure may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, micro-code, etc.) or an embodiment combining software and hardware aspects that may all generally be referred to herein as a “circuit,” “module” or “system.” Any combination of one or more computer readable medium(s) may be utilized. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may be any computer readable medium that is not a computer readable storage medium and that can communicate, p