Search

EP-4172827-B1 - INFORMATION ENHANCED CLASSIFICATION

EP4172827B1EP 4172827 B1EP4172827 B1EP 4172827B1EP-4172827-B1

Inventors

  • Raghuramu, Arun
  • SONG, Yuzhou
  • ZHANG, YANG

Dates

Publication Date
20260513
Application Date
20210615

Claims (15)

  1. A method comprising: accessing network traffic from a network (100, 200), wherein the network traffic is associated with a plurality of entities (102, 104, 106, 120, 122, 130, 150); selecting an entity of the plurality of entities (102, 104, 106, 120, 122, 130, 150); determining one or more values associated with one or more properties associated with the entity, wherein the one or more values are accessed from the network traffic; determining that the one or more values associated with the one or more properties are insufficient to classify the entity; in response to determining that the one or more values associated with the one or more properties are insufficient to classify the entity: applying one or more heuristics to the one or more properties associated with the entity to generate a search query based on the one or more values associated with the one or more properties associated with the entity; performing the search query based on the one or more values determined from the network traffic; receiving a search query result, wherein the search query result comprises a plurality of webpages; and accessing data from a webpage (306) of the plurality webpages, wherein the data comprises one or more additional values associated with one or more additional properties of the entity; determining, by a processing device (1000), a classification result of the entity based on the data from the webpage (306) of the plurality of webpages and the one or more values associated with the one or more properties associated with the entity accessed from the network traffic; and storing the classification result.
  2. The method of claim 1 further comprising: performing an action based on the classification result.
  3. The method of claim 1, wherein the classification result is determined using a model trained based on data from a plurality of webpages.
  4. The method of claim 1, wherein the classification result is determined based on a database comprising data based on a plurality of webpages.
  5. The method of claim 1 further comprising: accessing a keyword list comprising a plurality of keywords; and determining one or more keyword matches of the plurality of keywords and data from the webpage (306), wherein the classification result is based on at least one keyword of the keyword list.
  6. A system comprising: a memory (1004, 1006, 1018); and a processing device (1000), operatively coupled to the memory (1004, 1006, 1018), adapted to: access network traffic from a network (100, 200), wherein the network traffic is associated with a plurality of entities (102, 104, 106, 120, 122, 130, 150); select an entity of the plurality of entities (102, 104, 106, 120, 122, 130, 150); determine one or more values associated with one or more properties associated with the entity, wherein the one or more values are accessed from the network traffic; determine that the one or more values associated with the one or more properties are insufficient to classify the entity; in response to determining that the one or more values associated with the one or more properties are insufficient to classify the entity; apply one or more heuristics to the one or more properties associated with the entity to generate a search query based on the one or more values associated with the one or more properties associated with the entity; perform the search query based on the one or more values determined from the network traffic; receive a search query result, wherein the search query result comprises a plurality of webpages; and access data from a webpage (306) of the plurality of webpages, wherein the data comprises one or more additional values associated with the one or more additional properties of the entity; determine, by the processing device (1000), a classification result of the entity based on the data from the webpage (306) of the plurality of webpages and the one or more values associated with the one or more properties associated with the entity accessed from the network traffic; and store the classification result.
  7. The system of claim 6, the processing device (1000) further adapted to: perform an action based on the classification result.
  8. The system of claim 6, wherein the classification result is determined using a model trained based on data from a plurality of webpages.
  9. The system of claim 6, wherein the classification result is determined based on a database comprising data based on a plurality of webpages.
  10. The system of claim 6, the processing device (1000) further adapted to: extract text from the webpage (306) of the plurality of webpages.
  11. The system of claim 6, the processing device (1000) further adapted to: access a keyword list comprising a plurality of keywords; and determine one or more keyword matches of the plurality of keywords and data from the webpage (306), wherein the classification result is based on at least one keyword of the keyword list.
  12. A non-transitory computer readable medium having instructions encoded thereon that, when executed by a processing device (1000), cause the processing device (1000) to: access network traffic from a network (100, 200), wherein the network traffic is associated with a plurality of entities (102, 104, 106, 120, 122, 130, 150); select an entity of the plurality of entities (102, 104, 106, 120, 122, 130, 150); determine one or more values associated with one or more properties associated with the entity, wherein the one or more values are accessed from the network traffic; determine that the one or more values associated with the one or more properties are insufficient to classify the entity; in response to determining that the one or more values associated with the one or more properties are insufficient to classify the entity; apply one or more heuristics to the one or more properties associated with the entity to generate a search query based on the one or more values associated with the one or more properties associated with the entity; perform the search query based on the one or more values determined from the network traffic; receive a search query result, wherein the search query result comprises a plurality of webpages; and access data from a webpage (306) of the plurality webpages, wherein the data comprises one or more additional values associated with the one or more additional properties of the entity; determine, by the processing device (1000), a classification result of the entity based on the data from the webpage (306) of the plurality of webpages and the one or more values associated with the one or more properties associated with the entity accessed from the network traffic; and store the classification result.
  13. The non-transitory computer readable medium of claim 12, wherein the instructions further cause the processing device (1000) to: perform an action based on the classification result.
  14. The non-transitory computer readable medium of claim 12, wherein the classification result is determined using a model trained based on data from a plurality of webpages.
  15. The non-transitory computer readable medium of claim 12, wherein the classification result is determined based on a database comprising data based on a plurality of webpages.

Description

TECHNICAL FIELD Aspects and implementations of the present disclosure relate to network monitoring, and more specifically, classification of entities of a network BACKGROUND As technology advances, the number and variety of devices that are connected to communications networks are rapidly increasing. Each device may have its own respective vulnerabilities which may leave the network open to compromise or other risks. Preventing the spreading of an infection of a device or an attack through a network can be important for securing a communication network. EP3065076A1 (SECURE NOK AS) concerns a method for responding to a cyber attack on an industrial control system. Data is to be collected from both internal and external sources on the industrial control systems. The data is to be aggregated into databases and knowledge bases. It is to be compared to previously collected data so as to formulate a response to a cyber attack. In accordance with the present invention, there is a method, system, and a non-transitory computer readable medium in a in accordance with the appended independent claims. BRIEF DESCRIPTION OF THE DRAWINGS Aspects and implementations of the present disclosure will be understood more fully from the detailed description given below and from the accompanying drawings of various aspects and implementations of the disclosure, which, however, should not be taken to limit the disclosure to the specific aspects or implementations, but are for explanation and understanding only. Figure 1 depicts an illustrative communication network in accordance with one implementation of the present disclosure.Figure 2 depicts an illustrative network topology in accordance with one implementation of the present disclosure.Figure 3 depicts a diagram of aspects of classification using external data in accordance with one implementation of the present disclosure.Figure 4 depicts a flow diagram of aspects of a method for training a model using external data in accordance with one implementation of the present disclosure.Figure 5 depicts a flow diagram of aspects of a method for determining a classification using a model in accordance with one implementation of the present disclosure.Figure 6 depicts a flow diagram of aspects of a method for determining a classification using a data store in accordance with one implementation of the present disclosure.Figure 7A depicts example aspects of a webpage in accordance with one implementation of the present disclosure.Figure 7B depicts example aspects of data extracted from the webpage in accordance with one implementation of the present disclosure.Figure 8A depicts example aspects of another webpage in accordance with one implementation of the present disclosure.Figure 8B depicts example aspects of data extracted from the additional webpage in accordance with one implementation of the present disclosure.Figure 9 depicts illustrative components of a system for classifying entities, training models, or a combination thereof in accordance with one implementation of the present disclosure.Figure 10 is a block diagram illustrating an example computer system, in accordance with one implementation of the present disclosure. DETAILED DESCRIPTION Aspects and implementations of the present disclosure are directed to training and using models (e.g., machine learning models, etc.) to perform classification of entities of a network (but may be applicable in other areas), with the model trained based on external data (e.g., from the Internet). The systems and methods disclosed can be employed with respect to network security, among other fields. More particularly, it can be appreciated that devices with vulnerabilities are a significant and growing problem. At the same time, the proliferation of network-connected devices (e.g., internet of things (IoT) devices such as televisions, security cameras (IP cameras), wearable devices, medical devices, etc.) can make it difficult to effectively ensure that network security is maintained. Classification can be particularly important for securing a network because lack of knowledge about what a device is can prevent application of appropriate security measures. Accordingly, described herein in various implementations are systems, methods, techniques, and related technologies, which allow for improved classification of entities to enable securing of a network including performing one or more policies based on classification of an entity. Accordingly, described herein in various implementations are systems, methods, techniques, and related technologies, which enable better classification by using publicly available data (e.g., data from the Internet). The usage of external data allows classification of less common entities or devices (e.g., entities made by smaller companies). Embodiments improve overall classification while allowing dynamic (e.g., on-the-fly) classification of an entity. One of the problems with classifying devices on a network is t