US-12619723-B1 - Identifying malicious contents that are stored in distributed hash table networks

US12619723B1US 12619723 B1US12619723 B1US 12619723B1US-12619723-B1

Abstract

Method and system for evaluating contents stored in a Distributed Hash Table (DHT) network are described. Contents are stored as chunks across nodes of the DHT network. Contents are subjected to cybersecurity evaluations to generate risk histories of the chunks. A reputation of a target content is determined based on corresponding risk histories of individual chunks that are present in the target content.

Inventors

Vincenzo Ciancaglini
Morton Swimmer
Roel Sotto Reyes

Assignees

TREND MICRO INCORPORATED

Dates

Publication Date: 20260505
Application Date: 20231107

Claims (11)

1 . A method of evaluating contents stored in a Distributed Hash Table (DHT) network for maliciousness, the method comprising: performing cybersecurity evaluations on a plurality of sample files that are distributed across nodes of a Distributed Hash Table (DHT) network, wherein the cybersecurity evaluations assign a file risk score to each of the plurality of sample files; updating corresponding risk histories of a plurality of chunks of the plurality of sample files based on results of the cybersecurity evaluations by propagating the file risk score of each of the plurality of sample files to each corresponding chunk; receiving a plurality of chunks of a target file from nodes of the DHT network; identifying a set of chunks of the plurality of chunks of the target file that each has a corresponding risk history from the cybersecurity evaluations; determining a reputation of the target file based on corresponding risk histories of the set of chunks of the plurality of chunks of the target file; flagging the target file as malicious in response to the reputation of the target file indicating that the target file is malicious; detecting a malicious file among the plurality of sample files; and flagging as malicious a first node of the DHT network that points to or provides a chunk of the malicious file.
2 . The method of claim 1 , further comprising: in response to flagging the first node as malicious, preventing a second node of the DHT network from receiving a chunk of a content from the first node.
3 . The method of claim 1 , wherein determining the reputation of the target file comprises: calculating a chunk risk score of each chunk of the plurality of chunks of the plurality of sample files based on a risk history of the chunk; storing chunk risk scores of the plurality of chunks of the plurality of sample files in a storage location; retrieving, from the storage location, corresponding chunk risk scores of the set of chunks of the plurality of chunks of the target file; calculating an overall risk score of the target file from the corresponding chunk risk scores of the set of chunks of the plurality of chunks of the target file; and comparing the overall risk score of the target file to a threshold.
4 . The method of claim 3 , wherein the storage location comprises nodes of the DHT network.
5 . The method of claim 1 , wherein determining the reputation of the target file comprises: retrieving the corresponding risk histories of the set of chunks of the plurality of chunks of the target file from a storage location; calculating corresponding chunk risk scores of the set of chunks of the plurality of chunks of the target file based on the corresponding risk histories of the set of chunks of the plurality of chunks of the target file; calculating an overall risk score of the target file from the corresponding chunk risk scores of the set of chunks of the plurality of chunks of the target file; and comparing the overall risk score of the target file to a threshold.
6 . The method of claim 5 , wherein the storage location comprises nodes of the DHT network.
7 . A system for evaluating contents stored in a Distributed Hash Table (DHT) network for maliciousness, the system comprising: a plurality of peer nodes, each of the plurality of peer nodes being a node of the DHT network and comprising a computer system that stores one or more chunks of a plurality of contents stored in the DHT network; a plurality of probe nodes, each of the plurality of probe nodes being a node of the DHT network and comprising a computer system that collects network traffic data of the DHT network, wherein a peer node of the plurality of peer nodes comprises at least one processor and a memory, the memory of the peer node storing instructions that when executed by the at least one processor of the peer node cause the peer node to: receive a plurality of chunks of a target content; identify a set of chunks of the plurality of chunks of the target content that each has a risk history from cybersecurity evaluations performed on a plurality of sample contents; and determine a reputation of the target content based on corresponding risk histories of the set of chunks of the plurality of chunks of the target content; and a backend system comprising at least one processor and a memory, the memory of the backend system storing instructions that when executed by the at least one processor of the backend system cause the backend system to: perform the cybersecurity evaluations on the plurality of sample contents, wherein the cybersecurity evaluations assign a content risk score to each of the plurality of sample contents; update corresponding risk histories of a plurality of chunks of the plurality of sample contents based on results of the cybersecurity evaluations by propagating the content risk score of each of the plurality of sample contents to each corresponding chunk; detect a malicious content among the plurality of sample contents; and flag as malicious a first node of the DHT network that points to or provides a chunk of the malicious content.
8 . The system of claim 7 , wherein the backend system that provides the corresponding risk histories of the set of chunks of the plurality of chunks of the target content to the peer node of the plurality of peer nodes.
9 . The system of claim 7 , wherein the corresponding risk histories of the set of chunks of the plurality of chunks of the target content are stored across the plurality of peer nodes.
10 . The system of claim 7 , wherein the instructions stored in the memory of the peer node, when executed by the at least one processor of the peer node, cause the peer node to determine the reputation of the target content by: calculating corresponding chunk risk scores of the set of chunks of the plurality of chunks of the target content from the corresponding risk histories of the set of chunks of the plurality of chunks of the target content; calculating an overall risk score of the target content based on the corresponding chunk risk scores of the set of chunks of the plurality of chunks; and comparing the overall risk score of the target content to a threshold.
11 . The system of claim 7 , wherein the instructions stored in the memory of the peer node, when executed by the at least one processor of the peer node, cause the peer node to determine the reputation of the target content by: retrieving corresponding chunk risk scores of the set of chunks of the plurality of chunks of the target content that have been calculated from the corresponding risk histories of the set of chunks of the plurality of chunks of the target content; calculating an overall risk score of the target content based on the corresponding chunk risk scores of the set of chunks of the plurality of chunks; and comparing the overall risk score of the target content to a threshold.

Description

TECHNICAL FIELD The present disclosure is directed to cybersecurity. BACKGROUND Distributed Hash Table (DHT) networks allow for distributed storage of content across a plurality of peer nodes. Examples of DHT networks include Content-Addressable Networks, Interplanetary File System (IPFS) networks, etc. Generally, DHT networks employ a DHT algorithm that follow the following set of principles: (a) nodes of the DHT network and contents are assigned an identifier (ID), which is usually a hash of the content or the hash of a node's fingerprint; (b) node and content IDs are mapped in the same addressing space, i.e. they use the same hash algorithm; (c) each node is responsible for a partition of the addressing space, hence each node will be responsible to store chunks (i.e., portions) of content whose ID falls in that partition; (d) nodes maintain a logical routing table of other nodes in the network that they discovered; and (e) routing of network messages (e.g., messages for putting content, getting content, and finding node IDs) is content based, i.e. depends on the content ID or node ID. Routing may also be based on locality properties, such as geographical or network proximity of the nodes, in which case the locality properties may be included in the node hashing function to retain nodes that are close together in terms of physical, network, or geographic location or in terms of addressing space. Messages are routed from one node to another until they find the node that is responsible according to the partitioning metric. Because content is stored as separate chunks in different nodes, evaluating the content for maliciousness can be very difficult. An attacker (i.e., malicious actor) can exploit the way files and folders are divided into chunks to create a malicious payload that is divided into several pieces, which are distributed as part of seemingly harmless content and subsequently individually fetched and reassembled at a victim computer. For example, an attacker can hide some cryptominer code that is left inactive as part of a seemingly harmless website. At a later stage of the attack, only the content ID of the cryptominer code is needed to retrieve and reassemble the chunks of the cryptominer code at the victim computer. Because the content ID is not necessarily malicious and all of the chunks are not stored in the victim computer, the maliciousness of the cryptominer code is very difficult to detect before its activation. BRIEF SUMMARY In one embodiment, contents are stored as chunks across nodes of a DHT network. The contents are subjected to cybersecurity evaluations, which may be by one or more cybersecurity authorities. Each chunk of the contents has a risk history from the results of the cybersecurity evaluations. Risk scores of chunks of the contents are determined based on the risk histories of the chunks of the contents. A reputation of a target content is determined based on risk histories of chunks that are present in the target content. For example, an overall risk score of the target content may be calculated from risk scores of chunks that are present in the target content. The overall risk score may be compared to a threshold to determine the reputation of the target content. These and other features of the present disclosure will be readily apparent to persons of ordinary skill in the art upon reading the entirety of this disclosure, which includes the accompanying drawings and claims. BRIEF DESCRIPTION OF THE DRAWINGS A more complete understanding of the subject matter may be derived by referring to the detailed description and claims when considered in conjunction with the following figures, wherein like reference numbers refer to similar elements throughout the figures. FIG. 1 shows a block diagram of a Distributed Hash Table (DH) network, in accordance with an embodiment of the present invention. FIG. 2 shows chunk risk histories of a sample content, in accordance with an embodiment of the present invention. FIGS. 3-5 show block diagrams of chunk risk histories and risk scores deployment scenarios, in accordance with embodiments of the present invention. FIG. 6 shows a flow diagram of a method of evaluating contents stored in a DHT network for maliciousness, in accordance with an embodiment of the present invention. FIG. 7 shows a flow diagram of a method of identifying malicious nodes in a DHT network, in accordance with an embodiment of the present invention. FIG. 8 shows a block diagram of a computer system that may be employed with embodiments of the present invention. DETAILED DESCRIPTION In the present disclosure, numerous specific details are provided, such as examples of systems, components, and methods, to provide a thorough understanding of embodiments of the invention. Persons of ordinary skill in the art will recognize, however, that the invention can be practiced without one or more of the specific details. In other instances, well-known details are not shown or des