JP-7856807-B2 - Large-scale graph partitioning method using streaming clustering in a GPU environment

JP7856807B2JP 7856807 B2JP7856807 B2JP 7856807B2JP-7856807-B2

Inventors

イ，ヒョンビョン
ソン，サンホ
イム，ジョンテ
ユ，ジェス

Assignees

チュンブクナショナルユニバ―シティインダストリーアカデミックコーオペレーションファウンデーション

Dates

Publication Date: 20260511
Application Date: 20250226
Priority Date: 20240906

Claims (4)

The host CPU performs an input step (S100) to receive the original graph data, A clustering step (S200) is performed in which the CPU generates a cluster map array for the original graph data received as input using a streaming clustering algorithm, and then generates cluster graph data using the cluster map array generated for the original graph data. A transmission step (S300) in which the cluster graph data generated by the GPU is received, An initial partitioning step (S400) is performed on the transmitted cluster graph data by applying a pre-stored algorithm on the GPU, A method for large-scale graph partitioning using streaming clustering in a GPU environment, comprising: an optimization step (S500) in which the CPU receives the transmission of assigned partition label data, which is the result of the initial partitioning, and optimizes the assigned partition label data using the generated cluster map array.
The clustering step (S200) is performed as follows: The method for large-scale graph partitioning by streaming clustering in a GPU environment according to claim 1, wherein the original graph data is assumed to be in an ES (Edge Stream) form in which edges are input sequentially, edge information is provided in batch units, and closely connected vertices are grouped into the same cluster for each vertex.
The initial division step (S400) is, A method for large-scale graph partitioning by streaming clustering in a GPU environment according to claim 2, wherein a pre-stored Label Propagation algorithm is applied to repeatedly update the labels of each node according to the label distribution of neighboring nodes.
The optimization step (S500) is, A method for large-scale graph partitioning by streaming clustering in a GPU environment according to claim 3, wherein, using the assigned partition label data, it is determined whether the quality of the partitioned graph would improve if each vertex were assigned to another partition of an adjacent vertex instead of the partition to which the vertex is assigned, and the graph partitioning is optimized by vertex duplication based on the determination result.

Description

Application of Article 30, Paragraph 2 of the Patent Act 1. Journal of the Korea Contents Association, vol. 24, No. 2, pp. 22-41, 2024 2. Proceedings of the Korean Society of Computer Information Conference: Korea Computer Congress 2024, pp. 209-211, Jeju, ICC, Jeju Island, South Korea, June 26, 2024. This invention relates to a method for large-scale graph partitioning using streaming clustering in a GPU environment, and more specifically, to a method for large-scale graph partitioning using streaming clustering in a GPU environment that can compress the graph through clustering and improve data processing efficiency, enabling large-scale graph partitioning within the limited memory resources of a GPU. Graphs are used as a core data structure in various fields. Graph structures can represent the relationships between entities (e.g., people or objects) and are utilized in areas such as medical information systems and social network analysis. Recently, with the development of network technology and the proliferation of social networking services like Twitter and Facebook, the scale of data has been increasing exponentially. This necessitates "large-scale graphs," which are far larger than conventional graphs. Large-scale graphs have a significantly greater number of vertices and edges, and are used for more effective analysis and a wider range of applications than conventional graphs. Analyzing such large-scale graphs requires significantly more computing resources. Traditionally, a single computer, or a single system, using a CPU was sufficient to process all the data. However, in the case of large graphs, processing them with a single system is impossible. To address this, methods have been proposed for distributing, storing, and processing large-scale graph data across multiple systems and nodes. The efficiency of such graph partitioning techniques is considered a crucial factor in processing large-scale graph data. Graph partitioning techniques in CPU environments can be broadly divided into two categories. One is in-memory graph partitioning, which stores the entire graph in memory and performs batch processing. The other is streaming graph partitioning, which gradually stores the graph in memory and repeats the processing. Recently, with the development of GPUs (Graphics Processing Units), distributed parallel processing techniques using GPUs have become important. While GPUs are devices for processing graphics, they are also used to process large amounts of data. Such GPU-based SIMT (Single Instruction Multiple Threads) computers leverage data parallelism to accelerate computation. GPUs have the characteristic of providing extremely high processing power (throughput), allowing them to execute a large number of threads simultaneously, making them effective for performing data parallel processing. Nevertheless, loading large-scale graph data onto a GPU clearly has limitations. Generally, GPUs have relatively less memory capacity than CPUs, making it impossible to load large-scale graph data into GPU memory. For example, the conventional technique, Thanos, proposes a rapid graph partitioning algorithm using a cross-partitioning algorithm, but when the data becomes extremely large, it faces the problem of GPU memory shortage (GPU out of memory), making graph partitioning difficult. In this regard, Patent Document 1 ("Hybrid GPU/CPU Data Processing Method") discloses a technique for performing large-scale graph searches on a parallel processing platform. Japanese Patent Publication No. 6346893 (Registration Date: June 1, 2018) This is an illustrative diagram showing a conventional graph partitioning method.This is a sequence diagram illustrating a method for large-scale graph partitioning using streaming clustering in a GPU environment according to one embodiment of the present invention.This is an illustrative diagram showing the structure of a large-scale graph partitioning method using streaming clustering in a GPU environment according to one embodiment of the present invention.This is an illustrative diagram showing the cluster graph generation process in the clustering step (S200) of a large-scale graph partitioning method using streaming clustering in a GPU environment according to one embodiment of the present invention.This is an illustrative diagram showing the data transformation process in the initial partitioning step (S400) of a large-scale graph partitioning method using streaming clustering in a GPU environment according to one embodiment of the present invention.This is an illustrative diagram showing the partition allocation process in the initial partitioning step (S400) of a large-scale graph partitioning method using streaming clustering in a GPU environment according to one embodiment of the present invention.This is an illustrative diagram showing the vertex replication process in the optimization step (S500) of a large-scale graph partitioning method using streaming clustering in a