KR-20260063078-A - SKEW RESISTANCE PROCESSING IN DIMM DEVICE AND PROCESSING METHOD
Abstract
The present invention relates to a skew-resistant PID device and a processing method, wherein the device comprises: DIMMs (Dual In-line Memory Modules) comprising a plurality of ranks each comprising a plurality of banks and an IDP (In-DIMM Processor) that processes internal memory operations; a memory controller; and a host CPU connected to the memory module through the memory controller and which increases the parallel processing performance of the IDP by replicating a join key in units of banks and ranks.
Inventors
- 김영석
- 이수현
- 임채민
- 최진우
- 김한준
Assignees
- 연세대학교 산학협력단
Dates
- Publication Date
- 20260507
- Application Date
- 20241030
Claims (10)
- Dual In-line Memory Modules (DIMMs) comprising an In-DIMM Processor (IDP) that processes internal memory operations and is composed of multiple ranks, each containing multiple banks; Memory controller; and A skew-resistant PID (Processing in DIMM) device comprising a host CPU connected to the memory module through the memory controller and replicating join keys in bank and rank units to increase the parallel processing performance of the IDP.
- In paragraph 1, the host CPU is A skew-resistant PID (Processing in DIMM) device characterized by analyzing the configuration of the R table and the S table and determining a cost model that determines the replication ratio based on the configuration of the PID device.
- In paragraph 2, the host CPU is A skew resistance PID (Processing in DIMM) device characterized by determining the number of bank sets and the number of rank sets by calculating the optimal join key replication ratio () through the above cost model.
- In paragraph 2, the host CPU is A skew resistance PID (Processing in DIMM) device characterized by performing a Host-to-DIMM Scatter operation that distributes and transmits the above R table and S table to the above DIMM.
- In paragraph 1, the above IDP is A skew-resisting PID (Processing in DIMM) device characterized by performing a Bank and Rank Set-aware Partitioning operation that replicates the R table according to the bank set and rank set and distributes the S table to the bank set and rank set based on the replication of the R table.
- In paragraph 5, the above host CPU is A skew resistance PID (Processing in DIMM) device characterized by performing an All-to-All Inter-IDP Shuffle operation, wherein the data of the R table and the S table are transmitted to each IDP, so that each IDP can exchange and process the data of the R table and the S table.
- In Paragraph 6, each of the above IDPs A skew resistance PID (Processing in DIMM) device characterized by performing a Single-IDP Join operation to generate a join result by performing a join operation based on the data of the R table and the S table.
- In Paragraph 7, each of the above IDPs A skew-resistant PID (Processing in DIMM) device characterized by performing a hash join or a sort-merge join as the above join operation.
- In Paragraph 7, each of the above IDPs A skew resistance PID (Processing in DIMM) device characterized by transmitting the above-mentioned join result to the host CPU so that the host CPU collects the above-mentioned join result to generate a final result.
- A skew resistance PID (Processing in DIMM) processing method performed in a skew resistance PID (Processing in DIMM) device comprising: DIMM (Dual In-line Memory Modules) comprising an IDP (In-DIMM Processor) that processes internal memory operations and is composed of multiple ranks each comprising multiple banks; a memory controller; and a host CPU connected to the memory module through the memory controller. A step of determining a replication rate based on the configuration of the above PID device; and A skew resistance PID processing method comprising the step of increasing the parallel processing performance of the IDP by replicating the join key in bank and rank units of the DIMM based on the above replication ratio.
Description
Skew Resistance PID Device and Processing Method The present invention relates to PID technology, and more specifically, to a skew-resistant PID device and processing method that can improve the parallel processing performance of an IDP (In-DIMM Processor) by replicating join keys in bank and rank units through a memory controller. Recent advancements in dual in-line memory modules (DIMMs) have enabled DIMMs to support Processing-In-DIMM (PID) by placing In-DIMM Processors (IDPs) near memory banks. PIDs can accelerate applications suffering from memory wall issues by offloading memory-intensive tasks to IDPs. By offloading tasks to IDPs, applications can leverage the DIMM's high internal memory bandwidth and minimize data movement between the host CPU and the DIMM. While commercial DIMMs supporting PIDs were not available until recently, the introduction of UPMEM DIMMs and Samsung AxDIMMs has led to growing interest in PIDs across various fields, including bioinformatics, machine learning, and security. In-memory databases often suffer from memory wall issues, but it has been proven that this can be significantly improved through PID. In particular, previous research proposed a PID join algorithm to accelerate in-memory join operations. In the join operation, given two tables R and S, the CPU evenly distributes tuples from R and S to each IDP, allowing the IDP to perform global partitioning on its own. Subsequently, the CPU shuffles the tuples between the IDPs to enable each IDP to process its own partition, and the IDP performs a single IDP join. Afterward, the CPU collects output tuples from all IDPs to rapidly perform the in-memory join operation. However, existing PID join algorithms suffer from poor performance and scalability when the input table is biased. Although these algorithms use global partitioning per IDP to equalize the computational load among IDPs, the presence of biased input tables leads to severe load imbalance, resulting in a problem where some IDPs remain idle while others are processing. Figure 1 is a diagram illustrating the characteristics of a conventional PID join algorithm and SPID-Join. FIG. 2 is a diagram illustrating the functional configuration of a skew resistance PID device according to one embodiment of the present invention. FIG. 3 is a diagram illustrating a PID join algorithm according to an embodiment of the present invention. FIG. 4 is a diagram illustrating an example of a cost model according to one embodiment of the skew resistance PID device of FIG. 2. FIG. 5 is a flowchart illustrating a skew resistance PID processing method according to the present invention. Figure 6 is a diagram illustrating the bank set-based join key replication process of the SPID-Join algorithm. Figure 7 is a diagram illustrating the rank set-based join key replication process of the SPID-Join algorithm. Figure 8 is a diagram illustrating the bank and rank set partitioning of SPID-Join for IDP and input tuples. FIG. 9 is a diagram illustrating variables for constructing a cost model according to one embodiment of the present invention. FIG. 10 is a diagram showing the join operation delay time of a PID join algorithm according to one embodiment of the present invention. The description of the present invention is merely an example for structural or functional explanation, and therefore the scope of the present invention should not be interpreted as being limited by the examples described in the text. That is, since the examples are subject to various modifications and may take various forms, the scope of the present invention should be understood to include equivalents capable of realizing the technical concept. Furthermore, the objectives or effects presented in the present invention do not imply that a specific example must include all of them or only such effects; therefore, the scope of the present invention should not be understood as being limited by them. Meanwhile, the meaning of the terms described in this application should be understood as follows. Terms such as "first," "second," etc., are intended to distinguish one component from another, and the scope of rights shall not be limited by these terms. For example, the first component may be named the second component, and similarly, the second component may be named the first component. When it is stated that one component is "connected" to another component, it should be understood that it may be directly connected to that other component, or that there may be other components in between. Conversely, when it is stated that one component is "directly connected" to another component, it should be understood that there are no other components in between. Meanwhile, other expressions describing the relationships between components, such as "between" and "exactly between," or "adjacent to" and "directly adjacent to," should be interpreted in the same way. A singular expression should be understood to include a plural expression unl