Search

CN-122019453-A - User mode RDMA communication simulation system and method based on persistent memory

CN122019453ACN 122019453 ACN122019453 ACN 122019453ACN-122019453-A

Abstract

The invention discloses a user mode RDMA communication simulation system and method based on a persistent memory, and belongs to the technical field of computer systems and network communication. The system comprises a persistent memory management module, a non-lock ring queue module, a zero copy transmission engine, an RDMA semantic implementation module, a memory area registration module and a polling mode driving module. The system enables a plurality of user state processes to share and access the same persistent storage space through memory mapping, and builds a lock-free ring-shaped queue in the space to realize inter-process communication. The processor non-temporary storage instruction is utilized to realize zero copy data transmission, simulate RDMA write and read operations, and the processor atomic instruction is utilized to simulate RDMA atomic operations. The memory area registration and key management mechanism ensures the access security. The invention does not need RDMA special hardware, is completely realized in a user mode, supports data persistence, and can be used for scenes such as RDMA application development and debugging, persistence memory programming research, distributed system prototype design and the like.

Inventors

  • LIU YINGBO

Assignees

  • 云南财经大学

Dates

Publication Date
20260512
Application Date
20260129

Claims (10)

  1. 1. A persistent memory-based user mode RDMA communication simulation system, comprising: The persistent memory management module is used for mapping the persistent storage area to address spaces of a plurality of user state processes through a memory mapping interface and dividing the persistent storage area into a control area and a data area; the control area stores a communication queue structure and memory area registration information, and the data area is divided into a plurality of independent memory blocks according to node identifiers; The ring-free queue module is used for realizing message transmission among nodes, the ring-shaped queue adopts a single producer and single consumer model, the capacity of the queue is power of 2, and pseudo sharing during access of the multi-core processor is eliminated in a cache line alignment mode; the zero-copy transmission engine is used for directly writing data into the persistent memory through a non-temporary storage instruction, bypassing the CPU cache level and adopting a storage barrier instruction to ensure data persistence; the RDMA semantic implementation module is used for supporting RDMA writing operation, RDMA reading operation and atomic operation, wherein the atomic operation comprises comparison, exchange and value addition; The memory area registration module is used for registering and verifying the memory areas which can be accessed remotely, and distributing local and remote access keys for each area; and the polling mode driving module is used for checking the inbound communication queue in a polling mode and processing the request and supporting the batch dequeuing operation.
  2. 2. The system of claim 1, wherein the persistent memory management module uses a shared memory map flag to enable shared access of multiple processes to the same persistent storage region.
  3. 3. The system of claim 1, wherein the lock-free ring queue module has a queue capacity of 1024 entries and optimizes slot index calculation by bit operation instead of modulo operation.
  4. 4. The system of claim 1, wherein the pointer variable in the lock-free queue module is exclusive of a 64 byte full cache line, and wherein alignment is achieved by padding bytes.
  5. 5. The system of claim 1, wherein the zero copy transfer engine uses stream store instructions of a single instruction multiple data instruction set for bulk data transfer, each instruction processing 128 bits of data, a combination of instructions processing cache line granularity data.
  6. 6. The system of claim 1, wherein the RDMA semantic implementation module wherein the compare and swap operation is implemented using __ sync_val_compare_and_swap functions and the value and add operation is implemented using __ sync_fetch_and_add functions.
  7. 7. A method for simulating user mode RDMA communication based on persistent memory, wherein the system of any of claims 1-6 is applied, the method comprising the steps of: s1, initializing a persistent memory space, creating or opening a persistent storage file, mapping to a process address space through a memory mapping interface, and initializing a control area and a data area; S2, registering a memory area, generating a local access key and a remote access key for a memory buffer area to be registered, and recording registration information in a global memory area registry; S3, executing RDMA operation, executing RDMA write, RDMA read or atomic operation according to the type of the work request, and writing the completion state into a completion queue after the operation is completed; And S4, polling a completion queue, traversing the completion queue to acquire the state information of the completed operation, and returning the state information to the application program.
  8. 8. The method of claim 7, wherein the RDMA write operation in step S3 includes calculating an offset of the source address and the destination address in the persistent memory, dividing the transfer data into an aligned portion and a non-aligned tail portion at a 64-byte cache line boundary, transferring the aligned portion using a_mm_stream_si 128 instruction, transferring the non-aligned tail portion using a memcpy function, and performing a sfence memory barrier to ensure data persistence.
  9. 9. The method of claim 7, wherein the RDMA read operation in step S3 includes calculating an offset of the source address and the destination address in persistent memory, prefetching remote data into the CPU cache at cache line granularity using a_mm_prefetch instruction, copying the data to the local buffer using a memcpy function, and executing lfence a load barrier to ensure read completion.
  10. 10. The method of claim 7, wherein the polling operation in step S4 supports a batch dequeue mode, wherein a polling call returns status information for a plurality of completed operations.

Description

User mode RDMA communication simulation system and method based on persistent memory Technical Field The invention relates to the technical field of computer system and network communication, in particular to a user-state Remote Direct Memory Access (RDMA) communication simulation system and method based on a persistent memory. Background Remote direct memory access, RDMA, technology is one of the core communication technologies in modern data centers and high performance computing fields. The technology allows one computer to directly read and write the memory of the other computer, and the whole data transmission process does not need CPU intervention or network protocol stack processing of the kernel of the operating system. This bypass core design gives RDMA technology significant performance advantages, including end-to-end latency on the order of microseconds, network bandwidth utilization near wire speed, and very low CPU occupancy. There are currently three main implementations of RDMA technology. The first is the InfiniBand protocol, a proprietary high-speed interconnect technology that provides the most complete RDMA function support, widely used in supercomputers and high-performance clusters. The second is the ethernet-based RoCE protocol, which encapsulates the transport layer of InfiniBand in ethernet frames, enabling RDMA to run on standard ethernet infrastructure. The third is the iWARP protocol, which implements RDMA semantics based on the TCP/IP protocol stack, with best compatibility but relatively low performance. The core operations of RDMA technology include the following classes. The RDMA WRITE operation allows the sender to directly write data into the memory area specified by the receiver, without the receiver CPU having to participate. RDMA Read operations allow a sender to Read data directly from a memory region of a receiver. The Send/Receive operation is a traditional messaging mode, and the receiver needs to issue a Receive request in advance. The atomic operation includes comparing and exchanging and valuing and adding two kinds of synchronous primitives for realizing the distributed environment. However, development and testing of RDMA technology faces many challenges, embodied in: First, hardware costs are high. InfiniBand network card and exchanger are expensive, infiniBand adapter of a entrance level is more than hundreds of dollars, and high-end product price is higher. The RoCE network card, while relatively inexpensive, requires the use of an ethernet switch that supports data center bridging functions. This high hardware cost limits the popular application of RDMA technology in teaching scientific and small-scale development environments. Second, the environment configuration is complex. The RDMA environment is built by a plurality of links such as driver installation, firmware update, subnet manager configuration, queue pair parameter setting and the like. RDMA devices from different vendors differ in configuration details, and inexperienced developers often spend a lot of time on the environmental configuration. The configuration of a multi-node RDMA cluster is more complex, requiring coordination of network parameters and security policies for each node. Third, the means of debugging are limited. RDMA operations occur at the hardware level, and it is difficult for conventional software debug tools to capture detailed information during communication. When RDMA applications are problematic, it is difficult for developers to locate whether an application logic error, queue pair configuration problem, or network failure is occurring. Multi-node distributed debugging is more difficult and requires cooperation on multiple machines. Fourth, authentication in combination with persistent memory is difficult. With the development of Intel Optane persistent memory technologies, the use of RDMA in combination with persistent memory is an important research direction for distributed storage systems. However, real persistent memory devices are also expensive and require specific hardware platform support. Developers lack a convenient means to verify the correctness and consistency of RDMA when used in conjunction with persistent memory. There are some software-level RDMA emulation schemes in the prior art. The SoftRoCE module provided by the Linux kernel can simulate the RoCE protocol on the common ethernet, so that a developer can run the RDMA application program without special hardware. However, softRoCE still needs kernel module support, and data transmission needs kernel protocol stack processing, so that user mode zero copy transmission cannot be really realized. In addition, softRoCE does not support feature verification of persistent memory nor does it mimic persistent semantics. Thus, there is a need in the industry for a persistent memory-based user mode RDMA communication emulation system. The system should provide programming interfaces and semantic features that app