CN-122018809-A - Parallel read-write accelerating device based on client cache of distributed file system

CN122018809ACN 122018809 ACN122018809 ACN 122018809ACN-122018809-A

Abstract

The application discloses a parallel read-write acceleration device based on a client cache of a distributed file system, which comprises a write cache module, a parallel write module and a read cache module, wherein the write cache module is used for writing data into a chunk according to a data write request, marking the chunk as dirty, pressing the chunk into a write sharing thread pool when the chunk is full or no data is written in the chunk exceeding a preset time, the parallel write module is used for writing the data into a distributed storage cluster in parallel through a write thread in the write sharing thread pool, the parallel read module is used for reading the data into a plurality of chunk in advance through a plurality of read threads in the read sharing thread pool, and the read cache module is used for reading the corresponding data from the chunk according to the data read request and returning the data to a read client. The application aggregates the small I/O requests through the caching mechanism and combines concurrent writing and concurrent pre-reading, thereby remarkably reducing the delay between the writing of data from the sending end to the reading of the receiving end and improving the effective bandwidth.

Inventors

PAN ZIHAO

Assignees

北京同有飞骥科技股份有限公司

Dates

Publication Date: 20260512
Application Date: 20260203

Claims (10)

1. A parallel read-write acceleration device based on a client cache of a distributed file system, characterized in that the device can simultaneously perform data reading and data writing, comprising: The write cache module is used for receiving and responding to a data write request initiated by a write client, writing data into a chunk corresponding to a chunk linked list according to the data write request, and marking the corresponding chunk as a dirty state; The parallel writing module is used for writing data in a plurality of chunk into the distributed storage cluster in parallel through a plurality of writing threads in the writing sharing thread pool and marking the corresponding chunk as a clean state; the parallel reading module is used for reading data into a plurality of chunk in advance through reading a plurality of reading threads in the shared thread pool; And the read cache module is used for receiving and responding to the data read request sent by the read client, reading corresponding data from the plurality of chunk according to the data read request and returning the corresponding data to the read client.
2. The parallel read-write acceleration device based on the client cache of the distributed file system according to claim 1, wherein the size of the chunk memory in the chunk linked list is consistent with the memory size of the storage objects in the distributed storage cluster.
3. The parallel read-write acceleration device based on the client cache of the distributed file system according to claim 1, wherein the size of the chunk memory is 1M.
4. The parallel read-write acceleration device based on the client cache of the distributed file system according to claim 1, wherein the chunk includes location information, access time and status of data in chunk.
5. The parallel read-write acceleration apparatus based on a distributed file system client cache of claim 1, wherein the parallel write module locks a corresponding chunk during dirty page write-back.
6. The parallel read-write acceleration device based on the client cache of the distributed file system according to claim 1, wherein when the parallel read module reads data into a plurality of chunk in advance, the pre-read window is initially 2 chunk sizes, and if the active pre-read is triggered again, the parallel read module sets the pre-read window to be 4 chunk sizes.
7. The parallel read-write acceleration device based on the client cache of the distributed file system according to claim 1, wherein when the read cache module reads corresponding data from a plurality of chunk according to the data read request, if there is no corresponding data in the chunk, an idle chunk is obtained, the idle chunk is associated with a chunk_space object and an index of a file in the data read request, and then is inserted into a hash table, and an underlying I/O request is sent, and data is obtained from the distributed storage set to the idle chunk according to the underlying I/O request.
8. The parallel read-write acceleration device based on the client cache of the distributed file system according to claim 1, wherein the parallel read module dynamically monitors an access mode of the read client, when detecting that the read client accesses sequentially, not only requests the chunk data accessed this time, but also asynchronously pre-reads the subsequent chunk data, and dynamically adjusts the number of pre-read chunk according to the rate of sequential reading.
9. The parallel read-write acceleration apparatus based on a client cache of a distributed file system according to claim 8, wherein the parallel read module automatically pre-reads a chunk at the end every time the read client consumes a chunk.
10. A parallel read-write acceleration method based on a client cache of a distributed file system is characterized by comprising the following steps of: Receiving and responding to a data writing request initiated by a writing client, writing data into a chunk corresponding to a chunk linked list according to the data writing request, and marking the corresponding chunk as a dirty state; writing data in a plurality of chunk into a distributed storage cluster in parallel through a plurality of writing threads in a writing sharing thread pool, and marking the corresponding chunk as a clean state; reading data into a plurality of chunk in advance through a plurality of read threads in a read sharing thread pool; and receiving and responding to a data reading request sent by the reading client, reading corresponding data from the plurality of chunk according to the data reading request, and returning the corresponding data to the reading client.

Description

Parallel read-write accelerating device based on client cache of distributed file system Technical Field The application relates to the technical field of distributed storage, in particular to a parallel read-write acceleration device based on client cache of a distributed file system. Background With explosive growth of data scale and rapid development of technologies such as cloud computing, big data, artificial intelligence and the like, the traditional centralized storage architecture gradually exposes bottlenecks in aspects of expandability, fault tolerance, performance and the like. To cope with the requirements of efficient storage and access of mass data, distributed storage systems have been developed. By means of the distributed storage of the data in the plurality of nodes and the cooperative work of the network, the system not only improves the throughput capacity and the storage capacity of the system, but also enhances the reliability and the availability of the data. Although, the initial design goal of distributed storage systems was to deal with large files, data-intensive tasks, and multiple client concurrent access scenarios. However, in emerging applications such as AI training, high frequency transaction logs, virtualized disks, database backups, etc., workloads exhibit distinct features, mainly single data streams, small granularity of I/O requests, but place extremely high demands on throughput and latency. This type of requirement is fundamentally conflicting with the architectural assumptions of traditional distributed file systems, exposing its performance bottlenecks and inadequacies in high concurrency small I/O scenarios. Ceph is the most mainstream open source distributed storage system, and the flow of the client (especially through librados direct access object storage) when processing small IO is (i.e. a typical path of small IO write request in Ceph): The application initiates a request that the application issues a 4KB/8KB write request to the client library (e.g., librbd or libcephfs). Metadata lookup and calculation: For CephFS, the client may need to query METADATA SERVER to parse the file path to obtain the object ID and layout information corresponding to the file. For RBD/objects, clients map data objects to specific PGs through hash computation. CRUSH calculation the client uses the CRUSH algorithm to calculate on which OSD or OSDs (typically master and slave OSDs) the data object should be stored based on the PG ID. Network communication, wherein the client establishes (or acquires from a connection pool) a network connection with the calculated main OSD, and sends the IO request. And the main OSD coordinates that after the main OSD receives the write request, the main OSD is responsible for copying the write request to other duplicate OSD and waiting for confirmation of all the duplicate OSD. And (3) returning the confirmation, namely returning a signal of successful writing to the client after the main OSD receives the confirmation of all the copies. However, in this flow, the following inherent bottlenecks lead to a dramatic drop in small IO performance: Defect one-deep coupling of metadata to the data path results in high latency. Technical description each small IO request must go through the complete "metadata/CRUSH computation- > network communication- > Master copy coordination" path. Even if the metadata has been cached, the steps of the CRUSH calculation and determination of the target OSD cannot be omitted. The network protocol stack and the system call overhead become performance bottlenecks. Technical description every small IO needs to be encapsulated/decapsulated by the kernel network protocol stack and context switching between user state and kernel state. ASYNC MESSENGER using librados improves, but does not address the overhead at the message level at all. And thirdly, the intelligent aggregation and scheduling mechanism aiming at single-stream small IO is lacking. Description of the art the request processing of a legacy client is a "one from, one to" pipeline mode. While Ceph supports light ordering of Op queues, it lacks the ability to conduct cross-objective, cross-time intelligent aggregation on the client side. In summary, the existing distributed file system client side causes the following problems in single-stream, 4K/8K and small I/O application scenarios due to the conventional design reasons: 1. Each small I/O must go through the problems of too high delay caused by metadata computation, protocol encapsulation/decapsulation, etc. 2. Small I/O lacks a reasonable aggregation mechanism, resulting in a bandwidth upper bound that is too low. Disclosure of Invention Therefore, the application provides a parallel read-write acceleration device based on a client cache of a distributed file system, which aims to solve the problems of high delay and low bandwidth upper limit of the distributed file system in the prior art when processing I/O req