CN-122019476-A - Directory entry processing method, apparatus, storage medium, and program product

CN122019476ACN 122019476 ACN122019476 ACN 122019476ACN-122019476-A

Abstract

The embodiment of the application provides a directory entry processing method, equipment, a storage medium and a program product, and relates to the field of distribution. The method includes receiving an operation for creating a directory or adding a directory entry newly, updating a directory entry count in response to the operation, the directory entry count being used to record a total number of files and subdirectories under a target directory, the target directory being the directory for which the operation is directed, allocating a contiguous space including N blocks in a resource group if the value of the directory entry count satisfies a large directory condition, and storing a new directory entry generated based on the operation in the contiguous space, wherein N is greater than 1. The method increases the continuity of metadata access and improves the traversal and statistical performance of the cluster file system when processing massive file catalogs.

Inventors

SHAO XIN
SHAN CHUNXIN

Assignees

中电科金仓(北京)科技股份有限公司

Dates

Publication Date: 20260512
Application Date: 20251229

Claims (10)

1. The method for processing the directory entries is characterized by being applied to a cluster file system, wherein the cluster file system comprises a resource group, and the resource group is a basic management unit for metadata and data storage, and the method comprises the following steps: receiving an operation for creating or adding a catalog item; updating a directory entry count in response to the operation, the directory entry count being used to record a total number of files and subdirectories under a target directory, the target directory being a directory for which the operation is directed; In the case where the value of the directory entry count satisfies a large directory condition, a contiguous space including N blocks is allocated in the resource group, and a new directory entry generated based on the operation is stored in the contiguous space, the N being greater than 1.
2. The method of claim 1, wherein the value of the directory entry count satisfies a large directory condition, comprising the value of the directory entry count being greater than or equal to a preset value, wherein the resource group comprises a metadata area for storing metadata of directory entries and a normal data area for storing data of normal files, wherein the allocating a contiguous space comprising N blocks in the resource group comprises: A contiguous space comprising N blocks is allocated in the metadata area of the resource group.
3. The method according to claim 2, wherein the method further comprises: Receiving a directory traversal instruction for the target directory; acquiring metadata of a directory entry based on the directory traversal instruction, wherein the metadata of the directory entry comprises a name of the directory entry and an address of an index block; And under the condition that the continuous space where the address of the index block is located indicates that the continuous space is not pre-read, asynchronously pre-reading metadata of file information in the continuous space where the address of the index block is located, and marking the continuous space where the address of the index block is located as a pre-read state.
4. The method of claim 3, wherein in asynchronously pre-reading metadata of file information in a contiguous space in which an address of the index block is located, a distributed lock of all metadata in the contiguous space in which the address of the index block is located is applied in bulk, and wherein granting of the distributed lock is waited asynchronously.
5. The method of claim 3 or 4, wherein the cluster file system comprises a virtual file system layer, a cluster file system layer, and a distributed lock layer; The receiving of the directory traversal instruction for the target directory includes receiving the directory traversal instruction at the virtual file system layer; The obtaining metadata of the directory entry based on the directory traversal instruction includes: calling an interface of the cluster file system layer based on the virtual file system layer, so that the cluster file system layer obtains the directory traversal instruction; applying a distributed lock of the target directory to the distributed lock layer based on the cluster file system layer; and acquiring the source data of the target directory from the cache or the disk, and releasing the distributed lock of the target directory.
6. The method of claim 5, wherein the cluster file system further comprises a cache layer that asynchronously pre-reads metadata of file information in contiguous space where the address of the index block is located, comprising: judging the type of a block in a continuous space where the address of the pre-reading index block is located, and screening out the index block; applying for the distributed locks of the index blocks in batches, and asynchronously waiting for the grant of the distributed locks; under the condition that the distributed lock is obtained, the metadata of file information in the index block are read in batches; and caching the metadata of the file information to the caching layer, and releasing the distributed lock.
7. The method of any one of claims 1-4, further comprising at least one of: Dynamically adjusting a preset value for judging whether the large directory condition is met or not according to the load state of the cluster file system; And adjusting the preset value according to the directory entry growth rate of the target directory, wherein the preset value is reduced when the directory entry growth rate exceeds a preset rate threshold.
8. An electronic device comprising a processor and a memory communicatively coupled to the processor; The memory stores computer-executable instructions; the processor executes computer-executable instructions stored in the memory to implement the method of any one of claims 1 to 7.
9. A computer readable storage medium having stored therein computer executable instructions which when executed by a processor are adapted to carry out the method of any one of claims 1 to 7.
10. A computer program product comprising a computer program which, when executed by a processor, implements the method of any one of claims 1 to 7.

Description

Directory entry processing method, apparatus, storage medium, and program product Technical Field The present application relates to the distributed field, and in particular, to a method, an apparatus, a storage medium, and a program product for processing a directory entry. Background With the development of distributed storage technology, a cluster file system (cluster FILE SYSTEM, CFS) has become a core infrastructure of a very large scale data storage scenario. Users typically have high frequency access to large directories, including massive files and subdirectories. When the number of files under a directory reaches tens of thousands or even hundreds of thousands, the traditional metadata storage and access mechanism is difficult to meet the high concurrency and low latency requirements, and becomes a key bottleneck for limiting CFS performance. In current related implementations, CFSs typically employ a decentralized storage policy when handling large directory metadata. Specifically, the system discretely stores directory entries and corresponding index node (inode) information on a plurality of data blocks or different storage nodes to achieve distributed management of metadata. In the directory traversal or statistics operation process, the system needs to traverse directory entries one by one, and obtain inode information corresponding to each file by searching index or mapping relation. However, the above implementation is difficult to meet the efficient metadata access in a massive file scenario. Disclosure of Invention The application provides a directory entry processing method, equipment, a storage medium and a program product, which are used for solving the technical problem of efficient metadata access under a massive file scene. In a first aspect, the present application provides a method for processing a directory entry, which is applied to a cluster file system, where the cluster file system includes a resource group, and the resource group is a basic management unit for metadata and data storage, and the method includes: receiving an operation for creating or adding a catalog item; updating a directory entry count in response to the operation, the directory entry count being used to record a total number of files and subdirectories under a target directory, the target directory being the directory for which the operation is directed; in the case where the value of the directory entry count satisfies the large directory condition, a continuous space including N blocks is allocated in the resource group, and a new directory entry generated based on the operation is stored in the continuous space, where N is greater than 1. In this embodiment, when the system recognizes that the directory size reaches the large directory threshold value through the dynamic counter, the storage mode of the subsequent newly added metadata is switched from the default random dispersion mode to the concentrated continuous mode. The method provides a basis for the subsequent efficient large-directory traversal operation, so that metadata which originally needs a large amount of random I/O reading can be quickly acquired through a small amount of sequential I/O, and the problem of IO amplification is fundamentally relieved. In one possible implementation, the value of the directory entry count meets a large directory condition, and the method comprises the steps that the value of the directory entry count is larger than or equal to a preset value, a resource group comprises a metadata area and a common data area, the metadata area is used for storing metadata of the directory entry, the common data area is used for storing data of a common file, and a continuous space comprising N blocks is distributed in the resource group, wherein the continuous space comprises: A contiguous space comprising N blocks is allocated in the metadata area of the resource group. In this implementation, the metadata of the large catalog is stored centrally and continuously in a dedicated physical area through the collaborative design of the storage structure, recognition logic and allocation mechanism. The random scattered layout of metadata in the traditional scheme is changed, so that a large number of random disk I/O operations originally needed are converted into a small number of efficient sequential I/O operations when the subsequent directory traversal or space statistics is carried out on mass files, and the metadata access performance of the system under a large directory scene is improved. In one possible implementation, the method further includes: Receiving a directory traversal instruction for a target directory; acquiring metadata of a directory entry based on a directory traversal instruction, wherein the metadata of the directory entry comprises a name of the directory entry and an address of an index block; And under the condition that the continuous space where the address of the index block is located indicates th