Search

CN-121658437-B - Distributed file system recycle bin performance optimization method

CN121658437BCN 121658437 BCN121658437 BCN 121658437BCN-121658437-B

Abstract

The invention relates to a distributed file system recycle bin performance optimization method, which sequentially completes recycle bin function starting judgment, file erasability verification and recycle bin storage suitability judgment in the Linux file system unlink interface execution process, acquires metadata of a target deletion file parent directory from a server, synchronously generates a global unified root recycle bin directory when a file system is created, creates a corresponding recycle bin subdirectory in a target MDS based on a parent directory inode number when the file is deleted, executes atomic operation through a Linux standard rename interface, moves the file to the corresponding recycle bin subdirectory, limits the operation to the inside of a single MDS, extracts the parent directory inode number when the file is restored, iteratively inquires the metadata to restore an original complete path, directly integrally moves the corresponding directory of the recycle bin if the original directory does not exist or needs to be restored to a new directory when the file is restored in batches, and otherwise executes restoration operation by files. The file deleting and recovering performance is remarkably improved, the resource cost is reduced, and the data consistency is ensured.

Inventors

  • YAN XIN
  • DONG BO
  • CAO XUEGUI
  • HUANG YAONIAN

Assignees

  • 四川省华存智谷科技有限责任公司

Dates

Publication Date
20260505
Application Date
20260206

Claims (10)

  1. 1. The distributed file system recycle bin performance optimization method is characterized by comprising the following steps of: S1, in the execution process of an unlink standard processing interface of a Linux file system, sequentially judging whether a directory where a file is located enables a recycle bin function, whether the file can be deleted or not, and whether the file needs to be put into the recycle bin or not, and acquiring metadata of a father directory of a target deletion file from a server through a file path, wherein the metadata comprises father directory inode numbers; When the file deleting operation is detected, the index number of the father directory inode is used as a directory name, the index number of the MDS to which the father directory metadata belongs is used as a directional parameter, and a directory creating request is sent to a target MDS; S3, the client initiates an atomic mobile operation based on a Linux file system standard rename interface and directly transfers the target deleted file to a recycle bin directory; S4, when file recovery operation is initiated, based on a path of the target file in the recycle bin, extracting a parent directory name in the path, namely a parent directory inode number, inquiring corresponding metadata through the inode number, further acquiring an inode number of a higher-level parent directory, and performing iterative traversal to obtain a complete path from the original directory to the root directory; S5, acquiring an original directory of the deleted file, if the original directory path does not exist or needs to be restored to a specified new directory, initiating rename atomic operation to integrally move the corresponding directory in the recycle bin to a target position, and if the original directory path exists, repeating the operation of the step S4 for each file in the directory.
  2. 2. The method for optimizing the performance of a distributed file system recycle bin according to claim 1, wherein the specific process of sequentially determining in step S1 whether the directory in which the file is located enables the recycle bin function, whether the file can be deleted, and whether the file needs to be placed in the recycle bin is as follows: s11, judging whether a directory where the file is located enables a recycle bin function or not through directory metadata inquiry and configuration; s12, confirming whether the file is allowed to be deleted or not through permission verification, file occupation state detection and locking state detection; s13, confirming whether the system needs to be brought into the recycle bin management or not through system key file detection and storage strategy matching; And S14, outputting a final judging result, namely judging that the file is required to be put into a recycle bin if the system key file detection, the locking state detection and the storage strategy matching detection are passed, entering a subsequent recycle bin directory creation and file movement process, and terminating the recycle bin process if any detection is not passed, and executing file deletion according to the conventional unlink logic.
  3. 3. The method for optimizing the performance of a distributed file system recycle bin according to claim 2, wherein the specific process of step S12 is as follows: S121, acquiring a user account initiating an unlink operation and a user group to which the user account belongs, inquiring a permission control list of a target file, and confirming whether the user has a deletion permission; S122, judging the occupied state of the file by checking whether the reference count of the file is 0, wherein 0 represents unoccupied, 1 represents occupied by a system or an application, and if no occupied condition exists, entering into the verification of S123; s123, checking whether the file is provided with the special attribute which can not be deleted, if so, judging that the file can not be deleted no matter what the authority is, and if not, judging that the file can be deleted through the authority and occupation detection, and entering the next judgment.
  4. 4. The method for optimizing the performance of a distributed file system recycle bin according to claim 2, wherein the specific process of obtaining the metadata of the target deletion file parent directory from the server through the file path in step S1 is as follows: S15, analyzing the file path and locating the parent directory, including file path standardization processing, extracting the parent directory path and generating a parent directory path check code; s16, routing to the target MDS and inquiring metadata; And S17, the target MDS performs integrity check on the metadata, after the verification is passed, the target MDS uses a node private key to digitally sign the metadata, and then packages the signed metadata, signature information and response time stamp into a response packet, and returns the response packet to the client.
  5. 5. The method for optimizing distributed file system recycle bin performance according to claim 4, wherein the specific process of step S16 is as follows: s161, positioning MDS nodes responsible for the father catalogue through path routing; S162, initiating a parent directory metadata query request to a target MDS; s163, the target MDS performs the metadata query.
  6. 6. The method for optimizing the performance of a distributed file system recycle bin according to claim 1, wherein the specific process of synchronously generating the global unified root recycle bin directory for storing all deleted files at the time of file system creation in step S2 is as follows: S21, presetting configuration of directory parameters of a root recycle bin, wherein the configuration comprises directory paths, directory identifications, belonging MDS node policies and upper limit of storage quota; s22, creating and registering a root recycle bin directory, wherein the root recycle bin directory comprises cluster initialization coordination, determining a root directory master management MDS, and executing root directory initialization creation, master-slave node metadata synchronization and consistency verification and full cluster directory reachability registration by a master MDS node; And S23, authority configuration and function activation, including authority policy binding, storage policy activation, function availability verification and creation result feedback.
  7. 7. The method for optimizing the performance of a distributed file system recycle bin according to claim 1, wherein the specific process of step S3 is as follows: s31, the client gathers key data of the preamble step, determines rename core parameters of operation, including a source path, a target path and a target MDS index, and performs atomic operation pre-configuration; S32, initiating and executing an atomicity rename operation; And S33, atomicity rename operation result feedback is synchronous with the state.
  8. 8. The method for optimizing distributed file system recycle bin performance according to claim 7, wherein the specific process of step S32 is as follows: S321, establishing a dedicated communication channel with the target MDS; S322, constructing and sending rename atomic operation requests; s323, the target MDS executes the atomic movement operation.
  9. 9. The method for optimizing the performance of a distributed file system recycle bin according to claim 1, wherein the specific process in step S4 is as follows: S41, analyzing a recycle bin path and extracting a core inode number, namely when a client receives a file recovery request, acquiring a complete path of a target file in a recycle bin, and extracting an original parent directory inode number; S42, iteratively inquiring metadata to trace back a complete original path; And S43, binding the restored original complete path with the corresponding recycle bin path, caching the restored original complete path to the local of the client, and directly multiplexing the cached path if the subsequent same file initiates the restoration operation again, thereby reducing the cost of repeated iterative query.
  10. 10. The method for optimizing distributed file system recycle bin performance according to claim 9, wherein the specific process of step S42 is as follows: s421, initializing traceability parameters, wherein the traceability parameters comprise the inode number, the path component list, the termination mark and the positioning home MDS node to be queried currently; s422, circularly and iteratively querying the upper level inode and splicing paths: s423, splicing and verifying the original complete path.

Description

Distributed file system recycle bin performance optimization method Technical Field The invention belongs to the technical field of recycle bin performance optimization, and particularly relates to a distributed file system recycle bin performance optimization method. Background With the wide application of the distributed file system in the fields of big data, cloud computing and the like, the deletion and recovery operation of files becomes one of the core scenes of the daily operation and maintenance of the system. The recycle bin is used as a key function for guaranteeing data safety, and the design rationality of the recycle bin directly influences the performance and user experience of the distributed file system. In the prior art, the recycle bin implementation of the distributed file system mainly has two modes, namely, one mode is to move the file to the recycle bin when deleting the file, record the original file path in the file attribute for subsequent recovery, and the other mode is to store the file path information to an independent database and finish recovery operation by inquiring the database. However, distributed file systems typically deploy multiple metadata servers (MDSs), each MDS hosting a portion of the file system metadata, with significant drawbacks to the prior art approaches described above: the deleting operation has high delay, namely when the file is moved to the recycle bin, the file needs to be coordinated across a plurality of MDSs, so that the response delay of the deleting operation is obviously increased, and particularly under a high concurrency deleting scene, the system performance bottleneck is prominent; the recovery process is complex and low-efficiency, when the files are recovered, the file expansion attribute or the path information stored in the database is required to be additionally read, the complexity of metadata operation is increased, and the recovery delay is further prolonged; The resource expense is large, the path information is stored in the database or is additionally recorded in the file attribute, so that the additional storage resource is occupied, the difficulty of maintaining the data consistency is increased, and the problem that the file cannot be recovered due to the loss or disorder of the path information is easily caused; The compatibility and the atomicity are insufficient, namely, part of schemes do not adopt a Linux file system standard interface, so that the ecological compatibility with the existing system is poor, the atomicity guarantee is lacking in the file moving process, and intermediate states such as file damage, path loss and the like can occur. Therefore, how to optimize the deletion and recovery flow of the recycle bin under the distributed file system architecture, reduce the cross-MDS coordinated operation, reduce the metadata operation overhead, improve the batch recovery performance, and ensure the atomicity and the data consistency of the operation at the same time becomes the technical problem to be solved currently. Disclosure of Invention The invention aims to provide a distributed file system recycle bin performance optimization method, which is used for optimizing the deletion and recovery flow of a recycle bin under the architecture of a distributed file system, reducing cross MDS coordinated operation, reducing metadata operation overhead, improving batch recovery performance and guaranteeing the atomicity and data consistency of operation. The technical scheme adopted by the invention is as follows: a distributed file system recycle bin performance optimization method comprises the following steps: S1, in the execution process of an unlink standard processing interface of a Linux file system, sequentially judging whether a directory where a file is located enables a recycle bin function, whether the file can be deleted or not, and whether the file needs to be put into the recycle bin or not, and acquiring metadata of a father directory of a target deletion file from a server through a file path, wherein the metadata comprises father directory inode numbers; When the file deleting operation is detected, the index number of the father directory inode is used as a directory name, the index number of the MDS to which the father directory metadata belongs is used as a directional parameter, and a directory creating request is sent to a target MDS; S3, the client initiates an atomic mobile operation based on a Linux file system standard rename interface and directly transfers the target deleted file to a recycle bin directory; S4, when file recovery operation is initiated, based on a path of the target file in the recycle bin, extracting a parent directory name in the path, namely a parent directory inode number, inquiring corresponding metadata through the inode number, further acquiring an inode number of a higher-level parent directory, and performing iterative traversal to obtain a complete path from the origi