CN-121996615-A - Distributed file management system and method
Abstract
A distributed file management system and method belong to the technical field of computer software, wherein the system adopts a tree hierarchy structure to organize files and comprises an application layer, a file system layer and an infrastructure layer, wherein the application layer is used for receiving file operation requests from a plurality of independent clients, each request carries a unique application identifier, the file system layer comprises a distribution, operation, metadata management, daily, timing tasks and gateway modules, the requests are routed to corresponding logic storage partitions in a data layer according to the application identifiers, the data layer comprises a database cluster and an object storage cluster, the metadata and the file data are logically isolated and stored and managed according to the application identifiers, and the infrastructure layer provides extensible servers and network resources. The invention is based on a distributed architecture, realizes load balancing and efficient access through a data storage partition, metadata management and index technology and a client load balancing technology, and improves the expandability, fault tolerance and performance of the system.
Inventors
- ZHUANG LI
Assignees
- 华泰证券股份有限公司
Dates
- Publication Date
- 20260508
- Application Date
- 20251219
Claims (9)
- 1. A distributed file management system is characterized by comprising an application layer, a file system layer, a data layer and an infrastructure layer, wherein the application layer is used for receiving file operation requests from a plurality of independent clients, each request carries a unique application identifier of an application program from which the request is sent, the file system layer is coupled to the application layer and comprises a configuration module, an operation module, a metadata management module, a log module, a timing task module and a gateway module, the logic storage partition corresponding to the application identifier is used for routing the file operation requests to the data layer, the data layer is coupled to the file system layer and comprises a database cluster and an object storage cluster, the persistence storage and management are used for carrying out logic isolation on file metadata and file data from different applications according to the application identifier, and the infrastructure layer is used for providing extensible servers and network resources for the data layer and the file system layer.
- 2. A distributed file management system according to claim 1 wherein the application layer comprises an application software development kit providing a programming interface for file operations for the application.
- 3. The distributed file management system of claim 2, wherein in the file system layer, the configuration module is connected with the database, and is configured to receive an externally input configuration command, process configuration information according to the corresponding configuration command, store configuration information in the database, and request the log module to add operation log information; the operation module is connected with the database and the object storage and is used for receiving externally input file inquiry, search and download commands, inquiring information from the database according to the commands and acquiring file content information from the object storage; the metadata management module is connected with the application program, the gateway module, the log module and the database and is used for executing metadata operation commands of the file, including metadata information of the file which is newly added, inquired, modified and deleted; The log module is connected with the configuration module, the metadata management module and the database and is used for receiving an externally input newly-added log command and an externally input query log command, and storing operation log information in the database according to the newly-added command or the query log information; The timing task module is connected with the configuration module, the metadata management module, the database and the object storage and is used for executing delayed deletion of the file and cleaning of the recycle bin according to a preset time period; The gateway module is connected with the configuration module, the metadata base management module, the log module and the object storage and is used for providing communication between the modules based on HTTP or HTTPS protocols.
- 4. The distributed file management system of claim 3 wherein in the data layer, the database cluster is used for persisting configuration information, file metadata information, delayed deletion information, timing task execution history information and operation log information; The object storage cluster is used for storing file data bodies and supporting an S3 protocol, the object storage cluster realizes logic isolation of file data by distributing independent storage buckets for file data of different applications, the object storage cluster supports a host/rack level fault domain by adopting a multi-copy and erasure code redundancy mechanism, and CRC (cyclic redundancy check) is carried out when data are read and written on line so as to prevent data read and write errors.
- 5. A distributed file management system according to claim 4, wherein the following data distribution policies are specifically employed: Metadata is distributed to different database nodes according to the application, and is mapped to different database tables according to file identifications through hash functions, so that uniform data distribution is realized; And when uploading a large file, dividing the file into data blocks with fixed size by means of a blocking mechanism of object storage, and distributing the data blocks to different storage nodes to realize load balancing of the data block level.
- 6. A distributed file management system according to claim 3 wherein service registration and discovery coordination and client load balancing techniques are employed at the application SDK end; Registering addresses and metadata to a service registration center when each module of the system is started, and dynamically inquiring the service registration center by a client to obtain an available service list so as to realize the elastic discovery of the service addresses; the client load balancing technology is characterized in that the client autonomously selects a target module address from an available service list for request distribution according to different load balancing strategies including polling, random and weighting so as to realize load balancing.
- 7. A distributed file management system according to claim 6 wherein zero copy, non-blocking IO, IO multiplexing and connection multiplexing techniques are used in the application SDK.
- 8. The distributed file management system according to claim 1, wherein the distributed file management system provides a configuration page, an operation page and a log query page which interact with a user, the configuration page, the operation page and the log query page are respectively connected with the configuration module, the operation module and the log module, the user can perform corresponding operations through corresponding pages after passing identity authentication and permission verification, and the configuration page, the operation page and the log query page support interaction forms including a browser.
- 9. A method of distributed file management, characterized in that it is based on a distributed file management system according to any of claims 1 to 8, comprising the steps of: step 1, receiving file operation requests from a client through an application layer, wherein each request carries a unique application identifier of an application program from which the request originates; Step 2, the file system service layer authenticates and analyzes the request, extracts the application identifier, and routes the request to the corresponding logic storage space in the data layer according to the application identifier: routing the metadata operation to a database partition corresponding to the application identifier for execution, and routing the metadata operation to an object storage space corresponding to the application identifier for execution; Step 3, executing the operation indicated by the file operation request in the corresponding storage partition in the data layer; And 4, recording log information of the current file operation, triggering a corresponding life cycle management task according to the operation type, and returning an execution result from the application layer to the client.
Description
Distributed file management system and method Technical Field The invention belongs to the technical field of computer software, and particularly relates to a distributed file management system and method. Background With the continuous development of business scale of securities and finance companies, data assets are rapidly increased in the process of IT system construction. Stock corporate contribution manuscript management systems are important information systems for storing and marketing sponsoring business work manuscript data, and particularly file storage is a core component for supporting the manuscript system to operate, and face a number of technical challenges. The number of files of the business system of the large securities institution is in the tens of millions to billions, the storage capacity of the files is in the range from TB to PB, the size of the files is in the range from KB to GB, file information needs to be accessed in real time, file contents need to be accessed in real time, and files and catalogues are organized in a tree hierarchy structure. The traditional single file server has limitations in aspects of expandability, availability, fault tolerance, performance, capacity and the like, and cannot meet the requirements. With the development of distributed computing and network technology, a road is paved for the development of a distributed file system. Currently, the common distributed file stores are mainly HDFS and Ceph. HDFS is a Hadoop distributed file system, a distributed file system for storing large-scale data, specifically designed to handle large data tasks in Hadoop. Files are divided into fixed-size data blocks (typically 128MB or 256 MB) in HDFS and are replicated on multiple datanodes. The method has high fault tolerance and expandability, and is suitable for large file storage. However, the HDFS is not suitable for low latency due to its design focus on large data storage and batch processing, and has low storage and management efficiency for large numbers of small files. Ceph is an open-source distributed storage system, and provides a solution for block storage, object storage and file storage, wherein the Ceph file system (CephFS) is a distributed file system constructed based on RADOS, and has high reliability, high fault tolerance, expandability and multiple access interfaces. However, the metadata server (MDS) of CephFS is a key component and is responsible for managing metadata of the file system, the risk of single-point failure exists, when the metadata server fails, the switching time is relatively long, the file system may not be available, when a plurality of clients access the file system concurrently, consistency and delay problems exist, if one client is in abnormal condition such as downtime and outage after reading the file, the other client is in abnormal condition such as hang-up when reading the file, and CephFS is in problem of performance degradation when managing a huge amount of files. Disclosure of Invention In order to solve the challenges and problems in terms of availability, expandability, performance, cost and the like faced by the prior art in managing, storing and applying mass files, the invention provides a distributed file management system and method, which aims to provide a logically unified file management view with tree-level for a plurality of independent applications, and solve the problems of availability, expandability, performance and cost in mass file storage and management through a distributed architecture, multidimensional data isolation, intelligent load balancing and data redundancy strategy. In order to achieve the above purpose, the invention adopts the following technical scheme: A distributed file management system adopts a tree hierarchy structure to organize files, and comprises an application layer, a file system layer, a data layer, an infrastructure layer and a network resource layer, wherein the application layer is used for receiving file operation requests from a plurality of independent clients, each request carries a unique application identifier of an application program from which the request is sent, the file system layer is coupled to the application layer and comprises a configuration module, an operation module, a metadata management module, a log module, a timing task module and a gateway module, the logic storage partition corresponding to the application identifier in the data layer is used for routing the file operation requests, the data layer is coupled to the file system layer and comprises a database cluster and an object storage cluster, the persistence storage and management are used for logically isolating file metadata and file data from different applications according to the application identifier, and the infrastructure layer is used for providing extensible servers and network resources for the data layer and the file system layer. Further, the application layer includes an ap