CN-122018787-A - Method for effectively improving read-write efficiency and concurrency of small file use storage
Abstract
The invention discloses a method for effectively improving the read-write efficiency and concurrency of small files in use storage, which comprises the following steps of merging a plurality of small files in storage and packing the small files into a storage block; and introducing a lock mechanism and controlling the read-write priority to realize concurrent control, ensuring that concurrent access of a plurality of threads or processes to small files can not generate data conflict, and simultaneously maximizing concurrency capacity. According to the method, the small files are combined to reduce storage fragments, the access response speed is improved by optimizing the scheduling algorithm, the concurrency is controlled to avoid data conflict, the reading and writing efficiency and the concurrency processing capacity of small file storage are effectively improved, and the method is suitable for small file dense storage scenes.
Inventors
- WANG SHIWEI
- ZHAO TIANCHENG
- WU HANGFENG
- SHENG GUOLIN
- LV LIANXIN
Assignees
- 杭州联汇科技股份有限公司
Dates
- Publication Date
- 20260512
- Application Date
- 20251230
Claims (5)
- 1. A method for effectively improving the read-write efficiency and concurrency of small file use storage is characterized by comprising the following steps: Step one, merging a plurality of small files in storage, and packaging the small files into a storage block; Step two, adjusting a read-write scheduling algorithm according to the access mode of the file, and ensuring that the frequently accessed file is loaded preferentially; and thirdly, introducing a lock mechanism and read-write priority control to realize concurrency control, ensuring that concurrent access of a plurality of threads or processes to small files can not generate data conflict, and simultaneously maximizing concurrency capability.
- 2. The method for effectively improving the read-write efficiency and concurrency of the small file use storage according to claim 1, wherein the specific way of merging the plurality of small files in the first step is as follows: setting the size of each file in the storage system to be Wherein i is a file index, and the size of each file satisfies: Wherein, the Is the maximum capacity of the storage block, and then the files to be combined are stuffed into the storage block to complete the file combination.
- 3. The method for effectively improving the read-write efficiency and concurrency of using storage of small files according to claim 2, wherein in the step of merging a plurality of small files, a merging factor M is defined to represent the number of files merged in a storage block, and the method is characterized in that: this merge factor M needs to be dynamically adjusted to accommodate the variation in file sizes to ensure that as many small files as possible are merged without exceeding the memory block size.
- 4. A method for effectively improving the read-write efficiency and concurrency of small file use storage according to any one of claims 1-3, wherein the specific mode of adjusting the read-write scheduling algorithm in the second step is as follows, setting the priority function of the access mode And Representing the priorities of the read and write operations, respectively, a priority function is calculated based on the history of file accesses and the access frequency: Wherein, the And The read-write rate of the file is calculated as the ratio of the access frequency of the file to the past read or write time.
- 5. The method for effectively improving the read-write efficiency and concurrency of small file usage storage according to any one of claims 1 to 3, wherein in the concurrency control of step three, a thread concurrency degree is set Representing the maximum number of concurrent threads that the system can handle simultaneously. The concurrency control strategy ensures the atomicity of each read-write operation through a lock mechanism, and defines a concurrency scheduling function Its value depends on the current number of threads of the system and the task queue size: Wherein, the Is a file A corresponding waiting access queue length.
Description
Method for effectively improving read-write efficiency and concurrency of small file use storage Technical Field The invention relates to the technical field of storage, in particular to a method for effectively improving the read-write efficiency and concurrency of small file use storage. Background Traditional storage systems have not been optimized for large numbers of small file scenarios at the beginning of the design, and have faced significant performance bottlenecks in processing such files. On one hand, low read-write efficiency is represented by unmatched storage medium characteristics and small file random access modes, so that single file operation needs multiple independent storage accesses to amplify IO overhead, and on the other hand, concurrent access management is poor because a centralized metadata architecture is easy to form an access hot spot, metadata inquiry and operation requirements under high concurrency are difficult to support, and the overall system response capability is further influenced. These problems are more and more prominent when the data scale is increased, and the improvement of the efficiency of related application scenes is restricted. Disclosure of Invention Aiming at the defects existing in the prior art, the invention aims to provide a method for effectively improving the read-write efficiency and concurrency of small file use storage, and solves the performance bottleneck problem when a traditional storage system processes a large number of small files through a file merging, scheduling algorithm optimization and concurrency control mechanism. In order to achieve the purpose, the invention provides a technical scheme that the method for effectively improving the read-write efficiency and concurrency of the small file use storage comprises the following steps: Step one, merging a plurality of small files in storage, and packaging the small files into a storage block; Step two, adjusting a read-write scheduling algorithm according to the access mode of the file, and ensuring that the frequently accessed file is loaded preferentially; and thirdly, introducing a lock mechanism and read-write priority control to realize concurrency control, ensuring that concurrent access of a plurality of threads or processes to small files can not generate data conflict, and simultaneously maximizing concurrency capability. As a further improvement of the present invention, the specific way of merging the plurality of small files in the step one is as follows: setting the size of each file in the storage system to be Wherein i is a file index, and the size of each file satisfies: Wherein, the Is the maximum capacity of the storage block, and then the files to be combined are stuffed into the storage block to complete the file combination. As a further improvement of the present invention, in the step of merging a plurality of small files, a merging factor M is defined, which represents the number of files that are merged in one storage block, and the method satisfies the following conditions: this merge factor M needs to be dynamically adjusted to accommodate the variation in file sizes to ensure that as many small files as possible are merged without exceeding the memory block size. As a further improvement of the invention, the specific mode of adjusting the read-write scheduling algorithm in the second step is as follows, setting the priority function of the access modeAndRepresenting the priorities of the read and write operations, respectively, a priority function is calculated based on the history of file accesses and the access frequency: Wherein, the AndThe read-write rate of the file is calculated as the ratio of the access frequency of the file to the past read or write time. As a further improvement of the present invention, in the concurrency control of the third step, a thread concurrency degree is setRepresenting the maximum number of concurrent threads that the system can handle simultaneously. The concurrency control strategy ensures the atomicity of each read-write operation through a lock mechanism, and defines a concurrency scheduling functionIts value depends on the current number of threads of the system and the task queue size: Wherein, the Is a fileA corresponding waiting access queue length. The method has the advantages that a plurality of small files are combined into the storage block, IO operation times of a storage system are reduced, reading and writing efficiency is remarkably improved, a scheduling algorithm is adjusted according to an access mode, frequently accessed files are loaded preferentially, resource allocation is optimized, a lock mechanism and priority control are introduced to realize concurrency control, data consistency of multi-thread access is guaranteed, concurrency capacity is maximized at the same time, and performance bottleneck of a traditional storage system is effectively solved through multi-dimensional collaborative optimi