CN-122018783-A - F2FS garbage recycling optimization method based on ZNS SSD
Abstract
The invention relates to an F2FS garbage collection optimization method based on ZNS SSD, which comprises the steps of designing threads to monitor the data failure rate of ZNS SSD partitions according to the intensity of writing load, and adaptively adjusting the sampling period of the monitoring threads and a model formula for predicting the partition data failure rate so as to predict the data failure rate of each partition of the ZNS SSD. And selecting a proper partition for recycling according to the partition data effective rate, the partition data failure rate and the current writing load intensity. According to various characteristics of the data to be migrated, defining the heat of the data, clustering the data to be migrated, dividing the heat, and classifying and placing the data to be migrated into special partitions. In addition, the data after migration is uniformly managed, so that redundant garbage recovery can be obviously reduced, and the running efficiency of the system is improved.
Inventors
- LONG LINBO
- SHI KAIDI
- RAN JING
- SHEN JINGCHENG
- WANG HAOWEN
Assignees
- 重庆邮电大学
Dates
- Publication Date
- 20260512
- Application Date
- 20251218
Claims (5)
- 1. The F2FS garbage collection optimization method based on ZNS SSD is characterized by comprising the following steps: S1, designing a prediction method of the ZNS SSD partition data failure rate based on the load according to the load writing strength of F2FS so as to predict the data failure rate of each partition of the ZNS SSD; S2, according to the partition data effective rate, the partition data failure rate and the current F2FS writing load intensity, providing a garbage recycling partition selection method based on the characteristics of multiple data, and selecting a proper partition for recycling so as to reduce the data migration quantity; s3, aiming at the data to be migrated of garbage recovery, providing a partition allocation and data placement method based on the heat of the migrated data; And S4, providing a unified management method of the migrated data aiming at the migrated data so as to adapt to the whole garbage collection mechanism and reduce redundant garbage collection.
- 2. The garbage collection optimization method based on the multiple data features of claim 1, wherein the predicting method of the failure rate of the load-based ZNS SSD partition data in step S1 specifically includes: s11, constructing a partition data failure rate ZIR monitoring thread, and initializing the sliding window size and sampling frequency T of the monitoring thread; s12, at the work load which changes subsequently, adjusting the ZIR thread sampling frequency T for different load intensities, wherein the method comprises the following steps: The Current IOPS represents the Current IOPS, the IOPS max is the highest possible system, k is the adjusting coefficient, 1-2 is taken, T low represents the lowest sampling frequency, and T high represents the highest sampling frequency; s11, constructing a partition data failure rate ZIR monitoring thread, and initializing the sliding window size and sampling frequency T of the monitoring thread; s12, at the work load which changes subsequently, adjusting the ZIR thread sampling frequency T for different load intensities, wherein the method comprises the following steps: S13, after the ZIR monitoring thread is started, calculating the current ZIR according to the effective data volume of the subareas at different moments, wherein the method comprises the following steps: Wherein valid_blks T1 represents the effective data amount of the partition at a certain time, and valid_blks T1+T represents the effective data amount of the partition after the time T passes from the time T1; S14, predicting ZIR by using two models, namely, exponentially Weighted Moving Average (EWMA) and linear regression. In the case of medium and low intensity write loads, such as write bandwidths below 70% of maximum bandwidth, linear regression prediction is used due to stable trend of data failure, and in the case of high intensity write loads, such as write bandwidths above or equal to 70% of maximum bandwidth, exponential Weighted Moving Average (EWMA) is used to respond to short term trends due to unstable trend of data failure.
- 3. The garbage collection optimization method based on the multiple data features according to claim 1, wherein the garbage collection partition selection method based on the multiple data features in step S2 specifically comprises: And S21, calculating the GC_Urgery of the garbage collection emergency degree, the effective data block duty ratio of the subarea and the failure rate of the subarea data according to the writing load intensity of the current system. Designing a garbage collection partition selection algorithm based on multiple data characteristics; s22, after garbage collection is started, traversing all partitions to be collected, and calculating the collection expense for each partition based on a designed new partition collection algorithm; S23, selecting the partition to be recycled with the minimum recycling cost for data migration by comparing the recycling cost of each partition to be recycled.
- 4. The garbage collection optimization method based on the multiple data features of claim 1, wherein the partition allocation and data placement method based on the migration data heat in step S3 specifically comprises: S31, recording three data characteristics of data to be migrated to define the heat degree of the data, wherein the more frequently the data is updated, the higher the heat degree is, the more recently the data is updated, the higher the heat degree is, the future failure rate zir pre of the data is, the higher the future failure rate of the data is, and the higher the heat degree is; s32, setting 3 center points of the k-means algorithm according to the three data characteristics recorded in the step S31 S33, for each data to be migrated, calculating the distances between the data characteristics and 3 clustering centers respectively, and distributing the data to the clusters to which the closest clustering centers belong; S34, for each cluster, calculating the average value of the data characteristics in the cluster, and taking the average value as a new cluster center point; And S35, repeatedly executing the step S33 and the step S34 until the cluster center point is not changed any more or the maximum iteration number is reached. The data to be migrated is classified into three types, namely cold data, warm data and hot data. S36, adding a partition type GC Zone which is specially used for placing migrated data; and S37, judging the type of the data according to the clustering result when the garbage is recovered and migrated, and placing the data into the GC Zone.
- 5. The garbage collection optimization method based on the multiple data features of claim 1, wherein the step S4 migration data unified management method specifically comprises the following steps: And S41, when garbage is recovered, selecting the partition of the same type from the GC Zone of the partition specially storing the migration data according to the type of the migration data, and if the partition of the same type does not exist, selecting one partition from the rest idle partitions and adding the partition into the GC Zone. S42 when the number of GC Zone reaches 6 we prefer to remove the partition that has undergone more garbage collection rounds, if two partitions have undergone the same number of garbage collection rounds, then the partition with lower ZIR is removed. S43, in the foreground GC, we choose to skip the GC Zone because the GC Zone stores data after migration, so as to avoid failure in a short time. The partitions are collected uniformly in the background garbage collection.
Description
F2FS garbage recycling optimization method based on ZNS SSD Technical Field The invention belongs to the technical field of computer storage, and particularly relates to an F2FS garbage collection optimization method based on ZNS SSD. Background With the commercial deployment of NVMe Zoned Namespace (ZNS) solid state storage technology, collaborative optimization between storage devices and file systems is faced with systematic reconstruction. The ZNS SSD allows the host side to directly control the data layout by exposing the partition Zone interface of the physical erase block alignment, which radically changes the design premise of the traditional garbage collection (Garbage Collection, GC) mechanism. Under the 4KB random writing scene, the ZNS SSD can reduce the writing amplification by about 58 percent compared with the traditional FTL SSD, and a new hardware foundation is provided for constructing a high-performance persistent storage system. The ZNS SSD requires data to be written strictly sequentially within a physical Zone, and erase operations can only be performed with granularity of the entire Zone (typically hundreds of MB to GB level). Flash-FRIENDLY FILE SYSTEM (F2 FS) is a specially designed Log-Structured file system for Flash memory, which uses its LFS (Log-Structured FILE SYSTEM) mode to force sequential writing of data, a feature that makes it very compatible with ZNS SSDs. Currently, F2FS has increased the basic support for ZNS SSDs. However, the GC mechanism of F2FS has not been optimized for the Zone characteristics of the ZNS SSD, resulting in significant degradation of GC efficiency of conventional F2FS on the ZNS SSD, especially in write-intensive workloads, frequent GC may cause severe performance degradation. On the one hand, in a write-intensive workload, F2FS frequently triggers a foreground GC, and a Zone with relatively high invalid data content is selected as a reclaimed Zone (greedy algorithm), and the Zone with relatively high invalid data content often stores hot data, however, after migration, valid data in the Zone is likely to fail again in a short time, and the migration data may be migrated again in a subsequent GC, so that a large number of repeated data migration is caused. Therefore, in the GC process of F2FS, the overhead of the recovered Zone should be carefully considered, and the influence of GC is further reduced. On the other hand, when the data migration is performed by the GC, the F2FS regards all the migrated valid data as cold data for migration, and in fact, the cold and hot degrees of the data may have a great difference, which results in that the data with different cold and hot degrees are mixed and placed in the same Zone, and different cold and hot data are stored in the same Zone in an interleaving manner, so that the frequency of GC triggering is further increased. Disclosure of Invention In order to solve the problems in the background art, the invention provides a ZNS SSD-based F2FS garbage collection optimization method, which comprises the following steps: s1, starting threads to monitor the data failure rate of the partition, and based on different load types, adaptively adjusting the sampling period of the monitoring threads and a model formula for predicting the partition data failure rate so as to acquire accurate partition data failure rate; s2, selecting a proper partition for recycling based on partition data effective rate, partition data failure rate and current load intensity when selecting the partition to be recycled; S3, defining the heat of the data according to various characteristics of the data to be migrated, clustering the data to be migrated, dividing the heat, and classifying and placing the data to be migrated into special partitions; And S4, uniformly managing the migrated data, and adapting the whole garbage recycling mechanism to reduce redundant garbage recycling. The invention has at least the following beneficial effects According to the method, the multi-characteristic of the data is comprehensively analyzed, and the GC mechanism is optimized accordingly, so that the overall performance of the ZNS SSD can be remarkably improved. By designing a proper monitoring thread, important characteristics of data are collected, and the monitoring thread can be adaptively adjusted and designed according to different loads. Based on the characteristic recovery algorithm, the method selects a proper Zone for recovery, and can reduce redundant garbage recovery. The invention clusters data based on multiple characteristics, optimizes a data migration mechanism, and uniformly manages the migrated data to adapt to the whole GC mechanism. In summary, the garbage collection optimization method based on the multiple data features provided by the invention realizes the optimization of the GC mechanism by comprehensively considering the multiple features of the data, can effectively reduce the data migration amount and the Zone erasure