Search

CN-121996568-A - Hard disk garbage recycling method and device, electronic equipment and storage medium

CN121996568ACN 121996568 ACN121996568 ACN 121996568ACN-121996568-A

Abstract

The application discloses a method and a device for recycling hard disk garbage. The method comprises the steps of dividing a storage object to be recovered into a plurality of continuous intervals with fixed sizes, determining a start interval and an end interval according to the offset and the length of data to be migrated, dividing a reading scene into three types of single intervals, double intervals and multiple intervals based on interval position relations, wherein under the single interval scene, if the intervals are cached, the data are directly copied, otherwise, when the reading IOPS (object identifier) does not reach the upper limit, the data of the whole interval are read into the cache, the same logic is respectively executed on each interval in the double interval scene, the data are directly read in the multi-interval scene, the IOPS concurrency number is converted according to the actual length, and the effective data are migrated to a new storage object and the original object space is released. The application effectively reduces the number of hard disk read requests through an interval pre-reading mechanism, reduces the addressing overhead of a mechanical disk magnetic head, simultaneously reduces IOPS flow control conflict by using cache hit, and remarkably improves the garbage recovery efficiency of a storage system and the capacity stability of a cluster.

Inventors

  • ZHANG ZHEN

Assignees

  • 中电云计算技术有限公司

Dates

Publication Date
20260508
Application Date
20260407

Claims (10)

  1. 1. A method for recycling hard disk garbage, the method comprising: S1, dividing a storage object to be recovered into a plurality of continuous sections with fixed sizes; S2, determining a start interval and an end interval occupied by the data according to the offset and the data length of the data to be migrated in the storage object; S3, dividing a data reading scene into three types of single-interval scenes, double-interval scenes and multi-interval scenes based on the position relation between the starting interval and the ending interval, and executing differentiated reading strategies aiming at different types: (1) For a single interval scene, if the interval is read to a cache, directly copying data corresponding to the interval from the cache, if the interval is not read, reading complete data of the interval from a hard disk and loading the complete data to the cache when the read IOPS concurrency number does not reach an upper limit, and simultaneously adding the IOPS concurrency number, decrementing the IOPS concurrency number after reading is finished and marking the interval as a read state; (2) For a cross-double-interval scene, respectively executing reading and caching processing logic which is the same as that of the single-interval scene on each interval until the data reading of the two intervals is completed; (3) For a cross-multi-interval scene, directly reading data from a hard disk, converting the IOPS concurrency number according to the actual length of the data, and performing flow control without starting an interval cache mechanism; S4, based on the read effective data, migrating the effective data to a new storage object, and releasing the space occupied by the original storage object.
  2. 2. The method according to claim 1, wherein the size of the continuous section in step S1 is fixed to 1MB per section, and the size of the storage object is 128MB.
  3. 3. The method of claim 1, wherein the starting interval in step S2 is calculated as a downward rounding function, starting interval = Offset/interval size The ending interval is calculated as a downward rounding function (Offset+data Length-1)/Interval size 。
  4. 4. The method according to claim 1, wherein in step S3, the judging condition of the single-interval scene is that the start interval is equal to the end interval, the judging condition of the cross-double-interval scene is that the end interval minus the start interval is equal to 1, and the judging condition of the cross-multi-interval scene is that the end interval minus the start interval is greater than 1.
  5. 5. The method of claim 1, wherein the IOPS concurrency count across multi-region scenes in step S3 is converted into a downward rounding function by increasing concurrency count = Data actual length/interval size 。
  6. 6. The method according to claim 1, wherein before executing step S1, the method further comprises determining a storage object to be garbage collected, wherein the storage object stores user data and corresponding metadata, and the metadata comprises an offset and a data length of the user data in the storage object; before executing the step S3, the method further comprises the steps of checking metadata information corresponding to the user data stored in the storage object back to the index pool in advance, and comparing the metadata information with metadata in the storage object to determine the validity of the data.
  7. 7. The method of claim 1, wherein the method is applied to garbage collection of a hard disk pool in a distributed storage system employing redirection-on-write technology.
  8. 8. A hard disk waste recycling device, the device comprising: The interval segmentation module is used for segmenting the storage object to be recovered into a plurality of continuous intervals with fixed sizes; The interval positioning module is used for determining a starting interval and an ending interval occupied by the data according to the offset and the data length of the data to be migrated in the storage object; The reading strategy execution module is used for dividing the data reading scene into three types of single-interval scenes, double-interval scenes and multi-interval scenes based on the position relation between the starting interval and the ending interval, and executing differentiated reading strategies aiming at different types: (1) For a single interval scene, if the interval is read to a cache, directly copying data corresponding to the interval from the cache, if the interval is not read, reading complete data of the interval from a hard disk and loading the complete data to the cache when the read IOPS concurrency number does not reach an upper limit, and simultaneously adding the IOPS concurrency number, decrementing the IOPS concurrency number after reading is finished and marking the interval as a read state; (2) For a cross-double-interval scene, respectively executing reading and caching processing logic which is the same as that of the single-interval scene on each interval until the data reading of the two intervals is completed; (3) For a cross-multi-interval scene, directly reading data from a hard disk, converting the IOPS concurrency number according to the actual length of the data, and performing flow control without starting an interval cache mechanism; and the data migration module is used for migrating the read effective data to a new storage object based on the read effective data and releasing the space occupied by the original storage object.
  9. 9. A computer readable storage medium having stored thereon a computer program, which when executed by a processor, implements the steps of the hard disk garbage collection method according to any of claims 1-7.
  10. 10. An electronic device is characterized by comprising a memory and a processor; A memory for storing a computer program; A processor for executing the computer program to implement the steps of the hard disk garbage collection method as claimed in any one of claims 1 to 7.

Description

Hard disk garbage recycling method and device, electronic equipment and storage medium Technical Field The application belongs to the technical field of hard disk cleaning, and particularly relates to a hard disk garbage recycling method, a device, a computer readable storage medium and electronic equipment. Background In a storage system, in order to fully exert the writing performance of a hard disk, the industry commonly adopts a Redirect-on-Write (ROW) technology, and after scattered writing requests are aggregated into large-scale sequential I/O, the scattered writing requests are written into a pre-allocation object in a hard disk pool. In the distributed storage architecture, the write throughput is significantly improved by the additional write mode, so the ROW technique also uses the mechanism. However, the append write mechanism must generate a version of the garbage data when updating the same data block multiple times, relying on the garbage collection (Garbage Collection, GC) mechanism to free up dead space. Whereas ROW typically employs large object management (e.g., 128MB for a single object), conventional GC strategies perform data movement on objects whose garbage ratio exceeds a preset threshold by scanning the garbage amount inside the object, i.e., reading valid data from the source object, writing to a newly allocated object, and then deleting the source object to reclaim storage space. However, GC move efficiency is limited by read I/O performance constraints, in which too high a read IOPS can easily cause hard disk overload, resulting in performance degradation and even RPC blocking, while too low a read IOPS can not fully exploit concurrency advantages. In addition, for HDD mechanical hard disks, head seek time significantly affects read efficiency, and under the same data volume, single large I/O sequential read performance is significantly better than multiple small I/O random reads. Therefore, in the context of reading IOPS and bandwidth dual limitations, the conventional GC approach is difficult to achieve efficiency optimizations. Disclosure of Invention In order to solve the problems in the prior art, the application provides a novel hard disk garbage recycling method based on intelligent pre-reading, and aims to improve the hard disk garbage recycling efficiency. Specifically, the application provides the following technical scheme: the first aspect of the application provides a hard disk garbage recycling method, which comprises the following steps: S1, dividing a storage object to be recovered into a plurality of continuous sections with fixed sizes; S2, determining a start interval and an end interval occupied by the data according to the offset and the data length of the data to be migrated in the storage object; S3, dividing a data reading scene into three types of single-interval scenes, double-interval scenes and multi-interval scenes based on the position relation between the starting interval and the ending interval, and executing differentiated reading strategies aiming at different types: (1) For a single interval scene, if the interval is read to a cache, directly copying data corresponding to the interval from the cache, if the interval is not read, reading complete data of the interval from a hard disk and loading the complete data to the cache when the read IOPS concurrency number does not reach an upper limit, and simultaneously adding the IOPS concurrency number, decrementing the IOPS concurrency number after reading is finished and marking the interval as a read state; (2) For a cross-double-interval scene, respectively executing reading and caching processing logic which is the same as that of the single-interval scene on each interval until the data reading of the two intervals is completed; (3) For a cross-multi-interval scene, directly reading data from a hard disk, converting the IOPS concurrency number according to the actual length of the data, and performing flow control without starting an interval cache mechanism; S4, based on the read effective data, migrating the effective data to a new storage object, and releasing the space occupied by the original storage object. Further, in the method of the present application, the size of the continuous section in step S1 is fixed to 1MB per section, and the size of the storage object is 128MB. Further, in the method of the present application, the starting interval in step S2 is calculated as a downward rounding function, namely, the starting interval=Offset/interval sizeThe ending interval is calculated as a downward rounding function(Offset+data Length-1)/Interval size。 Further, in the method of the present application, in step S3, the judging condition of the single-interval scene is that the start interval is equal to the end interval, the judging condition of the cross-double-interval scene is that the end interval minus the start interval is equal to 1, and the judging condition of the cross-multi-int