Search

US-12625855-B1 - Bulk insertion of data into log-structured merge tree

US12625855B1US 12625855 B1US12625855 B1US 12625855B1US-12625855-B1

Abstract

Techniques are provided for managing LSM tree structures and enabling bulk insertion of data into LSM tree structures. An exemplary embodiment includes a method that is performed by a data management system. The data management system receives a plurality of data records generated at least in part by at least one background process, and generates a burst segment which comprises at least a portion of the plurality of data records. The data management system inserts the burst segment into a non-root level of a LSM tree structure.

Inventors

  • Ziv Dor
  • Sabina Rachev
  • Yosef SHATSKY
  • Xiaomei Liu

Assignees

  • DELL PRODUCTS L.P.

Dates

Publication Date
20260512
Application Date
20250124

Claims (20)

  1. 1 . A method, comprising: utilizing, by a data management system, a log-structured merge (LSM) tree structure as a metadata structure to manage metadata records used for searching and locating stored data in a storage system, wherein utilizing the LSM tree structure comprises: inserting, by the data management system, new segments of metadata records generated in response to user input/output (I/O) requests, into a root level of the LSM tree structure; receiving, by the data management system, a plurality of metadata records generated at least in part by at least one background process that is performed with regard to at least a portion of the stored data in the storage system; generating, by the data management system, a burst segment which comprises at least a portion of the plurality of metadata records generated at least in part by the at least one background process, wherein generating the burst segment comprises assigning a timestamp to the burst segment which corresponds to a time of creation of the burst segment; and inserting, by the data management system, the burst segment into a non-root level of the LSM tree structure at a given time based at least in part on the timestamp of the burst segment and a timestamp of at least one other segment in the non-root level.
  2. 2 . The method of claim 1 , wherein the background process comprises at least one of a deduplication process, a data migration process, and a data replication process.
  3. 3 . The method of claim 1 , wherein inserting the burst segment into the non-root level of the LSM tree structure comprises inserting the burst segment into the non-root level of the LSM tree structure subsequent to an existing segment in the non-root level such that the existing segment has at least one of (i) a timestamp which is less than the timestamp of the burst segment, and (ii) a timestamp which is greater than the timestamp of the burst segment, but which comprises a set of merged segments that includes at least one segment with a timestamp that is less than the timestamp of the burst segment.
  4. 4 . The method of claim 1 , further comprising: performing a merge operation to aggregate metadata records of a set of segments in the non-root level of the LSM tree structure into a new segment; assigning a timestamp to the new segment which corresponds to a timestamp of a segment in the set of segments which is deemed to be a newest segment based on the respective timestamps of the segments within the set of segments; and inserting the new segment with the assigned timestamp into a next non-root level of the LSM tree structure.
  5. 5 . The method of claim 1 , wherein: each new segment of metadata records inserted into the root level comprises a respective timestamp that corresponds to a time of creation of the new segment; and the new segments in the root level are arranged in time order based on the respective timestamps of the new segments.
  6. 6 . The method of claim 5 , wherein each segment inserted into the root level of the LSM tree structure, comprises sorted metadata records which correspond to the user I/O requests and which are destaged from a write cache.
  7. 7 . The method of claim 5 , further comprising: performing a merge operation to aggregate metadata records of a set of segments in the root level of the LSM tree structure into a new segment; assigning a timestamp to the new segment which corresponds to a timestamp of a segment in the set of segments which is deemed to be a newest segment based on the respective timestamps of the segments within the set of segments; and inserting the new segment with the assigned timestamp into a next level of the LSM tree structure.
  8. 8 . An article of manufacture comprising a non-transitory processor-readable storage medium having stored therein program code of one or more software programs, wherein the program code is executable by one or more processors to implement a method which comprises: utilizing, by a data management system, a log-structured merge (LSM) tree structure as a metadata structure to manage metadata records used for searching and locating stored data in a storage system, wherein utilizing the LSM tree structure comprises: inserting, by the data management system, new segments of metadata records generated in response to user input/output (I/O) requests, into a root level of the LSM tree structure; receiving, by the data management system, a plurality of metadata records generated at least in part by at least one background process that is performed with regard to at least a portion of the stored data in the storage system; generating, by the data management system, a burst segment which comprises at least a portion of the plurality of metadata records generated at least in part by the at least one background process, wherein generating the burst segment comprises assigning a timestamp to the burst segment which corresponds to a time of creation of the burst segment; and inserting, by the data management system, the burst segment into a non-root level of the LSM tree structure at a given time based at least in part on the timestamp of the burst segment and a timestamp of at least one other segment in the non-root level.
  9. 9 . The article of manufacture of claim 8 , wherein the background process comprises at least one of a deduplication process, a data migration process, and a data replication process.
  10. 10 . The article of manufacture of claim 8 , wherein the program code for inserting the burst segment into the non-root level of the LSM tree structure comprises program code for inserting the burst segment into the non-root level of the LSM tree structure subsequent to an existing segment in the non-root level such that the existing segment has at least one of (i) a timestamp which is less than the timestamp of the burst segment, and (ii) a timestamp which is greater than the timestamp of the burst segment, but which comprises a set of merged segments that includes at least one segment with a timestamp that is less than the timestamp of the burst segment.
  11. 11 . The article of manufacture of claim 8 , further comprising program code for: performing a merge operation to aggregate metadata records of a set of segments in the non-root level of the LSM tree structure into a new segment; assigning a timestamp to the new segment which corresponds to a timestamp of a segment in the set of segments which is deemed to be a newest segment based on the respective timestamps of the segments within the set of segments; and inserting the new segment with the assigned timestamp into a next non-root level of the LSM tree structure.
  12. 12 . The article of manufacture of claim 8 , wherein: each new segment of metadata records inserted into the root level comprises a respective timestamp that corresponds to a time of creation of the new segment; and the new segments in the root level are arranged in time order based on the respective timestamps of the new segments.
  13. 13 . The article of manufacture of claim 12 , wherein each segment inserted into the root level of the LSM tree structure, comprises sorted metadata records which correspond to the user I/O requests and which are destaged from a write cache.
  14. 14 . The article of manufacture of claim 12 , further comprising program code for: performing a merge operation to aggregate metadata records of a set of segments in the root level of the LSM tree structure into a new segment; assigning a timestamp to the new segment which corresponds to a timestamp of a segment in the set of segments which is deemed to be a newest segment based on the respective timestamps of the segments within the set of segments; and inserting the new segment with the assigned timestamp into a next level of the LSM tree structure.
  15. 15 . An apparatus comprising: at least one processor; and memory configured to store program code, wherein the program code is executable by the at least one processor to instantiate a data management system, wherein the data management system is configured to: utilize a log-structured merge (LSM) tree structure as a metadata structure to manage metadata records used for searching and locating stored data in a storage system, wherein utilizing the LSM tree structure comprises: insert new segments of metadata records generated in response to user input/output (I/O) requests, into a root level of the LSM tree structure; receive a plurality of metadata records generated at least in part by at least one background process that is performed with regard to at least a portion of the stored data in the storage system; generate a burst segment which comprises at least a portion of the plurality of metadata records generated at least in part by the at least one background process, wherein generating the burst segment comprises assigning a timestamp to the burst segment which corresponds to a time of creation of the burst segment; and insert the burst segment into a non-root level of the LSM tree structure at a given time based at least in part on the timestamp of the burst segment and a timestamp of at least one other segment in the non-root level.
  16. 16 . The apparatus of claim 15 , wherein the background process comprises at least one of a deduplication process, a data migration process, and a data replication process.
  17. 17 . The apparatus of claim 15 , wherein in inserting the burst segment into the non-root level of the LSM tree structure, the data management system is configured to insert the burst segment into the non-root level of the LSM tree structure subsequent to an existing segment in the non-root level such that the existing segment has at least one of (i) a timestamp which is less than the timestamp of the burst segment, and (ii) a timestamp which is greater than the timestamp of the burst segment, but which comprises a set of merged segments that includes at least one segment with a timestamp that is less than the timestamp of the burst segment.
  18. 18 . The apparatus of claim 15 , wherein the data management system is further configured to: perform a merge operation to aggregate metadata records of a set of segments in the non-root level of the LSM tree structure into a new segment; assign a timestamp to the new segment which corresponds to a timestamp of a segment in the set of segments which is deemed to be a newest segment based on the respective timestamps of the segments within the set of segments; and insert the new segment with the assigned timestamp into a next non-root level of the LSM tree structure.
  19. 19 . The apparatus of claim 15 , wherein: each new segment of metadata records inserted into the root level comprises a respective timestamp that corresponds to a time of creation of the new segment; and the new segments in the root level are arranged in time order based on the respective timestamps of the new segments.
  20. 20 . The apparatus of claim 19 , wherein the data management system is further configured to: perform a merge operation to aggregate metadata records of a set of segments in the root level of the LSM tree structure into a new segment; assign a timestamp to the new segment which corresponds to a timestamp of a segment in the set of segments which is deemed to be a newest segment based on the respective timestamps of the segments within the set of segments; and insert the new segment with the assigned timestamp into a next level of the LSM tree structure.

Description

TECHNICAL FIELD This disclosure relates generally to data management techniques and, more particularly, to techniques for utilizing log-structured merge tree structures for data management in, e.g., a storage system. BACKGROUND Log-structured merge (LSM) trees are data structures that are commonly used to manage data in various applications such as database and data storage systems. LSM trees are configured to handle frequent updates to data and to enable efficient data searching. An LSM tree generally includes a root level (referred to herein as Level 0 or L0) which is configured as the point of insertion of new segments of data records into the LSM tree. The segments of data records in the LSM tree are eventually “merged” into larger segments of data records in lower levels (i.e., non-root levels) of the LSM tree using, e.g., merge-sort algorithms. In circumstances where a significantly large amount of data records (e.g., burst of updates) needs to be bulk inserted into the LSM tree at once, in addition to the regular flow of data record insertion into L0, such bulk insertion of data records can result in resource contention of the L0 insertion resources, and deny resources from the regular flow of updates, thereby leading to degraded system performance. SUMMARY Exemplary embodiments of the disclosure include techniques for managing LSM tree structures and, in particular, to techniques to enable bulk insertion of data into LSM tree structures. For example, an exemplary embodiment includes a method that is performed by a data management system. The data management system receives a plurality of data records generated at least in part by at least one background process, and generates a burst segment which comprises at least a portion of the plurality of data records. The data management system inserts the burst segment into a non-root level of a LSM tree structure. In other exemplary embodiments, the data management system assigns a timestamp to the burst segment which corresponds to a time of creation of the burst segment. The data management system inserts the burst segment into the non-root level of the LSM tree structure subsequent to an existing segment in the non-root level such that the existing segment has at least one of (i) a timestamp which is less than the timestamp of the burst segment, and (ii) a timestamp which is greater than the timestamp of the burst segment, but which comprises a set of merged segments that includes at least one segment with a timestamp that is less than the timestamp of the burst segment. Other embodiments of the disclosure include, without limitation, systems and articles of manufacture comprising processor-readable storage media, which are configured for managing metadata of a storage system. BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 schematically illustrates a network computing system comprising a data storage system which implements LSM tree structures for metadata management, according to an exemplary embodiment of the disclosure. FIG. 2 schematically illustrates a storage node which implements an LSM tree structure for metadata management, according to an exemplary embodiment of the disclosure. FIG. 3 schematically illustrates a process for utilizing multiple data structures including an LSM tree structure for managing metadata in a storage system, according to an exemplary embodiment of the disclosure. FIG. 4 schematically illustrates an exemplary LSM tree structure which is configured to enable bulk insertion of data, according to an exemplary embodiment of the disclosure. FIG. 5 schematically illustrates a method for bulk insertion of data into an LSM tree structure, according to an exemplary embodiment of the disclosure. FIG. 6 illustrates a flow diagram of a method for bulk insertion of data into an LSM tree structure, according to an exemplary embodiment of the disclosure. FIG. 7 schematically illustrates a framework of a server node for hosting a storage node, according to an exemplary embodiment of the disclosure. DETAILED DESCRIPTION Exemplary embodiments of the disclosure will now be discussed in further detail with regard to techniques for managing LSM tree structures and, in particular, to systems and methods to enable bulk insertion of data into LSM tree structures. As noted above, an issue with the use of LSM tree structures arises in circumstances where a significantly large amount of data records (e.g., burst of data updates) needs to be bulk inserted into an LSM tree at once, in conjunction with the regular insertion flow of data records into the L0 level. Consequently, such bulk data insertion can result in resource contention of the L0 insertion resources, and deny resources from the regular flow of updates, thereby leading to degraded system performance. For example, in the context of exemplary data storage systems as discussed herein where an LSM tree is utilized to manage storage metadata associated with user input/output (I/O) requests, the re