Search

US-12619582-B1 - Managing file system transaction dependencies

US12619582B1US 12619582 B1US12619582 B1US 12619582B1US-12619582-B1

Abstract

Embodiments manage file system transaction dependencies in distributed file systems. A transaction log containing log entries is obtained, where each log entry represents a transaction with file system operations, execution times, and associated inodes. Dependency graphs are constructed based on shared inode references between log entries. Each inode stores the last modified transaction log key, which serves as a root into the dependency graph of all dependent transactions. Depth-first search traversal identifies leaf log entries ready for application. These leaf entries are communicated to target file systems for replay. Independent subgraphs are identified and processed in parallel to maximize throughput. In-flight windows with bounded capacity prevent deadlock conditions by ensuring dependencies fit within available buffer space. Log entries from multiple subgraphs may be combined into single communication messages to optimize network efficiency across geographically distributed file system infrastructure.

Inventors

  • Thomas Gregory Rothschilds
  • Graham Edwin Ellis
  • Aaron James Passey

Assignees

  • Qumulo, Inc.

Dates

Publication Date
20260505
Application Date
20251202

Claims (20)

  1. 1 . A method for managing data in a file system over a network using one or more processors to execute instructions that are configured to cause performance of actions, comprising: obtaining a transaction log that includes a plurality of log entries, wherein each log entry is associated with a transaction that includes one or more file system operations, and wherein each log entry corresponds to an execution time and one or more inodes associated with its associated transaction; collecting one or more dependency graphs based on one or more of the plurality of log entries, wherein each log entry in a dependency graph shares at least one inode with at least one adjacent log entry in the same dependency graph, and wherein a root log entry of each dependency graph is associated with latest execution time in the dependency graph based on a highest log entry key value among log entries included in the dependency graph; collecting one or more leaf log entries based on one or more traversals of the one or more graphs, wherein a number of log entries are visited in each dependency graph until a traversal threshold is reached, and wherein a continuation key associated with a continuation log entry in each dependency graph is collected for a first unvisited log entry following visitation of one or more log entries during the continuing traversal of the dependency graph; communicating the one or more leaf log entries to a target file system, wherein the target file system applies each communicated transaction associated with the communicated leaf log entries to execute each file system operation associated with each communicated transaction on the target file system; and obtaining a user interface that includes one or more display panels for content that includes a transaction log performance metrics and other information associated with the file system, wherein the content is dynamically transformed and arranged for display to a user based on one or more of user interaction telemetry, user feedback or a metric.
  2. 2 . The method of claim 1 , wherein the traversal of the one or more dependency graphs, further comprises: suspending the traversal at each continuation log entry; and resuming the traversal from each continuation log entry based on the continuation key, wherein the continuation key enables traversal of each dependency graph.
  3. 3 . The method of claim 1 , further comprising: collecting one or more first log entries from a first dependency graph of the one or more dependency graphs; collecting one or more second log entries from a second dependency graph of the one or more dependency graphs, wherein the first dependency graph and the second dependency graph are independent based on an intersection of log entry keys in the first dependency graph and the second dependency graph being empty; and communicating the one or more first log entries and the one or more second log entries to the target file system in parallel, wherein the target file system applies one or more transactions associated with the one or more first log entries and one or more other transactions associated with the one or more second log entries in parallel.
  4. 4 . The method of claim 1 , further comprising: collecting one or more read operations for a new transaction based on one or more first inodes that are accessed without modification during the new transaction; and collecting one or more write operations for the other new transaction based on one or more second inodes that are modified during the new transaction, wherein the one or more read operations and the one or more write operations are included in a log entry associated with the new transaction.
  5. 5 . The method of claim 1 , wherein collecting the one or more leaf log entries, further comprises: obtaining an in-flight window that includes a bounded capacity for tracking one or more in-flight log entries pending communication to the target file system; collecting one or more candidate log entries for communication to the target file system; and using the in-flight window to perform further actions, including: collecting one or more dependency log entries associated with each candidate log entry; and adding the one or more candidate log entries to the in-flight window based on the capacity of the in-flight window accommodating the one or more candidate log entries and the one or more dependency log entries to avoid one or more deadlock conditions.
  6. 6 . The method of claim 1 , further comprising: collecting a first group of the one or more log entries from a first dependency graph of the one or more dependency graphs; collecting a second group of the one or more log entries from a second dependency graph of the one or more dependency graphs; and combining the first group of the one or more log entries and the second group of the one or more log entries into a same communication message, wherein the same communication message is communicated to the target file system.
  7. 7 . The method of claim 1 , further comprising: updating one or more inodes associated with each log entry to include an associated key value based on storage of the log entry in the transaction log, wherein the associated key value is used to selectively collect one or more log entries associated with a particular inode.
  8. 8 . A network computer for managing data in a file system over a network, comprising: a memory that stores at least instructions; and one or more processors that execute instructions that are configured to cause performance of actions, including: obtaining a transaction log that includes a plurality of log entries, wherein each log entry is associated with a transaction that includes one or more file system operations, and wherein each log entry corresponds to an execution time and one or more inodes associated with its associated transaction; collecting one or more dependency graphs based on one or more of the plurality of log entries, wherein each log entry in a dependency graph shares at least one inode with at least one adjacent log entry in the same dependency graph, and wherein a root log entry of each dependency graph is associated with latest execution time in the dependency graph based on a highest log entry key value among log entries included in the dependency graph; collecting one or more leaf log entries based on one or more traversals of the one or more graphs, wherein a number of log entries are visited in each dependency graph until a traversal threshold is reached, and wherein a continuation key associated with a continuation log entry in each dependency graph is collected for a first unvisited log entry following visitation of one or more log entries during the continuing traversal of the dependency graph; communicating the one or more leaf log entries to a target file system, wherein the target file system applies each communicated transaction associated with the communicated leaf log entries to execute each file system operation associated with each communicated transaction on the target file system; and obtaining a user interface that includes one or more display panels for content that includes a transaction log performance metrics and other information associated with the file system, wherein the content is dynamically transformed and arranged for display to a user based on one or more of user interaction telemetry, user feedback or a metric.
  9. 9 . The network computer of claim 8 , wherein the traversal of the one or more dependency graphs, further comprises: suspending the traversal at each continuation log entry; and resuming the traversal from each continuation log entry based on the continuation key, wherein the continuation key enables traversal of each dependency graph.
  10. 10 . The network computer of claim 8 , further comprising: collecting one or more first log entries from a first dependency graph of the one or more dependency graphs; collecting one or more second log entries from a second dependency graph of the one or more dependency graphs, wherein the first dependency graph and the second dependency graph are independent based on an intersection of log entry keys in the first dependency graph and the second dependency graph being empty; and communicating the one or more first log entries and the one or more second log entries to the target file system in parallel, wherein the target file system applies one or more transactions associated with the one or more first log entries and one or more other transactions associated with the one or more second log entries in parallel.
  11. 11 . The network computer of claim 8 , further comprising: collecting one or more read operations for a new transaction based on one or more first inodes that are accessed without modification during the new transaction; and collecting one or more write operations for the new transaction based on one or more second inodes that are modified during the new transaction, wherein the one or more read operations and the one or more write operations are included in a log entry associated with the new transaction.
  12. 12 . The network computer of claim 8 , wherein collecting the one or more leaf log entries, further comprises: obtaining an in-flight window that includes a bounded capacity for tracking one or more in-flight log entries pending communication to the target file system; collecting one or more candidate log entries for communication to the target file system; and using the in-flight window to perform further actions, including: collecting one or more dependency log entries associated with each candidate log entry; and adding the one or more candidate log entries to the in-flight window based on the capacity of the in-flight window accommodating the one or more candidate log entries and the one or more dependency log entries to avoid one or more deadlock conditions.
  13. 13 . The network computer of claim 8 , further comprising: collecting a first group of the one or more log entries from a first dependency graph of the one or more dependency graphs; collecting a second group of the one or more log entries from a second dependency graph of the one or more dependency graphs; and combining the first group of the one or more log entries and the second group of the one or more log entries into a same communication message, wherein the same communication message is communicated to the target file system.
  14. 14 . The network computer of claim 8 , further comprising: updating one or more inodes associated with each log entry to include an associated key value based on storage of the log entry in the transaction log, wherein the associated key value is used to selectively collect one or more log entries associated with a particular inode.
  15. 15 . A processor readable non-transitory storage media that includes instructions for managing data in a file system over a network, wherein execution of the instructions by one or more processors on one or more network computers performs actions, comprising: obtaining a transaction log that includes a plurality of log entries, wherein each log entry is associated with a transaction that includes one or more file system operations, and wherein each log entry corresponds to an execution time and one or more inodes associated with its associated transaction; collecting one or more dependency graphs based on one or more of the plurality of log entries, wherein each log entry in a dependency graph shares at least one inode with at least one adjacent log entry in the same dependency graph, and wherein a root log entry of each dependency graph is associated with latest execution time in the dependency graph based on a highest log entry key value among log entries included in the dependency graph; collecting one or more leaf log entries based on one or more traversals of the one or more graphs, wherein a number of log entries are visited in each dependency graph until a traversal threshold is reached, and wherein a continuation key associated with a continuation log entry in each dependency graph is collected for a first unvisited log entry following visitation of one or more log entries during the continuing traversal of the dependency graph; communicating the one or more leaf log entries to a target file system, wherein the target file system applies each communicated transaction associated with the communicated leaf log entries to execute each file system operation associated with each communicated transaction on the target file system; and obtaining a user interface that includes one or more display panels for content that includes a transaction log performance metrics and other information associated with the file system, wherein the content is dynamically transformed and arranged for display to a user based on one or more of user interaction telemetry, user feedback or a metric.
  16. 16 . The processor readable non-transitory storage media of claim 15 , wherein the traversal of the one or more dependency graphs, further comprises: suspending the traversal at each continuation log entry; and resuming the traversal from each continuation log entry based on the continuation key, wherein the continuation key enables traversal of each dependency graph.
  17. 17 . The processor readable non-transitory storage media of claim 15 , further comprising: collecting one or more first log entries from a first dependency graph of the one or more dependency graphs; collecting one or more second log entries from a second dependency graph of the one or more dependency graphs, wherein the first dependency graph and the second dependency graph are independent based on an intersection of log entry keys in the first dependency graph and the second dependency graph being empty; and communicating the one or more first log entries and the one or more second log entries to the target file system in parallel, wherein the target file system applies one or more transactions associated with the one or more first log entries and one or more other transactions associated with the one or more second log entries in parallel.
  18. 18 . The processor readable non-transitory storage media of claim 15 , further comprising: collecting one or more read operations for a new transaction based on one or more first inodes that are accessed without modification during the new transaction; and collecting one or more write operations for the new transaction based on one or more second inodes that are modified during the new transaction, wherein the one or more read operations and the one or more write operations are included in a log entry associated with the new transaction.
  19. 19 . The processor readable non-transitory storage media of claim 15 , wherein collecting the one or more leaf log entries, further comprises: obtaining an in-flight window that includes a bounded capacity for tracking one or more in-flight log entries pending communication to the target file system; collecting one or more candidate log entries for communication to the target file system; and using the in-flight window to perform further actions, including: collecting one or more dependency log entries associated with each candidate log entry; and adding the one or more candidate log entries to the in-flight window based on the capacity of the in-flight window accommodating the one or more candidate log entries and the one or more dependency log entries to avoid one or more deadlock conditions.
  20. 20 . The processor readable non-transitory storage media of claim 15 , further comprising: collecting a first group of the one or more log entries from a first dependency graph of the one or more dependency graphs; collecting a second group of the one or more log entries from a second dependency graph of the one or more dependency graphs; and combining the first group of the one or more log entries and the second group of the one or more log entries into a same communication message, wherein the same communication message is communicated to the target file system.

Description

TECHNICAL FIELD The present invention relates generally to file systems, and more particularly, but not exclusively, to managing file system transaction dependencies. BACKGROUND Modern computing often requires the collection, processing, or storage of very large data sets or file systems. Accordingly, to accommodate the capacity requirements as well as other requirements, such as, high availability, redundancy, latency/access considerations, or the like, modern file systems may be very large or distributed across multiple hosts, networks, or data centers, and so on. Further, reliable or highly-available file systems may be expected to perform various actions to operate, recover from errors, perform backups, rebalancing data, or the like, all the while maintaining file system consistency across the entirety of the file system. Further, more recently organizations are increasingly relying on distributed resources, including distributed/work-from-home employees, geographically distant work centers, geographically distant data centers, and so on. Often these distant/separate resources need to share data. Using a central file system may enable some shared access across far distances, however, in many data intensive workflows, relying on distantly located file systems may have various disadvantages, including poor responsiveness, redundant data copying, dependence on unreliable long distance communication, connectivity, excess network resource consumption, or the like. Thus, it is with respect to these considerations and others that the present invention has been made. BRIEF DESCRIPTION OF THE DRAWINGS Non-limiting and non-exhaustive embodiments of the present innovations are described with reference to the following drawings. In the drawings, like reference numerals refer to like parts throughout the various figures unless otherwise specified. For a better understanding of the described innovations, reference will be made to the following Detailed Description of Various Embodiments, which is to be read in association with the accompanying drawings, wherein: FIG. 1 illustrates a system environment in which various embodiments may be implemented; FIG. 2 illustrates a schematic embodiment of a client computer; FIG. 3 illustrates a schematic embodiment of a network computer; FIG. 4 illustrates a logical architecture of a system for managing file system transaction dependencies in accordance with one or more of the various embodiments; FIG. 5 illustrates a logical schematic of a file system for managing file system transaction dependencies in accordance with one or more of the various embodiments; FIG. 6 illustrates a logical schematic of a distributed file system for managing file system transaction dependencies in accordance with one or more of the various embodiments; FIG. 7 illustrates a logical schematic for a system for managing file system transaction dependencies in accordance with one or more of the various embodiments; FIG. 8 illustrates a logical schematic for a system for managing file system transaction dependencies in accordance with one or more of the various embodiments; FIG. 9 illustrates a logical schematic of a system for managing file system transaction dependencies in accordance with one or more of the various embodiments; FIG. 10 illustrates a logical schematic of a transaction log for managing file system transaction dependencies in accordance with one or more of the various embodiments; FIG. 11 illustrates a logical schematic of a system for collecting and applying telemetry information and telemetry metrics for file system administration and managing file system transaction dependencies in accordance with one or more of the various embodiments; FIG. 12 illustrates an overview flowchart for a process for managing file system transaction dependencies in accordance with one or more of the various embodiments; FIG. 13 illustrates a flowchart for a process for managing file system transaction dependencies in accordance with one or more of the various embodiments; FIG. 14 illustrates a flowchart for a process for managing file system transaction dependencies in accordance with one or more of the various embodiments; FIG. 15 illustrates a flowchart for a process for managing file system transaction dependencies in accordance with one or more of the various embodiments; FIG. 16 illustrates a flowchart for a process for managing file system transaction dependencies in accordance with one or more of the various embodiments; FIG. 17 illustrates a flowchart for a process for managing file system transaction dependencies in accordance with one or more of the various embodiments; and FIG. 18 illustrates a flowchart for a process for collecting and applying telemetry information and telemetry metrics for managing file system transaction dependencies in accordance with one or more of the various embodiments. DETAILED DESCRIPTION OF VARIOUS EMBODIMENTS Various embodiments now will be described more fully here