EP-4736021-A1 - DYNAMICALLY RESIZING ROOT NODES OF HIERARCHICAL DATA STRUCTURES
Abstract
A processing system dynamically adjusts the size of root nodes in hierarchical data structures, such as B+ trees. Upon creation, a root node has a predefined initial size. When a request is received to insert a new record, a determination is made as to whether the size of the root node has reached a predefined maximum size. If the root node has not reached the predefined maximum size, the size of the root node is increased to accommodate the new record and the new record is added to the root node. If the root node has reached the predefined maximum size, one or more new child nodes are created, and records are moved from the root node to the new child nodes. The size of the root node is then reduced to a predefined minimum size.
Inventors
- CAREY, OMAR
- DAS, RAJSEKHAR
Assignees
- Microsoft Technology Licensing, LLC
Dates
- Publication Date
- 20260506
- Application Date
- 20240619
Claims (15)
- 1. A computer-implemented method, comprising: creating a hierarchical data structure representing at least a portion of a filesystem for a storage volume maintained by a processing system, the hierarchical data structure comprising a root node of a predefined initial size; receiving a request to insert a new record into the hierarchical data structure; determining that a current size of the root node is a not a predefined maximum size; and responsive to determining that the current size of the root node is not the predefined maximum size, increasing the cunent size of the root node to accommodate the new record and adding the new record to the root node.
- 2. The computer-implemented method of claim 1, wherein the hierarchical data structure comprises a B+ tree.
- 3. The computer-implemented method of claim 2, wherein the B+ tree defines a streams run table defining a mapping between virtual cluster numbers and logical cluster numbers associated with the storage volume of the processing system.
- 4. The computer-implemented method of claim 3, wherein the B+ tree is embedded in a file table of the filesystem.
- 5. The computer-implemented method of claim 1, wherein the hierarchical data structure comprises a file table storing metadata for files in a directory of the filesystem, and wherein the hierarchical data structure is embedded in a directory table of the filesystem.
- 6. A computer-readable storage medium having computer-executable instructions stored thereupon which, when executed by a processing system, cause the processing system to: store a hierarchical data structure representing at least a portion of a filesystem for a storage volume maintained by the processing system, the hierarchical data structure comprising at least a root node of a predefined initial size; determine that a current size of the root node of the hierarchical data structure is a predefined maximum size; and responsive to determining that the current size of the root node of the hierarchical data structure is the predefined maximum size, create a new node in the hierarchical data structure, move a record from the root node of the hierarchical data structure to the new node, and reduce the current size of the root node of the hierarchical data structure to a predefined minimum size.
- 7. The computer-readable storage medium of claim 6, wherein the hierarchical data structure comprises a B+ tree.
- 8. The computer-readable storage medium of claim 7, wherein the B+ tree comprises a streams run table defining a mapping between virtual cluster numbers and logical cluster numbers associated with the storage volume of the processing system.
- 9. The computer-readable storage medium of claim 8, wherein the B+ tree is embedded in a file table of the filesystem.
- 10. The computer-readable storage medium of claim 7, wherein the B+ tree comprises a file table storing metadata for files in a directory of the filesystem, and wherein the B+ tree is embedded in a directory table of the filesystem.
- 11. A processing system, comprising: a processor; and a computer-readable storage medium having computer-executable instructions stored thereupon that, when executed by the processing system, cause the processing system to: store a hierarchical data structure corresponding to a filesystem, the hierarchical data structure comprising at least a root node of a predefined initial size; receive a request to insert a new record into the hierarchical data structure; responsive to receiving the request, determine that a current size of the root node is a predefined maximum size; and responsive to determining that the current size of the root node is not the predefined maximum size, increase the current size of the root node to accommodate the new record and store the new record in the root node.
- 12. The processing system of claim 11, wherein the hierarchical data structure comprises a B+ tree.
- 13. The processing system of claim 12, wherein the B+ tree comprises a streams run table embedded in a file table of the filesystem, the streams run table defining a mapping between virtual cluster numbers and logical cluster numbers associated with a storage volume of the processing system.
- 14. The processing system of claim 12, wherein the B+ tree comprises a file table embedded in a directory table of the filesystem, the file table storing metadata for files in a directory of the filesystem.
- 15. The processing system of claim 11 , wherein the new record comprises a leaf record.
Description
DYNAMICALLY RESIZING ROOT NODES OF HIERARCHICAL DATA STRUCTURES BACKGROUND [0001] B+ trees are self-balancing hierarchical data structures with a variable but often large number of children per node. B+ trees frequently have a relatively high branching factor and, as a result, also have a relatively high fanout (i.e., the number of pointers to child nodes from a root or internal node) when compared to other types of tree data structures, such as binary search trees. In general, a tree data structure with a higher fanout has fewer internal nodes and, as a result, requires fewer node traversals to locate a leaf node storing a desired record than a tree data structure having a lower fanout. This property of B+ trees makes them highly suitable for storing data requiring efficient retrieval, such as filesystem metadata. [0002] One disadvantage of using B+ trees to store filesystem metadata results from the fact that the root node of a B+ tree is mostly empty after a push operation moves records from the root node. For example, 16K of space may be initially allocated for a root node in a B+ tree. When the root node reaches its allocated capacity, a push operation occurs that creates two child nodes and moves records from the root node into the new child nodes. Thereafter, each of the new child nodes will contain approximately 8K of data from the root node. The root node, however, will contain only keys and pointers that reference the new child nodes even though 16K of space was previously allocated to it. This results in nearly 16K of allocated but unused storage in the root node. This problem is exacerbated on storage volumes created by filesystems that cause B+ trees to frequently perform push operations. SUMMARY [0003] Technologies are disclosed herein for dynamically resizing root nodes of hierarchical data structures, such as B+ trees. Through implementations of the disclosed technologies, root nodes of hierarchical data structures, such as B+ trees, that are utilized to store filesystem metadata can be dynamically resized in order to reduce the amount of allocated but unused space in the root nodes. For example, the size of a root node of a B+ tree can be dynamically reduced to a minimum size following a push operation, thereby reducing the amount of allocated but unused storage consumed by the root node. Dynamically resizing root nodes of B+ trees in the disclosed manner can also speed up retrieval operations by improving the locality of reference for records stored in the root node of a B+ tree. Other technical benefits not specifically mentioned herein might also be realized through implementations of the disclosed subject matter. [0004] In order to provide aspects of the functionality disclosed herein, a processing system maintains a filesystem for a storage volume. The filesystem utilizes B+ trees to store filesystem metadata for the storage volume. For example, in an embodiment, the filesystem utilizes B+ trees to store streams run tables that specify a mapping between virtual cluster numbers (“VCNs”) and logical cluster numbers (“LCNs”) for files stored on the storage volume. In this embodiment, the filesystem also utilizes B+ trees that define file tables that store metadata for the files on the storage volume. The B+ trees defining the file tables are embedded in directory tables and the B+ trees defining the streams run tables are embedded in file tables, according to embodiments. [0005] When a B+ tree utilized by the filesystem is created, a root node of the B+ tree is created that has a predefined initial size. When a request is received to insert a new record into the B+ tree, a determination is made as to whether the root node has reached a predefined maximum size. If the size of the root node has not yet reached the predefined maximum size, the size of the root node is increased to accommodate the new record. The new record is then added to the root node. In this manner, the root node grows dynamically as new records are added until it reaches the predefined maximum size. [0006] If the root node of the B+ tree has reached the predefined maximum size when a request to add a new record is received, one or more new child nodes (e.g., internal nodes or leaf nodes) of the root node are created in the B+ tree and records are moved from the root node to the newly created child nodes. The size of the root node of the B+ tree is then reduced to a predefined minimum size, for example the minimum size required to store the new record. In this manner, the size of the root node can be dynamically reduced to eliminate allocated but unused space in the root node following a push operation. [0007] The above-described subject matter is implemented as a computer-controlled apparatus, a computer-implemented method, a processing system, or as an article of manufacture such as a computer readable medium in various embodiments disclosed herein. These and various other features will be apparent from a reading of