DE-102022108668-B4 - Journal groups for a metadata housekeeping process
Abstract
A storage system (100) comprising the following: a processor; a memory (115); and a machine-readable memory that stores instructions, whereby the instructions can be executed by the processor to: to detect a housekeeping operation to perform a set of housekeeping updates on a set of container indexes (220), each container index containing metadata for deduplicated data units; in response to a detection of the housekeeping operation, to identify from the set of container indexes (220) a subset of container indexes that are associated with a particular journal group (120; 310), wherein the particular journal group (120; 310) comprises a plurality of journals (130; 320) to store changes to metadata contained in the subset of container indexes, wherein each journal of the particular journal group (120; 310) is to store changes to metadata contained in an associated container index of the subset of container indexes; and during the housekeeping process, to maintain the specific journal group (120; 310) that was loaded into memory (115) until all of the set of housekeeping updates have been stored in the respective journals (130; 320) of the specific journal group (120; 310).
Inventors
- Richard Phillip MAYO
- Callum Murray
- David Malcolm Falkinder
Assignees
- HEWLETT PACKARD ENTERPRISE DEVELOPMENT LP
Dates
- Publication Date
- 20260513
- Application Date
- 20220410
- Priority Date
- 20220127
Claims (20)
- A memory system (100) comprising: a processor; a memory (115); and a machine-readable memory that stores instructions, wherein the instructions can be executed by the processor to: detect a housekeeping operation to perform a set of housekeeping updates on a set of container indexes (220), each container index comprising metadata for deduplicated data units; in response to a detection of the housekeeping operation, to identify from the set of container indexes (220) a subset of container indexes that are associated with a particular journal group (120; 310), wherein the particular journal group (120; 310) comprises a plurality of journals (130; 320) to store changes to metadata contained in the subset of container indexes, with each journal of the particular journal group (120; 310) being intended to store changes to metadata contained in an associated container index of the subset of container indexes; and during the housekeeping operation, to maintain the particular journal group (120; 310) that was loaded into memory (115) until all of the set of housekeeping updates have been stored in the respective journals (130; 320) of the particular journal group (120; 310).
- The storage system according to Claim 1 , wherein the housekeeping operation consists of updating reference counts contained in the set of container indices (220) based on a stack of manifests (150), and wherein the set of container indices (220) is referenced by the stack of manifests (150).
- The storage system (100) according to Claim 1 , which includes instructions that can be executed by the processor to: load the set of container indices (220) into memory (115); determine a plurality of reference count changes associated with the set of container indices (220); and generate a work summary (180) based on the plurality of reference count changes.
- The storage system (100) according to Claim 1 , which includes instructions that can be executed by the processor to: identify a plurality of journal groups (120; 310) associated with the set of container indexes (220) using a stored lookup table (186); and sort the set of container indexes (220) in a sorted order according to the associated journal groups (120; 310).
- The storage system (100) according to Claim 4 , which includes instructions that can be executed by the processor to: for each journal group of the plurality of journal groups (120; 310) a number of container indices to determine which are assigned to the journal group.
- The storage system (100) according to Claim 5 , which includes instructions that can be executed by the processor to: select a specific container index according to the sorted order, the specific container index being associated with the specific journal group (120; 310); and in response to a determination that the specific container index is an initial container index to be processed for the specific journal group (120; 310): load the specific journal group (120; 310) into memory (115); and set a countdown counter equal to the number of container indexes associated with the specific journal group (120; 310).
- The storage system (100) according to Claim 6 , which includes instructions that can be executed by the processor to: update the specific journal group (120; 310) to store reference count changes associated with the specific container index; and decrement the countdown counter (182) in response to a determination that the specific journal group (120; 310) contains all reference count changes associated with the specific container index.
- The storage system (100) according to Claim 7 , which includes instructions that can be executed by the processor to: in response to a determination that the countdown counter (182) has reached zero: write the specified journal group (120; 310) to persistent memory (140); and remove the specified journal group (120; 310) from memory (115).
- A method (800) comprising: detecting (810) by a storage controller (110) of a deduplication storage system, a housekeeping operation to perform a set of housekeeping updates on a set of container indexes (220), each container index (220) containing metadata for deduplicated data units; in response to a detection of the housekeeping operation, identifying (820) by the storage controller (110) from the set of container indexes (220) a subset of container indexes associated with a particular journal group, the particular journal group comprising a plurality of journals (130; 320) to store changes of metadata contained in the subset of container indexes, each journal of the particular journal group (120; 310) being intended to store changes of metadata contained in an associated container index of the subset of container indexes; and during the housekeeping process, maintain (830) by the memory controller (110) the specified journal group (120; 310) which was loaded into a memory (115) until all of the set of housekeeping updates have been stored in the respective journals (130; 320) of the specified journal group.
- The procedure (800) according to Claim 9 , wherein the housekeeping operation consists of updating reference counts contained in the set of container indices (220) based on a stack of manifests (150), and wherein the set of container indices (220) is referenced by the stack of manifests (150).
- The procedure (800) according to Claim 9 , further comprising: loading the set of container indexes (220) into memory (115); determining a plurality of reference count changes associated with the set of container indexes (220); and generating a work summary (180) based on the plurality of reference count changes.
- The procedure (800) according to Claim 9 , furthermore, comprising: identifying a plurality of journal groups associated with the set of container indexes (220) using a stored lookup table (186); and sorting the set of container indexes (220) in a specific order according to the associated journal groups (120; 310).
- The procedure (800) according to Claim 12 , further comprising: for each journal group of the plurality of journal groups (120; 310), determining a number of container indexes to be assigned to the journal group; selecting a particular container index according to the sorted order, the particular container index being assigned to the particular journal group (120; 310); and determining whether the particular container index is an initial container index to be processed for the particular journal group (120; 310); and in response to a determination that the particular container index is the initial container index to be processed for the particular journal group: loading the particular journal group (120; 310) into memory (115); and Setting a countdown counter (182) corresponding to the number of container indices assigned to the specified journal group (120; 310).
- The procedure (800) according to Claim 13 , furthermore, comprehensively: updating the specific journal group (120; 310) to include reference count changes associated with the specific container index; and decrementing the countdown counter (182) in response to a determination that the specific journal group (120; 310) contains all reference count changes associated with the specific container index.
- The procedure (800) according to Claim 14 , furthermore comprehensively: Determine whether the countdown counter (182) has reached zero; and in response to a determination that the countdown counter (182) has reached zero: Write the specified journal group (120; 310) to a persistent memory (140); and delete the specified journal group (120; 310) from the memory (115).
- A non-transitory machine-readable medium (600) that stores instructions which, when executed, cause a processor to: detect a housekeeping operation (610) in order to perform a set of housekeeping updates on a set of container indexes (220), each container index (220) containing metadata for deduplicated data units; in response to a detection of the housekeeping operation, identify (620) from the set of container indexes (220) a subset of container indexes that are associated with a particular journal group (120; 310), the particular journal group (120; 310) comprising a plurality of journals (130; 320) to store changes to metadata contained in the subset of container indexes, each journal of the particular journal group (120; 310) being intended to store changes to metadata contained in an associated container index of the subset of container indexes; and during the housekeeping process, to maintain (630) the specific journal group (120; 310) that has been loaded into a memory (115) until all of the set of housekeeping updates have been stored in the respective journals (130; 320) of the specific journal group (120; 310).
- The non-transitory machine-readable medium (600) according to Claim 16 , wherein the housekeeping operation consists of updating reference counts contained in the set of container indices (220) based on a stack of manifests (150), and wherein the set of container indices (220) is referenced by the stack of manifests (150).
- The non-transitory machine-readable medium (600) according to Claim 16 , which includes instructions which, when executed, cause the processor to: load the set of container indexes (220) into memory (115); determine a plurality of reference count changes associated with the set of container indexes (220); generate a work summary (180) based on the plurality of reference count changes; using a stored lookup table (186), identify a plurality of journal groups (120; 310) associated with the set of container indexes (220); and sort the set of container indexes (220) in a sorted order according to the associated journal groups (120; 310).
- The non-transitory machine-readable medium (600) according to Claim 18 , which includes instructions which, when executed, cause the processor to: determine, for each journal group of the plurality of journal groups (120; 310), a number of container indexes to be assigned to the journal group; select a specific container index according to the sorted order, the specific container index being assigned to the specific journal group; and, in response to a determination that the specific container index is an initial container index to be processed for the specific journal group: load the specific journal group into memory (115); and set a countdown counter (182) equal to the number of container indexes assigned to the specific journal group (120; 310).
- The non-transitory machine-readable medium according to Claim 19 , which includes instructions that, when executed, cause the processor to: update the specified journal group (120; 310) to store changes in the reference count changes associated with the specified container index; decrement the countdown counter (182) in response to the determination that the specified journal group (120; 310) has stored all the entries for the specified container index contains reference count changes associated with tainerindex; and in response to a determination that the countdown counter (182) has reached zero: to write the specified journal group (120; 310) to persistent memory (140); and to remove the specified journal group (120; 310) from memory (115).
Description
background Data reduction techniques can be used to decrease the amount of data stored in a storage system. One example of a data reduction method is data deduplication. Data deduplication identifies duplicate data units and attempts to reduce or eliminate the number of duplicate data units stored in the storage system. US 2019 / 0 121 705 A1 This describes how, in response to an event in the deduplication system, a system accesses the element metadata of a backup element stored in a remote object storage system. The backup element's element metadata contains scope information, which specifies a range of identifier values for sub-objects of the backup element stored in the remote object storage system. Based on this scope information, the system makes queries to retrieve the respective attribute information of the sub-objects of the backup element stored in the remote object storage system. Using this attribute information, the system determines a name for a specific sub-object of the backup element that has already been used. US 8 712 970 B1 Describes a data management method in which a real-time history of a database system is stored as a logical representation, and this logical representation is then used for every point-in-time recovery of the database system. In the US 8 712 970 B1 This is a method for capturing transaction data, binary data changes, metadata, and events, and for tracking a real-time history of a database system according to these events. The method enables the tracking and storage of consistent checkpoint images of the database system and also allows for the tracking of transaction activity between checkpoints. The database system can be restored to any consistent checkpoint or to any point between two checkpoints. Brief description A storage system according to claims 1 to 8, a method according to claims 9 to 15 and a non-transitory machine-readable medium according to claims 16 to 20 is disclosed. Brief description of the drawings Some embodiments are described with reference to the following illustrations. is a schematic diagram of an exemplary storage system according to some embodiments. This is a representation of example data structures in accordance with some implementations. This is a representation of example data structures in accordance with some implementations. This is an illustration of a sample process in accordance with some implementations. These are illustrations of example data structures according to some implementations. is a diagram of a machine-readable medium that stores instructions in accordance with some implementations. is a schematic diagram of an example computer device according to some implementations. This is an illustration of a sample process in accordance with some implementations. In the drawings, identical reference numbers denote similar, but not necessarily identical, elements. The illustrations are not necessarily to scale, and the size of some parts may be exaggerated to make the example shown clearer. Furthermore, the drawings contain examples and/or embodiments that correspond to the description; however, the description is not limited to the examples and/or embodiments shown in the drawings. Detailed description In this disclosure, the use of the terms "a," "an," or "the" includes the plural forms unless the context clearly indicates otherwise. Similarly, the terms "include," "including," "comprises," "comprehensive," "contribute," or "indicating," when used in this disclosure, specify the presence of the indicated elements but do not exclude the presence or addition of other elements. In some examples, a storage system can deduplicate data to better suit the needs of the memory. To reduce the storage space required for data, the storage system can perform a deduplication process, dividing a data stream into discrete data units or "chunks." Furthermore, the storage system can identify identifiers or "fingerprints" of incoming data units and determine which incoming data units are duplicates of previously stored data units. In the case of duplicate data units, the storage system can store references to the previous data units instead of storing the duplicate incoming data units themselves. As used here, the term "fingerprint" refers to a value derived by applying a function to the contents of the data unit (where "content" can be all or a subset of the data unit's contents). An example of a function that can be applied is a hash function, which produces a hash value based on the incoming data unit. Examples of hash functions include cryptographic hash functions such as those of Secure Hash Algorithm 2 (SHA-2), e.g., SHA-224, SHA-256, SHA-384, etc. Other examples may use other types of hash functions or other types of fingerprint functions. A "storage system" can comprise a storage device or an array of storage devices. A storage system can also include one or more storage controllers that manage access to the storage device(s). A "data unit