US-12627733-B2 - Techniques for coordinating parallel performance and cancellation of commands in a storage cluster system

US12627733B2US 12627733 B2US12627733 B2US 12627733B2US-12627733-B2

Abstract

Various embodiments are directed to techniques for coordinating at least partially parallel performance and cancellation of data access commands between nodes of a storage cluster system. An apparatus may include a processor component of a first node coupled to a first storage device storing client device data; an access component to perform replica data access commands of replica command sets on the client device data, each replica command set assigned a set ID; a communications component to analyze a set ID included in a network packet to determine whether a portion of a replica command set in the network packet is redundant, and to reassemble the replica command set from the portion based if the portion is not redundant; and an ordering component to provide the communications component with set IDs of replica command sets of which the access component has fully performed the set of replica data access commands.

Inventors

Manoj Sundararajan
Paul Yuedong Mu
Paul Ngan

Assignees

NETAPP, INC.

Dates

Publication Date: 20260512
Application Date: 20240429

Claims (20)

1 . A method comprising: receiving data access commands for execution by a first node; identifying a first subset of the data access commands that target a first version of a database; grouping the first subset of the data access commands into a first command set; creating a first replica command set comprising replica data access commands that are replicas of the first subset of the data access commands within the first command set; and replicating the first replica command set to a second node for execution.
2 . The method of claim 1 , comprising: identifying a second subset of the data access commands that target a second version of the database; and grouping the second subset of the data access commands into a second command set.
3 . The method of claim 2 , comprising: creating a second replica command set comprising replica data access commands that are replicas of the second subset of the data access commands within the second command set.
4 . The method of claim 3 , comprising: replicating the second replica command set to the second node for execution.
5 . The method of claim 1 , comprising: assigning sequence identifiers to each of the data access commands within the first command set.
6 . The method of claim 5 , comprising: assigning matching sequence identifiers to each of the replicas data access commands within the first replica command set.
7 . The method of claim 6 , wherein a matching sequence identifier assign to a replica data access command matches a sequence identifier of a data access command for which the replica data access command is a replica.
8 . A non-transitory machine readable medium comprising instructions for performing a method, which when executed by a machine, causes the machine to perform operations comprising: receiving data access commands for execution by a first node; identifying a first subset of the data access commands that target a first version of a database; grouping the first subset of the data access commands into a first command set; creating a first replica command set comprising replica data access commands that are replicas of the first subset of the data access commands within the first command set; and replicating the first replica command set to a second node for execution.
9 . The non-transitory machine readable medium of claim 8 , wherein the operations comprise: identifying a second subset of the data access commands that target a second version of the database; and grouping the second subset of the data access commands into a second command set.
10 . The non-transitory machine readable medium of claim 9 , wherein the operations comprise: creating a second replica command set comprising replica data access commands that are replicas of the second subset of the data access commands within the second command set.
11 . The non-transitory machine readable medium of claim 10 , wherein the operations comprise: replicating the second replica command set to the second node for execution.
12 . The non-transitory machine readable medium of claim 8 , wherein the operations comprise: assigning sequence identifiers to each of the data access commands within the first command set.
13 . The non-transitory machine readable medium of claim 12 , wherein the operations comprise: assigning matching sequence identifiers to each of the replicas data access commands within the first replica command set.
14 . The non-transitory machine readable medium of claim 13 , wherein a matching sequence identifier assign to a replica data access command matches a sequence identifier of a data access command for which the replica data access command is a replica.
15 . A computing device comprising: a memory comprising machine executable code; and a processor coupled to the memory, the processor configured to execute the machine executable code to cause the processor to perform operations comprising: receiving data access commands for execution by a first node; identifying a first subset of the data access commands that target a first version of a database; grouping the first subset of the data access commands into a first command set; creating a first replica command set comprising replica data access commands that are replicas of the first subset of the data access commands within the first command set; and replicating the first replica command set to a second node for execution.
16 . The computing device of claim 15 , wherein the operations comprise: identifying a second subset of the data access commands that target a second version of the database; and grouping the second subset of the data access commands into a second command set.
17 . The computing device of claim 16 , wherein the operations comprise: creating a second replica command set comprising replica data access commands that are replicas of the second subset of the data access commands within the second command set.
18 . The computing device of claim 17 , wherein the operations comprise: replicating the second replica command set to the second node for execution.
19 . The computing device of claim 15 , wherein the operations comprise: assigning sequence identifiers to each of the data access commands within the first command set.
20 . The computing device of claim 19 , wherein the operations comprise: assigning matching sequence identifiers to each of the replicas data access commands within the first replica command set.

Description

RELATED APPLICATIONS This application claims priority to and is a continuation of U.S. patent application Ser. No. 17/989,102, filed on Nov. 17, 2022 and titled “TECHNIQUES FOR COORDINATING PARALLEL PERFORMANCE AND CANCELLATION OF COMMANDS IN A STORAGE CLUSTER SYSTEM,” which claims priority to and is a continuation of U.S. Pat. No. 11,509,718, filed on Jan. 28, 2020 and titled “TECHNIQUES FOR COORDINATING PARALLEL PERFORMANCE AND CANCELLATION OF COMMANDS IN A STORAGE CLUSTER SYSTEM,” which claims priority to and is a continuation of U.S. Pat. No. 10,587,668, filed on Sep. 19, 2014 and titled “TECHNIQUES FOR COORDINATING PARALLEL PERFORMANCE AND CANCELLATION OF COMMANDS IN A STORAGE CLUSTER SYSTEM,” which are incorporated herein by reference. BACKGROUND Remotely accessed storage cluster systems may include multiple interconnected nodes that may be geographically dispersed to perform the storage of client device data in a fault-tolerant manner and to enable the speedy retrieval of that data. Each of such nodes may include multiple interconnected modules, each of which may be specialized to perform a portion of the tasks of storing and retrieving client device data. Distant communications may need to occur on short notice among multiple ones of such nodes to coordinate handling of an error that may arise in the performance of such tasks. Thus, the architectures of such storage cluster systems may be quite complex. In contrast, client devices may not be configured to monitor and/or control aspects of such complex architectures or the complexities of the manner in which they achieve fault tolerance. Client devices may communicate with storage cluster systems using protocols that are not well suited to convey the details of such complexities, and client devices may employ operating systems that provide little flexibility in dealing with delays arising from such complexities. BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 illustrates an example embodiment of a storage cluster system. FIG. 2A illustrates an example embodiment of a pair of high availability groups of a cluster. FIG. 2B illustrates an example embodiment of a pair of high availability groups of different clusters. FIG. 3 illustrates an example embodiment of a HA group of partnered nodes. FIG. 4 illustrates an example embodiment of duplication and storage of metadata within a shared set of storage devices. FIG. 5A illustrates an example embodiment of replication of commands between nodes. FIG. 5B illustrates an example embodiment of relaying responses to replicated commands between nodes. FIG. 6 illustrates an example embodiment of synchronization of commands and metadata among nodes. FIG. 7 illustrates an example embodiment of a mesh of communications sessions among nodes. FIG. 8 illustrates an example embodiment of active nodes of different HA groups exchanging replica data access commands. FIGS. 9A and 9B, together, illustrate an example embodiment of formation and transmission of replica command sets between two active nodes. FIGS. 10A and 10B, together, illustrate an example embodiment of transmission of replica command sets in portions within network packets between two active nodes. FIG. 11 illustrates a logic flow according to an embodiment. FIG. 12 illustrates a logic flow according to an embodiment. FIG. 13 illustrates a logic flow according to an embodiment. FIG. 14 illustrates a logic flow according to an embodiment. FIG. 15 illustrates a processing architecture according to an embodiment. DETAILED DESCRIPTION Various embodiments are generally directed to techniques for coordinating the at least partially parallel performance and cancellation of data access commands between nodes of a storage cluster system. In a storage cluster system, multiple nodes may be grouped into two or more clusters that may each be made up of one or more high availability (HA) groups of nodes. The two or more clusters may be positioned at geographically distant locations and may be coupled via one or more interconnects extending through networks such as the Internet or dedicated leased lines. A single node of a HA group of each cluster may be an active node that communicates with the other(s) via an active communications session to exchange replicas of data access commands to enable at least partially parallel performance of those data access commands to synchronize the state of the client device data between their HA groups. Those active nodes may also exchange cancel commands to enable at least partially parallel performance of cancellation of data access commands to again synchronize the state of the client device data between their HA groups. Further, one of those active nodes may additionally communicate with one or more client devices to receive requests for storage services and to translate those requests into the data access commands. Within each HA group, at least one other node may be an inactive node partnered with the active node and prepared via duplication