CN-115398415-B - Method, device and system for writer preselection in distributed data system
Abstract
The metadata service maintains and updates copies of namespaces of the distributed and replicated file systems and coordinates updating of the data by generating an ordered set of protocols corresponding to the received proposal, the ordered set of protocols specifying an order in which the nodes modify the data stored in the data nodes and cause corresponding changes in states of the namespaces. For each protocol in the generated ordered set of protocols, a respective writer list may be provided that includes an ordered list of nodes to execute the protocol and make a respective change to the namespace. The ordered set of protocols may then be sent to the plurality of nodes along with a corresponding writer list or pre-generated index thereof for each protocol in the ordered set of protocols, and each node in the plurality of nodes may be configured to execute only the protocol of the first listed node on the received writer list.
Inventors
- Y. Allard
- M. Du contest
- N. Aksa
- R. Turimera
- C mcgee
Assignees
- 万迪斯科股份有限公司
Dates
- Publication Date
- 20260508
- Application Date
- 20210519
- Priority Date
- 20200821
Claims (19)
- 1. A computer-implemented method, comprising: receiving a proposal for modifying data stored in a distributed and replicated file system coupled to a network, the distributed and replicated data system comprising a plurality of nodes, each node comprising a server; configuring a metadata service to maintain copies of namespaces of distributed and replicated file systems; Coordinating the updating of the data by generating an ordered set of protocols corresponding to the received proposal, the ordered set of protocols specifying an order in which nodes modify the data stored in the data nodes and cause changes in the state of the namespaces, wherein each node is configured to delay making changes to the data stored in the data nodes and cause changes in the state of the namespaces until the ordered set of protocols is received; for each protocol in the generated ordered set of protocols, providing or identifying an index of a respective writer list comprising an ordered list of nodes to execute the protocol and make a respective change to the namespace, each writer list comprising an ordered list of preferred operational nodes towards a first end of the writer list and failed or suspected failed nodes towards a second end of the writer list; Transmitting the ordered set of protocols to a plurality of nodes along with a respective writer list or an index of identification of the respective writer list for each protocol in the ordered set of protocols, each node of the plurality of nodes configured to execute only the protocol of the first listed node on the provided or identified index of the respective writer list; When a previously listed node in the writer list has failed or is suspected of failing, the next listed node in the writer list is capable of executing a protocol, and Such that the metadata service updates a copy of the maintained namespace.
- 2. The computer-implemented method of claim 1, wherein providing comprises generating a writer list for at least some of the generated ordered protocol sets.
- 3. The computer-implemented method of claim 1, wherein providing comprises selecting from a plurality of pre-generated writer lists.
- 4. The computer-implemented method of claim 1, wherein providing further comprises, after executing the first proposal, providing the same writer list for the second proposal as the first proposal.
- 5. The computer-implemented method of claim 1, further comprising the predetermined replacement node being capable of executing a protocol when a first listed node in the writer list has failed.
- 6. The computer-implemented method of claim 1, further comprising propagating, by each node that has executed a protocol, information related to the executed protocol to each node of the plurality of nodes.
- 7. The computer-implemented method of claim 6, wherein propagating further comprises guaranteeing delivery of the propagated information.
- 8. The computer-implemented method of claim 1, further comprising updating, as the node executes the protocol, a deterministic state machine with information related to the executed protocol, the deterministic state machine coupled to each of the other nodes and serving as a persistent message service between the plurality of nodes.
- 9. The computer-implemented method of claim 1, further comprising learning a failed or failed node and placing the failed or failed node at the bottom of any generated writer list.
- 10. The computer-implemented method of claim 1, further comprising: Pre-generating an indexed writer list for each of all possible combinations of the order of the plurality of nodes; A pre-generated list of writers of the index is distributed to each of a plurality of nodes, Wherein providing a writer list comprising selecting one of the indices, and Wherein transmitting includes transmitting the ordered set of protocols to the plurality of nodes along with an index for each protocol in the ordered set of protocols, the index pointing to a selected one of the writer lists of the pre-generated index.
- 11. A network of servers configured to implement a distributed file system, the distributed file system comprising: a plurality of data nodes, each data node configured to store a data block of a client file; A plurality of nodes, each node comprising a server; a metadata service configured to maintain and update a state of a namespace of the distributed file system in response to a change in a data block of a client file; a distributed coordination engine embedded in the metadata service and configured to coordinate the received offers to update the data blocks by generating an ordered set of protocols corresponding to the received offers, the ordered set of protocols specifying an order in which the nodes modify data stored in the data nodes; Wherein the metadata service is further configured to, for each protocol in the generated ordered set of protocols, provide or generate an index to a respective writer list comprising an ordered list of nodes executing the protocol and causing a respective change to the namespace, each writer list comprising a preferred running node towards a first end of the writer list and an ordered list of failed or suspected failed nodes towards a second end of the writer list; And wherein the metadata service is further configured to send the ordered set of protocols to the plurality of nodes along with the corresponding writer list or pre-generated index thereof for each protocol in the ordered set of protocols such that each node in the plurality of nodes is only capable of executing the protocol of the first listed node on the writer list and when a previously listed node in the writer list has failed or is suspected of failing, the next listed node in the writer list is enabled to execute the protocol.
- 12. The network of servers of claim 11, wherein the metadata service is further configured to generate a writer list for at least some of the generated ordered protocol sets.
- 13. The network of servers of claim 11, wherein the metadata service is further configured to select from a plurality of pre-generated writer lists.
- 14. The network of servers of claim 11, wherein the metadata service is further configured to provide the same writer list for the second proposal as the first proposal after the first proposal is executed.
- 15. The network of servers of claim 11, wherein the metadata service is further configured to enable the predetermined replacement node to execute a protocol when a first listed node in the writer list has failed.
- 16. The network of servers of claim 11, wherein each node that has executed a protocol is further configured to propagate information related to the executed protocol to each node of the plurality of nodes.
- 17. The network of servers of claim 11, wherein, when executing the protocol, the node is further configured to update a deterministic state machine with information about the executed protocol, the deterministic state machine coupled to each of the other nodes and serving as a persistent message service between the plurality of nodes.
- 18. The network of servers of claim 11, wherein the metadata service is further configured to learn failed or failed nodes and place the failed or failed nodes at the bottom of any generated writer list.
- 19. The network of servers of claim 11, wherein the metadata service is further configured to: Pre-generating an indexed writer list for each of all possible combinations of the order of the plurality of nodes; distributing the writer list of the pre-generated index to each of the plurality of nodes, and The ordered set of protocols is sent to the plurality of nodes along with an index for each protocol in the ordered set of protocols, and the index points to a selected one of the writer lists of the pre-generated index.
Description
Method, device and system for writer preselection in distributed data system Technical Field The present invention relates to the field of distributed replication data systems, and more particularly to a method, apparatus, and system for writer preselection in a distributed data system. Background The field of embodiments disclosed herein includes distributed replication data systems. Some distributed data systems may define a logical structure called a region. Each such region may include a server whose task is to execute the received command by writing and reading to the metadata service (hereinafter MDS) of its region. Many systems provide such metadata services through a single server. In applications where read consistency is sacrificed for improved performance, the read command is optionally configured to bypass the server. When there are multiple servers in each zone for failover, a selection must be made to ensure that only one server in each zone is designated to execute a given command at any time, although different commands may have different designated servers. In this case, logic is provided to select a new server that allows writing when the current write-enabled server is deemed to have crashed. It may be difficult to reliably distinguish between crashed servers and slow running servers. In fact, when no periodic heartbeat signal is received for the surface server to continue to function properly within the expected time frame, the server may have indeed crashed, may simply slow down, or may be delayed or unable to communicate with the server due to temporary network delay problems or other reasons. In this case, if the new server write is programmatically enabled and the old server resumes operation for any reason, then there may now be two servers able to execute the same command by writing back-end storage. As a result, data integrity is no longer guaranteed, as both servers now independently execute the same command, which may lead to data corruption. Thus, some distributed systems require a human to confirm that the first server is in a terminated or non-operational state before the second server can write to the backend data store. This confirms that the server that is considered to have terminated is actually terminated, thereby preventing two active writers from being present simultaneously in the same area. Such manual confirmation is not optimal because no data can be written until the currently enabled writing server is manually confirmed (and thus disabled). Disclosure of Invention The invention provides a method, a device and a system for writer preselection in a distributed data system. The technical scheme provided by the invention is that the method for realizing the computer comprises the following steps: receiving a proposal for modifying data stored in a distributed and replicated file system coupled to a network, the distributed and replicated data system comprising a plurality of nodes, each node comprising a server; configuring a metadata service to maintain copies of namespaces of distributed and replicated file systems; Coordinating the updating of the data by generating an ordered set of protocols corresponding to the received proposal, the ordered set of protocols specifying an order in which nodes modify the data stored in the data nodes and cause changes in the state of the namespaces, wherein each node is configured to delay making changes to the data stored in the data nodes and cause changes in the state of the namespaces until the ordered set of protocols is received; for each protocol in the generated ordered set of protocols, providing or identifying an index of a respective writer list comprising an ordered list of nodes to execute the protocol and make a respective change to the namespace, each writer list comprising an ordered list of preferred operational nodes towards a first end of the writer list and failed or suspected failed nodes towards a second end of the writer list; Transmitting the ordered set of protocols to a plurality of nodes along with a respective writer list or an index of identification of the respective writer list for each protocol in the ordered set of protocols, each node of the plurality of nodes configured to execute only the protocol of the first listed node on the provided or identified index of the respective writer list; When a previously listed node in the writer list has failed or is suspected of failing, the next listed node in the writer list is capable of executing a protocol, and Such that the metadata service updates a copy of the maintained namespace. Preferably, the providing includes generating a writer list for at least some of the generated ordered protocol sets. Preferably, the providing includes selecting from a plurality of pre-generated writer lists. Preferably, the providing further comprises, after the first proposal is performed, providing the second proposal with the same writer list as the first proposal.