Search

US-12625880-B1 - Generating a version vector for a data object in a distributed system to determine whether the data object is current

US12625880B1US 12625880 B1US12625880 B1US 12625880B1US-12625880-B1

Abstract

A distributed system includes multiple servers that various client devices access. Each server maintains a copy of a data object for access to the data object regardless of which server is accessed. Each server may modify a local copy of the data object, with the modifications propagated to other servers. A server generates a local version vector that includes pairs of a server identifier from whom modifications were received and a server-specific version identifier for the modifications. When a client device retrieves the data object, the client device transmits a client version vector for the data object including pairs of server identifiers and server-specific version identifiers from which the client device received version events for modifications to the data object. The server compares the local version vector to the client version vector to determine whether to transmit a local copy of the data object to the client device.

Inventors

  • Jiang Wu
  • Maxime Lasserre
  • Stephane Major

Assignees

  • MANGO TECHNOLOGIES, INC.

Dates

Publication Date
20260512
Application Date
20250221

Claims (20)

  1. 1 . A method comprising: storing, at a server of a distributed system, a data object; storing an updated version of the data object at the server in response to receiving one or more modifications to the data object; storing a server-specific version identifier generated by the server in association with an object identifier of the data object at the server; generating, by the server, a version vector for the data object, the version vector including a pair including a server identifier of the server and the server-specific version identifier of the data object generated by the server; transmitting a version event from the server to one or more client devices, the version event including an object identifier for the data object and the version vector for the data object; receiving, at an additional server of the distributed system that is different from the server and that includes a copy of the data object, a request from a client device including the data object identifier and a client version vector for the data object, the client version vector including one or more client pairs, each client pair including a server identifier of a server from which the client device received one or more version events including the object identifier and a client server-specific version identifier generated by the server from which the client device received one or more version events including the object identifier; and transmitting the copy of the data object from the additional server to the client device in response to the additional server determining a local version vector associated with the object identifier by the additional server includes the client version vector, the local version vector including a local pair for each server from which the additional server obtained one or more modifications to the data object, the local pair including a server identifier of a server from which the additional server obtained one or more modifications to the data object and a local server-specific version identifier generated by the server from which the additional server obtained one or more modifications to the data object.
  2. 2 . The method of claim 1 , wherein the additional server determining the local version vector associated with the object identifier by the additional server includes the client version vector comprises: the additional server determining the local version vector includes each server identifier of a server from which the client device received one or more version events in the client version vector; and for each client pair included in the client version vector, the additional server identifying a local pair included in the local version vector having the server identifier of the server from which the client device received one or more version events including the object identifier and determining the identified local pair includes a local server-specific version identifier that is at least as recent as the client server-specific version identifier included in the client pair including the server identifier of the server from which the client device received one or more version events including the object identifier.
  3. 3 . The method of claim 2 , wherein the client server-specific version identifier comprises a client monotonically increasing sequence number and the local server-specific version identifier comprises a local monotonically increasing sequence number.
  4. 4 . The method of claim 3 , wherein determining the identified local pair includes the local server-specific version identifier that is at least as recent as the client server-specific version identifier included in the client pair including the server identifier of the server from which the client device received one or more version events including the object identifier comprises: determining the local monotonically increasing sequence number in the identified local pair in the local version vector is not less than the client monotonically increasing sequence number in the client pair in the client version vector.
  5. 5 . The method of claim 2 , wherein a server-specific version identifier generated by the server from which the client device received one or more version events comprises a timestamp generated by the server from which the client device received one or more version events and a server-specific version identifier generated by the server from which the additional server received modifications to the copy of the data object comprises an additional timestamp generated by the server from which the additional server received modifications to the copy of the data object.
  6. 6 . The method of claim 5 , wherein determining the identified local pair includes the local server-specific version identifier that is at least as recent as the client server-specific version identifier included in the client pair including the server identifier of the server from which the client device received one or more version events including the object identifier comprises: determining the additional timestamp in the identified local pair in the local version vector is not less than the timestamp in the client pair in the client version vector.
  7. 7 . The method of claim 1 , further comprising: transmitting an indication from the additional server to the client device that the copy of the data object stored on the additional server is not current response to the additional server determining the local version vector associated with the object identifier by the additional server does not include the client version vector.
  8. 8 . The method of claim 7 , wherein determining the local version vector associated with the object identifier by the additional server does not include the client version vector comprises: the additional server determining at least one local pair in the local version vector having a server identifier matching a server identifier included in a client pair in the client version vector includes a local server-specific version identifier that is older than the client server-specific version identifier included in a pair in the client version vector including the server identifier.
  9. 9 . A method comprising: storing a data object at a server of a distributed system, the data object associated with an object identifier; generating, by the server, a local version vector for the data object, the local version vector including a local pair for each server from which the server obtained one or more modifications to the data object, the local pair including a server identifier of a server from which the server obtained one or more modifications to the data object and a local server-specific version identifier generated by the server from which the server obtained one or more modifications to the data object; storing the local version vector at the server in association with the object identifier; receiving, a request from a client device including the object identifier and a client version vector for the data object, the client version vector including one or more client pairs, each client pair including a server identifier of a server from which the client device received a version event including the object identifier and a client server-specific version identifier generated by the server from which the client device received the version event including the object identifier; and in response to the server determining the local version vector associated with the object identifier by the server includes the client version vector, transmitting the data object from the server to the client device.
  10. 10 . The method of claim 9 , wherein determining the local version vector associated with the object identifier by the server includes the client version vector comprises: determining the local version vector includes each server identifier included in the client version vector; and for each client pair included in the client version vector, the server identifying a local pair included in the local version vector having the server identifier of the server from which the client device received one or more version events including the object identifier and determining the identified local pair includes a local server-specific version identifier that is at least as recent as the client server-specific version identifier included in the client pair including the server identifier of the server from which the client device received one or more version events including the object identifier.
  11. 11 . The method of claim 10 , wherein the client server-specific version identifier comprises a client monotonically increasing sequence number and the local server-specific version identifier comprises a local monotonically increasing sequence number.
  12. 12 . The method of claim 11 , wherein determining the identified local pair includes the local server-specific version identifier that is at least as recent as the client server-specific version identifier included in the client pair including the server identifier of the server from which the client device received one or more version events including the object identifier comprises: determining the local monotonically increasing sequence number in the identified local pair in the local version vector is not less than the client monotonically increasing sequence number in the client pair in the client version vector.
  13. 13 . The method of claim 10 , wherein a server-specific version identifier generated by the server from which the client device received one or more version events comprises a timestamp generated by the server from which the client device received one or more version events and a server-specific version identifier generated by the server from which the server obtained one or more modifications to the data object comprises an additional timestamp generated by the server from which the server obtained one or more modifications to the data object.
  14. 14 . The method of claim 13 , wherein determining the identified local pair includes the local server-specific version identifier that is at least as recent as the client server-specific version identifier included in the client pair including the server identifier of the server from which the client device received one or more version events including the object identifier comprises: determining the additional timestamp in the identified local pair in the local version vector is not less than the timestamp in the client pair in the client version vector.
  15. 15 . The method of claim 9 , further comprising: in response to the server determining the local version vector associated with the object identifier by the server does not include the client version vector, transmitting an indication from the server to the client device that the data object stored by server is not current.
  16. 16 . The method of claim 15 , wherein determining the local version vector associated with the object identifier by the server does not include the client version vector comprises: determining at least one local pair in the local version vector having a server identifier matching a server identifier included in a client pair in the client version vector includes a local server-specific version identifier that is older than the client server-specific version identifier included in a pair in the client version vector including the server identifier included in the client pair.
  17. 17 . A computer program product comprising a non-transitory computer readable storage medium having instructions encoded thereon that, when executed by a processor, cause the processor to perform steps comprising: storing a data object at a server of a distributed system, the data object associated with an object identifier; generating, by the server, a local version vector for the data object, the local version vector including a local pair for each server from which the server obtained one or more modifications to the data object, the local pair including a server identifier of a server from which the server obtained one or more modifications to the data object and a local server-specific version identifier generated by the server from which the server obtained one or more modifications to the data object; storing the local version vector at the server in association with the object identifier; receiving, a request from a client device including the object identifier and a client version vector for the data object, the client version vector including one or more client pairs, each client pair including a server identifier of a server from which the client device received a version event including the object identifier and a client server-specific version identifier generated by the server from which the client device received the version event including the object identifier; and in response to the server determining the local version vector associated with the object identifier by the server includes the client version vector, transmitting the data object from the server to the client device.
  18. 18 . The computer program product of claim 17 , wherein determining the local version vector associated with the object identifier by the server includes the client version vector comprises: determining the local version vector includes each server identifier included in the client version vector; and for each client pair included in the client version vector, the server identifying a local pair included in the local version vector having the server identifier of the server from which the client device received one or more version events including the object identifier and determining the identified local pair includes a local server-specific version identifier that is at least as recent as the client server-specific version identifier included in the client pair including the server identifier of the server from which the client device received one or more version events including the object identifier.
  19. 19 . The computer program product of claim 17 , wherein the non-transitory computer readable storage medium further has instructions encoded thereon that, when executed by the processor, cause the processor to perform steps comprising: in response to the server determining the local version vector associated with the object identifier by the server does not include the client version vector, transmitting an indication from the server to the client device that the data object stored by server is not current.
  20. 20 . The computer program product of claim 19 , wherein determining the local version vector associated with the object identifier by the server does not include the client version vector comprises: determining at least one local pair in the local version vector having a server identifier matching a server identifier included in a client pair in the client version vector includes a local server-specific version identifier that is older than the client server-specific version identifier included in a pair in the client version vector including the server identifier included in the client pair.

Description

BACKGROUND A distributed system includes multiple servers that communicate with each other. Each server maintains a local copy of data objects comprising data, such as documents, files, tasks, or other types of data. When a user accesses the distributed system via a client device, the client device accesses a particular server in the distributed system. For example, the distributed system has different servers in different geographic areas, and a user accesses a server in a geographic area nearest to a geographic area including the user's client device. Maintaining copies of a particular data object on multiple different servers allows a user to retrieve the data object from the distributed system regardless of which server of the distributed system is accessed. Many distributed systems allow multiple servers to modify a data object, with a server modifying the data object propagating the modification to other servers. Each of the other servers modify locally stored copies of the data object accordingly. To improve data throughput, a distributed system allows different servers to temporarily maintain different versions of the data object while modifications to the data object are propagated to various servers. This allows a server to locally modify a copy of the data object without immediately communicating the modification to other servers and verifying the modification being successfully made to other servers when the data object is modified. However, when a client device requests the data object from a server, allowing different servers to temporarily maintain different versions of the data object prevents a server providing the data object to the client device from determining whether the server's copy of the data object for the client device has incorporated modifications to the data object made by one or more other servers. This may result in the client device obtaining an older version of the data object from the server having incomplete or outdated content. Some conventional distributed systems provide consistency across multiple copies of a data object on different servers by having a single server modify the data object and preventing other servers from modifying the data object. However, limiting data object modification to a single server of the distributed system increases an amount of time for the data object to be modified and limits a rate at which the data object may be modified. Further limiting data object modification to a single server increases network traffic to the single server capable of modifying data objects. Other conventional distributed systems may implement cross-server transactions, where multiple servers are capable of modifying a data object, but a server that modifies a data object transmits the modifications to each other server of the distributed system when the server modifies the data object to propagate the modifications and wait for verification of successful propagation across servers. However, communicating and verifying modifications of a data object to different servers increases an amount of time for the data object to be modified across the distributed system before access by client devices, increasing an overall amount of time for the distributed system to modify the data object and increasing latency for data object modification. Some distributed systems avoid transmitting modifications of a data object from one server to other servers of the distributed system when a server modifies the data object by adding a version to the data object. A version is a monotonic increasing value associated with an identifier of the data object. Examples of a version include a sequence number or a timestamp. The data object and a corresponding version of the data object are stored on every server of such a distributed system. However, there are challenges for maintaining a consistent version of a data object across multiple servers of a distributed system that enables multiple servers to independently modify the data object. To create a globally comparable sequence number for a version of the data object, synchronization of the sequence number across each server requires each server to communicate with each other. If such a distributed system uses a timestamps for a version of a data object, a timestamp from one server is not comparable to timestamp from another server due to clock drifts between different servers. Accordingly, many distributed systems provide version consistency by allowing a single server to receive modifications to a data object from client devices and to store to the data object at a time. With a single server modifying a data object at a time, a single server determines a version of the data object, ensuring the version is monotonically increasing. However, limiting modification of a data object to a single server limits a rate at which the data object may be modified. SUMMARY In accordance with one or more aspects of the disclosure, a distributed syste