Search

US-20260127141-A1 - Incremental Data Replication at Scale

US20260127141A1US 20260127141 A1US20260127141 A1US 20260127141A1US-20260127141-A1

Abstract

Incremental replication systems, methods, and apparatuses are described. An illustrative source storage system receives a request from a first target storage system for replication data associated with a snapshot of the file system, computes a delta between the snapshot and an earlier-in-time snapshot of the file system, sends the delta to the first target storage system, caches the delta, receives a request from a second target storage system for replication data associated with the snapshot of the file system; determines, that the cached delta can be used to respond to the request from the second target storage system, and uses the cached delta to send the delta to the second target storage system.

Inventors

  • MANCHUN ZHENG
  • Abhishek Jain
  • Robert Matych
  • Victor Yip
  • Peter Vajgel
  • Dong Liu

Assignees

  • PURE STORAGE, INC.

Dates

Publication Date
20260507
Application Date
20241106

Claims (20)

  1. 1 . A source storage system that replicates a file system to a plurality of target storage systems comprising a first target storage system and a second target storage system, the source storage system comprising: a memory storing instructions; and one or more processors communicatively coupled to the memory and configured to execute the instructions to perform a process comprising: receiving a request from the first target storage system for replication data associated with a snapshot of the file system; computing, based on the request from the first target storage system, a delta between the snapshot and an earlier-in-time snapshot of the file system; sending the delta to the first target storage system; caching the delta at the source storage system; receiving, at the source storage system, a request from the second target storage system for replication data associated with the snapshot of the file system; determining, at the source storage system and based on the request from the second target storage system, that the cached delta can be used to respond to the request from the second target storage system; and using the cached delta to send the delta to the second target storage system.
  2. 2 . The source storage system of claim 1 , wherein the determining that the cached delta can be used to respond to the request from the second target storage system comprises determining that the request from the second target storage system is received within a time threshold.
  3. 3 . The source storage system of claim 1 , wherein the determining that the cached delta can be used to respond to the request from the second target storage system comprises determining that the cached delta can be logically merged with other incremental replication data to satisfy the request from the second target storage system.
  4. 4 . The source storage system of claim 1 , wherein the using the cached delta to send the delta to the second target storage system comprises logically merging the cached delta with a different cached delta and sending merged delta data to the second target storage system.
  5. 5 . The source storage system of claim 1 , wherein the process further comprises: catching up a replica file system at the first target storage system to be synchronized with the file system at the source target storage system; and after the catching up, proxying, by the source storage system, I/O operations for the file system between a client and the first target storage system.
  6. 6 . The source storage system of claim 5 , wherein the proxying comprises the source storage system forwarding I/O requests received from the client to the first target storage system.
  7. 7 . The source storage system of claim 6 , wherein: the forwarding I/O requests from the client to the first target storage system comprises forwarding file handles included in the I/O requests, the first target storage system configured to use the file handles to process the I/O requests.
  8. 8 . The source storage system of claim 5 , wherein the proxying comprises capturing and sending file protocol locking information to the first target storage system.
  9. 9 . The source storage system of claim 5 , wherein the catching up and the proxying are performed for a failover from the source storage system to the first target storage system.
  10. 10 . The source storage system of claim 5 , wherein the catching up and the proxying are performed for a migration of the file system from the source storage system to the first target storage system.
  11. 11 . The source storage system of claim 1 , wherein: the request from the first target storage system for replication data associated with the snapshot of the file system comprises a pull request issued by the first target storage system based on a pull schedule of the first target storage system to pull new snapshot data from a plurality of source storage systems that comprises the source storage system; and the pull schedule is configured to distribute resources of the first target storage system fairly among the plurality of source storage systems.
  12. 12 . A method of incremental data replication, the method comprising: receiving, by a source storage system that replicates a dataset to a plurality of target storage systems comprising a first target storage system and a second target storage system, a request from the first target storage system for replication data for the dataset; determining, by the source storage system and based on the request from the first target storage system, incremental replication data for the dataset; sending, by the source storage system, the incremental replication data to the first target storage system; caching, by the source storage system, the incremental replication data; receiving, by the source storage system, a request from a second target storage system for replication data for the dataset; determining, by the source storage system and based on the request from the second target storage system, that the cached incremental replication data can be used to respond to the request from the second target storage system; and using, by the source storage system, the cached incremental replication data to send the incremental replication data to the second target storage system.
  13. 13 . The method of claim 12 , wherein the dataset comprises a file system.
  14. 14 . The method of claim 12 , wherein the dataset comprises a volume, an object bucket, or a database.
  15. 15 . The method of claim 12 , wherein: the request from the first target storage system for replication data for the dataset comprises a request for replication data associated with a snapshot of the dataset; and the determining the incremental replication data for the dataset comprises determining a delta between the requested snapshot of the dataset and an earlier-in-time snapshot of the dataset.
  16. 16 . The method of claim 12 , wherein the determining that the cached incremental replication data can be used to respond to the request from the second target storage system comprises determining that the request from the second target storage system is received within a time threshold.
  17. 17 . The method of claim 12 , further comprising: catching up a replica dataset at the first target storage system to be synchronized with the dataset at the source target storage system; and after the catching up, proxying, by the source storage system, I/O operations for the dataset between a client and the first target storage system.
  18. 18 . The method of claim 17 , wherein the proxying comprises forwarding I/O requests received from the client to the first target storage system.
  19. 19 . The method of claim 18 , wherein: the dataset comprises a file system; and the forwarding I/O requests from the client to the first target storage system comprises forwarding file handles included in the I/O requests, the first target storage system configured to use the file handles to process the I/O requests.
  20. 20 . A computer program product embodied in a non-transitory computer readable storage medium and comprising computer instructions for: receiving a request from a first target storage system for replication data associated with a snapshot of a file system; computing, based on the request from the first target storage system, a delta between the snapshot and an earlier-in-time snapshot of the file system; sending the delta from a source storage system to the first target storage system; caching the delta at the source storage system; receiving, at the source storage system, a request from a second target storage system for replication data associated with the snapshot of the file system; determining, at the source storage system and based on the request from the second target storage system, that the cached delta can be used to respond to the request from the second target storage system; and using the cached delta to send the delta from the source storage system to the second target storage system.

Description

BRIEF DESCRIPTION OF THE DRAWINGS The accompanying drawings illustrate various embodiments and are a part of the specification. The illustrated embodiments are merely examples and do not limit the scope of the disclosure. Throughout the drawings, identical or similar reference numbers designate identical or similar elements. FIG. 1A illustrates a first example system for data storage in accordance with some implementations. FIG. 1B illustrates a second example system for data storage in accordance with some implementations. FIG. 1C illustrates a third example system for data storage in accordance with some implementations. FIG. 1D illustrates a fourth example system for data storage in accordance with some implementations. FIG. 2A is a perspective view of a storage cluster with multiple storage nodes and internal storage coupled to each storage node to provide network attached storage, in accordance with some embodiments. FIG. 2B is a block diagram showing an interconnect switch coupling multiple storage nodes in accordance with some embodiments. FIG. 2C is a multiple level block diagram, showing contents of a storage node and contents of one of the non-volatile solid state storage units in accordance with some embodiments. FIG. 2D shows a storage server environment, which uses embodiments of the storage nodes and storage units of some previous figures in accordance with some embodiments. FIG. 2E is a blade hardware block diagram, showing a control plane, compute and storage planes, and authorities interacting with underlying physical resources, in accordance with some embodiments. FIG. 2F depicts elasticity software layers in blades of a storage cluster, in accordance with some embodiments. FIG. 2G depicts authorities and storage resources in blades of a storage cluster, in accordance with some embodiments. FIG. 3A sets forth a diagram of a storage system that is coupled for data communications with a cloud services provider in accordance with some embodiments of the present disclosure. FIG. 3B sets forth a diagram of a storage system in accordance with some embodiments. FIG. 3C sets forth an example of a cloud-based storage system in accordance with some embodiments. FIG. 3D illustrates an exemplary computing device that may be specifically configured to perform one or more of the processes described herein. FIG. 3E illustrates an example of a fleet of storage systems for providing storage services in accordance with some embodiments. FIG. 3F illustrates an example container system in accordance with some embodiments. FIG. 3G illustrates an example of a storage node for a large-scale storage platform in accordance with embodiments of the disclosure. FIG. 4 illustrates an example configuration of storage systems configured to replicate data between them in accordance with some embodiments. FIG. 5 illustrates an example process of incremental replication in accordance with some embodiments. FIG. 6 illustrates an example method of incremental replication in accordance with some embodiments. FIG. 7 illustrates another example method of incremental replication in accordance with some embodiments. FIG. 8 illustrates an example configuration of storage systems configured to replicate data between them in accordance with some embodiments. FIG. 9 illustrates an example method of incremental replication in accordance with some embodiments. FIG. 10 illustrates an example method in accordance with some embodiments. DESCRIPTION OF EMBODIMENTS Example methods, apparatuses, and products for incremental data replication in accordance with embodiments of the present disclosure are described with reference to the accompanying drawings, beginning with FIG. 1A. FIG. 1A illustrates an example system for data storage, in accordance with some implementations. System 100 (also referred to as “storage system” herein) includes numerous elements for purposes of illustration rather than limitation. It may be noted that system 100 may include the same, more, or fewer elements configured in the same or different manner in other implementations. System 100 includes a number of computing devices 164A-B. Computing devices (also referred to as “client devices” herein) may be embodied, for example, a server in a data center, a workstation, a personal computer, a notebook, or the like. Computing devices 164A-B may be coupled for data communications to one or more storage arrays 102A-B through a storage area network (‘SAN’) 158 or a local area network (‘LAN’) 160. The SAN 158 may be implemented with a variety of data communications fabrics, devices, and protocols. For example, the fabrics for SAN 158 may include Fibre Channel, Ethernet, Infiniband, Serial Attached Small Computer System Interface (‘SAS’), or the like. Data communications protocols for use with SAN 158 may include Advanced Technology Attachment (‘ATA’), Fibre Channel Protocol, Small Computer System Interface (‘SCSI’), Internet Small Computer System Interface (‘iSCSI’), HyperSC