Search

US-12625772-B2 - Remote backup restore with a local dedupe engine

US12625772B2US 12625772 B2US12625772 B2US 12625772B2US-12625772-B2

Abstract

A system can maintain a backup of data stored on a source computer, wherein the backup is stored on a remote computer. The system can maintain, on the source computer, a file catalog, wherein the file catalog comprises a local path for a first file on the source computer, a remote backup location for the first file on the remote computer, and a first hash of the first file. The system can determine that a copy of the first file at the local path on the source computer is corrupted. The system can identify whether a second hash of a second file on the source computer matches the first hash of the first file. The system can, in response to determining that the second hash matches the first hash, repair first data of the first file with second data of the second file, and repair first metadata of the first file with second metadata from the remote backup location for the first file on the remote computer.

Inventors

  • Shiv Kumar
  • Kaushik GUPTA

Assignees

  • DELL PRODUCTS L.P.

Dates

Publication Date
20260512
Application Date
20230615

Claims (20)

  1. 1 . A system, comprising: at least one processor; and at least one memory coupled to the processor, comprising instructions that, in response to execution by the at least one processor, cause the system to perform operations, comprising: maintaining a backup of data stored on a source computer, wherein the backup is stored on a remote computer; maintaining, on the source computer, a file catalog, wherein the file catalog comprises a local path for a first file on the source computer, a remote backup location for the first file on the remote computer, and a first hash of the first file; determining that a copy of the first file at the local path on the source computer is corrupted; identifying whether a second hash of a second file on the source computer matches the first hash of the first file on the source computer; and in response to determining that the second hash of the second file on the source computer matches the first hash of the first file on the source computer, restoring first data of the first file with second data of the second file; obtaining second metadata that corresponds to the first file from the remote backup location for the first file on the remote computer, to produce remotely-obtained second metadata, and restoring first metadata of the first file with the remotely-obtained second metadata from the remote backup location for the first file on the remote computer, wherein the source computer and the remote computer differ.
  2. 2 . The system of claim 1 , wherein the operations further comprise: in response to determining that no hash of a file on the source computer matches the first hash of the first file, restoring the first data and the first metadata of the first file with third data and the second metadata from the remote backup location for the first file on the remote computer.
  3. 3 . The system of claim 1 , wherein the identifying of whether the second hash matches the first hash of the first file is performed based on determining that a number of duplicate files on the source computer is greater than or equal to a determined threshold value.
  4. 4 . The system of claim 1 , wherein the operations further comprise: subsequent to performing the identifying of whether the second hash matches the first hash of the first file, determining to halt considering local copies of files when restoring files in response to determining that a number of duplicate files on the source computer is less than a determined threshold value.
  5. 5 . The system of claim 4 , wherein the operations further comprise: storing hashes of files in the file catalog while processing local copies of files when restoring files is halted.
  6. 6 . The system of claim 1 , wherein the identifying of whether the second hash matches the first hash of the first file is performed by a data deduplication component of the source computer.
  7. 7 . The system of claim 1 , wherein the identifying of whether the second hash matches the first hash of the first file comprises: sending, to a data deduplication component of the source computer, the first hash from the file catalog; and receiving respective local file paths of a group of files that comprises the second file, wherein respective hashes of respective files of the group of files match the first hash.
  8. 8 . A method, comprising: maintaining, by a system comprising at least one processor, a file catalog, wherein the file catalog comprises a local path for a first file on the system, a remote backup location for the first file on a remote computer that stores a backup of data of the system, and a first hash of the first file; determining, by the system, that a copy of the first file at the local path on the system is corrupted; identifying, by the system, whether a second hash of a second file on the system matches the first hash of the first file; and in response to determining that the second hash matches the first hash, restoring, by the system, data of the first file with data of the second file, obtaining, by the system, second metadata that corresponds to the first file from the remote backup location for the first file on the remote computer, and restoring, by the system, first metadata of the first file with the second metadata that is obtained from the remote backup location for the first file on the remote computer.
  9. 9 . The method of claim 8 , wherein the identifying of whether the second hash matches the first hash of the first file comprises: sending, to a data deduplication component of the system, the first hash from the file catalog; and receiving respective stubs of a group of files that comprises the second file, wherein respective hashes of respective files of the group of files match the first hash.
  10. 10 . The method of claim 8 , wherein the identifying of whether the second hash matches the first hash of the first file comprises: sending, to a data deduplication component of the system, the first hash from the file catalog; and receiving, from the data deduplication component, an indication that no hashes of files match the first hash.
  11. 11 . The method of claim 8 , further comprising: in response to initiating, by the system, an incremental backup to the remote computer, determining hashes for files of the incremental backup; and storing, by the system, the hashes for files in the file catalog.
  12. 12 . The method of claim 8 , further comprising: enabling, by the system, processing of local file copies when restoring the file based on file duplication statistics received from a deduplication engine.
  13. 13 . The method of claim 8 , further comprising: enabling, by the system, processing of local file copies when restoring the file based on a current level of file deduplication on the system.
  14. 14 . The method of claim 13 , further comprising: determining, by the system, the current level of file deduplication on the system from a deduplication component of the system.
  15. 15 . A non-transitory computer-readable medium comprising instructions that, in response to execution, cause a system comprising at least one processor to perform operations, comprising: maintaining a file catalog, wherein the file catalog comprises a local path for a first file, a remote backup location for the first file on a remote computer that stores a backup of local data, and a first hash of the first file; determining that a copy of the first file at the local path is to be repaired; identifying whether a second hash of a second file that is stored locally matches the first hash of the first file; and in response to determining that the second hash matches the first hash, restoring data of the first file with data of the second file, obtaining second metadata that corresponds to the first file from the remote backup location for the first file on the remote computer, and restoring first metadata of the first file with the second metadata from the remote backup location.
  16. 16 . The non-transitory computer-readable medium of claim 15 , wherein first metadata of the first file stored locally differs from second metadata of the second file stored locally.
  17. 17 . The non-transitory computer-readable medium of claim 15 , wherein the remote backup location in the file catalog comprises an identifier of the remote computer and a snapshot identifier of a backup snapshot on the remote computer.
  18. 18 . The non-transitory computer-readable medium of claim 15 , wherein first metadata of the first file comprises an extended attribute of the first file.
  19. 19 . The non-transitory computer-readable medium of claim 15 , wherein a first computing cluster comprises the local data, and wherein a second computing cluster comprises the remote computer.
  20. 20 . The non-transitory computer-readable medium of claim 15 , wherein the first hash comprises a secure hash algorithm 1 (SHA-1) hash.

Description

BACKGROUND Computer systems can facilitate data storage. Backups of data can be maintained. Stored data can become corrupted. SUMMARY The following presents a simplified summary of the disclosed subject matter in order to provide a basic understanding of some of the various embodiments. This summary is not an extensive overview of the various embodiments. It is intended neither to identify key or critical elements of the various embodiments nor to delineate the scope of the various embodiments. Its sole purpose is to present some concepts of the disclosure in a streamlined form as a prelude to the more detailed description that is presented later. An example system can operate as follows. The system can maintain a backup of data stored on a source computer, wherein the backup is stored on a remote computer. The system can maintain, on the source computer, a file catalog, wherein the file catalog comprises a local path for a first file on the source computer, a remote backup location for the first file on the remote computer, and a first hash of the first file. The system can determine that a copy of the first file at the local path on the source computer is corrupted. The system can identify whether a second hash of a second file on the source computer matches the first hash of the first file. The system can, in response to determining that the second hash matches the first hash, repair first data of the first file with second data of the second file, and repair first metadata of the first file with second metadata from the remote backup location for the first file on the remote computer. An example method can comprise maintaining, by a system comprising a processor, a file catalog, wherein the file catalog comprises a local path for a first file on the system, a remote backup location for the first file on a remote computer that stores a backup of data of the system, and a first hash of the first file. The method can further comprise determining, by the system, that a copy of the first file at the local path on the system is corrupted. The method can further comprise identifying, by the system, whether a second hash of a second file on the system matches the first hash of the first file. The method can further comprise, in response to determining that the second hash matches the first hash, repairing, by the system, data of the first file with data of the second file, and repairing, by the system, metadata of the first file with metadata from the remote backup location for the first file on the remote computer. An example non-transitory computer-readable medium can comprise instructions that, in response to execution, cause a system comprising a processor to perform operations. These operations can comprise maintaining a file catalog, wherein the file catalog comprises a local path for a first file, a remote backup location for the first file on a remote computer that stores a backup of local data, and a first hash of the first file. These operations can further comprise determining that a copy of the first file at the local path is to be repaired. These operations can further comprise identifying whether a second hash of a second file that is stored locally matches the first hash of the first file. These operations can further comprise, in response to determining that the second hash matches the first hash, repairing data of the first file with data of the second file, and repairing metadata of the first file with metadata from the remote backup location. BRIEF DESCRIPTION OF THE DRAWINGS Numerous embodiments, objects, and advantages of the present embodiments will be apparent upon consideration of the following detailed description, taken in conjunction with the accompanying drawings, in which like reference characters refer to like parts throughout, and in which: FIG. 1 illustrates an example system architecture that can facilitate remote backup restore with a local dedupe engine, in accordance with an embodiment of this disclosure; FIG. 2 illustrates another example system architecture that can facilitate remote backup restore with a local dedupe engine, in accordance with an embodiment of this disclosure; FIG. 3 illustrates another example system architecture that can facilitate remote backup restore with a local dedupe engine, in accordance with an embodiment of this disclosure; FIG. 4 illustrates an example process flow that can facilitate remote backup restore with a local dedupe engine, in accordance with an embodiment of this disclosure; FIG. 5 illustrates another example process flow that can facilitate remote backup restore with a local dedupe engine, in accordance with an embodiment of this disclosure; FIG. 6 illustrates another example process flow that can facilitate remote backup restore with a local dedupe engine, in accordance with an embodiment of this disclosure; FIG. 7 illustrates another example process flow that can facilitate remote backup restore with a local dedupe engine, in acco