US-20260127078-A1 - SYSTEMS AND METHODS FOR GENERATIING A DECENTRALIZED INDEX FOR A DISTRIBUTED BACKUP SOLUTION

US20260127078A1US 20260127078 A1US20260127078 A1US 20260127078A1US-20260127078-A1

Abstract

A system generates a plurality of data archives using a plurality of backup agents in the distributed backup system, wherein each respective backup agent is configured to: create a respective data archive; build a respective data index while populating the respective data archive such that a data chunk is indexed simultaneously with the data chunk being added to the respective data archive; and upload the respective data archive and the respective data index to a centralized storage of the distributed backup system. The system receives, by the centralized storage, the plurality of data archives and a plurality of data indexes from the plurality of backup agents, and merges, by the centralized storage, the plurality of data indexes for access by a user.

Inventors

Maxim CHEREY
Mikhail Rybakov
Ivan Krestinin
Serg Bell
Stanislav Protasov

Assignees

ACRONIS INTERNATIONAL GMBH

Dates

Publication Date: 20260507
Application Date: 20241104

Claims (17)

1 . A method for managing a distributed backup system, the method comprising: generating a plurality of data archives using a plurality of backup agents in the distributed backup system, wherein each respective backup agent is configured to: create a respective data archive; build a respective data index while populating the respective data archive such that a data chunk is indexed simultaneously with the data chunk being added to the respective data archive; and upload the respective data archive and the respective data index to a centralized storage of the distributed backup system; receiving, by the centralized storage, the plurality of data archives and a plurality of data indexes from the plurality of backup agents; and merging, by the centralized storage, the plurality of data indexes for access by a user without mounting the plurality of data archives or accessing data chunks within the plurality of data archives.
2 . The method of claim 1 , wherein the centralized storage comprises a plurality of servers each configured to process a data index received from a respective backup agent of the plurality of backup agents.
3 . The method of claim 1 , wherein adding additional backup agents to the plurality of backup agents does not require proportionally scaling an amount of servers in the centralized storage.
4 . The method of claim 1 , wherein each respective backup agent is configured to build the respective data index based on a pre-existing full snapshot comprising a plurality of data chunks.
5 . The method of claim 4 , wherein the respective backup agent is configured to map the data chunk to a specific region of the pre-existing full snapshot when indexing.
6 . The method of claim 4 , wherein the plurality of data chunks are indexed as folders and files.
7 . The method of claim 1 , wherein each respective backup agent of the plurality of backup agents uses a different indexing scheme and includes an index identifier in the respective data index.
8 . The method of claim 7 , wherein the centralized storage is configured to convert the plurality of data indexes into a universal indexing scheme prior to merging.
9 . A system for managing distributed backups, comprising: at least one memory; at least one hardware processor coupled with the at least one memory and configured, individually or in combination, to: generate a plurality of data archives using a plurality of backup agents in the distributed backup system, wherein each respective backup agent is configured to: create a respective data archive; build a respective data index while populating the respective data archive such that a data chunk is indexed simultaneously with the data chunk being added to the respective data archive; and upload the respective data archive and the respective data index to a centralized storage of the distributed backup system; receive, by the centralized storage, the plurality of data archives and a plurality of data indexes from the plurality of backup agents; and merge, by the centralized storage, the plurality of data indexes for access by a user without mounting the plurality of data archives or accessing data chunks within the plurality of data archives.
10 . The system of claim 9 , wherein the centralized storage comprises a plurality of servers each configured to process a data index received from a respective backup agent of the plurality of backup agents.
11 . The system of claim 9 , wherein adding additional backup agents to the plurality of backup agents does not require proportionally scaling an amount of servers in the centralized storage.
12 . The system of claim 9 , wherein each respective backup agent is configured to build the respective data index based on a pre-existing full snapshot comprising a plurality of data chunks.
13 . The system of claim 12 , wherein the respective backup agent is configured to map the data chunk to a specific region of the pre-existing full snapshot when indexing.
14 . The system of claim 12 , wherein the plurality of data chunks are indexed as folders and files.
15 . The system of claim 9 , wherein each respective backup agent of the plurality of backup agents uses a different indexing scheme and includes an index identifier in the respective data index.
16 . The system of claim 15 , wherein the centralized storage is configured to convert the plurality of data indexes into a universal indexing scheme prior to merging.
17 . A non-transitory computer readable medium storing thereon computer executable instructions for managing a distributed backup system, including instructions for: generating a plurality of data archives using a plurality of backup agents in the distributed backup system, wherein each respective backup agent is configured to: create a respective data archive; build a respective data index while populating the respective data archive such that a data chunk is indexed simultaneously with the data chunk being added to the respective data archive; and upload the respective data archive and the respective data index to a centralized storage of the distributed backup system; receiving, by the centralized storage, the plurality of data archives and a plurality of data indexes from the plurality of backup agents; and merging, by the centralized storage, the plurality of data indexes for access by a user without mounting the plurality of data archives or accessing data chunks within the plurality of data archives.

Description

FIELD OF TECHNOLOGY The present disclosure relates to the field of data storage, and, more specifically, to systems and methods for generating a decentralized index for a distributed backup solution. BACKGROUND The typical data backup process consists of several sequential steps. After a data archive is created, a backup agent uploads the data to a centralized storage. After that, to enable browsing the stored data or to search through it, an index is built. Conventionally, index creation is a heavy time and resource consuming process because it requires going through all the data in the archive. In a distributed system, after the index is created, it is merged and stored along with the indexes of other data archives. SUMMARY The present disclosure describes building a data index for data archives based on the pre-calculated information that is collected by backup agents along the backup process. The disclosed systems and methods allow for building the full index through all the data archives in a an efficient way-avoiding the need for high resource and processing time consumption in the centralized storage, and allowing immediate access to an up-to-date index as soon as a data archive is added onto the centralized storage. In one exemplary aspect, the techniques described herein relate to a method for managing a distributed backup system, the method including: generating a plurality of data archives using a plurality of backup agents in the distributed backup system, wherein each respective backup agent is configured to: create a respective data archive; build a respective data index while populating the respective data archive such that a data chunk is indexed simultaneously with the data chunk being added to the respective data archive; and upload the respective data archive and the respective data index to a centralized storage of the distributed backup system; receiving, by the centralized storage, the plurality of data archives and a plurality of data indexes from the plurality of backup agents; and merging, by the centralized storage, the plurality of data indexes for access by a user. In some aspects, the techniques described herein relate to a method, wherein the centralized storage includes a plurality of servers each configured to process a data index received from a respective backup agent of the plurality of backup agents. In some aspects, the techniques described herein relate to a method, wherein adding additional backup agents to the plurality of backup agents does not require proportionally scaling an amount of servers in the centralized storage. In some aspects, the techniques described herein relate to a method, wherein each respective backup agent is configured to build the respective data index based on a pre-existing full snapshot including a plurality of data chunks. In some aspects, the techniques described herein relate to a method, wherein the respective backup agent is configured to map the data chunk to a specific region of the full snapshot when indexing. In some aspects, the techniques described herein relate to a method, wherein the plurality of data chunks are indexed as folders and files. In some aspects, the techniques described herein relate to a method, wherein each respective backup agent of the plurality of backup agents uses a different indexing scheme and includes an index identifier in the respective data index. In some aspects, the techniques described herein relate to a method, wherein the centralized storage is configured to convert the plurality of data indexes into a universal indexing scheme prior to merging. It should be noted that the methods described above may be implemented in a system comprising a hardware processor. Alternatively, the methods may be implemented using computer executable instructions of a non-transitory computer readable medium. In some aspects, the techniques described herein relate to a system for managing distributed backups, including: at least one memory; at least one hardware processor coupled with the at least one memory and configured, individually or in combination, to: generate a plurality of data archives using a plurality of backup agents in the distributed backup system, wherein each respective backup agent is configured to: create a respective data archive; build a respective data index while populating the respective data archive such that a data chunk is indexed simultaneously with the data chunk being added to the respective data archive; and upload the respective data archive and the respective data index to a centralized storage of the distributed backup system; receive, by the centralized storage, the plurality of data archives and a plurality of data indexes from the plurality of backup agents; and merge, by the centralized storage, the plurality of data indexes for access by a user. In some aspects, the techniques described herein relate to a non-transitory computer readable medium storing thereon computer executable