US-20260127077-A1 - RAPID ACCESS FOR RESTORED CLOUD BACKUP DATA

US20260127077A1US 20260127077 A1US20260127077 A1US 20260127077A1US-20260127077-A1

Abstract

Techniques are disclosed relating to providing fast access to restored data in a cloud storage environment. A computing system stores multiple sets of incremental backup data that reflect changes in data being backed up between backup intervals and metadata that indicates which set of incremental backup data stores a given object. The computing system generates an endpoint for a requesting computing system, where the endpoint supports requests for restored backup data and data responses. In response to a request from the requesting computing system via the endpoint, the computing system queries the metadata based on the request and stores metadata retrieved from the query in a key/value store using object identification information as key data and the retrieved metadata as value data. The computing system provides requested data according to a lazy loading technique, including providing requested data via the endpoint based on the metadata in the key/value store.

Inventors

Xia Hua
Woon Ho Jung
Jinmyeong Kim
Junhwan Kim
Seunghyeon KIM
Yuchon Yi
Nicholas Gerald Zehender
Jason Sarich
Jaewon Wi

Assignees

COMMVAULT SYSTEMS, INC.

Dates

Publication Date: 20260507
Application Date: 20260106

Claims (20)

1 . A computer system that is cloud-based, wherein the computer system comprises a hardware processor subsystem that is coupled to system memory that stores program instructions, which, when executed by the hardware processor subsystem, configure the computer system to: store, at a data storage resource of the computer system, multiple sets of incremental backup data that reflect changes in data objects between backup intervals; store, at a metadata storage resource of the computer system, metadata that identifies, for a given data object, which set of incremental backup data stores the given data object; generate a key/value store configured to store metadata that is retrieved from the metadata storage resource by way of metadata queries, wherein the key/value store uses data object identification information as key data and retrieved metadata as value data; generate an endpoint that supports requests from an application outside the computer system; and responsive to a request to restore a first backup data object, received from the application by way of the endpoint, look up first metadata corresponding to the first backup data object in the key/value store, and, based on location information included in the first metadata, retrieve the first backup data object from the data storage resource of the computer system, and, via the endpoint, provide the first backup data object to the application based on a lazy loading technique; wherein the computer system is further configured to perform, at least partially in parallel with the lazy loading technique, a full restoration of requested backup data objects to a destination data storage associated with the application and outside the computer system.
2 . The computer system of claim 1 , wherein the metadata at the metadata storage resource comprises a backup time corresponding to each data object among the multiple sets of incremental backup data.
3 . The computer system of claim 1 , wherein the key/value store comprises multiple lookup tables that are populated concurrently from multiple metadata partitions, wherein each partition includes an ordered list of data objects.
4 . The computer system of claim 3 , wherein each metadata partition is queried based on a partition key corresponding to an application sub-module, and metadata within each partition is sorted based on a timestamp sort key.
5 . The computer system of claim 3 , wherein the computer system includes a worker function that is configured to execute multiple parallel queries to scan multiple partitions of metadata and map query results to the multiple lookup tables of the key/value store.
6 . The computer system of claim 1 , wherein the endpoint is configured to provide read-only access to a subset of restored backup data corresponding to one or more protection groups identified in a request.
7 . The computer system of claim 6 , wherein the endpoint provides access to a virtual namespace allowing the application to query backup data objects using same identifiers as in a source data storage.
8 . The computer system of claim 1 , wherein the endpoint is generated in response to an explicit endpoint request specifying a protection group and a restoration time parameter identifying a particular backup point.
9 . The computer system of claim 1 , wherein the computer system is configured to perform lazy loading by retrieving only data objects requested by the application and deferring retrieval of remaining data objects until subsequent requests are received.
10 . The computer system of claim 1 , wherein the computer system is configured to maintain ordering of metadata values in the key/value store according to an order of metadata within partitions and an order of partitions across multiple lookup tables.
11 . A computer-implemented method comprising: storing, by a computing system that operates in a cloud, multiple sets of incremental backup data that reflect changes in data objects between backup intervals; storing, by the computing system, at a metadata storage resource of the computing system, metadata that identifies, for a given data object, which set of incremental backup data stores the given data object; generating, by the computing system, a key/value store configured to store metadata retrieved from the metadata storage resource by way of metadata queries, wherein the key/value store uses data object identification information as key data and retrieved metadata as value data; generating, by the computing system, an endpoint that supports requests from an application external to the computing system; and responsive to a request to restore a first backup data object received from the application via the endpoint, looking up first metadata corresponding to the first backup data object in the key/value store and, based on location information in the first metadata, retrieving the first backup data object from a data storage resource and providing the first backup data object to the application via the endpoint according to a lazy loading technique; wherein the computing system performs, at least partially in parallel with the lazy loading technique, a full restoration of requested backup data objects to a destination data storage associated with the application.
12 . The computer-implemented method of claim 11 , wherein the metadata includes a backup time corresponding to each data object among the multiple sets of incremental backup data.
13 . The computer-implemented method of claim 11 , wherein the key/value store comprises multiple lookup tables that are populated concurrently from multiple metadata partitions, each partition including an ordered list of data objects.
14 . The computer-implemented method of claim 13 , wherein each metadata partition is queried based on a partition key corresponding to an application sub-module, and metadata within each partition is sorted based on a timestamp sort key.
15 . The computer-implemented method of claim 13 , wherein a worker function executes multiple parallel queries to scan multiple partitions of metadata and map query results to the multiple lookup tables of the key/value store.
16 . The computer-implemented method of claim 11 , wherein the endpoint provides read-only access to a subset of restored backup data corresponding to one or more protection groups identified in a request.
17 . The computer-implemented method of claim 16 , wherein the endpoint provides access to a virtual namespace allowing the application to query backup data objects using same identifiers as in a source data storage.
18 . The computer-implemented method of claim 11 , wherein the endpoint is generated in response to an explicit endpoint request specifying a protection group and a restoration time parameter identifying a particular backup point.
19 . The computer-implemented method of claim 11 , wherein performing the lazy loading technique comprises retrieving only those data objects requested by the application and deferring retrieval of remaining data objects until subsequent requests are received.
20 . The computer-implemented method of claim 11 , further comprising maintaining ordering of metadata values in the key/value store according to an order of metadata within partitions and an order of partitions across multiple lookup tables.

Description

PRIORITY This application is a Continuation of U.S. patent application Ser. No. 18/505,868 filed on 9 Nov. 2023, which is incorporated by reference in its entirety herein. Any and all applications for which a foreign or domestic priority claim is identified in the Application Data Sheet of the present application are hereby incorporated by reference in their entireties under 37 C.F.R. 1.57. BACKGROUND Technical Field This disclosure relates generally to data backups and more particularly to fast access to restored data in a cloud storage environment. Description of Related Art Backup and archiving of data in computing systems is important in various contexts, e.g., to mitigate or prevent data loss due to equipment failure or malicious activity. Many data stores are now cloud based, e.g., Amazon Web Services S3 storage, Microsoft Azure, IBM cloud databases, etc. Cloud-based systems may allow entities to store substantial amounts of data without maintaining the storage hardware. Providing backups in the cloud-based context may be challenging in terms of security, ransomware protection, compute resources, cost, etc. Some data stores are structured while others are unstructured, each of which may have various advantages and drawbacks. A key-value database is one type of a non-relational database that stores data as a collection of key-value pairs, where a key is used as a unique identifier to retrieve associated value with each key. The keys and values may be, for example: strings, numbers, complex objects, etc. Amazon simple storage service (S3), for example, is a cloud-based key-value data store, for storing diverse and mostly unstructured data. S3 buckets are containers that store uploaded objects. Backup storage services are typically utilized to protect various types of information from being lost due to hardware failure, file corruption, malicious entities, natural disasters, etc. It may be desirable to backup cloud-based data and potentially to use cloud-based solutions for the backup storage. Generally, backups may be challenging in terms of differentiating between types of data, time-constraints, data size and cost, querying backups, restoration, etc. In addition, cloud-based backup services may face challenges of object versioning overhead, costs for small files, etc. Further, restoration of data for a given backup to a client account may take a substantial amount of time, particularly for large backups in the incremental backup context. BRIEF DESCRIPTION OF DRAWINGS FIG. 1 is a block diagram illustrating an example backup service that allows an application to access backup data via an endpoint, according to some embodiments. FIG. 2 is a block diagram illustrating an example key/value store, according to some embodiments. FIG. 3 is a block diagram illustrating an example of relationships between multiple key/value lookup tables and multiple metadata partitions, according to some embodiments. FIG. 4 is a block diagram illustrating an example request, according to some embodiments. FIG. 5 is a flow diagram illustrating an example method for providing access to backup data by a computer system, according to some embodiments. FIG. 6 is a flow diagram illustrating an example method for accessing backup data by a requesting computer system, according to some embodiments. FIG. 7 is a block diagram illustrating an example computing device, according to some embodiments. DETAILED DESCRIPTION U.S. patent application Ser. No. 17/929,591 titled “Protection Groups for Backing up Cloud-Based Key-Value Stores,” and filed Sep. 2, 2022 is incorporated by reference herein in its entirety. The '591 application discusses various techniques for granular key/value-based cloud storage backups. It also discusses various structures for organizing and retrieving backup data, e.g., using protection groups. Generally, a backup system may store data very differently than an application that uses the data. In particular, a backup may be organized for retention, storage size, etc. and may use metadata may specify the organization (e.g., using pointers to indicate where a given object is stored in an incremental backup). In this context, it may be challenging to access restore data quickly. Customers may want granular point-in-time recovery, however, while desiring rapid access to certain restore data. For example, some customers may run tests or validations of their backups periodically. In the context of the '591 application, for example, a restore operation may involve querying a metadata engine to obtain a list of objects and using worker functions (e.g., Amazon S3 lambda functions) to return the data to customer data buckets. This full restore may take substantial time, however. Therefore, in disclosed embodiments, a system performs a metadata query and then inserts the metadata in a key/value store such as a DynamoDB table (e.g., with the key being an object identifier and the value being the corresponding metadata). The syste