US-12625773-B2 - Client-informed preferred restore throughput via client-side deduplication library in a deduplication filesystem
Abstract
Multiple internal read-ahead streams are established for each of a set of current restore streams handling restorations of files managed by a deduplication filesystem. An API signature is received from a client. The signature includes an identification of a file to restore and a request to pin the file as priority. A restore stream for the file is tagged as priority requested. A determination of whether to establish multiple internal read-ahead streams for the restore stream includes when a threshold number of current restore streams using the multiple internal read-ahead streams has been reached, checking whether a current restore stream using the multiple internal read-ahead streams has the priority requested tag. When the current restore stream using the multiple internal read-ahead streams does not have the priority requested tag, the multiple internal read-ahead streams of the current restore streams are torn down.
Inventors
- Nitin Madan
- Kedar Godbole
- Aditi Tejas Gosavi
Assignees
- DELL PRODUCTS L.P.
Dates
- Publication Date
- 20260512
- Application Date
- 20240329
Claims (15)
- 1 . A method of allowing an administrator to indicate and update priority of file restorations comprising: establishing multiple internal read-ahead streams for each of a plurality of current restore streams handling restorations of files managed by a deduplication system; receiving an application programming interface (API) signature called by a client, the API signature comprising an identification of a first file to restore and a request, from the administrator, to pin the first file with a priority requested tag; tagging a first restore stream for the first file with the priority requested tag, the priority requested tag being stored as metadata and indicating a preference from the administrator for the first file to be restored using the multiple internal read-ahead streams; determining whether to establish the multiple internal read-ahead streams for the first restore stream, the determining comprising: when a threshold number of the plurality of current restore streams using the multiple internal read-ahead streams has been reached, checking whether a current second restore stream using the multiple internal read-ahead streams for restoration of a second file has the priority requested tag from the administrator; and when the current second restore stream using the multiple internal read-ahead streams does not have the priority requested tag, tearing down the multiple internal read-ahead streams of the current second restore stream for the restoration of the second file to allow the multiple internal read-ahead streams to be established for the first restore stream of the first file having the priority requested tag from the administrator; after the tearing down, establishing the multiple internal read-ahead streams for the first restore stream of the first file having the priority requested tag from the administrator; while the first file is being restored using the multiple internal read-ahead streams, receiving a second API signature called by the client, the second API signature comprising the identification of the first file and a request, from the administrator, to unpin the priority requested tag from the first file, thereby removing the priority requested tag from the first file; and in response to the request from the administrator to unpin the first file, tearing down the multiple internal read-ahead streams for the first restore stream of the first file.
- 2 . The method of claim 1 wherein the determining further comprises: when the current second restore stream using the multiple internal read-ahead streams does have the priority requested tag, maintaining the multiple internal read-ahead streams for the current second restore stream.
- 3 . The method of claim 2 further comprising: after the maintaining, checking whether a next current restore stream using the multiple internal read-ahead streams has the priority requested tag.
- 4 . The method of claim 1 wherein the determining further comprises: when the threshold number of the plurality of current restore streams using the multiple internal read-ahead streams has not been reached, establishing the multiple internal read-ahead streams for the restore stream having the priority requested tag.
- 5 . The method of claim 1 further comprising: after the tearing down the multiple internal read-ahead streams of the current second restore stream, using a single read stream to continue restoration of a file associated with the current second restore stream.
- 6 . A system of allowing an administrator to indicate and update priority of file restorations, the system comprising: a processor; and memory configured to store one or more sequences of instructions which, when executed by the processor, cause the processor to carry out the steps of: establishing multiple internal read-ahead streams for each of a plurality of current restore streams handling restorations of files managed by a deduplication system; receiving an application programming interface (API) signature called by a client, the API signature comprising an identification of a first file to restore and a request, from the administrator, to pin the first file with a priority requested tag; tagging a first restore stream for the first file with the priority requested tag, the priority requested tag being stored as metadata and indicating a preference from the administrator for the first file to be restored using the multiple internal read-ahead streams; determining whether to establish the multiple internal read-ahead streams for the restore stream, the determining comprising: when a threshold number of the plurality of current restore streams using the multiple internal read-ahead streams has been reached, checking whether a current second restore stream using the multiple internal read-ahead streams for restoration of a second file has the priority requested tag from the administrator; and when the current second restore stream using the multiple internal read-ahead streams does not have the priority requested tag, tearing down the multiple internal read-ahead streams of the current second restore stream for the restoration of the second file to allow the multiple internal read-ahead streams to be established for the first restore stream of the first file having the priority requested tag from the administrator; after the tearing down, establishing the multiple internal read-ahead streams for the first restore stream of the first file having the priority requested tag from the administrator; while the first file is being restored using the multiple internal read-ahead streams, receiving a second API signature called by the client, the second API signature comprising the identification of the first file and a request, from the administrator, to unpin the priority requested tag from the first file, thereby removing the priority requested tag from the first file; and in response to the request from the administrator to unpin the first file, tearing down the multiple internal read-ahead streams for the first restore stream of the first file.
- 7 . The system of claim 6 wherein the determining further comprises: when the current second restore stream using the multiple internal read-ahead streams does have the priority requested tag, maintaining the multiple internal read-ahead streams for the current second restore stream.
- 8 . The system of claim 7 wherein the processor further carries out the steps of: after the maintaining, checking whether a next current restore stream using the multiple internal read-ahead streams has the priority requested tag.
- 9 . The system of claim 6 wherein the determining further comprises: when the threshold number of the plurality of current restore streams using the multiple internal read-ahead streams has not been reached, establishing the multiple internal read-ahead streams for the restore stream having the priority requested tag.
- 10 . The system of claim 6 further wherein the processor further carries out the steps of: after the tearing down the multiple internal read-ahead streams of the current second restore stream, using a single read stream to continue restoration of a file associated with the current second restore stream.
- 11 . A computer program product, comprising a non-transitory computer-readable medium having a computer-readable program code embodied therein, the computer-readable program code adapted to be executed by one or more processors to implement a method of allowing an administrator to indicate and update priority of file restorations, the method comprising: establishing multiple internal read-ahead streams for each of a plurality of current restore streams handling restorations of files managed by a deduplication system; receiving an application programming interface (API) signature called by a client, the API signature comprising an identification of a first file to restore and a request, from the administrator, to pin the first file with a priority requested tag; tagging a first restore stream for the first file with the priority requested tag, the priority requested tag being stored as metadata and indicating a preference from the administrator for the first file to be restored using the multiple internal read-ahead streams; determining whether to establish the multiple internal read-ahead streams for the restore stream, the determining comprising: when a threshold number of the plurality of current restore streams using the multiple internal read-ahead streams has been reached, checking whether a current second restore stream using the multiple internal read-ahead streams for restoration of a second file has the priority requested tag; and when the current second restore stream using the multiple internal read-ahead streams does not have the priority requested tag, tearing down the multiple internal read-ahead streams of the current second restore stream for the restoration of the second file to allow the multiple internal read-ahead streams to be established for the first restore stream of the first file having the priority requested tag from the administrator; after the tearing down, establishing the multiple internal read-ahead streams for the first restore stream of the first file having the priority requested tag from the administrator; while the first file is being restored using the multiple internal read-ahead streams, receiving a second API signature called by the client, the second API signature comprising the identification of the first file and a request, from the administrator, to unpin the priority requested tag from the first file, thereby removing the priority requested tag from the first file; and in response to the request from the administrator to unpin the first file, tearing down the multiple internal read-ahead streams for the first restore stream of the first file.
- 12 . The computer program product of claim 11 wherein the determining further comprises: when the current second restore stream using the multiple internal read-ahead streams does have the priority requested tag, maintaining the multiple internal read-ahead streams for the current second restore stream.
- 13 . The computer program product of claim 12 wherein the method further comprises: after the maintaining, checking whether a next current restore stream using the multiple internal read-ahead streams has the priority requested tag.
- 14 . The computer program product of claim 11 wherein the determining further comprises: when the threshold number of the plurality of current restore streams using the multiple internal read-ahead streams has not been reached, establishing the multiple internal read-ahead streams for the restore stream having the priority requested tag.
- 15 . The computer program product of claim 11 wherein the method further comprises: after the tearing down the multiple internal read-ahead streams of the current second restore stream, using a single read stream to continue restoration of a file associated with the current second restore stream.
Description
CROSS-REFERENCE TO RELATED APPLICATIONS The present application is related to U.S. patent application Ser. No. 18/622,620, filed Mar. 29, 2024, which is assigned to the assignee of the present application, and is incorporated herein by reference in its entirety. TECHNICAL FIELD The present invention relates generally to information processing systems, and more particularly to large scale filesystems. BACKGROUND Backups of client files to a backup system may be conducted at periodic intervals. The backup copies can serve a number of different purposes such as data protection, testing and development, reporting, data mining, and so forth. A backup copy may be restored from the backup system to a client. It may be the case, however, that different files undergoing restoration have different priorities. There is a need for improved systems and techniques to identify certain restorations as high priority so that they can be completed in the shortest amount of time given the available resources. The subject matter discussed in the background section should not be assumed to be prior art merely as a result of its mention in the background section. Similarly, a problem mentioned in the background section or associated with the subject matter of the background section should not be assumed to have been previously recognized in the prior art. The subject matter in the background section merely represents different approaches, which in and of themselves may also be inventions. BRIEF SUMMARY A deduplication filesystem offers multiple mechanisms for restoring a file. In an embodiment, a mechanism referred to as multi-stream restore (MSR) includes the establishment of multiple internal threads that prefetch data of a file to restore into a read-ahead cache. A client can indicate whether a restoration should be designated as MSR. When the client makes the indication of MSR, the filesystem attempts to fulfill the request. BRIEF DESCRIPTION OF THE FIGURES In the following drawings like reference numerals designate like structural elements. Although the figures depict various examples, the one or more embodiments and implementations described herein are not limited to the examples depicted in the figures. FIG. 1 shows a block diagram of an information processing system for a client-informed restore, according to one or more embodiments. FIG. 2 shows an example of a deduplication process, according to one or more embodiments. FIG. 3 shows an example of a tree data structure of the namespace, according to one or more embodiments. FIG. 4 shows an overall flow for client-informed restore, according to one or more embodiments. FIG. 5 shows an example of prefetching, according to one or more embodiments. FIG. 6 shows a graph of a read highway, according to one or more embodiments. FIG. 7 shows a block diagram of a read-ahead cache, according to one or more embodiments. FIG. 8 shows another block diagram of a read-ahead cache, according to one or more embodiments. FIG. 9 shows a flow for MSR, according to one or more embodiments. FIG. 10 shows another flow for MSR, according to one or more embodiments. FIG. 11 shows a process for tagging a stream as a priority MSR, according to one or more embodiments. FIG. 12 shows a flow of a system load check routine, according to one or more embodiments. FIG. 13 shows another flow of a system load check routine, according to one or more embodiments. FIG. 14 shows a block diagram of a REST architecture, according to one or more embodiments. FIG. 15 shows a block diagram of a processing platform that may be utilized to implement at least a portion of an information processing system, according to one or more embodiments. FIG. 16 shows a block diagram of a computer system suitable for use with the system, according to one or more embodiments. DETAILED DESCRIPTION A detailed description of one or more embodiments is provided below along with accompanying figures that illustrate the principles of the described embodiments. While aspects of the invention are described in conjunction with such embodiment(s), it should be understood that it is not limited to any one embodiment. On the contrary, the scope is limited only by the claims and the invention encompasses numerous alternatives, modifications, and equivalents. For the purpose of example, numerous specific details are set forth in the following description in order to provide a thorough understanding of the described embodiments, which may be practiced according to the claims without some or all of these specific details. For the purpose of clarity, technical material that is known in the technical fields related to the embodiments has not been described in detail so that the described embodiments are not unnecessarily obscured. It should be appreciated that the described embodiments can be implemented in numerous ways, including as a process, an apparatus, a system, a device, a method, or a computer-readable medium such as a computer-readable st