Search

US-12625998-B2 - Managing storage space use in distributed systems

US12625998B2US 12625998 B2US12625998 B2US 12625998B2US-12625998-B2

Abstract

Methods and systems for managing operation of a deployment are disclosed. The operation may be managed by selecting, by an inference model, a storage location to write data. To write the data, data access patterns of the data may be ingested by the inference model and a storage location may be selected. The storage location may include a local storage of the data processing system and/or a cloud storage of a management system of the deployment. To read the data, the inference model may perform a first search for the data in the local storage. However, if metadata is found in the local storage, then the inference model may perform a second search in the cloud storage. If the data is found, then the inference model may read the data.

Inventors

  • Maxim Balin
  • NECHAMI MAMAN
  • Eric Joseph Bruno

Assignees

  • DELL PRODUCTS L.P.

Dates

Publication Date
20260512
Application Date
20241029

Claims (20)

  1. 1 . A method for managing operation of a deployment, the method comprising: obtaining, by a data manager of a data processing system of the deployment, a data access request; in a first instance of the obtaining where the data access request is a write request: selecting, by the data manager, a storage location using an inference model, the inference model being based on data usage patterns, data characteristics, and data priority; storing, by the data manager, a portion of data at the storage location to service the data access request; in a second instance of the obtaining where the data access request is a read request: reading, by the data manager, a second portion of the data from a second storage location where requested data is believed to be stored; in a first instance of the reading where the second portion of the data is successfully read from the second storage location: providing the second portion of the data to a requestor to service the data access request; in a second instance of the reading where the second portion of the data where metadata is read from the second storage location: identifying, using the metadata, a third storage location; reading a third portion of the data from the third storage location; and providing the third portion of the data to the requestor to service the data access request.
  2. 2 . The method of claim 1 , wherein the data usage patterns comprise a frequency of accessing the data, at least one time at which the accessing the data occurs, and latency associated with the accessing of the data.
  3. 3 . The method of claim 1 , wherein the data characteristics comprise a type of the data, a volume of the data, and at least one security protocol by which to secure the data.
  4. 4 . The method of claim 1 , wherein the data priority categorizes a level of importance of the data associated with an operation by the data processing system.
  5. 5 . The method of claim 1 , wherein the metadata of the data comprises at least a storage location of the data.
  6. 6 . The method of claim 5 , wherein the metadata is stored in a physical block address space of a data storage device that is reserved for the data, the data is not stored in the physical block address space, and performing a read of the physical block address space returns the metadata and not the data.
  7. 7 . The method of claim 1 , wherein selecting, by the data manager, the storage location using the inference model comprises: predicting, using the inference model, a quantity of computing resources likely to be used at a future time for accessing, by the data processing system, the portion of the data at the future time; comparing the quantity to a threshold quantity; in a first instance of the comparing where the quantity exceeds the threshold quantity: selecting local storage for storage of the portion of the data; and in a second instance of the comparing where the quantity does not exceed the threshold quantity: selecting remote storage for storage of the portion of the data.
  8. 8 . A non-transitory machine-readable medium having instructions stored therein, which when executed by a processor, cause the processor to perform operations for managing operation of a deployment, the operations comprising: obtaining, by a data manager of a data processing system of the deployment, a data access request; in a first instance of the obtaining where the data access request is a write request: selecting, by the data manager, a storage location using an inference model, the inference model being based on data usage patterns, data characteristics, and data priority; storing, by the data manager, a portion of data at the storage location to service the data access request; in a second instance of the obtaining where the data access request is a read request: reading, by the data manager, a second portion of the data from a second storage location where requested data is believed to be stored; in a first instance of the reading where the second portion of the data is successfully read from the second storage location: providing the second portion of the data to a requestor to service the data access request; in a second instance of the reading where the second portion of the data where metadata is read from the second storage location: identifying, using the metadata, a third storage location; reading a third portion of the data from the third storage location; and providing the third portion of the data to the requestor to service the data access request.
  9. 9 . The non-transitory machine-readable medium of claim 8 , wherein the data usage patterns comprise a frequency of accessing the data, at least one time at which the accessing the data occurs, and latency associated with the accessing of the data.
  10. 10 . The non-transitory machine-readable medium of claim 8 , wherein the data characteristics comprise a type of the data, a volume of the data, and at least one security protocol by which to secure the data.
  11. 11 . The non-transitory machine-readable medium of claim 8 , wherein the data priority categorizes a level of importance of the data associated with an operation by the data processing system.
  12. 12 . The non-transitory machine-readable medium of claim 8 , wherein the metadata of the data comprises at least a storage location of the data.
  13. 13 . The non-transitory machine-readable medium of claim 12 , wherein the metadata is stored in a physical block address space of a data storage device that is reserved for the data, the data is not stored in the physical block address space, and performing a read of the physical block address space returns the metadata and not the data.
  14. 14 . The non-transitory machine-readable medium of claim 8 , wherein selecting, by the data manager, the storage location using the inference model comprises: predicting, using the inference model, a quantity of computing resources likely to be used at a future time for accessing, by the data processing system, the portion of the data at the future time; comparing the quantity to a threshold quantity; in a first instance of the comparing where the quantity exceeds the threshold quantity: selecting local storage for storage of the portion of the data; and in a second instance of the comparing where the quantity does not exceed the threshold quantity: selecting remote storage for storage of the portion of the data.
  15. 15 . A data processing system, comprising: a processor; and a memory coupled to the processor to store instructions, which when executed by the processor, cause the processor to perform operations managing operation of a deployment, the operations comprising: obtaining, by a data manager of the data processing system of the deployment, a data access request; in a first instance of the obtaining where the data access request is a write request: selecting, by the data manager, a storage location using an inference model, the inference model being based on data usage patterns, data characteristics, and data priority; storing, by the data manager, a portion of data at the storage location to service the data access request; in a second instance of the obtaining where the data access request is a read request: reading, by the data manager, a second portion of the data from a second storage location where requested data is believed to be stored; in a first instance of the reading where the second portion of the data is successfully read from the second storage location: providing the second portion of the data to a requestor to service the data access request; in a second instance of the reading where the second portion of the data where metadata is read from the second storage location: identifying, using the metadata, a third storage location; reading a third portion of the data from the third storage location; and providing the third portion of the data to the requestor to service the data access request.
  16. 16 . The data processing system of claim 15 , wherein the data usage patterns comprise a frequency of accessing the data, at least one time at which the accessing the data occurs, and latency associated with the accessing of the data.
  17. 17 . The data processing system of claim 15 , wherein the data characteristics comprise a type of the data, a volume of the data, and at least one security protocol by which to secure the data.
  18. 18 . The data processing system of claim 15 , wherein the data priority categorizes a level of importance of the data associated with an operation by the data processing system.
  19. 19 . The data processing system of claim 15 , wherein the metadata of the data comprises at least a storage location of the data.
  20. 20 . The data processing system of claim 19 , wherein the metadata is stored in a physical block address space of a data storage device that is reserved for the data, the data is not stored in the physical block address space, and performing a read of the physical block address space returns the metadata and not the data.

Description

FIELD Embodiments disclosed herein relate generally to managing operation of a distributed system. More particularly, embodiments disclosed herein relate to managing data in distributed systems. BACKGROUND Computing devices may provide computer-implemented services. The computer-implemented services may be used by users of the computing devices and/or devices operably connected to the computing devices. The computer-implemented services may be performed with hardware components such as processors, memory modules, storage devices, and communication devices. The operation of these components and the components of other devices may impact the performance of the computer-implemented services. BRIEF DESCRIPTION OF THE DRAWINGS Embodiments disclosed herein are illustrated by way of example and not limitation in the figures of the accompanying drawings in which like references indicate similar elements. FIG. 1 shows a diagram illustrating a system in accordance with an embodiment. FIGS. 2A-2C show interaction diagrams illustrating operation of a system in accordance with an embodiment. FIGS. 3A-3C show flow diagrams illustrating a method in accordance with an embodiment. FIG. 4 shows a block diagram illustrating a data processing system in accordance with an embodiment. DETAILED DESCRIPTION Various embodiments will be described with reference to details discussed below, and the accompanying drawings will illustrate the various embodiments. The following description and drawings are illustrative and are not to be construed as limiting. Numerous specific details are described to provide a thorough understanding of various embodiments. However, in certain instances, well-known or conventional details are not described in order to provide a concise discussion of embodiments disclosed herein. Reference in the specification to “one embodiment” or “an embodiment” means that a particular feature, structure, or characteristic described in conjunction with the embodiment can be included in at least one embodiment. The appearances of the phrases “in one embodiment” and “an embodiment” in various places in the specification do not necessarily all refer to the same embodiment. References to an “operable connection” or “operably connected” means that a particular device is able to communicate with one or more other devices. The devices themselves may be directly connected to one another or may be indirectly connected to one another through any number of intermediary devices, such as in a network topology. In general, embodiments disclosed herein relate to managing operation of a deployment. The operation may be managed by selecting, based on a data access request, a storage location of data using an inference model. The inference model may be trained perform a read and/or write request using at data access patterns of the data. The data access patterns may include data usage patterns, data characteristics, data priorities, etc. The data usage patterns may include a frequency of accessing the data using at least one application, latency associated with the accessing of the data, etc. The data characteristics may include at least a type of the data, a volume of the data, and/or at least one security protocol by which to secure the data. The data priorities may categorize at least one level of importance of the data associated with an operation by a data processing system. During a write request, the inference model may obtain, by a data manager, the data from an application. The inference model may ingest the data access patterns from metadata of the data. The inference model may ingest the data access patterns and make a selection, based on the data access patterns, of a storage location. The storage location of storage locations may include local storage of the data processing system and/or cloud storage of a management system of the deployment. Once the storage location has been selected by the inference model, the write of the data may be performed in the storage location by the inference model. During a read request, the data manager may obtain the data access request. The data manager may ingest the data access request. The inference model may to obtain a first location to search for the data from the data manager. The first search location may be local storage of the data processing system. The inference model may also ingest the first search location from a storage allocation table of the data processing system. The inference model may perform a first search of the local storage for the data. If the data is not found, then the first search, by the inference model may end. However, if the data is found, then the inference model may transfer, by the data manager, the data to the application. Further, if, during the first search, metadata of the data is instead found, then the inference model may ingest the metadata. From the metadata, the inference model may extract a pointer to a second location of the data. The second loc