Search

US-12625853-B2 - Method and system for performing high-performance writes of data to columnar storage

US12625853B2US 12625853 B2US12625853 B2US 12625853B2US-12625853-B2

Abstract

Techniques described herein relate to a method for storing data in columnar storage. The method includes obtaining a columnar storage write request associated with a file, wherein the file comprises rows and columns of file data; in response to obtaining the columnar storage write request: writing the file data to storage using column-based caches; generating file metadata based on the writing of the file data to the storage; and assigning a key to the file metadata; and storing the file metadata using a key-value service.

Inventors

  • Lu Lei
  • Flavio Paiva Junqueira
  • JIANG CAO
  • Xia Wang

Assignees

  • DELL PRODUCTS L.P.

Dates

Publication Date
20260512
Application Date
20230324

Claims (8)

  1. 1 . A method for storing data in columnar storage, comprising: obtaining a columnar storage write request associated with a file, wherein the file comprises rows and columns of file data; in response to obtaining the columnar storage write request: writing the file data to storage using column-based caches, wherein writing the file data to the storage comprises: loading row data associated with a first column of the file into a first column-based cache of the column-based caches; loading row data associated with a second column of the file into a second column-based cache of the column-based caches; making a determination that the first column-based cache is full, wherein the determination is made based upon a predetermined cache threshold, wherein the predetermined cache threshold is a percentage of the first column-based cache's total capacity, wherein each column-based cache of the column-based caches has a different capacity and a different cache threshold; in response to the determination: flushing the data in the first column-based cache to the storage as a first page; generating first page metadata associated with the first page, wherein the first page metadata comprises a storage location of the first page, wherein the storage location comprises a storage identifier associated with the storage, and one or more physical addresses associated with data of the first page; and emptying the first column-based cache; loading second row data associated with the first column of the file into the first column-based cache of the column-based caches; loading second row data associated with the second column of the file into the second column-based cache of the column-based caches; making a second determination that the second column-based cache and the first column-based cache are full; in response to the second determination: flushing the data in the second column-based cache to the storage as a second page; generating second page metadata associated with the second page; emptying the second column-based cache; flushing the data in the first column-based cache to the storage as a third page; generating third page metadata associated with the third page; and emptying the first column-based cache; generating file metadata based on the writing of the file data to the storage; and assigning a key to the file metadata; and storing the file metadata using a key-value service.
  2. 2 . The method of claim 1 , wherein writing the file data to storage using the column-based caches further comprises: after emptying the first column-based cache: loading third row data associated with the first column of the file into the first column-based cache of the column-based caches; loading third row data associated with the second column of the file into the second column-based cache of the column-based caches; making a third determination that the second column-based cache is full; in response to the third determination: flushing the data in the second column-based cache to the storage as a fourth page; generating fourth page metadata associated with the fourth page; and emptying the second column-based cache.
  3. 3 . The method of claim 1 , wherein the flushing of the data in the second column-based cache to the storage as a second page and the flushing of the data in the first column-based cache to the storage as a third page are performed concurrently.
  4. 4 . The method of claim 1 , wherein the file metadata comprises: the first page metadata; the second page metadata; and the third page metadata.
  5. 5 . A system for storing data in columnar storage, comprising: storage for storing data; and a format handler, comprising a processor and memory, and programmed to: obtain a columnar storage write request associated with a file, wherein the file comprises rows and columns of file data; in response to obtaining the columnar storage write request: write the file data to storage using column-based caches, wherein writing the file data to the storage comprises: loading row data associated with a first column of the file into a first column-based cache of the column-based caches; loading row data associated with a second column of the file into a second column-based cache of the column-based caches; making a determination that the first column-based cache is full, wherein the determination is made based upon a predetermined cache threshold, wherein the predetermined cache threshold is a percentage of the first column-based cache's total capacity, wherein each column-based cache of the column-based caches has a different capacity and a different cache threshold; in response to the determination: flushing the data in the first column-based cache to the storage as a first page; generating first page metadata associated with the first page, wherein the first page metadata comprises a storage location of the first page, wherein the storage location comprises a storage identifier associated with the storage, and one or more physical addresses associated with data of the first page; and emptying the first column-based cache; loading second row data associated with the first column of the file into the first column-based cache of the column-based caches; loading second row data associated with the second column of the file into the second column-based cache of the column-based caches; making a second determination that the second column-based cache and the first column-based cache are full; in response to the second determination: flushing the data in the second column-based cache to the storage as a second page; generating second page metadata associated with the second page; emptying the second column-based cache; flushing the data in the first column-based cache to the storage as a third page; generating third page metadata associated with the third page; and emptying the first column-based cache; generate file metadata based on the writing of the file data to the storage; and assign a key to the file metadata; and store the file metadata using a key-value service.
  6. 6 . The system of claim 5 , wherein writing the file data to storage using the column-based caches further comprises: after emptying the first column-based cache: loading third row data associated with the first column of the file into the first column-based cache of the column-based caches; loading third row data associated with the second column of the file into the second column-based cache of the column-based caches; making a third determination that the second column-based cache is full; in response to the third determination: flushing the data in the second column-based cache to the storage as a fourth page; generating fourth page metadata associated with the fourth page; and emptying the second column-based cache.
  7. 7 . The system of claim 5 , wherein the flushing of the data in the second column-based cache to the storage as a second page and the flushing of the data in the first column-based cache to the storage as a third page are performed concurrently.
  8. 8 . The system of claim 5 , wherein the file metadata comprises: the first page metadata; the second page metadata; and the third page metadata.

Description

BACKGROUND Computing devices may provide services for users. To provide the services, the computing devices may generate data. The data may be important to users. The data may be stored in storage for later use. The data may be written to the storage using a cache. SUMMARY In general, certain embodiments described herein relate to a method for storing data in columnar storage. The method may include obtaining a columnar storage write request associated with a file, wherein the file comprises rows and columns of file data; in response to obtaining the columnar storage write request: writing the file data to storage using column-based caches; generating file metadata based on the writing of the file data to the storage; and assigning a key to the file metadata; and storing the file metadata using a key-value service. In general, certain embodiments described herein relate to a system for storing data in columnar storage. The system includes a storage for storing data. The system also includes format handler that includes a processor and memory and is programmed to obtain a columnar storage write request associated with a file, wherein the file comprises rows and columns of file data; in response to obtaining the columnar storage write request: write the file data to storage using column-based caches; generate file metadata based on the writing of the file data to the storage; and assign a key to the file metadata; and store the file metadata using a key-value service. In general, certain embodiments described herein relate to a non-transitory computer readable medium that includes computer readable program code, which when executed by a computer processor enables the computer processor to perform a method for storing data in columnar storage. The method may include obtaining a columnar storage write request associated with a file, wherein the file comprises rows and columns of file data; in response to obtaining the columnar storage write request: writing the file data to storage using column-based caches; generating file metadata based on the writing of the file data to the storage; and assigning a key to the file metadata; and storing the file metadata using a key-value service. Other aspects of the embodiments disclosed herein will be apparent from the following description and the appended claims. BRIEF DESCRIPTION OF DRAWINGS Certain embodiments of the invention will be described with reference to the accompanying drawings. However, the accompanying drawings illustrate only certain aspects or implementations of the invention by way of example and are not meant to limit the scope of the claims. FIG. 1A shows a diagram of a system in accordance with one or more embodiments disclosed herein. FIG. 1B shows a diagram of a columnar storage in accordance with one or more embodiments disclosed herein. FIG. 1C shows a diagram of a columnar storage client in accordance with one or more embodiments disclosed herein. FIG. 2A shows a flowchart of a method for servicing a write request to a columnar storage in accordance with one or more embodiments disclosed herein. FIG. 2B shows a flowchart of a method for writing data to columnar storage in accordance with one or more embodiments disclosed herein. FIG. 3 shows a diagram of a computing device in accordance with one or more embodiments disclosed herein. DETAILED DESCRIPTION Specific embodiments will now be described with reference to the accompanying figures. In the following description, numerous details are set forth as examples of the embodiments disclosed herein. It will be understood by those skilled in the art that one or more embodiments disclosed herein may be practiced without these specific details and that numerous variations or modifications may be possible without departing from the scope of the embodiments disclosed herein. Certain details known to those of ordinary skill in the art are omitted to avoid obscuring the description. In the following description of the figures, any component described with regard to a figure, in various embodiments disclosed herein, may be equivalent to one or more like-named components described with regard to any other figure. For brevity, descriptions of these components will not be repeated with regard to each figure. Thus, each and every embodiment of the components of each figure is incorporated by reference and assumed to be optionally present within every other figure having one or more like-named components. Additionally, in accordance with various embodiments disclosed herein, any description of the components of a figure is to be interpreted as an optional embodiment, which may be implemented in addition to, in conjunction with, or in place of the embodiments described with regard to a corresponding like-named component in any other figure. Throughout this application, elements of figures may be labeled as A to N. As used herein, the aforementioned labeling means that the element may include any number of items a