US-12619586-B1 - Hardware implemented data layer for processing compressed columnar data

US12619586B1US 12619586 B1US12619586 B1US 12619586B1US-12619586-B1

Abstract

An integrated circuit and method are disclosed for processing of compressed columnar data. The processing of compressed columnar data includes loading compressed columnar data associated with a first and a second column into a first on-chip buffer, performing time-shared processing of the compressed columnar data from the first on-chip buffer into a second on-chip buffer, transcoding the compressed columnar data from the second on-chip buffer into unified columnar data having a unified format, loading the unified columnar data into a third on-chip buffer so that the unified columnar data are logically aligned in the third on-chip buffer, and providing at least a portion of the unified columnar data to a query operator. The integrated circuit includes a column loader, a balancer, a transcoder, and a decoder for performing processing of the compressed columnar data.

Inventors

Runbin Shi
Conor John Cunningham
BLAKE DOUGLAS PELTON

Assignees

MICROSOFT TECHNOLOGY LICENSING, LLC

Dates

Publication Date: 20260505
Application Date: 20241107

Claims (20)

1 . An integrated circuit for line-rate processing of compressed columnar data, the integrated circuit comprising: a column loader to load first compressed columnar data associated with a first column and second compressed columnar data associated with a second column into a first on-chip buffer; a balancer to perform time-shared processing of the first compressed columnar data and the second compressed columnar data from the first on-chip buffer and load the processed first compressed columnar data and second compressed columnar data into a second on-chip buffer; a transcoder to transcode the first compressed columnar data and the second compressed columnar data from the second on-chip buffer into first unified columnar data and second unified columnar data having a unified format and load the first unified columnar data and second unified columnar data into a third on-chip buffer, wherein the first unified columnar data and the second unified columnar data in the third on-chip buffer are logically aligned; and a decoder to perform value-wise decoding of at least a portion of the first unified columnar data or the second unified columnar data from the third on-chip buffer into first decoded columnar data and second decoded columnar data and provide the first decoded columnar data and the second decoded columnar data to a query operator, wherein at least one of the first compressed columnar data or the second compressed columnar data comprises at least one of: a uniform block tuple comprising a length indicating a number of rows in a uniform block of consecutive rows of the first column or the second column that share a common row value, and the common row value shared by consecutive rows of uniform block; or a nonuniform block tuple comprising length indicating a number of rows in a nonuniform block of consecutive rows of the first column or the second column that do not share a common value, and a pointer to a position in a value array comprising row values of nonuniform blocks of the first column or the second column.
2 . The integrated circuit of claim 1 , wherein, to load the first compressed columnar data and second compressed columnar data into the first on-chip buffer, the column loader is configured to: determine that available space in the first on-chip buffer associated with the first column satisfies an availability condition; and stream the first compressed columnar data into the available space in the first on-chip buffer associated with the first column.
3 . The integrated circuit of claim 1 , wherein, to perform time-shared processing of the first compressed columnar data and the second compressed columnar data from the first on-chip buffer, the balancer is configured to: perform time-shared processing based at least on a first compression ratio associated with the first compressed columnar data and a second compression ratio associated with the second compressed columnar data.
4 . The integrated circuit of claim 1 , wherein, to perform time-shared processing of the first compressed columnar data and the second compressed columnar data from the first on-chip buffer, the balancer is configured to: associate a first number of credits with the first column, the first number of credits determined based at least on a first compression ratio associated with the first compressed columnar data and a second compression ratio associated with the second compressed columnar data; associate a second number of credits with the second column, the second number of credits determined based at least on the first compression ratio and the second compression ratio; determine that the first number of credits is greater than the second number of credits; forward at least a portion of the first compressed columnar data from the first on-chip buffer to the second on-chip buffer; and decrement the first number of credits.
5 . The integrated circuit of claim 1 , wherein, to transcode the first compressed columnar data and the second compressed columnar data, the transcoder is configured to: transcode at least one of the first compressed columnar data or the second compressed columnar data from a first format comprising values that are of varying lengths into the unified format comprising values that are of a fixed length selected from a predetermined set of fixed lengths.
6 . The integrated circuit of claim 1 , wherein the transcoder comprises: a controller to select a column to transcode during a current transcoding cycle, and determine a number of values to transcode during the current transcoding cycle; an aligner to normalize values associated with the selected column that are of varying lengths into normalized values that are of a fixed length selected from a predetermined set of fixed lengths; and an accumulation buffer to accumulate the normalized values until a predetermined number of normalized values are accumulated, and loading the predetermined number of normalized values into the third on-chip buffer.
7 . The integrated circuit of claim 6 , wherein, to normalize values associated with the selected column, the aligner is configured to: pad the values that are of varying lengths until the values that are of varying lengths are of the fixed length selected from the predetermined set of fixed lengths.
8 . The integrated circuit of claim 1 , wherein, to perform value-wise decoding of at least a portion of the first unified columnar data and the second unified columnar data, the decoder is configured to: determine that a value of at least one of the first unified columnar data or the second unified columnar data is a pointer to a dictionary entry; determine a real value associated with the pointer using a dictionary lookup; and replace the pointer in the first unified columnar data or the second unified columnar data with the real value.
9 . The integrated circuit of claim 1 , wherein the decoder is configured to provide at least the portion of the first unified columnar data or the second unified columnar data to the query operator without converting the first compressed columnar data or the second compressed columnar data into raw uncompressed columnar data.
10 . The integrated circuit of claim 1 , wherein at least one of the first on-chip buffer, the second on-chip buffer, or the third on-chip buffer comprises: a first logical buffer associated with the first column; and a second logical buffer associated with the second column, wherein the first logical buffer and the second logical buffer share on-chip memory.
11 . A method for line-rate processing of compressed columnar data using an integrated circuit, the method comprising: loading first compressed columnar data associated with a first column and second compressed columnar data associated with a second column into a first on-chip buffer; performing time-shared processing of the first compressed columnar data and the second compressed columnar data from the first on-chip buffer into a second on-chip buffer; transcoding the first compressed columnar data and the second compressed columnar data from the second on-chip buffer into first unified columnar data and second unified columnar data having a unified format; loading the first unified columnar data and the second unified columnar data into a third on-chip buffer, wherein the first unified columnar data and the second unified columnar data in the third on-chip buffer are logically aligned; and providing at least a first portion of the first unified columnar data or the second unified columnar data to a query operator, wherein at least one of the first compressed columnar data or the second compressed columnar data comprises at least one of: a uniform block tuple comprising a length indicating a number of rows in a uniform block of consecutive rows of the first column or the second column that share a common row value, and the common row value shared by consecutive rows of uniform block; or a nonuniform block tuple comprising length indicating a number of rows in a nonuniform block of consecutive rows of the first column or the second column that do not share a common value, and a pointer to a position in a value array comprising row values of nonuniform blocks of the first column or the second column.
12 . The method of claim 11 , wherein said loading first compressed columnar data associated with a first column and second compressed columnar data associated with a second column into a first on-chip buffer comprises: determining that available space in the first on-chip buffer associated with the first column satisfies an availability condition; and streaming the first compressed columnar data into the available space in the first on-chip buffer associated with the first column.
13 . The method of claim 11 , wherein said performing time-shared processing of the first compressed columnar data and the second compressed columnar data from the first on-chip buffer and load the processed first compressed columnar data and second compressed columnar data into a second on-chip buffer comprises: performing time-shared processing based at least on a first compression ratio associated with the first compressed columnar data and a second compression ratio associated with the second compressed columnar data.
14 . The method of claim 11 , wherein said performing time-shared processing of the first compressed columnar data and the second compressed columnar data from the first on-chip buffer and load the processed first compressed columnar data and second compressed columnar data into a second on-chip buffer comprises: associating a first number of credits with the first column, the first number of credits determined based at least on a first compression ratio associated with the first compressed columnar data and a second compression ratio associated with the second compressed columnar data; associating a second number of credits with the second column, the second number of credits determined based at least on the first compression ratio and the second compression ratio; determining that the first number of credits is greater than the second number of credits; forwarding at least a portion of the first compressed columnar data from the first on-chip buffer to the second on-chip buffer; and decrementing the first number of credits.
15 . The method of claim 11 , wherein said transcoding the first compressed columnar data and the second compressed columnar data from the second on-chip buffer into first unified columnar data and second unified columnar data having a unified format comprises: transcoding at least one of the first compressed columnar data or the second compressed columnar data from a first format comprising values that are of varying lengths into the unified format comprising values that are of a fixed length selected from a predetermined set of fixed lengths.
16 . The method of claim 11 , wherein said transcoding the first compressed columnar data and the second compressed columnar data from the second on-chip buffer into first unified columnar data and second unified columnar data having a unified format comprises: selecting a column to transcode during a current transcoding cycle; determining a number of values to transcode during the current transcoding cycle; normalizing values associated with the selected column that are of varying lengths into normalized values that are of a fixed length selected from a predetermined set of fixed lengths; accumulating the normalized values in an accumulation buffer until a predetermined number of normalized values are accumulated; and loading the predetermined number of normalized values into the third on-chip buffer.
17 . The method of claim 16 , wherein said normalizing values associated with the selected column that are of varying lengths into normalized values that are of a fixed length selected from a predetermined set of fixed lengths comprises: padding the values that are of varying lengths until the values that are of varying lengths are of the fixed length selected from the predetermined set of fixed lengths.
18 . The method of claim 11 , further comprising: performing value-wise decoding of at least a second portion of the first unified columnar data or the second unified columnar data into first decoded columnar data or second decoded columnar data; and providing the first decoded columnar data or the second decoded columnar data to the query operator.
19 . The method of claim 11 , wherein said performing value-wise decoding of at least a second portion of the first unified columnar data or the second unified columnar data into first decoded columnar data or second decoded columnar data comprises: determining that a value of at least one of the first unified columnar data or the second unified columnar data is a pointer to a dictionary entry; determining a real value associated with the pointer using a dictionary lookup; and replacing the pointer in the first unified columnar data or the second unified columnar data with the real value.
20 . The method of claim 11 , wherein said provide at least the portion of the first unified columnar data or the second unified columnar data to the query operator comprises: providing at least the portion of the first unified columnar data or the second unified columnar data to the query operator without converting the first compressed columnar data or the second compressed columnar data into raw uncompressed columnar data.

Description

BACKGROUND A database is an organized collection of data (e.g., a data store) based on the use of a database management system (DBMS) that enables end users and applications to interact with the data of the database via queries. Columnar storage is a database storage format where data of a database is organized and stored by columns rather than rows. In columnar storage, values of a particular column are stored together in storage (e.g., storage of a computer cluster, which may be local or cloud-based). This approach allows for efficient access to specific columns of a database and enhances query performance for analytical workloads. Columnar storage also allows for better compression relative to row storage because data within a column typically has similar values, often making it more compressible. SUMMARY This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter. Hardware-based systems and methods are disclosed for processing of compressed columnar data. Compressed columnar data associated with a plurality of columns are loaded from memory directly to query operators using a hardware-implemented data layer pipeline. The columnar data is loaded from memory in its compressed form and provided to the query operators without converting the compressed columnar data into raw uncompressed columnar data. Further features and advantages of the embodiments, as well as the structure and operation of various embodiments, are described in detail below with reference to the accompanying drawings. It is noted that the claimed subject matter is not limited to the specific embodiments described herein. Such embodiments are presented herein for illustrative purposes only. Additional embodiments will be apparent to persons skilled in the relevant art(s) based on the teachings contained herein. BRIEF DESCRIPTION OF THE DRAWINGS/FIGURES The accompanying drawings, which are incorporated herein and form a part of the specification, illustrate embodiments of the present application and, together with the description, further serve to explain the principles of the embodiments and to enable a person skilled in the pertinent art to make and use the embodiments. FIG. 1 shows a block diagram of an example system for processing compressed columnar data, in accordance with an embodiment. FIG. 2 shows a block diagram of an example system for loading compressed columnar data into a first on-chip buffer, in accordance with an embodiment. FIG. 3 shows a block diagram of an example system for time-shared processing of compressed columnar data from a first on-chip buffer into a second on-chip buffer, in accordance with an embodiment. FIG. 4 shows a block diagram of an example system for transcoding compressed columnar data into unified compressed columnar data, in accordance with an embodiment. FIG. 5 shows a flowchart of an example process for processing compressed columnar data, in accordance with an embodiment. FIG. 6 depicts a flowchart of an example process for loading compressed columnar data into a first on-chip buffer, in accordance with an embodiment. FIG. 7 shows a flowchart of an example process for time-shared processing of compressed columnar data from a first on-chip buffer into a second on-chip buffer, in accordance with an embodiment. FIG. 8 shows a flowchart of an example process for transcoding compressed columnar data into unified compressed columnar data, in accordance with an embodiment. FIG. 9 shows a flowchart of an example process for value-wise decoding unified compressed columnar data, in accordance with an embodiment. FIG. 10 shows a block diagram of an example computer system in which embodiments may be implemented. The subject matter of the present application will now be described with reference to the accompanying drawings. In the drawings, like reference numbers indicate identical or functionally similar elements. Additionally, the left-most digit(s) of a reference number identifies the drawing in which the reference number first appears. DETAILED DESCRIPTION I. Introduction The following detailed description discloses numerous example embodiments. The scope of the present patent application is not limited to the disclosed embodiments, but also encompasses combinations of the disclosed embodiments, as well as modifications to the disclosed embodiments. It is noted that any section/subsection headings provided herein are not intended to be limiting. Embodiments are described throughout this document, and any type of embodiment may be included under any section/subsection. Furthermore, embodiments disclosed in any section/subsection may be combined with any other embodiments described in the same section/subsection and/or a different section/s