US-12619628-B2 - Analytical query processing with decoupled compute instances
Abstract
The subject technology receives, by a first set of decoupled compute instances of a distributed database, a plurality of transactional queries against a transactional processing database stored on a first set of decoupled storage instances, at least one of the first set of decoupled compute instances bypassing a compiler of the distributed database to process the plurality of transactional queries using locally cached data. The subject technology receives at least one select query for processing on an aggregated analytical query database of the distributed database. The subject technology generates, by the compiler of the distributed database, a query plan for execution of the at least one select query using a second set of decoupled compute instances. The subject technology generates, by the second set of decoupled compute instances, an aggregated analytical query result dataset for the at least one select query according to the query plan.
Inventors
- Leonidas Galanis
- Alexander Miller
- William Waddington
- Khaled Yagoub
Assignees
- SNOWFLAKE INC.
Dates
- Publication Date
- 20260505
- Application Date
- 20230428
Claims (20)
- 1 . A method comprising: assigning, by a first cluster, a session of a client device to a first set of decoupled compute instances, the first cluster, prior to assigning the session, receiving a request from the client device to use the first set of decoupled compute instances, each of the decoupled compute instances comprising an execution node, the execution node being included in a second cluster that is separate from the first cluster; after assigning the session, receiving from the client device, by the first set of decoupled compute instances, a plurality of transactional queries against a transactional processing database stored on a first set of decoupled storage instances, at least one of the first set of decoupled compute instances bypassing a compiler to process at least one transactional query from the plurality of transactional queries using locally cached data in a cache provided by the second cluster; determining, by the second cluster, that a compiled execution plan of statements, corresponding to a particular transactional query from the plurality of transactional queries, does not exist in the cache provided by the second cluster; sending, by the second cluster, the particular transactional query to a third cluster for compiling and generating an execution plan of the particular transactional query, the third cluster being separate from the first cluster and the second cluster, wherein the second cluster processes OLTP traffic and performs OLTP operations on transactional data stores, and the third cluster includes SQL compiler functionality for generating query plans, the clusters being implemented with separate compute nodes; receiving at least one select query for processing on an aggregated analytical query database, the aggregated analytical query database included in the third cluster that is separate from the first cluster and the second cluster; generating, by the compiler included in the third cluster, a query plan for execution of the at least one select query using a second set of decoupled compute instances; and generating, by the second set of decoupled compute instances, an aggregated analytical query result dataset for the at least one select query according to the query plan.
- 2 . The method of claim 1 , wherein the first cluster comprises a server cluster, the second cluster comprises an OLTP cluster, the third cluster comprises a global services cluster, and further comprising: receiving a transactional query for processing on transactional data in the transactional processing database; determining that the transactional data is cached on the first set of decoupled compute instances; and generating transactional results data by applying the transactional query on the transactional data, the first set of decoupled compute instances bypassing the compiler in generating the transactional results data and returning results to a client device that generated the transactional query, the bypassing comprising setting an active session between the client device and the first set of decoupled compute instances.
- 3 . The method of claim 1 , further comprising: replicating data that has changed from the transactional processing database to an object store database of an aggregated analytical query database managed by the second set of decoupled compute instances.
- 4 . The method of claim 3 , wherein the aggregated analytical query database is stored on a second set of decoupled storage instances, the second set of decoupled compute instances being remote from the second set of decoupled storage instances.
- 5 . The method of claim 1 , wherein the first set of decoupled compute instances is remote from the first set of decoupled storage instances, each of the plurality of transactional queries comprising data manipulation language statements that change data in the transactional processing database.
- 6 . The method of claim 1 , wherein the transactional processing database is an Online Transactional Processing (OLTP) database that tracks a large volume of data transactions, and wherein the aggregated analytical query database is an Online Analytical Processing (OLAP) database comprising aggregated queries and a low volume of data transactions relative to the OLTP database.
- 7 . The method of claim 6 , further comprising: receiving a query and determining whether the query is an OLTP query or an OLAP query.
- 8 . The method of claim 7 , further comprising: in response to determining the query is a OLAP query, directing the query to the OLAP database and processing the query using the second set of decoupled compute instances.
- 9 . The method of claim 7 , further comprising: in response to determining the query is a OLTP query, directing the query to the OLTP database and processing the query using the first set of decoupled compute instances.
- 10 . The method of claim 1 , wherein the execution node comprises a processor and a cache, the first set of decoupled compute instances are remotely connected over a network to the first set of decoupled storage instances of the transaction processing database, and wherein the second set of decoupled compute instances are remotely connected over the network to the second set of decoupled storage instances of the aggregated analytical query database.
- 11 . A system comprising: at least one hardware processor; and memory storing instructions that, when executed by the at least one hardware processor, cause the at least one hardware processor to perform operations comprising: assigning, by a first cluster, a session of a client device to a first set of decoupled compute instances, the first cluster, prior to assigning the session, receiving a request from the client device to use the first set of decoupled compute instances, each of the decoupled compute instances comprising an execution node, the execution node being included in a second cluster that is separate from the first cluster; after assigning the session, receiving from the client device, by the first set of decoupled compute instances, a plurality of transactional queries against a transactional processing database stored on a first set of decoupled storage instances, at least one of the first set of decoupled compute instances bypassing a compiler to process at least one transactional query from the plurality of transactional queries using locally cached data in a cache provided by the second cluster; determining, by the second cluster, that a compiled execution plan of statements, corresponding to a particular transactional query from the plurality of transactional queries, does not exist in the cache provided by the second cluster; sending, by the second cluster, the particular transactional query to a third cluster for compiling and generating an execution plan of the particular transactional query, the third cluster being separate from the first cluster and the second cluster, wherein the second cluster processes OLTP traffic and performs OLTP operations on transactional data stores, and the third cluster includes SQL compiler functionality for generating query plans, the clusters being implemented with separate compute nodes; receiving at least one select query for processing on an aggregated analytical query database, the aggregated analytical query database included in the third cluster that is separate from the first cluster and the second cluster; generating, by the compiler included in the third cluster, a query plan for execution of the at least one select query using a second set of decoupled compute instances; and generating, by the second set of decoupled compute instances, an aggregated analytical query result dataset for the at least one select query according to the query plan.
- 12 . The system of claim 11 , wherein the operations further comprise: receiving a transactional query for processing on transactional data in the transactional processing database; determining that the transactional data is cached on the first set of decoupled compute instances; and generating transactional results data by applying the transactional query on the transactional data, the first set of decoupled compute instances bypassing the compiler in generating the transactional results data and returning results to a client device that generated the transactional query, the bypassing comprising setting an active session between the client device and the first set of decoupled compute instances.
- 13 . The system of claim 11 , wherein the operations further comprise: replicating data that has changed from the transactional processing database to an object store database of an aggregated analytical query database managed by the second set of decoupled compute instances.
- 14 . The system of claim 13 , wherein the aggregated analytical query database is stored on a second set of decoupled storage instances, the second set of decoupled compute instances being remote from the second set of decoupled storage instances.
- 15 . The system of claim 11 , wherein the first set of decoupled compute instances is remote from the first set of decoupled storage instances, each of the plurality of transactional queries comprising data manipulation language statements that change data in the transactional processing database.
- 16 . The system of claim 11 , wherein the transactional processing database is an Online Transactional Processing (OLTP) database that tracks a large volume of data transactions, and wherein the aggregated analytical query database is an Online Analytical Processing (OLAP) database comprising aggregated queries and a low volume of data transactions relative to the OLTP database.
- 17 . The system of claim 16 , wherein the operations further comprise: receiving a query and determining whether the query is an OLTP query or an OLAP query.
- 18 . The system of claim 17 , wherein the operations further comprise: in response to determining the query is a OLAP query, directing the query to the OLAP database and processing the query using the second set of decoupled compute instances.
- 19 . The system of claim 17 , wherein the operations further comprise: in response to determining the query is a OLTP query, directing the query to the OLTP database and processing the query using the first set of decoupled compute instances.
- 20 . A computer-storage medium embodying instructions that, when executed by a at least one hardware processor, cause the at least one hardware processor to perform operations comprising: assigning, by a first cluster, a session of a client device to a first set of decoupled compute instances, the first cluster, prior to assigning the session, receiving a request from the client device to use the first set of decoupled compute instances, each of the decoupled compute instances comprising an execution node, the execution node being included in a second cluster that is separate from the first cluster; after assigning the session, receiving from the client device, by the first set of decoupled compute instances, a plurality of transactional queries against a transactional processing database stored on a first set of decoupled storage instances, at least one of the first set of decoupled compute instances bypassing a compiler to process at least one transactional query from the plurality of transactional queries using locally cached data in a cache provided by the second cluster; determining, by the second cluster, that a compiled execution plan of statements, corresponding to a particular transactional query from the plurality of transactional queries, does not exist in the cache provided by the second cluster; sending, by the second cluster, the particular transactional query to a third cluster for compiling and generating an execution plan of the particular transactional query, the third cluster being separate from the first cluster and the second cluster, wherein the second cluster processes OLTP traffic and performs OLTP operations on transactional data stores, and the third cluster includes SQL compiler functionality for generating query plans, the clusters being implemented with separate compute nodes; receiving at least one select query for processing on an aggregated analytical query database, the aggregated analytical query database included in third cluster that is separate from the first cluster and the second cluster; generating, by the compiler included in the third cluster, a query plan for execution of the at least one select query using a second set of decoupled compute instances; and generating, by the second set of decoupled compute instances, an aggregated analytical query result dataset for the at least one select query according to the query plan.
Description
PRIORITY APPLICATIONS This application is a continuation of U.S. patent application Ser. No. 17/249,598, filed Mar. 5, 2021, which claims priority to U.S. Provisional Application Ser. No. 63/125,004, filed Dec. 14, 2020, the disclosure of each of which is incorporated herein in its entirety by reference for all purposes. TECHNICAL FIELD The present disclosure generally relates to special-purpose machines that manage database data and improvements to such variants, and to the technologies by which such special-purpose machines become improved compared to other special-purpose machines for performing aggregate queries and transactional queries on a network based network-based database platform. BACKGROUND Databases are used for data storage and access in computing applications. A goal of database storage is to provide enormous sums of information in an organized manner so that it can be accessed, managed, and updated. In a database, data may be organized into rows, columns, and tables. Different database storage systems may be used for storing different types of content, such as bibliographic, full text, numeric, and/or image content. Further, in computing, different database systems may be classified according to the organization approach of the database. Some databases can be configured for aggregated queries, in which the database undergoes a small number of transactions, but in which each transaction is complex and includes multiple computations are performed to complete the query. Other types of databases undergo a high number of small data transactions which must be accurately managed to ensure data integrity. It can be difficult to implement a single database platform that efficiently and accurately performs both aggregated queries and transactional queries in a simultaneously manner. BRIEF DESCRIPTION OF THE DRAWINGS Various ones of the appended drawings merely illustrate example embodiments of the present disclosure and should not be considered as limiting its scope. FIG. 1 is a block diagram illustrating an example computing environment in which a network-based data warehouse system can implement aggregate and transactional database systems, according to some example embodiments. FIG. 2 is a block diagram illustrating components of a compute service manager, according to some example embodiments. FIG. 3 is a block diagram illustrating components of an execution platform, according to some example embodiments. FIG. 4 shows a hybrid aggregate and transactional query architecture, according to some example embodiments. FIG. 5A-5C show example transactional data flows, according to some example embodiments. FIG. 6 shows a flow diagram for performing transactional and aggregate queries, according to some example embodiments. FIG. 7 shows a flow diagram of a method for performing transactional queries using the hybrid architecture, according to some example embodiments. FIG. 8 illustrates a diagrammatic representation of a machine in the form of a computer system within which a set of instructions may be executed for causing the machine to perform any one or more of the methodologies discussed herein, in accordance with some embodiments of the present disclosure. DETAILED DESCRIPTION The description that follows includes systems, methods, techniques, instruction sequences, and computing machine program products that embody illustrative embodiments of the disclosure. In the following description, for the purposes of explanation, numerous specific details are set forth in order to provide an understanding of various embodiments of the inventive subject matter. It will be evident, however, to those skilled in the art, that embodiments of the inventive subject matter may be practiced without these specific details. In general, well-known instruction instances, protocols, structures, and techniques are not necessarily shown in detail. FIG. 1 illustrates an example shared data processing platform 100 in which a network-based data warehouse system 102 can be implemented for processing of aggregate and transactional database systems, in accordance with some embodiments of the present disclosure. To avoid obscuring the inventive subject matter with unnecessary detail, various functional components that are not germane to conveying an understanding of the inventive subject matter have been omitted from the figures. However, a skilled artisan will readily recognize that various additional functional components may be included as part of the shared data processing platform 100 to facilitate additional functionality that is not specifically described herein. As shown, the shared data processing platform 100 comprises the network-based data warehouse system 102, a cloud computing storage platform 104 (e.g., an object storage platform, AWS® S3, Microsoft Azure®, or Google Cloud Services®), and a remote computing device 106. The network-based data warehouse system 102 is a network-based system used for storing and access