Search

EP-4740112-A2 - QUERY ENGINE FOR GRAPH DATABASES AND HETEROGENEOUS HARDWARE

EP4740112A2EP 4740112 A2EP4740112 A2EP 4740112A2EP-4740112-A2

Abstract

Database query processing techniques are disclosed. In various embodiments, a query associated with a database is received. A byte code representation of the query is generated, including by decomposing the query into a discrete set of streaming operators defined over associated data frames, wherein the byte code includes code defining for each operator in the discrete set of streaming operators the processing to be performed by that operator and further embodies a data flow graph that defines a flow of data to and through the discrete set of streaming operators. The byte code is executed by a query processing engine to generate and return a query result.

Inventors

  • CLARKSON, JAMES

Assignees

  • Neo4j Sweden AB

Dates

Publication Date
20260513
Application Date
20240702

Claims (20)

  1. 1. A system, comprising: a processor configured to: receive a query associated with a database; generate a byte code representation of the query, including by decomposing the query into a discrete set of streaming operators defined over associated data frames, wherein the byte code includes code defining for each operator in the discrete set of streaming operators the processing to be performed by that operator and further embodies a data flow graph that defines a flow of data to and through the discrete set of streaming operators; and a memory coupled to the processor and configured to store the byte code representation of the query.
  2. 2. The system of claim 1, wherein the processor is further configured to optimize the byte code representation of the query.
  3. 3. The system of claim 2, wherein the byte code is optimized based at least in part using metadata associated with the database.
  4. 4. The system of claim 2, wherein the byte code is optimized based at least in part using data read from the database.
  5. 5. The system of claim 1, further comprising a communications interface coupled to the processor and wherein the processor is further configured to send the byte code representation of the query to a database server with which the database is associated.
  6. 6. The system of claim 5, wherein the database server is configured to optimize or further optimize the byte code representation of the query.
  7. 7. The system of claim 5, wherein the database server comprises a virtual machine or other runtime configured to execute the byte code representation of the query.
  8. 8. The system of claim 7, wherein the virtual machine or other runtime is configured to compile the byte code representation of the query to generate executable machine code.
  9. 9. The system of claim 7, wherein execution of the byte code representation of the query causes the database server to create an instance of each of a plurality of operators comprising the discrete set of streaming operators and provide a scheduler configured to cause data frames to be provided to the respective input queues of each of the plurality of operators.
  10. 10. The system of claim 9, wherein the scheduler is configured to cause a data frame to be moved from the output queue of a first operator to the input queue of a second operator based at least in part on the data flow graph that defines the flow of data to and through the discrete set of streaming operators.
  11. 11. The system of claim 7, wherein the database server comprises a plurality of processing cores and the database server is configured to execute portions of the byte code representation of the query in parallel, across multiple of said cores, as permitted or required by the data flow graph that defines the flow of data to and through the discrete set of streaming operators.
  12. 12. The system of claim 7, wherein the database server comprises a distributed system comprising a plurality of physical machines and the database server is configured to execute portions of the byte code representation of the query across multiple of the physical machines.
  13. 13. The system of claim 12, wherein the database server implements a first subset of the discrete set of streaming operators at a first machine and a second subset of the discrete set of streaming operators at a second machine.
  14. 14. The system of claim 13, wherein the database server provides at the first machine a first operator proxy configured to send data frames via a communications channel to the second machine, and wherein the second machine is configured to place said data frames in an input queue of an instance of an associated operator running on the second machine.
  15. 15. The system of claim 7, wherein the database server is configured to distribute work among the plurality of machines according to a load balancing algorithm.
  16. 16. The system of claim 7, wherein the database server is configured to allocate work among the plurality of machines in response to back pressure.
  17. 17. The system of claim 1, wherein the processor is further configured to cache the byte code representation of the query.
  18. 18. The system of claim 17, wherein the query comprises a first query and wherein the processor is further configured to receive a second query, determine the second query is the same as the first query, and retrieve and use the previously cached byte code representation of the first query to process the second query.
  19. 19. A method, comprising: receiving a query associated with a database; generating a byte code representation of the query, including by decomposing the query into a discrete set of streaming operators defined over associated data frames, wherein the byte code includes code defining for each operator in the discrete set of streaming operators the processing to be performed by that operator and further embodies a data flow graph that defines a flow of data to and through the discrete set of streaming operators; and storing the byte code representation of the query.
  20. 20. A computer program product embodied in a non-transitory computer readable medium and comprising computer instructions for: receiving a query associated with a database; generating a byte code representation of the query, including by decomposing the query into a discrete set of streaming operators defined over associated data frames, wherein the byte code includes code defining for each operator in the discrete set of streaming operators the processing to be performed by that operator and further embodies a data flow graph that defines a flow of data to and through the discrete set of streaming operators; and storing the byte code representation of the query.

Description

QUERY ENGINE FOR GRAPH DATABASES AND HETEROGENEOUS HARDWARE CROSS REFERENCE TO OTHER APPLICATIONS [0001] This application claims priority to U.S. Provisional Patent Application No. 63/524,827 entitled QUERY ENGINE FOR GRAPH DATABASES AND HETEROGENEOUS HARDWARE filed July 03, 2023, which is incorporated herein by reference for all purposes. BACKGROUND OF THE INVENTION [0002] Typically, data stored in a database is accessed via a client or web-based application. A user enters a query expressed using a query language, such as Structure Query Language (SQL) or other languages used to access and/or manage relational databases, or Cypher™, Graph Query Language (GQL), or other languages used to access graph databases. [0003] Queries expressed using such languages typically are sent to a database management/access server, which parses the query as expressed using the query language and develops and executes a plan to perform the query and return results. BRIEF DESCRIPTION OF THE DRAWINGS [0004] Various embodiments of the invention are disclosed in the following detailed description and the accompanying drawings. [0005] Figure l is a block diagram illustrating an embodiment of a database management system comprising a compiled query-based query engine. [0006] Figure 2 is a flow diagram illustrating an embodiment of a process to perform a database query. [0007] Figure 3 is a block diagram illustrating an example of query processing in a prior art database management system. [0008] Figure 4 is a block diagram illustrating an example of query processing in an embodiment of a database management system comprising a compiled query-based query engine. [0009] Figure 5A is a diagram illustrating an example of query decomposition in an embodiment of a database management system. [0010] Figure 5B is a diagram illustrating an example of query execution in an embodiment of a database management system. [0011] Figure 6 is a flow diagram illustrating an embodiment of a process to perform dynamically adaptive query execution. [0012] Figure 7 is a functional flow block diagram illustrating an embodiment of a system configured to perform a query. [0013] Figure 8 is a diagram illustrating an example of distributed query execution in an embodiment of a database management system. [0014] Figure 9A is a flow diagram illustrating an embodiment of a process to retrieve or generate an intermediate representation of a query. [0015] Figure 9B is a flow diagram illustrating an embodiment of a process to generate an intermediate representation of a query. [0016] Figure 10 is a flow diagram illustrating an embodiment of a process to iteratively revert to, compile, and execute a bytecode representation of a query. DETAILED DESCRIPTION [0017] The invention can be implemented in numerous ways, including as a process; an apparatus; a system; a composition of matter; a computer program product embodied on a computer readable storage medium; and/or a processor, such as a processor configured to execute instructions stored on and/or provided by a memory coupled to the processor. In this specification, these implementations, or any other form that the invention may take, may be referred to as techniques. In general, the order of the steps of disclosed processes may be altered within the scope of the invention. Unless stated otherwise, a component such as a processor or a memory described as being configured to perform a task may be implemented as a general component that is temporarily configured to perform the task at a given time or a specific component that is manufactured to perform the task. As used herein, the term ‘processor’ refers to one or more devices, circuits, and/or processing cores configured to process data, such as computer program instructions. [0018] A detailed description of one or more embodiments of the invention is provided below along with accompanying figures that illustrate the principles of the invention. The invention is described in connection with such embodiments, but the invention is not limited to any embodiment. The scope of the invention is limited only by the claims and the invention encompasses numerous alternatives, modifications and equivalents. Numerous specific details are set forth in the following description in order to provide a thorough understanding of the invention. These details are provided for the purpose of example and the invention may be practiced according to the claims without some or all of these specific details. For the purpose of clarity, technical material that is known in the technical fields related to the invention has not been described in detail so that the invention is not unnecessarily obscured. [0019] Techniques are disclosed to apply advances in programming language capabilities to provide an improved database management system. In various embodiments, a novel query language runtime based on modern programming languages techniques is provided. In various embodiments, approaches d