Search

US-12619605-B2 - Zero copy optimization for select * queries

US12619605B2US 12619605 B2US12619605 B2US 12619605B2US-12619605-B2

Abstract

A computer-implemented method includes receiving a query specifying an operation to perform on a first table of a plurality of data blocks stored. Each data block in the first table includes a respective reference count indicating a number of tables referencing the data block. The method also includes determining that the operation specified by the query includes copying the plurality of data blocks in the first table into a second table and, in response, for each data block of the plurality of data blocks in the first table copied into the second table, incrementing, the respective reference count associated with the data block in the first table, appending, by the data processing hardware, into metadata of the second table, a reference of the corresponding data block copied into the second table.

Inventors

  • Pavan Edara
  • Jordan Tigani

Assignees

  • GOOGLE LLC

Dates

Publication Date
20260505
Application Date
20230317

Claims (14)

  1. 1 . A method comprising: receiving, by data processing hardware, a query specifying an operation to perform on a database stored on memory hardware in communication with the data processing hardware, the query comprising a copy operation for copying a plurality of data blocks of a first table stored in the database into a second table in the database, each data block of the first table comprising a respective reference count indicating a number of tables referencing the data block; initiating, by the data processing hardware and in a first process, the copy operation for copying the plurality of data blocks from the first table into the second table; determining, by the data processing hardware and in a second process that runs in parallel with the first process, whether the copy operation for copying the plurality of data blocks comprises copying the plurality of data blocks in the first table into the second table; responsive to determining that the copy operation for copying the plurality of data blocks comprises copying the plurality of data blocks in the first table into the second table, halting, by the data processing hardware, execution of the first process; executing, by the data processing hardware, the query without duplicating any of the plurality of data blocks by, for each data block of the plurality of data blocks: appending, into metadata of the second table, a reference to a respective data block in the first table; and incrementing the respective reference count associated with the respective data block in the first table; receiving, by the data processing hardware, a request to alter a particular data block of the plurality of data blocks stored on the memory hardware for the first table; generating, by the data processing hardware, a new particular data block in the memory hardware by duplicating the particular data block in the memory hardware; altering, by the data processing hardware, the new particular data block according to the request; and updating, by the data processing hardware, the respective reference for the particular data block in the first table to refer to the new particular data block.
  2. 2 . The method of claim 1 , wherein determining, by the data processing hardware, whether the copy operation for copying the plurality of data blocks comprises copying the plurality of data blocks in the first table into the second table comprises determining that the copy operation comprises a SELECT*Structured Query Language (SQL) statement.
  3. 3 . The method of claim 1 , further comprising, after appending the reference to the respective data block for each data block of the plurality of data blocks: receiving, by the data processing hardware, a request to delete the second table; and decrementing, by the data processing hardware, the respective reference count of each data block of the first table.
  4. 4 . The method of claim 1 , further comprising, after appending the reference to the respective data block for each data block of the plurality of data blocks: receiving, by the data processing hardware, a request to delete the first table; and decrementing, by the data processing hardware, the respective reference count of each data block of the first table.
  5. 5 . The method of claim 1 , wherein the first table comprises a respective reference for each data block of the plurality of data blocks stored on the memory hardware.
  6. 6 . The method of claim 5 , further comprising: receiving, by the data processing hardware, a request to alter a particular data block of the plurality of data blocks stored on the memory hardware; and altering, by the data processing hardware, the particular data block according to the request without altering the first table or the second table.
  7. 7 . The method of claim 1 , wherein determining, by the data processing hardware, whether the copy operation for copying the plurality of data blocks comprises copying the plurality of data blocks in the first table into the second table comprises determining that the query comprises a filter operation.
  8. 8 . A system comprising: data processing hardware; and memory hardware in communication with the data processing hardware, the memory hardware storing instructions that, when executed on the data processing hardware, cause the data processing hardware to: receive a query specifying an operation to perform on a database stored on memory hardware in communication with the data processing hardware, the query comprising a copy operation for copying a plurality of data blocks of a first table stored in the database into a second table in the database, each data block of the first table comprising a respective reference count indicating a number of tables referencing the data block; initiate, in a first process, the copy operation for copying the plurality of data blocks from the first table into the second table; determine, in a second process that runs in parallel with the first process, whether the copy operation for copying the plurality of data blocks comprises copying the plurality of data blocks in the first table into the second table; responsive to determining that the copy operation for copying the plurality of data blocks comprises copying the plurality of data blocks in the first table into the second table, halting, by the data processing hardware, execution of the first process; execute the query without duplicating any of the plurality of data blocks by, for each data block of the plurality of data blocks: appending, into metadata of the second table, a reference to a respective data block in the first table; and incrementing the respective reference count associated with the respective data block in the first table; receive a request to alter a particular data block of the plurality of data blocks stored on the memory hardware for the first table; generate a new particular data block in the memory hardware by duplicating the particular data block in the memory hardware; alter the new particular data block according to the request; and update the respective reference for the particular data block in the first table to refer to the new particular data block.
  9. 9 . The system of claim 8 , wherein the instructions that cause the data processing hardware to determine that the copy operation for copying the plurality of data blocks comprises copying the plurality of data blocks in the first table into the second table include instructions that cause the data processing hardware to determine that the copy operation comprises a SELECT*SQL statement.
  10. 10 . The system of claim 8 , wherein the instructions further cause the data processing hardware to, after appending the reference to the respective data block for each data block of the plurality of data blocks: receive a request to delete the second table; and decrement the respective reference count of each data block of the first table.
  11. 11 . The system of claim 8 , wherein the instructions further cause the data processing hardware to, after appending the reference to the respective data block for each data block of the plurality of data blocks: receive a request to delete the first table; and decrement the respective reference count of each data block of the first table.
  12. 12 . The system of claim 8 , wherein the first table comprises a respective reference for each data block of the plurality of data blocks stored on the memory hardware.
  13. 13 . The system of claim 12 , wherein the instructions further cause the data processing hardware to: receive a request to alter a particular data block of the plurality of data blocks stored on the memory hardware; and alter the particular data block according to the request without altering the first table or the second table.
  14. 14 . The system of claim 8 , wherein the instructions that cause the data processing hardware to determine that the copy operation for copying the plurality of data blocks comprises copying the plurality of data blocks in the first table into the second table include instructions that cause the data processing hardware to determine that the query comprises a filter operation.

Description

CROSS REFERENCE TO RELATED APPLICATIONS This U.S. patent application is a continuation of, and claims priority under 35 U.S.C. § 120 from, U.S. patent application Ser. No. 17/315,281, filed on May 8, 2021, which claims priority under 35 U.S.C. § 119(e) to U.S. Provisional Application 63/023,409, filed on May 12, 2020. The disclosures of these prior applications are considered part of the disclosure of this application and are hereby incorporated by reference in their entireties. TECHNICAL FIELD This disclosure relates to techniques for zero copy optimization for database operations. BACKGROUND As distributed storage (i.e., cloud storage) becomes increasingly popular and storage costs decrease, sizes of cloud databases have dramatically increased. For example, tables are often terabytes if not larger in size. Operations on these tables (e.g., duplicating a table) is often quite expensive, both in system resources and time. Structured Query Language (SQL) is the standard language for relational database management systems. Users often craft SQL queries to interact with cloud databases. SUMMARY An aspect of the disclosure provides a method. The method includes receiving, at data processing hardware, a query specifying an operation to perform on a first table of a plurality of data blocks stored on memory hardware in communication with the data processing hardware, each data block in the first table including a respective reference count indicating a number of tables referencing the data block. The method also includes determining, by the data processing hardware, that the operation specified by the query includes copying the plurality of data blocks in the first table into a second table. The method further includes, in response to determining that the operation specified by the query includes copying the plurality of data blocks in the first table into the second table, for each data block of the plurality of data blocks in the first table copied into the second table, incrementing, by the data processing hardware, the respective reference count associated with the data block in the first table, appending, by the data processing hardware, into metadata of the second table, a reference of the corresponding data block copied into the second table. Implementations of the disclosure may include one or more of the following optional features. In some implementations, copying the plurality of data blocks in the first table into the second table includes copying each data block in the first table into the second table without duplicating any of the plurality of data blocks. In some examples, the query includes a SELECT*Structured Query Language (SQL) statement. In some configurations, determining that the operation specified by the query includes copying the plurality of data blocks in the first table into the second table includes, during algebrization of the query, determining sub-operations of the query, and determining that the determined sub-operations of the query include a sub-operation for writing at least one data block to the memory hardware and a sub-operation for reading at least one data block from the memory hardware. Optionally, the sub-operation for writing at least one data block to the memory hardware includes a materialize sub-operation. In some examples, the sub-operation for reading at least one data block from the memory hardware includes a scan sub-operation. Optionally, determining that the determined sub-operations of the query include the scan sub-operation for reading the at least one data block includes determining that the determined sub-operations of the query include only a single scan sub-operation for reading the at least one data block. In some examples, determining that the query includes copying the plurality of the data blocks in the first table into the second table includes determining an order of a list of columns associated with the query, determining an order of columns of the first table, and determining that the order of the list of columns associated with the query is the same as the order of columns of the first table. In some implementations, the method further includes, while determining that the query includes copying the plurality of data blocks in the first table into the second table, initiating, by the data processing hardware, execution of the operation specified by the query over the first table. Optionally, the method further includes, in response to determining that the query includes copying the plurality of data blocks in the first table into the second table halting, by the data processing hardware, execution of the query over the first table. In some implementations, the method further includes, after appending the reference of each data block of the plurality of the data blocks into the metadata of the second table receiving, at the data processing hardware, a request to delete the first table, and decrementing, by the data processing hardware, the refere