Search

US-12619595-B1 - Optimistically concurrent view loading

US12619595B1US 12619595 B1US12619595 B1US 12619595B1US-12619595-B1

Abstract

A database management system receives a request to execute a long-running transaction. The database management system sub-divides the transaction into a plurality of sub-transactions. The transaction is initiated using optimistic locking. The database management system attempts to execute and validate the sub-transactions. When a sub-transaction fails, it is re-executed after a delay period. When all sub-transactions have been executed and validated, the transaction commits.

Inventors

  • Yannis Papakonstantinou
  • Tate A. Certain
  • ALLAN HENRY VERMEULEN
  • Christopher Richard Jacques de Kadt

Assignees

  • AMAZON TECHNOLOGIES, INC.

Dates

Publication Date
20260505
Application Date
20190318

Claims (20)

  1. 1 . A system, comprising: at least one processor; and at least one memory comprising computer-executable instructions that, in response to being executed by the at least one processor, cause the system to at least: receive a request to create a view, the view created by execution of a transaction comprising a plurality of sub-operations that compute the view, the transaction having a length of time that is above a threshold associated with a probability that the transaction will be affected by an intervening transaction; initiate the transaction using an optimistic concurrency model that confirms serializability of the transaction prior to committing of the transaction and based at least in part on confirming that an outcome of the transaction has not been affected by an intervening transaction; process a first sub-operation of the plurality of sub-operations to compute at least a first portion of the view; responsive to a determination that a second sub-operation, of the plurality of sub-operations, has failed validation based at least in part on an intervening change to data referenced by the second sub-operation occurring before the second sub-operation has committed: confirm the serializability of the second sub-operation to determine that the second sub-operation can be committed, and retry execution of the second sub-operation after a delay period; finalize the transaction responsive to successful validation of the first sub-operation and the retried second sub-operation that computes at least a second portion of the view; commit the transaction responsive to completion of each of the plurality of sub-operations including the retried second sub-operation after initial failed validation due to an error; and determine a probability that an additional transaction will be affected by an intervening transaction and that the probability is above a threshold level, and in response to determining that the probability is above the threshold level, initiating the additional transaction using the optimistic concurrency model that confirms serializability of the additional transaction prior to committing of the additional transaction.
  2. 2 . The system of claim 1 , wherein the at least one memory comprises further computer-executable instructions that, in response to being executed by the at least one processor, cause the system to at least determine that the sub-operation has failed validation based at least in part on an intervening change to data referenced by the sub-operation.
  3. 3 . The system of claim 1 , wherein the at least one memory comprises further computer-executable instructions that, in response to being executed by the at least one processor, cause the system to at least determine that a probability of a conflict with an additional transaction is above a threshold amount.
  4. 4 . The system of claim 1 , wherein the transaction is committed in response to completion of each of the plurality of sub-operations.
  5. 5 . The system of claim 1 , wherein the at least one memory comprises further computer-executable instructions that, in response to being executed by the at least one processor, cause the system to at least re-attempt to validate the sub-operation.
  6. 6 . A computer-implemented method, comprising: receiving a request to perform a transaction determined to be long-running based at least in part on the transaction having a length of time that is above a threshold associated with a probability that the transaction will be affected by an intervening transaction, the transaction comprising a plurality of sub-operations that when performed compute the transaction; determining that the probability that the transaction will be affected by an intervening transaction is above a threshold level; initiating execution of the plurality of sub-operations with a model that confirms serializability of the transaction prior to committing of the transaction and based at least in part on confirming that an outcome of the transaction has not been affected by an intervening transaction; responsive to determining that a sub-operation of the plurality of sub-operations failed, confirming the serializability of the sub-operation to determine if the sub-operation can be committed; after a delay period, retrying execution of the sub-operation, of the plurality of sub-operations, in response to the determining that the sub-operation failed validation based at least in part on an intervening change to data referenced by the sub-operation occurring before the sub-operation has committed, the execution of the sub-operation computing a first portion of the transaction; finalizing the transaction responsive to successful validation of the retried sub-operation, wherein the transaction is finalized in response to completion of each of the plurality of sub-operations including the retried sub-operation after initial failed validation; committing the transaction responsive to completion of each of the plurality of sub-operations including the retried second sub-operation after initial failed validation due to an error; and responsive to determining that a probability of an additional transaction being affected by an intervening transaction is above a threshold level, initiating the additional transaction using the optimistic concurrency model that confirms serializability of the additional transaction prior to committing of the additional transaction.
  7. 7 . The computer-implemented method of claim 6 , further comprising determining that the sub-operation has failed validation based at least in part on an intervening change to data referenced by the sub-operation.
  8. 8 . The computer-implemented method of claim 6 , wherein execution of the sub-operation is suspended during the delay period and completing execution of the sub-operation comprises attempting to re-validate the sub-operation.
  9. 9 . The computer-implemented method of The computer-implemented method of wherein the transaction is committed based at least in part on completion of each of the plurality of sub-operations.
  10. 10 . The computer-implemented method of claim 6 , further comprising determining that the sub-operation can be retried.
  11. 11 . The computer-implemented method of claim 6 , wherein the request to perform the transaction corresponds to a request to compute a view summary.
  12. 12 . The computer-implemented method of claim 11 , wherein the sub-operation, when performed, computes a portion of the view summary.
  13. 13 . The computer-implemented method of claim 6 , wherein the plurality of sub-operations each operate on a logical partition of a table.
  14. 14 . A non-transitory computer-readable storage medium having stored thereon executable instructions that, as a result of being executed by one or more processors of a computer system, cause the computer system to at least: determine a probability that a transaction will be affected by an intervening transaction and that the probability is above a threshold level; determine that the transaction is dividable into a plurality of sub-operations usable to complete at least a portion of the transaction, the transaction associated with a length of time that is above a time threshold associated with the probability that the transaction will be affected by the intervening transaction; initiate execution of a transaction comprising the plurality of sub-operations with a method that confirms serializability of the transaction based at least in part on confirming that an outcome of the transaction has not been affected by an intervening transaction; in response to determining that a sub-operation of the plurality of sub-operations failed, attempt to confirm the serializability of the sub-operation to determine if the sub-operation can be committed; after a delay period, reattempt completion of execution of the sub-operation, of the plurality of sub-operations, responsive to the determining that the sub-operation has failed validation based at least in part on an intervening change to data referenced by the sub-operation occurring before the sub-operation has committed, the execution of the sub-operation completing a part of the transaction; finalize the transaction responsive to successful validation of the sub-operation whose completion was reattempted, wherein the transaction is finalized in response to completion of each of the plurality of sub-operations including the reattempted sub-operation after initial failed validation; commit the transaction responsive to completion of each of the plurality of sub-operations including the sub-operation whose completion was reattempted after initial failed validation due to an error; and responsive to determining that a probability of an additional transaction being affected by an intervening transaction is above a threshold level, initiate the additional transaction using the optimistic concurrency model that confirms serializability of the additional transaction prior to committing of the additional transaction.
  15. 15 . The non-transitory computer-readable storage medium of claim 14 , wherein the instructions further comprise instructions that, as a result of being executed by the one or more processors, cause the computer system to at least determine that the sub-operation has failed validation based at least in part on a change to data referenced by the sub-operation.
  16. 16 . The non-transitory computer-readable storage medium of claim 14 , wherein the sub-operation is re-applied after a delay period.
  17. 17 . The non-transitory computer-readable storage medium of claim 14 , wherein the transaction is committed based at least in part on completing each of the plurality of sub-operations.
  18. 18 . The non-transitory computer-readable storage medium of claim 14 , further comprising determining that the sub-operation can be retried.
  19. 19 . The non-transitory computer-readable storage medium of claim 14 , wherein a sub-operation of the plurality of sub-operations, when executed, computes a portion of an index.
  20. 20 . The non-transitory computer-readable storage medium of claim 14 , wherein the sub-operation, when performed, computes a portion of a view.

Description

BACKGROUND Database management systems provide facilities to store and retrieve data. Although a wide variety of database management systems exists, the most popular may be divided into one of two categories. The first category of databases, relational databases, are those built on the relational model and generally supporting tables of fixed-length records. The second category is non-relational databases, which may substitute the comparatively rigid structured query language (“SQL”) with other query mechanisms. Databases of both of these categories are widely used. However, database management systems in both categories have their own respective limitations. BRIEF DESCRIPTION OF THE DRAWINGS Various techniques will be described with reference to the drawings, in which: FIG. 1 illustrates a ledger-based database system, in accordance with an embodiment; FIG. 2 illustrates distributed storage of a ledger used in conjunction with a ledger-based database system, in accordance with an embodiment; FIG. 3 illustrates aspects of query processing in a ledger-based database system, in accordance with an embodiment; FIG. 4 illustrates a table structure of a ledger-based database system, in accordance with an embodiment; FIG. 5 illustrates a journal, in accordance with an embodiment; FIG. 6 illustrates aspects of a journal record, in accordance with an embodiment; FIG. 7 illustrates aspects of a storage technique for summary data, in accordance with an embodiment; FIG. 8 illustrates aspects of executing a long-running transaction, in accordance with an embodiment; FIG. 9 illustrates an example process for executing a long-running transaction using an optimistic concurrency model; FIG. 10 illustrates aspects of an example process for executing a long-running transaction; and FIG. 11 illustrates a system in which various embodiments can be implemented. DETAILED DESCRIPTION Described herein are systems and techniques related to the operation of a ledger-based database management system. A ledger, as used herein, comprises journal and summary data structures adapted for use in a database management system. A journal records an immutable history of transactions performed on a document managed by the system, and a summary provides a synopsis of the document's current state. In an example embodiment, a ledger-based database management system creates views and indexes. The views and indexes are created using transactions executed under an optimistic concurrency model. Here, a transaction refers to a set of sub-operations that are performed as an atomic unit. The use of an optimistic concurrency model refers to avoiding the acquisition of locks on the database during the reading and processing phases of a transaction. Typically, transactions which create views and indexes are long-running, and as such are likely to be subverted or “starved” by shorter-running transactions. However, embodiments may execute long-running transactions, such as those used to create views or indexes, by attempting to execute the various sub-operations of which the transaction is comprised. If a particular sub-operation fails, it can be re-validated, retried, or re-executed. When all of the sub-operations have successfully completed, the transaction is committed. In an example embodiment, a ledger based database management system receives a request to create a view or index, where the view or index is to be created by a transaction which comprises a number of sub-operations. The transaction is initiated under optimistic concurrency assumptions, and as such no write locks are acquired on the data at least until a later commit phase, although certain embodiments may also avoid locking during the commit phase. The sub-operations are executed. When a sub-operation fails validation, e.g., by having the data it depends on written to by another transaction, the system attempts to complete execution of the sub-operation after a delay period. When all sub-operations successfully complete, the transaction is committed and the creation of the view or index is completed. As noted, the example embodiment initiates the transaction under an optimistic concurrency model, which may also be described as optimistic locking, or optimistic concurrency assumptions, and so forth. Optimistic concurrency is described as such because it is based on an assumption that no conflicts will arise in between the time data is write operations are prepared and the time the write operations are applied to the underlying data. Note that this assumption is sometimes wrong, and as such embodiments confirm the serializability of the transaction prior to committing the transaction. Confirming the serializability of a transaction comprises confirming that the outcome of a transaction (in this example, a long-running transaction) has not been affected by an intervening transaction. In the preceding and following description, various techniques are described. For purposes of explanation, speci