EP-4738140-A1 - GENERATION OF TABLE METADATA IN HTAP DATABASE DURING FLUSH
Abstract
A hybrid transactional/analytical processing, HTAP, database (10) is provided, which includes an online transaction processing, OLTP, engine (12), an online analytical processing, OLAP, engine (14) provided separately from the OLTP engine, and a shared storage (26) holding data. The shared storage includes an OLTP storage (28) holding data in row format for responding to queries of the OLTP engine, and an OLAP storage (30) holding data in column format for responding to queries of the OLAP engine. The OLAP storage includes a delta store (42) configured to record changes made to the OLTP storage in an update, and a base store (44) configured to implement the update from the delta store on data in persistent storage in a flush. The HTAP database further includes a table metadata service (62) configured to generate and perform operations on table metadata (64A) of the data that is updated in the flush.
Inventors
- LIU, YE
- ZHANG, LI
- ZHANG, Lutong
- XIONG, Mu
- JIN, Yupeng
- JIE, Zhou
- LIU, Bingyu
- CHEN, JIANJUN
Assignees
- Lemon Inc.
- DOUYIN VISION CO., LTD.
- Beijing Zitiao Network Technology Co., Ltd.
Dates
- Publication Date
- 20260506
- Application Date
- 20250714
Claims (15)
- A hybrid transactional/analytical processing, HTAP, database (10), comprising: an online transaction processing, OLTP, engine (12); an online analytical processing, OLAP, engine (14), provided separately from the OLTP engine (12); a shared storage (26) holding data, the shared storage (26) including: an OLTP storage (28) holding data in row format for responding to queries of the OLTP engine (12); and an OLAP storage (30) holding data in column format for responding to queries of the OLAP engine (14), the OLAP storage (30) including: a delta store (42) configured to record changes made to the OLTP storage (28) in an update; and a base store (44) configured to implement the update from the delta store (42) on data in persistent storage in a flush; and a table metadata service (62) configured to generate and perform operations on table metadata (64A) of the data that is updated in the flush.
- The HTAP database (10) of claim 1, wherein the table metadata service (62) is further configured to merge the table metadata (64A) of the data that is updated in the flush with prior table metadata (66A), optionally wherein the table metadata (64A) includes table-level metadata including one or more of total inserted row count, total deleted row count, and total final row count, optionally, wherein the table metadata (64A) includes column-level metadata including at least one of average length for variable length columns, number of distinct values, and a minimum-maximum value, and optionally wherein the OLAP engine (14) further comprises a query optimizer (49) configured to receive a query (34) and, based at least on the table metadata (64A), generate a plan (84) for responding to the query (34).
- The HTAP database (10) of claim 1 or 2, wherein the merged table metadata (72A) is first partition metadata pertaining to data of a first partition (76A), and the table metadata service (62) is further configured to aggregate the first partition metadata with merged table metadata (72B... 72N) of other partitions to form table-wide metadata pertaining to an entire table of data stored in the HTAP database (10).
- The HTAP database (10) of claim 3, wherein the table metadata service (62) is further configured to: identify in-memory data (58) that is not yet stored in the persistent storage; scan the in-memory data (58) to generate table metadata (64A) of the in-memory data (58); and merge the table metadata (64A) of the in-memory data with the table-wide metadata.
- The HTAP database (10) of claim 3 or 4, wherein the table metadata service (62) is further configured to: identify a portion of the table containing data that has not been flushed and for which table metadata (64A) has not been generated by the table metadata service (62); retrieve corresponding metadata (78) for the portion of the table from the OLTP storage (28); and merge the corresponding metadata (78) with the table-wide metadata (74).
- The HTAP database (10) of claim 3, 4 or 5, wherein the table metadata service (62) is further configured to: receive user input (80) instructing regeneration of the table-wide metadata (74); scan each partition (76A, 76B... 76N) to regenerate respective partition metadata (82A, 82B···82N); replace the merged table metadata (76A) of each partition (76A, 76B... 76N) with the respective partition metadata (82A, 82B···82N); aggregate each respective partition metadata (82A, 82B···82N) together to form fresh table-wide metadata(74A); and replace the table-wide metadata (74) with the fresh table-wide metadata(74A).
- The HTAP database (10) of any preceding claim, wherein the OLTP storage (28) comprises: a log store (36) configured to persist logs (38); and a page store (40) configured to store versions of data pages and apply the logs to construct current versions of the data pages; and the shared storage (26) comprises a replication framework (46) configured to perform log shipping of the logs from the log store (36) to the delta store (42).
- A method (300) for metadata generation in a hybrid transactional/analytical processing, HTAP, database, the HTAP database including an online transaction processing, OLTP, engine, an online analytical processing, OLAP, engine provided separately from the OLTP engine, and a shared storage holding data, the shared storage including an OLTP storage holding data in row format for responding to queries of the OLTP engine, and an OLAP storage holding data in column format for responding to queries of the OLAP engine, the method comprising: recording (302) changes made to the OLTP storage in a delta store of the OLAP storage in an update; implementing (304) the update from the delta store on data in persistent storage of a base store of the OLAP storage in a flush; and generating and performing (306) operations on table metadata of the data that is updated in the flush.
- The method (300) of claim 8, further comprising merging (308) the table metadata of the data that is updated in the flush with prior table metadata.
- The method (300) of claim 8 or 9, wherein the merged table metadata is first partition metadata pertaining to data of a first partition, and the method further comprises aggregating (310) the first partition metadata with merged table metadata of other partitions to form table-wide metadata pertaining to an entire table of data stored in the HTAP database.
- The method (300) of claim 10, further comprising: identifying (312) in-memory data that is not yet stored in the persistent storage; scanning (314) the in-memory data to generate table metadata of the in-memory data; and merging (316) the table metadata of the in-memory data with the table-wide metadata, optionally, wherein the method further comprises: identifying (318) a portion of the table containing data that has not been flushed and for which table metadata has not been generated by the table metadata service; retrieving (320) corresponding metadata for the portion of the table from the OLTP storage; and merging (322) the corresponding metadata with the table-wide metadata. optionally, wherein the method further comprises: receiving (324) user input instructing regeneration of the table-wide metadata; scanning (326) each partition to regenerate respective partition metadata; replacing (328) the merged table metadata of each partition with the respective partition metadata; aggregating (330) each respective partition metadata together to form fresh table-wide metadata; and replacing (332) the table-wide metadata with the fresh table-wide metadata.
- The method (300) of any one of claims 8-11, wherein the table metadata includes table-level metadata including one or more of total inserted row count, total deleted row count, and total final row count.
- The method (300) of any one of claims 8-12, wherein the table metadata includes column-level metadata including at least one of average length for variable length columns, number of distinct values, and a minimum-maximum value.
- The method (300) of any one of claims 8-13, further comprising, with a query optimizer: receiving a query; and based at least on the table metadata, generating a plan for responding to the query.
- A hybrid transactional/analytical processing, HTAP, database (10), comprising: an online transaction processing, OLTP, engine (12); an online analytical processing, OLAP, engine (14), provided separately from the OLTP engine (12); a shared storage (26) holding data of multiple tenants, the shared storage (26) including: an OLTP storage (28) holding data in row format for responding to queries of the OLTP engine (12); and an OLAP storage (30) holding data in column format for responding to queries of the OLAP engine (14), the OLAP storage (30) including: a delta store (42) configured to record changes made to the OLTP storage (28) in an update; and a base store (44) configured to implement the update from the delta store (42) on data in persistent storage in a flush; and a query optimizer (49) configured to: generate table metadata (64A) of the data that is updated in the flush; receive a query (34); and based at least on the table metadata (64A), generate a plan (84) for responding to the query (34).
Description
BACKGROUND Databases are designed with various types of architecture. Online transaction processing (OLTP) architecture is oriented toward processing transactions. Like with a bank account, transactions tend to be simple and numerous, and may include queries such as read, insert, update, delete, etc. However, OLTP architecture lacks massively parallel processing (MPP) capability. In contrast, online analytical processing (OLAP) architecture is oriented toward processing data analysis. Queries handled by OLAP architecture tend to be more complex, involve more data, and take more time to complete. For example, OLAP architecture may be responsible for determining an average sale price over the past year of a particular product tracked by the database. Many database users have use for both types of processing using the same data, and thus a hybrid transactional/analytical processing (HTAP) database has been developed which includes both types of architecture in a single database. In addition to merely storing data itself, the HTAP database has many uses for metadata or statistics about the data, such as data amount, data distribution, and number of distinct values, etc. One such use is providing the metadata to a query optimizer which can then determine how best to respond to the query. However, while the OLTP side of the HTAP database may be able to utilize existing mechanisms to collect metadata usable by its query optimizer, these mechanisms are activated by methods including random sampling, manual triggering of metadata collection by a user, and detecting invalid metadata upon querying the metadata. These existing mechanisms typically result in a large lump sum of metadata collection at once, which can be especially undesirable during times of already heavy system loads. Furthermore, the OLTP side uses relatively simple metadata, whereas the OLAP side of the HTAP database, which processes more complex queries, would benefit from query optimization and other features using appropriately more complex metadata. SUMMARY To address these issues, a hybrid transactional/analytical processing (HTAP) database is provided herein, which includes an online transaction processing (OLTP) engine, an online analytical processing (OLAP) engine provided separately from the OLTP engine, and a shared storage holding data. The shared storage may include an OLTP storage holding data in row format for responding to queries of the OLTP engine, and an OLAP storage holding data in column format for responding to queries of the OLAP engine. The OLAP storage may include a delta store configured to record changes made to the OLTP storage in an update, and a base store configured to implement the update from the delta store on data in persistent storage in a flush. The HTAP database may further include a table metadata service configured to generate and perform operations on table metadata of the data that is updated in the flush. This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter. Furthermore, the claimed subject matter is not limited to implementations that solve any or all disadvantages noted in any part of this disclosure. BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 illustrates a hybrid transactional/analytical processing (HTAP) database according to one example of the present disclosure.FIG. 2A illustrates a metadata generation and merging process of the HTAP database of FIG. 1, while FIG. 2A illustrates a metadata aggregation process.FIG. 3 shows an example flowchart of a method for metadata generation in an HTAP database according to one example of the present disclosure.FIG. 4 shows a schematic view of an example computing environment in which the HTAP database of FIG. 1 may be enacted. DETAILED DESCRIPTION OF EMBODIMENTS FIG. 1 illustrates a hybrid transactional/analytical processing (HTAP) database 10 according to one example of the present disclosure. The HTAP database 10 may be hybridized in the sense that it includes both an online transaction processing (OLTP) engine 12 and an online analytical processing (OLAP) engine 14, which may be provided separately from the OLTP engine 12. That is, the HTAP database 10 may include separated compute engines. Keeping the two engines 12, 14 separate may allow each separate engine 12, 14 to exhibit peak performance for their respective workloads for which they are specialized while avoiding interference therebetween. Examples of HTAP databases with unified engines include SAP HANA and MEMSQL, and examples with separate engines include WILDFIRE and TIDB. The OLTP engine 12 may include a read-write (RW) node 16 and a read-only (RO) node 18. Each node 16, 18 may include a respective secondary engine plugin 20, which