CN-121979848-A - Method and system for importing server-side scraping batch height concurrency of OLAP database

CN121979848ACN 121979848 ACN121979848 ACN 121979848ACN-121979848-A

Abstract

The invention discloses a method and a system for high concurrency import of a server side group facing an OLAP database, which are characterized in that multiple client side concurrency import requests are multiplexed into a single internal transaction and a data version at the server side, and a configurable synchronous/asynchronous dual-mode submitting mode is introduced, wherein the data reliability is ensured by combining WAL in an asynchronous mode, and SQL analysis expenditure is reduced by supporting preprocessing statement cache.

Inventors

Mei dai
YANG YONGQIANG
WANG MENG
LIAN LINJIANG
XIAO KANG
YI GUOLEI
MA RUYUE
NIU XIANHUI
CHEN MINGYU

Assignees

北京飞轮数据科技有限公司

Dates

Publication Date: 20260505
Application Date: 20260203

Claims (10)

1. The method for importing the server-side scraping batch height concurrency of the OLAP database is characterized by comprising the following steps of: S1, a database server receives concurrent data import requests from a plurality of clients; s2, the server performs batch processing on the received multiple concurrent import requests, multiplexes the import operations of the multiple clients into an internal logic import task, and the internal import task corresponds to a unique database transaction and a unique data version; s3, the server processes the internal import task according to a preset import mode: if the data is in the synchronous mode, the data after batch collection is integrally submitted to a database storage layer as a transaction, and after the transaction is successfully submitted, an import success response is returned to the corresponding client, so that the data is ensured to be immediately visible; If the data is in an asynchronous mode, firstly writing the data after batch collection into the WAL and persistence, then immediately returning an import-reception success response to the client, asynchronously submitting the data in the WAL to a database storage layer according to a preset submitting condition by a background process of the server, and making the data visible after the data is submitted; and S4, in an asynchronous mode, continuously monitoring a data queue corresponding to the internal import task by a background process of the server, and triggering data submitting operation when a preset time interval threshold or the accumulated data quantity reaches a preset size threshold.
2. The method of claim 1, wherein for the write request passing through JDBC and using INSERT INTO VALUES statements, the front end node of the server supports the PreparedStatement property of MySQL protocol, the first received SQL statement and the execution plan generated thereby are cached in the Session-level memory cache, and the same import request in the same Session is directly multiplexed with the cached execution plan for avoiding repeated SQL parsing and plan generation.
3. The method for high concurrency import of OLAP database oriented server according to claim 1, wherein in step S2, when the server receives the first client import request, it creates an internal import task, a corresponding data queue and an associated WAL file, and for the other client import requests that arrive later, it directly multiplexes the data queue and WAL file of the created internal import task to perform data addition.
4. The method for high concurrency import of OLAP database oriented server side as set forth in claim 1, wherein the data queue is a blocking queue in a server side back end process memory for temporarily storing batch data to be processed, and the WAL file is used for ensuring data reliability before data is persisted to main storage in an asynchronous mode.
5. The method for high concurrency import of OLAP database oriented server according to claim 1, wherein the synchronous mode is suitable for a scenario requiring high data consistency and requiring immediate query after import is completed, and the asynchronous mode is suitable for a high-frequency write scenario sensitive to write delay and allowing final consistency.
6. The method for high concurrency import of OLAP database oriented servers of claim 1, wherein in step S3, a response is returned to the client in asynchronous mode, and data is only stored in WAL and memory queues.
7. The method for server-side scratch and concurrency import to an OLAP database of claim 1, wherein in step S4, the time interval threshold and the size threshold of the commit condition are configurable.
8. The OLAP database-oriented server-side scraping high concurrency importing system, based on the OLAP database-oriented server-side scraping high concurrency importing method of any one of claims 1 to 7, is characterized by comprising: the client interface module is used for receiving concurrent data import requests of a plurality of clients; The request scheduling and batch scraping module is used for multiplexing a plurality of concurrent import requests into an internal import task and managing corresponding data queues and transaction contexts; the mode processing module is used for calling corresponding processing logic according to the configured synchronous or asynchronous mode; The WAL management module is used for managing writing and persistence of the pre-written log of the data in an asynchronous mode; The asynchronous submitting module is used for monitoring submitting conditions in the background, triggering data to be persisted from the memory queue to the storage layer and submitting the transaction when the conditions are met; and the execution plan caching module is positioned at the front-end node and used for caching SQL of the preprocessing statement and the execution plan thereof so as to be used for subsequent multiplexing of the same session.
9. The OLAP database oriented server side cueing high concurrency import system of claim 8 wherein the request scheduling and cueing module assigns globally unique transaction and version identifiers to internal import tasks when creating them.
10. The OLAP database oriented server side group high concurrency import system of claim 8, wherein the asynchronous commit module is coupled to a back-end storage module of the database, responsible for converting data from the memory queue to a segment file in columnar storage format, and updating metadata to make new data visible to queries.

Description

Method and system for importing server-side scraping batch height concurrency of OLAP database Technical Field The invention relates to the technical field of databases, in particular to a method and a system for simultaneously importing a server-side scraping batch height of an OLAP database. Background In OLAP (online analytical processing) database systems, in the prior art, when a high concurrency data import scenario is processed, a serial or simple concurrency processing manner of creating an independent transaction for each import request, generating a new data version, and executing complete SQL parsing is generally adopted, and another common method is to write data into a server memory buffer first, and then brush down to persistent storage in batches asynchronously according to a given policy. In the prior art, in the scheme of adopting the client to synchronously wait for the completion of the brushing, although the strong consistency and reliability of the data are ensured, each importing request needs to be independently occupied and finally submitted with a transaction, so that the number of the transactions is excessive, and the memory buffering and the processing context of the server cannot be effectively multiplexed between the clients. When the real high concurrent writing field Jing Shi is faced, the scheme can cause massive small transactions and data versions to be generated in the database system, so that the transaction management overhead is increased sharply, the IO and CPU pressure of the subsequent data merging (Compaction) operation is caused to be increased suddenly, and the overall throughput and expandability of the system are severely restricted. Therefore, in order to solve the above-mentioned problems, the present invention provides a method and a system for high-concurrency import of server-side scraping and loading for OLAP database. Disclosure of Invention In order to solve the problems of excessive transaction and version numbers and high system resource pressure in the existing high concurrency import scheme, the invention provides a server-side batch-up high concurrency import method and a system for an OLAP database. The technical scheme of the invention is that the method for importing the server-side scraping batch height concurrency of the OLAP database comprises the following steps: S1, a database server receives concurrent data import requests from a plurality of clients; S2, the server performs batch processing on a plurality of received concurrent import requests, multiplexes import operations of a plurality of clients into an internal logic import task, wherein the internal import task corresponds to a unique database transaction and a data version, when the server receives a first client import request, creates an internal import task, a corresponding data queue and an associated WAL file, and directly multiplexes the data queue and the WAL file of the created internal import task for data addition for other client import requests which arrive later; s3, the server processes the internal import task according to a preset import mode: if the data is in the synchronous mode, the data after batch collection is integrally submitted to a database storage layer as a transaction, and after the transaction is successfully submitted, an import success response is returned to the corresponding client, so that the data is ensured to be immediately visible; If the data is in an asynchronous mode, firstly writing the data after batch collection into a pre-written log (WAL) and persistence, and then immediately returning an import-reception success response to the client, wherein the data only exists in the WAL and a memory queue and a formal storage file or version is not formed yet; s4, in an asynchronous mode, the background process of the server continuously monitors a data queue corresponding to an internal import task, and when a preset time interval threshold is reached or the accumulated data volume reaches a preset size threshold, data submitting operation is triggered, wherein the threshold is configurable; S5, for a write-in request passing through JDBC and using INSERT INTO VALUES sentences, a front-end node of a server side supports the preprocessing sentence characteristics of MySQL protocol, the SQL sentences received for the first time and the execution plans generated by the SQL sentences are cached in a session-level memory, and the cached execution plans are directly multiplexed by the same import request in the same subsequent session, so that repeated SQL analysis and plan generation are avoided. Preferably, the synchronous mode is suitable for a scene requiring high data consistency and requiring inquiry immediately after the completion of the importing, and the asynchronous mode is suitable for a high-frequency writing scene sensitive to writing delay and allowing final consistency. The invention provides a server-side batch high concurrency impor