Search

US-20260127180-A1 - DATA PROCESSING METHOD AND APPARATUS, ELECTRONIC DEVICE, AND STORAGE MEDIUM

US20260127180A1US 20260127180 A1US20260127180 A1US 20260127180A1US-20260127180-A1

Abstract

One or more implementations of the present specification provide a data processing method and apparatus, an electronic device, and a storage medium. The method includes: in a process of writing target data in a memory into a disk, in response to receiving a query instruction for the target data, separately querying data from at least one piece of first local data and at least one piece of second local data based on the query instruction, intermediate index layer information of the at least one piece of first local data, and intermediate index layer information of the at least one piece of second local data, and sorting the queried data as a data query result. The first local data includes some data stored in the disk in the target data, the second local data includes some data stored in the memory in the target data, the target data is in a column-stored form, and each column group of the target data corresponds to one piece of second local data.

Inventors

  • Ju REN
  • Zhenjiang Xie
  • Yuzhong Zhao

Assignees

  • Beijing Oceanbase Technology Co., Ltd.

Dates

Publication Date
20260507
Application Date
20251219
Priority Date
20231213

Claims (20)

  1. 1 . A data processing method, comprising: in a process of writing target data in a memory into a disk, in response to a query instruction for the target data, separately querying data from at least one piece of first local data and at least one piece of second local data based on the query instruction, intermediate index layer information of the at least one piece of first local data, and intermediate index layer information of the at least one piece of second local data, wherein the first local data includes a first portion of the target data stored in the disk, the second local data includes a second portion of the target data stored in the memory, the target data is in a column-stored form, and each column group of the target data corresponds to a piece of second local data.
  2. 2 . The data processing method according to claim 1 , wherein the query instruction includes a query range and a query condition; and the separately querying the data from the at least one piece of first local data and the at least one piece of second local data based on the query instruction, the intermediate index layer information of the at least one piece of first local data, and the intermediate index layer information of the at least one piece of second local data includes: separately querying data within the query range from the at least one piece of first local data and the at least one piece of second local data based on the intermediate index layer information of the at least one piece of first local data and the intermediate index layer information of the at least one piece of second local data; sorting the data queried as to-be-queried data; and querying the to-be-queried data based on the query condition, to determine a data query result.
  3. 3 . The data processing method according to claim 2 , wherein the query range includes a column query range of at least one column group; and the separately querying the data within the query range from the at least one piece of first local data and the at least one piece of second local data based on the query range, the intermediate index layer information of the at least one piece of first local data, and the intermediate index layer information of the at least one piece of second local data includes: for a column group to which a column query range in the query range belongs, determining, based on intermediate index layer information of first local data and intermediate index layer information of second local data in the column group and the column query range of the column group, a first row offset range corresponding to the column query range of the column group; determining a second row offset range based on the first row offset range corresponding to the column query range in the query range; and separately querying data within the second row offset range from the at least one piece of first local data and the at least one piece of second local data based on the second row offset range, the intermediate index layer information of the at least one piece of first local data, and the intermediate index layer information of the at least one piece of second local data.
  4. 4 . The data processing method according to claim 3 , wherein the determining, based on the intermediate index layer information of the first local data and the intermediate index layer information of the second local data in the column group and the column query range of the column group, the first row offset range corresponding to the column query range of the column group includes: separately querying data within a third row offset range within the query range from the first local data and the second local data in the column group based on the intermediate index layer information of the first local data and the intermediate index layer information of the second local data in the column group; and sorting the data queried by row offset as a column query result of the column group; and filtering the column query result of the column group based on the column query range of the column group, to obtain the first row offset range corresponding to the column query range of the column group.
  5. 5 . The data processing method according to claim 4 , wherein the query range includes a primary key range; and the data processing method further comprises: separately querying data within the primary key range from first local data and second local data in a primary key column based on intermediate index layer information of the first local data and intermediate index layer information of the second local data in the primary key column; sorting the data queried by row offset as a primary key query result; and determining a second row offset range within the query range based on a row offset of data in the primary key query result.
  6. 6 . The data processing method according to claim 2 , wherein data in the first local data and the second local data is stored in a form of a data block, and each data block stores data; and the separately querying the data within the query range from the at least one piece of first local data and the at least one piece of second local data based on the intermediate index layer information of the at least one piece of first local data and the intermediate index layer information of the at least one piece of second local data includes: separately querying a data block to which the data within the query range belongs from the at least one piece of first local data and the at least one piece of second local data based on the intermediate index layer information of the at least one piece of first local data and the intermediate index layer information of the at least one piece of second local data; and sorting the data block queried as to-be-queried data.
  7. 7 . The data processing method according to claim 6 , wherein the data block includes a macro block and a micro block.
  8. 8 . The data processing method according to claim 1 , wherein the target data includes: data generated in the memory based on a data definition language (DDL).
  9. 9 . The data processing method according to claim 1 , wherein different column groups of the target data correspond to same first local data; or different column groups of the target data correspond to different first local data.
  10. 10 . An electronic device, comprising: one or more processors; and one or more storage devices, individually or collectively, having processor-executable instructions stored thereon, the processor-executable instructions, when executed by the one or more processors, enabling the one or more processors to, individually or collectively, implement actions including: in a process of writing target data in a memory into a disk, in response to a query instruction for the target data, separately querying data from at least one piece of first local data and at least one piece of second local data based on the query instruction, intermediate index layer information of the at least one piece of first local data, and intermediate index layer information of the at least one piece of second local data, wherein the first local data includes a first portion of the target data stored in the disk, the second local data includes a second portion of the target data stored in the memory, the target data is in a column-stored form, and each column group of the target data corresponds to a piece of second local data.
  11. 11 . The electronic device according to claim 10 , wherein the query instruction includes a query range and a query condition; and the separately querying the data from the at least one piece of first local data and the at least one piece of second local data based on the query instruction, the intermediate index layer information of the at least one piece of first local data, and the intermediate index layer information of the at least one piece of second local data includes: separately querying data within the query range from the at least one piece of first local data and the at least one piece of second local data based on the intermediate index layer information of the at least one piece of first local data and the intermediate index layer information of the at least one piece of second local data; sorting the data queried as to-be-queried data; and querying the to-be-queried data based on the query condition, to determine a data query result.
  12. 12 . The electronic device according to claim 11 , wherein the query range includes a column query range of at least one column group; and the separately querying the data within the query range from the at least one piece of first local data and the at least one piece of second local data based on the query range, the intermediate index layer information of the at least one piece of first local data, and the intermediate index layer information of the at least one piece of second local data includes: for a column group to which a column query range in the query range belongs, determining, based on intermediate index layer information of first local data and intermediate index layer information of second local data in the column group and the column query range of the column group, a first row offset range corresponding to the column query range of the column group; determining a second row offset range based on the first row offset range corresponding to the column query range in the query range; and separately querying data within the second row offset range from the at least one piece of first local data and the at least one piece of second local data based on the second row offset range, the intermediate index layer information of the at least one piece of first local data, and the intermediate index layer information of the at least one piece of second local data.
  13. 13 . The electronic device according to claim 12 , wherein the determining, based on the intermediate index layer information of the first local data and the intermediate index layer information of the second local data in the column group and the column query range of the column group, the first row offset range corresponding to the column query range of the column group includes: separately querying data within a third row offset range within the query range from the first local data and the second local data in the column group based on the intermediate index layer information of the first local data and the intermediate index layer information of the second local data in the column group; and sorting the data queried by row offset as a column query result of the column group; and filtering the column query result of the column group based on the column query range of the column group, to obtain the first row offset range corresponding to the column query range of the column group.
  14. 14 . The electronic device according to claim 13 , wherein the query range includes a primary key range; and the actions further include: separately querying data within the primary key range from first local data and second local data in a primary key column based on intermediate index layer information of the first local data and intermediate index layer information of the second local data in the primary key column; sorting the data queried by row offset as a primary key query result; and determining a second row offset range within the query range based on a row offset of data in the primary key query result.
  15. 15 . The electronic device according to claim 11 , wherein data in the first local data and the second local data is stored in a form of a data block, and each data block stores data; and the separately querying the data within the query range from the at least one piece of first local data and the at least one piece of second local data based on the intermediate index layer information of the at least one piece of first local data and the intermediate index layer information of the at least one piece of second local data includes: separately querying a data block to which the data within the query range belongs from the at least one piece of first local data and the at least one piece of second local data based on the intermediate index layer information of the at least one piece of first local data and the intermediate index layer information of the at least one piece of second local data; and sorting the data block queried as to-be-queried data.
  16. 16 . The electronic device according to claim 15 , wherein the data block includes a macro block and a micro block.
  17. 17 . A computer-readable storage medium, the computer-readable storage medium having computer instructions stored thereon, the computer instructions, when executed by one or more processors, enabling the one or more processors to, individually or collectively, implement actions including: in a process of writing target data in a memory into a disk, in response to a query instruction for the target data, separately querying data from at least one piece of first local data and at least one piece of second local data based on the query instruction, intermediate index layer information of the at least one piece of first local data, and intermediate index layer information of the at least one piece of second local data, wherein the first local data includes a first portion of the target data stored in the disk, the second local data includes a second portion of the target data stored in the memory, the target data is in a column-stored form, and each column group of the target data corresponds to a piece of second local data.
  18. 18 . The storage medium according to claim 17 , wherein the query instruction includes a query range and a query condition; and the separately querying the data from the at least one piece of first local data and the at least one piece of second local data based on the query instruction, the intermediate index layer information of the at least one piece of first local data, and the intermediate index layer information of the at least one piece of second local data includes: separately querying data within the query range from the at least one piece of first local data and the at least one piece of second local data based on the intermediate index layer information of the at least one piece of first local data and the intermediate index layer information of the at least one piece of second local data; sorting the data queried as to-be-queried data; and querying the to-be-queried data based on the query condition, to determine a data query result.
  19. 19 . The storage medium according to claim 18 , wherein the query range includes a column query range of at least one column group; and the separately querying the data within the query range from the at least one piece of first local data and the at least one piece of second local data based on the query range, the intermediate index layer information of the at least one piece of first local data, and the intermediate index layer information of the at least one piece of second local data includes: for a column group to which a column query range in the query range belongs, determining, based on intermediate index layer information of first local data and intermediate index layer information of second local data in the column group and the column query range of the column group, a first row offset range corresponding to the column query range of the column group; determining a second row offset range based on the first row offset range corresponding to the column query range in the query range; and separately querying data within the second row offset range from the at least one piece of first local data and the at least one piece of second local data based on the second row offset range, the intermediate index layer information of the at least one piece of first local data, and the intermediate index layer information of the at least one piece of second local data.
  20. 20 . The storage medium according to claim 19 , wherein the determining, based on the intermediate index layer information of the first local data and the intermediate index layer information of the second local data in the column group and the column query range of the column group, the first row offset range corresponding to the column query range of the column group includes: separately querying data within a third row offset range within the query range from the first local data and the second local data in the column group based on the intermediate index layer information of the first local data and the intermediate index layer information of the second local data in the column group; and sorting the data queried by row offset as a column query result of the column group; and filtering the column query result of the column group based on the column query range of the column group, to obtain the first row offset range corresponding to the column query range of the column group.

Description

TECHNICAL FIELD One or more implementations of the present specification relate to the field of database technologies, and in particular, to a data processing method and apparatus, an electronic device, and a storage medium. BACKGROUND With today's rapid development of the Internet and informatization, data generation is explosively increasing. Therefore, requirements on databases and database management are increasingly high. During data processing, data manipulation languages (DML) need to be used to operate data tables, for example, add, delete, query, or modify data. During data processing, data definition languages (DDL) are further used to reorganize data, for example, create a new table, delete a column, or change a column type. In the related technologies, in a process of reorganizing data by using a DDL, related data cannot provide a query service externally, which affects service processing of databases, causing service problems such as service request timeout. SUMMARY According to a first aspect of one or more implementations of the present specification, a data processing method is provided. The method includes: in a process of writing target data in a memory into a disk, in response to receiving a query instruction for the target data, separately querying data from at least one piece of first local data and at least one piece of second local data based on the query instruction, intermediate index layer information of the at least one piece of first local data, and intermediate index layer information of the at least one piece of second local data, and sorting the queried data as a data query result. The first local data includes some data stored in the disk in the target data, the second local data includes some data stored in the memory in the target data, the target data is in a column-stored form, and each column group of the target data corresponds to one piece of second local data. In an implementation of the present specification, the query instruction includes a query range and a query condition; and the separately querying the data from the at least one piece of first local data and the at least one piece of second local data based on the query instruction, the intermediate index layer information of the at least one piece of first local data, and the intermediate index layer information of the at least one piece of second local data, and sorting the queried data as the data query result includes: separately querying data within the query range from the at least one piece of first local data and the at least one piece of second local data based on the intermediate index layer information of the at least one piece of first local data and the intermediate index layer information of the at least one piece of second local data, and sorting the queried data as to-be-queried data; and querying the to-be-queried data based on the query condition, to determine the data query result. In an implementation of the present specification, the query range includes a column query range of at least one column group; and the separately querying the data within the query range from the at least one piece of first local data and the at least one piece of second local data based on the query range, the intermediate index layer information of the at least one piece of first local data, and the intermediate index layer information of the at least one piece of second local data, and sorting the queried data as the to-be-queried data includes: for a column group to which each column query range in the query range belongs, determining, based on intermediate index layer information of first local data and intermediate index layer information of second local data in the column group and the column query range of the column group, a first row offset range corresponding to the column query range of the column group; determining a second row offset range based on the first row offset range corresponding to each column query range in the query range; and separately querying data within the second row offset range from the at least one piece of first local data and the at least one piece of second local data based on the second row offset range, the intermediate index layer information of the at least one piece of first local data, and the intermediate index layer information of the at least one piece of second local data, and sorting the queried data as the to-be-queried data. In an implementation of the present specification, the determining, based on the intermediate index layer information of the first local data and the intermediate index layer information of the second local data in the column group and the column query range of the column group, the first row offset range corresponding to the column query range of the column group includes: separately querying data within a third row offset range within the query range from the first local data and the second local data in the column group based on the intermedi