CN-115718725-B - Data processing method, device and system for distributed database system

CN115718725BCN 115718725 BCN115718725 BCN 115718725BCN-115718725-B

Abstract

The embodiment of the invention provides a data processing method, device and system for a distributed database system. The method is applied to a management server in a distributed database system, the management server obtains file information of each fragment node, the file information of each fragment node is at least used for indicating associated subfiles in subfiles stored in the fragment node, the associated subfiles are a plurality of subfiles belonging to different files and having association relations, files to be migrated are selected from the subfiles included in at least one fragment node according to a preset file selection rule based on the obtained file information, and the file selection rule comprises that all subfiles included in the associated subfiles are taken as selection granularity for the associated subfiles when files are selected, and then the files to be migrated are migrated to an expansion node. According to the scheme, the execution efficiency of the expanded distributed database system aiming at the access instruction can be improved.

Inventors

WANG TIANYU

Assignees

北京金山云网络技术有限公司

Dates

Publication Date: 20260505
Application Date: 20210824

Claims (19)

1. The data processing method for the distributed database system is characterized by being applied to a management server in the distributed database system, wherein the distributed database system further comprises a plurality of slicing nodes, and the method comprises the following steps: The method comprises the steps of obtaining file information of each slicing node, wherein the file information of each slicing node is at least used for indicating associated subfiles in subfiles stored in the slicing node, and the associated subfiles are a plurality of subfiles belonging to different files but having association relations; Selecting a file to be migrated from subfiles included in at least one slicing node according to a preset file selection rule, wherein the file selection rule comprises that all subfiles included in the associated subfiles are taken as selection granularity when files are selected for the associated subfiles, the file to be migrated comprises the associated subfiles and subfiles without association relation, and the associated subfiles are selected from the subfiles included in the at least one slicing node when the associated subfiles are included; And migrating the file to be migrated to a capacity expansion node, wherein the capacity expansion node is a sharded node which is additionally arranged in the distributed database system in advance when the system expands capacity.
2. The method according to claim 1, wherein selecting the file to be migrated from the subfiles included in the at least one shard node according to a predetermined file selection rule based on the acquired file information comprises: determining at least one fragment node of a file to be migrated and the data quantity to be migrated corresponding to the at least one fragment node according to the designated index value of each fragment node, wherein the designated index value is used for representing the use condition of a storage space; And selecting an independent sub-file and/or an associated sub-file which are matched with the data quantity to be migrated and correspond to the fragmented node from the sub-files stored in the node according to a preset file selection rule aiming at each node in the at least one fragmented node, wherein the independent sub-file is a sub-file which does not have an association relationship with other sub-files.
3. The method according to claim 2, wherein selecting, according to a predetermined file selection rule, an independent sub-file and/or an associated sub-file that matches the amount of data to be migrated corresponding to the node from the sub-files stored in the node as the file to be migrated of the node includes: Determining a first data amount of an associated subfile indicated by the file information of the node and a second data amount of an independent subfile in the fragmented node; and selecting an independent subfile and/or an associated subfile matched with the data quantity to be migrated corresponding to the node from subfiles stored by the node based on the first data quantity and the second data quantity according to a preset file selection rule, and taking the independent subfile and/or the associated subfile as the file to be migrated of the node.
4. The method of claim 1, wherein the migrating the file to be migrated to the expansion node comprises: And issuing a migration instruction aiming at the file to be migrated to the partition node to which the file to be migrated belongs, so that the partition node receiving the migration instruction uses the subfiles as migration granularity, and migrates the file to be migrated to the capacity-expanding node.
5. The method according to any one of claims 1-4, wherein the determining the associated subfiles indicated by the file information of each shard node comprises: Determining associated files with association relations in each target file, wherein the target files are files to which sub-files stored in the sharding nodes belong; And determining the associated subfiles in the slicing node by utilizing subfiles of the target files contained in the associated files and positioned in the slicing node.
6. The method according to claim 5, wherein determining the association file in which the association relationship exists in each target file includes: acquiring associated file declaration information given by a user; And identifying the associated files with the association relation in each target file based on the associated file statement information.
7. The method of claim 5, wherein determining the associated subfile in the sharded node using the subfile of the target file contained in the associated file that is located in the sharded node comprises: Determining subfiles which are stored by the slicing node and belong to each designated file and accord with a preset matching rule as associated subfiles; The specified file is a target file contained in the associated file, and the predetermined matching rule comprises the steps of having the same sub-file identification or having matched column contents.
8. The method according to claim 7, wherein the determining, among the subfiles belonging to each of the specified files and stored in the slicing node, the subfiles conforming to the predetermined matching rule, before being used as the associated subfiles, further includes: detecting the number of subfiles belonging to each designated file and stored by the slicing node; If the detected number is 1, determining the subfiles belonging to each designated file as associated subfiles; And if the detected number is greater than 1, executing the sub-files which are stored by the determined slicing node and belong to each designated file and accord with the preset matching rule as the associated sub-files.
9. A distributed database system is characterized by comprising a management server and a plurality of slicing nodes; the system comprises a management server, each slicing node, a file information reporting unit, a file management unit and a storage unit, wherein the file information of each slicing node is used for determining the file information of the slicing node, reporting the file information of the slicing node to the management server, and the file information of the slicing node is at least used for indicating associated subfiles in subfiles stored in the slicing node, and the associated subfiles are a plurality of subfiles belonging to different files but having association relations; The management server is used for acquiring file information of each fragment node, selecting files to be migrated from subfiles included in at least one fragment node according to a preset file selection rule, and migrating the files to be migrated to an expansion node, wherein the file selection rule comprises that all subfiles included in the associated subfiles are used as selection granularity when the files are selected, the expansion node is a fragment node which is added in the distributed database system in advance when the system expands, the files to be migrated comprise associated subfiles and subfiles without association relation, and the associated subfiles are selected from the subfiles included in the at least one fragment node when the associated subfiles are included.
10. The system according to claim 9, wherein the management server selects the file to be migrated from the subfiles included in the at least one shard node according to a predetermined file selection rule based on the obtained file information, specifically: determining at least one fragment node of a file to be migrated and the data quantity to be migrated corresponding to the at least one fragment node according to the designated index value of each fragment node, wherein the designated index value is used for representing the use condition of a storage space; For each node in the at least one sharded node, selecting an independent sub-file and/or an associated sub-file which are matched with the data quantity to be migrated and correspond to the node from the sub-files stored in the node according to a preset file selection rule, and taking the independent sub-file and/or the associated sub-file as the file to be migrated of the node; the independent subfiles are subfiles which have no association relation with other subfiles.
11. The system according to claim 10, wherein the management server selects, from the subfiles stored in the node, an independent subfile and/or an associated subfile that matches the amount of data to be migrated corresponding to the node according to a predetermined file selection rule, as the file to be migrated of the node, and includes: Determining a first data amount of an associated subfile indicated by the file information of the node and a second data amount of an independent subfile in the fragmented node; and selecting an independent subfile and/or an associated subfile matched with the data quantity to be migrated corresponding to the node from subfiles stored by the node based on the first data quantity and the second data quantity according to a preset file selection rule, and taking the independent subfile and/or the associated subfile as the file to be migrated of the node.
12. The system of claim 9, wherein the management server migrating the file to be migrated to a capacity-expanding node comprises: And issuing a migration instruction aiming at the file to be migrated to the partition node to which the file to be migrated belongs, so that the partition node receiving the migration instruction uses the subfiles as migration granularity, and migrates the file to be migrated to the capacity-expanding node.
13. The system of any of claims 9-12, wherein each shard node determines file information for the shard node, comprising: Determining associated files with association relations in each target file, wherein the target files are files to which sub-files stored in the sharding nodes belong; Determining the associated subfiles in the slicing nodes by using subfiles of the target files contained in the associated files and positioned in the slicing nodes; Based on the determined associated subfiles, file information for the sharded node is generated.
14. The system of claim 13, wherein each of the slicing nodes determines an association file having an association relationship in each of the target files, and the method comprises: acquiring associated file declaration information given by a user; And identifying the associated files with the association relation in each target file based on the associated file statement information.
15. The system of claim 13, wherein each of the sharded nodes uses the subfiles of the object file contained in the associated file that are located in the sharded node to determine the associated subfiles in the sharded node, comprising: Determining subfiles which are stored by the slicing node and belong to each designated file and accord with a preset matching rule as associated subfiles; The specified file is a target file contained in the associated file, and the predetermined matching rule comprises the steps of having the same sub-file identification or having matched column contents.
16. The system of claim 15, wherein each of the slicing nodes determines a subfile, among the subfiles stored by the slicing node and belonging to each of the designated files, that meets a predetermined matching rule, and is further configured to, prior to being an associated subfile: detecting the number of subfiles belonging to each designated file and stored by the slicing node; If the detected number is 1, determining the subfiles belonging to each designated file as associated subfiles; And if the detected number is greater than 1, executing the sub-files which are stored by the determined slicing node and belong to each designated file and accord with the preset matching rule as the associated sub-files.
17. A data processing apparatus for a distributed database system, wherein the data processing apparatus is applied to a management server in the distributed database system, the distributed database system further includes a plurality of sharded nodes, and the apparatus includes: The system comprises an acquisition module, a storage module and a storage module, wherein the acquisition module is used for acquiring file information of each slicing node, and the file information of any slicing node is at least used for indicating associated subfiles in subfiles stored in the slicing node, wherein the associated subfiles are a plurality of subfiles belonging to different files but having association relations; The file selection module is used for selecting files to be migrated from the subfiles included in at least one fragment node according to a preset file selection rule, wherein the file selection rule comprises that all subfiles included in the associated subfiles are taken as selection granularity when files are selected, the files to be migrated comprise associated subfiles and subfiles without association relation, and the associated subfiles are selected from the subfiles included in at least one fragment node when the associated subfiles are included; And the migration module is used for migrating the file to be migrated to the capacity expansion node, wherein the capacity expansion node is a sharded node which is additionally arranged in the distributed database system in advance when the system expands capacity.
18. The management server is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory are communicated with each other through the communication bus; a memory for storing a computer program; A processor for carrying out the method steps of any one of claims 1-8 when executing a program stored on a memory.
19. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored therein a computer program which, when executed by a processor, implements the method steps of any of claims 1-8.

Description

Data processing method, device and system for distributed database system Technical Field The present invention relates to the field of data processing technologies, and in particular, to a data processing method, apparatus, and system for a distributed database system. Background In the distributed database system, a management server distributes and stores any file in a plurality of partition nodes in a physical sub-table form, so that a storage form of a logic file and a partition file is formed. Elastic capacity expansion is a necessary basic function of a distributed database system. After the capacity expansion node is added in the distributed database system, namely after a new slicing node is added, the data content of the file is subjected to data redistribution so as to complete capacity expansion. In the related art, each sub-file in the original sharded node may be distributed to different sharded nodes after capacity expansion. The management server can push down the access instruction to each fragment node to respond to the access instruction before capacity expansion, and collect the reported results of each fragment node, and after capacity expansion, the management server is required to read the relevant data of each fragment node and then respond to the instruction by using the relevant data, so that the execution efficiency of the access instruction response process is greatly reduced. Disclosure of Invention The embodiment of the invention aims to provide a data processing method, device and system for a distributed database system, which are used for improving the execution efficiency of the expanded distributed database system for an access instruction. The specific technical scheme is as follows: In a first aspect, an embodiment of the present invention provides a data processing method for a distributed database system, which is applied to a management server in the distributed database system, where the distributed database system further includes a plurality of shard nodes, and the method includes: The method comprises the steps of obtaining file information of each slicing node, wherein the file information of each slicing node is at least used for indicating associated subfiles in subfiles stored in the slicing node, and the associated subfiles are a plurality of subfiles belonging to different files but having association relations; selecting a file to be migrated from the subfiles included in at least one shard node according to a preset file selection rule based on the acquired file information, wherein the file selection rule comprises that all subfiles included in the associated subfiles are used as selection granularity for the associated subfiles when the files are selected; And migrating the file to be migrated to a capacity expansion node, wherein the capacity expansion node is a sharded node which is additionally arranged in the distributed database system in advance when the system expands capacity. Optionally, the selecting, according to a predetermined file selection rule, a file to be migrated from the subfiles included in the at least one shard node based on the acquired file information includes: determining at least one fragment node of a file to be migrated and the data quantity to be migrated corresponding to the at least one fragment node according to the designated index value of each fragment node, wherein the designated index value is used for representing the use condition of a storage space; And selecting an independent sub-file and/or an associated sub-file which are matched with the data quantity to be migrated and correspond to the fragmented node from the sub-files stored in the node according to a preset file selection rule aiming at each node in the at least one fragmented node, wherein the independent sub-file is a sub-file which does not have an association relationship with other sub-files. Optionally, according to a predetermined file selection rule, selecting, from the subfiles stored in the node, an independent subfile and/or an associated subfile that matches the amount of data to be migrated corresponding to the node, as the file to be migrated of the node, where the selecting includes: Determining a first data amount of an associated subfile indicated by the file information of the node and a second data amount of an independent subfile in the fragmented node; and selecting an independent subfile and/or an associated subfile matched with the data quantity to be migrated corresponding to the node from subfiles stored by the node based on the first data quantity and the second data quantity according to a preset file selection rule, and taking the independent subfile and/or the associated subfile as the file to be migrated of the node. Optionally, the migrating the file to be migrated to the capacity expansion node includes: And issuing a migration instruction aiming at the file to be migrated to the partition node to which the file to be mig