CN-116628036-B - Execution plan generation method, device, equipment and storage medium
Abstract
The invention discloses an execution plan generation method, an execution plan generation device, execution plan generation equipment and a storage medium. The method comprises the steps of obtaining a deduplication item parameter if no grouping item exists in an SQL statement and query items are target aggregation functions, wherein the target aggregation functions are aggregation functions comprising deduplication items, and generating an execution plan corresponding to the SQL statement according to the SQL statement and the deduplication item parameters, wherein the execution plan comprises an operator corresponding to each subtask and parallelism corresponding to each subtask, and executing the execution plan provided by the embodiment of the invention, so that redundant data replication overhead is not required to be introduced under the condition that an existing distributed execution framework is changed less, multithreading resources are fully utilized, and the execution efficiency of the SQL statement is improved.
Inventors
- SONG XIN
- WAN WEI
- HAN ZHUZHONG
Assignees
- 上海达梦数据库有限公司
Dates
- Publication Date
- 20260512
- Application Date
- 20230616
Claims (7)
- 1. An execution plan generation method, comprising: if no grouping item exists in the SQL sentence and the query items are all target aggregation functions, obtaining the duplicate removal item parameters, wherein the target aggregation functions are aggregation functions comprising duplicate removal items; Generating an execution plan corresponding to the SQL statement according to the SQL statement and the deduplication item parameter, wherein the execution plan comprises an operator corresponding to each subtask and parallelism corresponding to each subtask; the generating an execution plan corresponding to the SQL statement according to the SQL statement and the deduplication item parameter comprises the following steps: determining a target table according to the SQL statement; determining the parallelism of a first subtask according to the target table and available hardware resources of a node storing the target table, wherein the first subtask comprises a first distribution operator; determining the parallelism of a second subtask according to the deduplication item parameter and available hardware resources of a node storing the target table, wherein the second subtask comprises a first receiving operator, a first aggregation operator and a second distributing operator; Determining parallelism of a third sub-task as a first value, wherein the third sub-task comprises a second receive operator and a second aggregate operator; Wherein the method further comprises: Sequentially executing the first subtask, the second subtask and the third subtask to obtain a target execution result; Wherein the first subtask, the second subtask and the third subtask are sequentially executed, obtaining a target execution result, including: determining data to be transmitted corresponding to each de-duplication item parameter according to the target table and the de-duplication item parameter; Transmitting the data to be transmitted corresponding to each deduplication item parameter to a receiving thread corresponding to each deduplication item parameter according to the first subtask; performing first aggregation on the received data based on the second subtask to obtain a first aggregation result corresponding to each receiving thread; and performing secondary aggregation on the first aggregation result corresponding to each receiving thread based on a third subtask to obtain a target execution result corresponding to the SQL statement.
- 2. The method of claim 1, further comprising, before sending the data to be sent corresponding to each deduplication item parameter to the receiving thread corresponding to each deduplication item parameter according to the first subtask: obtaining a hash fold value of each duplicate removal item parameter; and determining a receiving thread corresponding to each deduplication item parameter according to the hash fold value of each deduplication item parameter.
- 3. The method of claim 2, wherein determining a corresponding receive thread for each deduplication item parameter based on the hash fold value for each deduplication item parameter comprises: Determining the number of parallel threads corresponding to each deduplication item parameter according to the parallelism corresponding to the second subtask and the number of deduplication item parameters; and determining the receiving thread corresponding to each de-duplication item parameter according to the parallel thread quantity corresponding to each de-duplication item parameter and the hash fold value of each de-duplication item parameter.
- 4. The method of claim 1, wherein performing secondary aggregation on the first aggregation result corresponding to each receiving thread based on a third subtask to obtain a target execution result corresponding to the SQL statement, comprises: determining a secondary aggregation function according to the corresponding primary aggregation result of each receiving thread; and performing secondary aggregation on the first aggregation result corresponding to each receiving thread in the second subtask based on the secondary aggregation function to obtain a target execution result corresponding to the SQL statement.
- 5. An execution plan generation apparatus, comprising: The system comprises a de-duplication item parameter acquisition module, a target aggregation function and a query module, wherein the de-duplication item parameter acquisition module is used for acquiring de-duplication item parameters if no grouping item exists in the SQL sentence and the query items are all target aggregation functions, and the target aggregation functions are aggregation functions comprising de-duplication items; the execution plan generation module is used for generating an execution plan corresponding to the SQL statement according to the SQL statement and the deduplication item parameter, wherein the execution plan comprises an operator corresponding to each subtask and parallelism corresponding to each subtask; The execution plan generation module is specifically configured to: determining a target table according to the SQL statement; determining the parallelism of a first subtask according to the target table and available hardware resources of a node storing the target table, wherein the first subtask comprises a first distribution operator; determining the parallelism of a second subtask according to the deduplication item parameter and available hardware resources of a node storing the target table, wherein the second subtask comprises a first receiving operator, a first aggregation operator and a second distributing operator; Determining parallelism of a third sub-task as a first value, wherein the third sub-task comprises a second receive operator and a second aggregate operator; Sequentially executing the first subtask, the second subtask and the third subtask to obtain a target execution result; Wherein the first subtask, the second subtask and the third subtask are sequentially executed, obtaining a target execution result, including: determining data to be transmitted corresponding to each de-duplication item parameter according to the target table and the de-duplication item parameter; Transmitting the data to be transmitted corresponding to each deduplication item parameter to a receiving thread corresponding to each deduplication item parameter according to the first subtask; performing first aggregation on the received data based on the second subtask to obtain a first aggregation result corresponding to each receiving thread; and performing secondary aggregation on the first aggregation result corresponding to each receiving thread based on a third subtask to obtain a target execution result corresponding to the SQL statement.
- 6. An electronic device, the electronic device comprising: at least one processor, and A memory communicatively coupled to the at least one processor, wherein, The memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the execution plan generation method of any one of claims 1-4.
- 7. A computer readable storage medium storing computer instructions for causing a processor to implement the execution plan generation method of any one of claims 1-4 when executed.
Description
Execution plan generation method, device, equipment and storage medium Technical Field The embodiment of the invention relates to the technical field of computers, in particular to a method, a device, equipment and a storage medium for generating an execution plan. Background In a database distributed environment, there are query statements shaped as GROUP BY below, with the aggregate function having DISTINCT. Firstly, constructing a table, wherein the table constructing statement is created table t1 (c 1 int, c2 int, c3 int, id int) partition by hash (id) parts 5; Example 1: select sum(distinct c1),count(distinct c2),sum(distinct c3)from t1; The existing processing method for the aggregation function containing a plurality of duplicate removal items comprises the following two steps of summarizing the aggregation function into a single thread and repeatedly sending the parameter data of the aggregation function. The first processing method cannot fully utilize multi-thread parallel optimization, and the second processing method can bring additional network communication expense. Disclosure of Invention The embodiment of the invention provides an execution plan generation method, an execution plan generation device and an execution plan generation storage medium, and the execution plan provided by the embodiment of the invention can be executed without introducing redundant data replication overhead under the condition that the existing distributed execution framework is changed less, so that multithreading resources are fully utilized, and the execution efficiency of SQL sentences is improved. According to an aspect of the present invention, there is provided an execution plan generation method including: if no grouping item exists in the SQL sentence and the query items are all target aggregation functions, obtaining the duplicate removal item parameters, wherein the target aggregation functions are aggregation functions comprising duplicate removal items; And generating an execution plan corresponding to the SQL statement according to the SQL statement and the deduplication item parameter, wherein the execution plan comprises an operator corresponding to each subtask and parallelism corresponding to each subtask. According to another aspect of the present invention, there is provided an execution plan generation apparatus including: The system comprises a de-duplication item parameter acquisition module, a target aggregation function and a query module, wherein the de-duplication item parameter acquisition module is used for acquiring de-duplication item parameters if no grouping item exists in the SQL sentence and the query items are all target aggregation functions, and the target aggregation functions are aggregation functions comprising de-duplication items; And the execution plan generation module is used for generating an execution plan corresponding to the SQL statement according to the SQL statement and the deduplication item parameter, wherein the execution plan comprises an operator corresponding to each subtask and parallelism corresponding to each subtask. According to another aspect of the present invention, there is provided an electronic apparatus including: at least one processor, and A memory communicatively coupled to the at least one processor, wherein, The memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the execution plan generation method according to any one of the embodiments of the present invention. According to another aspect of the present invention, there is provided a computer readable storage medium storing computer instructions for causing a processor to implement the execution plan generation method according to any one of the embodiments of the present invention when executed. The embodiment of the invention aims at the SQL statement without grouping items, the set functions comprise the de-duplication items, and the execution plan corresponding to the SQL statement is generated according to the de-duplication item parameters and the SQL statement, so that the execution plan provided by the embodiment of the invention can be executed without introducing redundant data replication overhead under the condition that the existing distributed execution framework is less changed, the multithreading resources are fully utilized, and the execution efficiency of the SQL statement is improved. It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the invention or to delineate the scope of the invention. Other features of the present invention will become apparent from the description that follows. Drawings In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the embodiments will be briefly described below, it being understood that the following d