Search

CN-121979917-A - Logic plan generation method and related device

CN121979917ACN 121979917 ACN121979917 ACN 121979917ACN-121979917-A

Abstract

The application discloses a logic plan generation method and a related device, wherein the logic plan generation method comprises the steps of receiving a data processing request from a user, generating a logic plan based on the data processing request, determining a window function operator to be optimized in the logic plan, wherein the window function operator to be optimized is a window function operator of an unspecified partition column, selecting at least one column from a data table appointed by the window function operator to be optimized based on at least one of partition potential, numerical balance and calculation efficiency of each column in a data table appointed by the window function operator to be optimized, positively correlating the partition potential, the numerical balance and the calculation efficiency with the selected probability, and taking the selected column as the partition column appointed by the window function operator to be optimized to obtain an optimization result of the logic plan. By the scheme, the advantages of a partitioning mechanism can be fully exerted.

Inventors

  • ZHANG DONGLIN
  • WANG JICHAO
  • Shen Zhuonan
  • WANG TINGTING
  • CAO JIANJIN
  • QIAN HAODONG

Assignees

  • 浙江大华技术股份有限公司

Dates

Publication Date
20260505
Application Date
20251230

Claims (10)

  1. 1. A logic plan generation method, comprising: Receiving a data processing request from a user, and generating a logic plan based on the data processing request; determining a window function operator to be optimized in the logic plan, wherein the window function operator to be optimized is a window function operator of an unspecified partition column; selecting at least one column from a data table appointed by the window function operator to be optimized based on at least one of partition potential, numerical value balance degree and calculation efficiency of each column in the data table appointed by the window function operator to be optimized, wherein the partition potential represents the dispersion degree of values in the corresponding column, the numerical value balance degree represents the balance degree of values in the column, and the calculation efficiency represents the efficiency of partitioning according to the corresponding column; And taking the selected column as the partition column appointed by the window function operator to be optimized to obtain an optimization result of the logic plan.
  2. 2. The method of claim 1, wherein the determining window function operators to be optimized in the logical plan comprises: for each logic planning operator in the logic plan, respectively judging whether the logic planning operator is the window function operator and the partition columns are not specified; And in response to being the window function operator and not designating the partition column, determining that the logical plan operator is the window function operator to be optimized.
  3. 3. The method according to claim 2, wherein the determining, for each logical plan operator in the logical plan, whether the logical plan operator is the window function operator and the partition column is not specified, includes: Judging whether the current logic plan operator is the window function operator or not; In response to not being the window function operator, updating the current logic plan operator by utilizing a next logic plan operator, and returning to the judging step of whether the window function operator is the window function operator; The method comprises the steps of determining whether a window function operator is assigned to a current logical plan operator, determining whether the window function operator is assigned to a partition column in response to the window function operator, determining whether the current logical plan operator is assigned to the partition column in response to the partition column not being assigned to the window function operator, determining whether the current logical plan operator is assigned to the window function operator and the partition column not being assigned to the partition column in response to the partition column being assigned to the window function operator, updating the current logical plan operator by using the next logical plan operator, and returning to the step of determining whether the current logical plan operator is the window function operator.
  4. 4. The method of claim 3, wherein said determining whether the current logical plan operator has specified the partition column comprises: Judging whether a field value corresponding to a partition column information field of the current logic plan operator is empty or not; In response to being empty, determining that the partition column is not specified; In response to not being empty, it is determined that the partition column has been designated.
  5. 5. The method according to claim 1, wherein the obtaining the optimization result of the logic plan by using the selected column as the partition column specified by the window function operator to be optimized includes: Taking the field of the selected column as a field value corresponding to a partition column information field contained in the window function operator to be optimized to obtain an optimized window function operator; Replacing the window function operator to be optimized in the logic plan with the optimized window function operator to obtain an optimization result of the logic plan; and/or before at least one of partition potential, numerical balance and computational efficiency of each column in the data table specified based on the window function operator to be optimized, selecting at least one of the columns from the data table specified by the window function operator to be optimized, including: taking a field value corresponding to a data table information field contained in the window function operator to be optimized as a data table appointed by the window function operator to be optimized; Determining each column in a data table specified by the window function operator to be optimized from metadata of the data table; And/or selecting at least one column from the data table specified by the window function operator to be optimized based on at least one of partition potential, numerical balance and calculation efficiency of each column in the data table specified by the window function operator to be optimized, including: for each column, acquiring an evaluation index of the column based on at least one of partition potential, numerical balance and calculation efficiency of the column; And selecting at least one column of which the evaluation index meets a preset index condition from the columns.
  6. 6. The method of claim 5, wherein the obtaining the evaluation index of the column based on at least one of partition potential, numerical balance, computational efficiency of the column comprises: Weighting at least two of partition potential, numerical balance and calculation efficiency of the column to obtain the evaluation index; And/or the number of the groups of groups, The selecting at least one column of the evaluation index satisfying a preset index condition from the columns includes: and selecting a preset number of columns with highest evaluation indexes from the columns.
  7. 7. The method of claim 1, wherein the step of obtaining the partition potential of the column comprises: Obtaining the unique value number of the column from metadata of the data table; Obtaining partition potential of the column based on the unique value number of the column, wherein the partition potential of the column is positively correlated with the unique value number of the column; and/or, the step of obtaining the numerical balance of the column comprises the following steps: obtaining a histogram of the column from metadata of the data table, wherein the histogram represents the quantity distribution condition of each value in the column; Acquiring the quantity balance degree of each value in the column based on the column diagram of the column; And/or the step of obtaining the calculation efficiency of the column comprises: acquiring the data type of the column from metadata of the data table; and determining the calculation efficiency corresponding to the data type as the calculation efficiency of the column.
  8. 8. The method of claim 7, wherein the column histogram is an equal width histogram of the column, the equal width histogram of the column characterizing a number of the columns in which each of the values falls within a plurality of adjacent first value intervals, and the width of each of the first value intervals is the same; The obtaining the number balance degree of each value in the column based on the column histogram comprises the following steps: acquiring first discrete degree statistical values of the number of the values falling in a plurality of adjacent first value intervals; Determining the number of degrees of equalization based on the first degree of discretization statistic, the number of degrees of equalization being inversely related to the first degree of discretization statistic; Or alternatively, the first and second heat exchangers may be, The column histogram is a contour histogram of the column, and the contour histogram of the column characterizes that the number of the values falling in a plurality of adjacent second value intervals in the column is the same; The obtaining the number balance degree of each value in the column based on the column histogram comprises the following steps: Acquiring the width of each second value interval; Acquiring a second discrete degree statistical value of the width of each second value interval; the number of degrees of equalization is determined based on the second degree of discretization statistic, the number of degrees of equalization being inversely related to the second degree of discretization statistic.
  9. 9. An electronic device comprising a memory and a processor for executing program instructions stored in the memory to implement the method of any one of claims 1-8.
  10. 10. A computer readable storage medium/program product, characterized in that a program instruction/computer program is stored thereon, which program instruction/computer program, when being executed by a processor, implements the method of any of claims 1-8.

Description

Logic plan generation method and related device Technical Field The present application relates to the field of data processing technologies, and in particular, to a logic plan generating method and a related device. Background APACHE SPARK (distributed computing engine) is a powerful distributed computing framework, spark SQL (structured data processing module) is one of the components of APACHE SPARK for processing structured data. Spark SQL can use user-entered structured query terms (SQL query terms) to query and analyze data. The Spark SQL operation flow is that receiving SQL query statement input by user, analyzing grammar to obtain grammar tree, converting grammar tree into logic plan, optimizing logic plan with RBO (Rule-Based Optimization) to obtain logic plan optimizing result, converting logic plan into physical plan, executing physical plan to realize distributed calculation. The Window Function (Window Function) of Spark SQL is a special class of Function in Spark SQL that can perform calculations on a particular "Window/partition" of a data table while keeping the number of rows of the data table unchanged. The computation performed by the window function may include, but is not limited to, ordering and ranking the data partitions, computing a moving average, running sum, and other statistical indicators. In Spark SQL window functions, partitioning (PARTITION BY) refers to dividing a data table into multiple logical groups according to the value of a given PARTITION column, each logical group being referred to as a PARTITION or window. The partition mechanism has the advantages of being capable of achieving accurate calculation, namely independent calculation in each partition through a window function, enabling calculation of different partitions not to interfere with each other, being high in calculation efficiency, enabling the Spark clusters to conduct parallel calculation on the partitions, improving calculation efficiency, achieving resource optimization, enabling the data quantity of the partitions to be controllable, avoiding memory overflow and the like, improving the utilization rate of the Spark clusters, and enabling multi-dimensional analysis to enable complex analysis of a data table to be conducted according to multiple dimensions at the same time, and meeting multi-angle insight requirements of services. However, the partition mechanism in the related art cannot fully exert the partition advantage. Disclosure of Invention The application provides a logic plan generation method and a related device, which can solve the problem that a partitioning mechanism in related technology cannot fully exert the advantage of partitioning. The application provides a logic plan generating method which comprises the steps of receiving a data processing request from a user, generating a logic plan based on the data processing request, determining a window function operator to be optimized in the logic plan, wherein the window function operator to be optimized is a window function operator of an unspecified partition column, selecting at least one column from a data table appointed by the window function operator to be optimized based on at least one of partition potential, numerical balance degree and calculation efficiency of each column in a data table appointed by the window function operator to be optimized, positively correlating the partition potential, the numerical balance degree and the calculation efficiency with the selected probability, wherein the partition potential represents the dispersion degree of values in the corresponding columns, the numerical balance degree represents the balance degree of the values in the columns, the calculation efficiency represents the efficiency of partitioning according to the corresponding columns, and taking the selected columns as partition columns appointed by the window function operator to be optimized to obtain an optimization result of the logic plan. The application provides a logic plan generating device which comprises a receiving module, a determining module, a selecting module and an appointing module, wherein the receiving module is used for receiving a data processing request from a user and generating a logic plan based on the data processing request, the determining module is used for determining a window function operator to be optimized in the logic plan, the window function operator to be optimized is a window function operator of an unspecified partition column, the selecting module is used for selecting at least one column from a data table appointed by the window function operator to be optimized based on at least one of partition potential, numerical balance degree and calculation efficiency of each column in the data table appointed by the window function operator to be optimized, the partition potential, the numerical balance degree and the calculation efficiency are positively related to the selected probability, the part