US-20260127195-A1 - Handling Data Skew For Parallel Execution With Server Mapping Plans
Abstract
Skew handling techniques are provided in parallel execution for even load balancing and scaling. In a compile-time solution, a dynamic sampling query is issued to detect partition skew. The compile-time solution determines the number of skewed partitions and uses a hybrid distribution scheme where skewed partitions use a random distribution and non-skewed partitions use the original server mapping. In a runtime solution, producer server processes create partition mapping vectors that contain partition mapping information. Each producer server process sends its partition mapping vector to the query coordinator (QC). The QC receives the partition mapping vectors from the producer server processes, merges the vectors, and determines a skew result based on the merged mapping vectors and sends the skew result to the producer server processes. The producer server process can alter distribution of skewed partitions based on the skew result.
Inventors
- Srikanth Bellamkonda
- Madhuri KANDEPI
- Yash Shah
- Shuo Chen
Assignees
- ORACLE INTERNATIONAL CORPORATION
Dates
- Publication Date
- 20260507
- Application Date
- 20251105
Claims (20)
- 1 . A method comprising: executing, by a database server, a statement involving loading rows from a source to a plurality of target partitions using a specified server mapping, wherein: the server mapping maps partitions of the plurality of target partitions to a set of server processes, the set of server processes includes a coordinator server process, a set of producer server processes, and a set of consumer server processes, the set of producer server processes send rows from the source to the set of consumer server processes based on the specified server mapping, executing the statement comprises: creating, by each producer server process, a partition mapping vector comprising partition mapping information for each partition of the plurality of target partitions; receiving, by the coordinator server process, a partition mapping vector from each producer server process; merging, by the coordinator server process, the partition mapping vectors from the set of producer server processes to form a merged partition mapping vector; determining, by the coordinator server process, a skew result based on the merged partition mapping vector; and determining a runtime server mapping based on the specified server mapping and the skew result, wherein the method is performed by one or more computing devices.
- 2 . The method of claim 1 , wherein: the set of consumer server processes include a plurality of server process groups, each server process group is mapped to one or more of the partitions, and each server process group of the plurality of server process groups comprises one or more server processes collocated on a single instance of the database server.
- 3 . The method of claim 1 , wherein: the set of consumer server processes include a plurality of server process groups, each server process group is mapped to one or more of the partitions, executing the statement further comprises sending the skew result from the coordinator server process to each server process in the set of producer server processes, a given producer server process alters a server mapping for rows of a given partition to randomize mapping of rows of the given partition to consumer server processes in a server process group corresponding to the given partition.
- 4 . The method of claim 1 , wherein determining the skew result comprises determining that rows of the source are mapped to all partitions of the plurality of target partitions and that a number of rows mapped to a first subset of partitions is greater than a number of rows mapped to a remainder of the partitions by a predetermined threshold.
- 5 . The method of claim 1 , wherein the specified server mapping is a random local distribution or a partition key distribution.
- 6 . The method of claim 1 , wherein: the statement specifies a create partitioned table operation, and each partition mapping vector comprises a row count for each partition.
- 7 . The method of claim 1 , wherein: the statement specifies a join operation of two tables, each producer server process buffers rows encountered in a build side of the join operation and generates the partition mapping vector based on the buffered rows, the partition mapping vector comprise a bit vector, and each bit in the bit vector indicates whether a corresponding partition participates in a probe side of the join operation.
- 8 . The method of claim 1 , wherein: the skew result indicates one of: no skew is detected, extreme skew is detected, or moderate skew is detected, in response to the skew result indicating no skew is detected, the specified server mapping is used, in response to the skew result indicating extreme skew is detected, server mapping is disabled, and in response to the skew result indicating moderate skew is detected, rebuilding the specified server mapping.
- 9 . A method comprising: compiling, by a database server, a statement for loading a source table to a partitioned target table using a specified server mapping to form a compiled statement, wherein: the server mapping maps partitions of the partitioned target table to a set of server processes, the set of server processes includes a plurality of server process groups, each server process group is mapped to one or more of the partitions, compiling the statement comprises: issuing a dynamic sampling query to determine partition numbers that are mapped to the partitioned target table and a number of rows of each partition; identifying a first subset of one or more partitions that are skewed based on the number of rows of each partition; and changing server mapping of the first subset of one or more partitions to a randomized server mapping; and executing, by the database server, the compiled statement, wherein the method is performed by one or more computing devices.
- 10 . The method of claim 9 , wherein each server process group of the plurality of server process groups comprises one or more server processes collocated on a single instance of the database server.
- 11 . The method of claim 9 , wherein identifying the first subset of one or more partitions comprises determining that all rows of the source table are mapped to the first subset of one or more partitions in the partitioned target table.
- 12 . The method of claim 9 , wherein identifying the first subset of one or more partitions comprises determining that rows of the source table are mapped to all partitions of the partitioned target table and that the number of rows mapped to the first subset of one or more partitions is greater than the number of rows mapped to a remainder of the partitions by a predetermined threshold.
- 13 . The method of claim 9 , wherein the specified server mapping is a random local distribution or a partition key distribution.
- 14 . One or more non-transitory computer-readable media storing instructions which, when executed by one or more processors, causes performance of: executing, by a database server, a statement involving loading rows from a source to a plurality of target partitions using a specified server mapping, wherein: the server mapping maps partitions of the plurality of target partitions to a set of server processes, the set of server processes includes a coordinator server process, a set of producer server processes, and a set of consumer server processes, the set of producer server processes send rows from the source to the set of consumer server processes based on the specified server mapping, executing the statement comprises: creating, by each producer server process, a partition mapping vector comprising partition mapping information for each partition of the plurality of target partitions; receiving, by the coordinator server process, a partition mapping vector from each producer server process; merging, by the coordinator server process, the partition mapping vectors from the set of producer server processes to form a merged partition mapping vector; determining, by the coordinator server process, a skew result based on the merged partition mapping vector; and determining a server mapping based on the specified server mapping and the skew result.
- 15 . The one or more non-transitory computer-readable media of claim 14 , wherein: the set of consumer server processes include a plurality of server process groups, each server process group is mapped to one or more of the partitions, and each server process group of the plurality of server process groups comprises one or more server processes collocated on a single instance of the database server.
- 16 . The one or more non-transitory computer-readable media of claim 14 , wherein: the set of consumer server processes include a plurality of server process groups, each server process group is mapped to one or more of the partitions, executing the statement further comprises sending the skew result from the coordinator server process to each server process in the set of producer server processes, a given producer server process alters a server mapping for rows of a given partition to randomize mapping of rows of the given partition to consumer server processes in a server process group corresponding to the given partition.
- 17 . The one or more non-transitory computer-readable media of claim 14 , wherein determining the skew result comprises determining that rows of the source are mapped to all partitions of the plurality of target partitions and that a number of rows mapped to a first subset of partitions is greater than a number of rows mapped to a remainder of the partitions by a predetermined threshold.
- 18 . The one or more non-transitory computer-readable media of claim 14 , wherein the specified server mapping is a random local distribution or a partition key distribution.
- 19 . The one or more non-transitory computer-readable media of claim 14 , wherein: the statement specifies a create partitioned table operation, and each partition mapping vector comprises a row count for each partition.
- 20 . The one or more non-transitory computer-readable media of claim 14 , wherein: the statement specifies a join operation of two tables, each producer server process buffers rows encountered in a build side of the join operation and generates the partition mapping vector based on the buffered rows, the partition mapping vector comprise a bit vector, and each bit in the bit vector indicates whether a corresponding partition participates in a probe side of the join operation.
Description
BENEFIT CLAIM This application claims the benefit under 35 U.S.C. § 121 as a divisional application of application Ser. No. 18/821,805, filed Oct. 30, 2024, by Bellamkonda et al., the entire content each of which is hereby incorporated by reference for all purposes as if fully set forth herein. FIELD OF THE INVENTION The present invention relates to parallel execution with partition and server mapping. More particularly, the present invention relates to data skew aware scaling of query plans with mapping of partitions to parallel server processes. BACKGROUND Parallel execution is the ability to apply multiple processor and input/output (I/O) resources to the execution of a single statement, such as a Structured Query Language (SQL) statement by using multiple computer system processes, referred to herein as servers or server processes herein. Parallel execution dramatically reduces response time for data-intensive operations on large databases often associated with a decision support system (DSS) and data warehouse. As another example, parallel execution can be implemented on an online transaction processing (OLTP) system for batch processing or schema maintenance operations, such as index creation. Parallel execution may improve processing for: queries requiring large table scans, joins, or partitioned index scans;creation of large indexes;creation of large tables, including materialized views; or,bulk insertions, updates, merges, and deletions. Parallel execution is beneficial when a query references a large data set, there is low concurrency, or elapsed time is important. Parallel execution uses a producer/consumer model. A parallel execution plan is carried out as a series of producer/consumer operations. Parallel execution (PX) server processes that produce data for subsequent operations are called producers or send-side server processes. A query coordinator (QC), also referred to as a PX coordinator, is the session that initiates the parallel execution and coordinates the producer/consumer operations. PX server processes that require the output of other operations are called consumers or receive-side server processes. Each producer or consumer parallel operation is performed by a set of PX server processes called PX server sets. The number of PX servers in a PX server set is called the Degree of Parallelism (DOP). Query optimizers attempt to leverage partitioning during query execution plan generation to minimize data transmission and to have a higher likelihood for in-memory operations. However, these schemes do not scale well when the number of partitions is small compared to the degree of parallelism. Parallel query server mapping is used in cases where the number of partitions is smaller than the possible DOP. Without server mapping, the DOP of partition-based operations is limited to the number of partitions accessed. In other words, there is a one-to-one mapping of partitions to server processes. To achieve higher parallelism, the data may be sub-partitioned further, which results in increased overhead for customers, who may decide not to use partition-based operations and alternatively use hash-based operations instead. On the other hand, using hash-based operations negates the benefits of the higher degree of parallelism with sever costs related to redistribution of data. Server mapping allows the database server engine to use a higher degree of parallelism for partition-based operations, such as loads, joins, etc. When using server mapping, the server set is divided into groups called “server groups.” All server processes in a server group are collocated in a single instance. These server groups work on one or a subset of partitions involved in that operation. The intent of this technique is to leverage the partitioning layout of the underlying objects to avoid the communication overhead by not using the interconnect to redistribute the data and keep the communication local to an instance. Using this technique, redistribution of data across the interconnect is avoided, and a higher degree of parallelism is achieved without having to sub-partition the data further. Server mapping techniques are generally identified using “LOCAL” and “PARITION KEY” keywords in the execution plan. These keywords refer to the idea that the communication is local within an instance and partition-based distributions. The mapping between server groups and partitions is static, and this can result in significant degradation in performance if there is skew in a partition. There are two types of possible partition skew: 1. Skew due to partition pruning: This issue is observed for joins utilizing server mapping techniques. Based on the data observed on the build side, the probe side will only access partitions that will participate in the join. This can lead to server groups with no partitions to work on or “idle groups” leading to poor query execution.2. Skew due to partition sizes: This issue is observed for both joi