Execute Queries in Parallel
This document describes the use cases, methods, limitations, and frequently asked questions for executing queries in parallel in SynxDB. SynxDB introduces the parallel query feature, where the number of computing nodes (including the SeqScan operator) is dynamically adjusted based on the data volume during query execution. Parallel query aims to improve query performance by utilizing multiple CPU cores to process a single query.
Use cases
Deploy a small number of segments on a single physical machine and improve performance by dynamically adjusting the degree of parallelism instead of deploying a large number of segments.
Enabling operator parallelism provides a performance advantage when the host CPU and disk load are not high.
How to use
SynxDB supports parallel queries on AO/AOCO tables and heap tables. Both GPORCA and the Postgres query optimizer support parallel execution strategies.
Parallel execution with GPORCA
GPORCA supports intra-segment parallel query execution, using multiple CPU cores on each segment to reduce execution time for long-running queries.
Key capabilities include:
Parallel table scan: Scans a single table using multiple workers in parallel.
Parallel hash join: Allows multiple workers to execute hash join operations concurrently, supporting shared hash tables. Supported join types include Inner Join, Left Outer Join, Semi Join (used in
EXISTSsubqueries), and Left Anti Semi Join (used inNOT INsubqueries).Parallel aggregation: Supports parallel group and hash aggregation, enabling workers to perform partial aggregations locally.
Parallel index scan: Uses multiple workers to execute an index scan in parallel for queries with selective predicates on indexed columns.
Parallel UNION ALL: Executes
UNIONandUNION ALLqueries in parallel within segments, improving throughput for queries combining multiple result sets. Controlled by theoptimizer_parallel_unionparameter.Parallel index only scan: Uses multiple workers to execute an index-only scan in parallel, reading results entirely from the index without accessing table data blocks.
To use parallel queries with GPORCA, ensure optimizer is enabled (default) and enable_parallel is set to ON.
SET enable_parallel = ON;
SET optimizer = ON; -- Default
SET statement_mem = '256MB'; -- At least 32MB is recommended for GPORCA parallel
The following prerequisites apply to GPORCA parallel execution:
Parallel mode must be enabled (
enable_parallel = on).Tables involved must not be replicated tables.
The query plan must include other parallel-capable operators.
statement_memshould be set to at least32MB(for example,SET statement_mem = '256MB';).Tables should have the
parallel_workersstorage parameter set to specify the number of parallel workers per segment. For example,CREATE TABLE t (a INT) WITH (parallel_workers=2) DISTRIBUTED BY (a);.
Parallel index scan with GPORCA
When a query has a selective predicate on an indexed column, GPORCA evaluates whether a parallel index scan is more cost-effective than a parallel sequential scan and selects the plan accordingly. This applies to standard B-tree indexes.
This optimization applies to a wide range of query patterns, including queries with LIMIT, ORDER BY with LIMIT, various join types (Left Join, Right Join, Semi Join, Anti-Semi Join), IN subqueries, and aggregations.
Query plan example
The following example shows a parallel index scan plan generated by GPORCA when enable_parallel is ON:
CREATE TABLE t (a int, b int) WITH (parallel_workers=2) DISTRIBUTED BY (a);
CREATE INDEX t_a_idx ON t(a);
INSERT INTO t SELECT i, i FROM generate_series(1, 10000) i;
ANALYZE t;
SET enable_parallel = on;
EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM t WHERE a < 1000;
Gather Motion 6:1 (slice1; segments: 6)
Output: a, b
-> Parallel Index Scan using t_a_idx on t
Output: a, b
Index Cond: (t.a < 1000)
Optimizer: GPORCA
To disable parallel index scan so that GPORCA falls back to a parallel sequential scan, set the optimizer_enable_indexscan GUC to off:
SET optimizer_enable_indexscan = off;
Parallel index only scan with GPORCA
When a query selects only columns covered by an index, GPORCA evaluates whether a parallel index-only scan is more cost-effective than a parallel index scan or sequential scan. If chosen, workers read results entirely from the index without accessing table data blocks. This applies to B-tree indexes only; bitmap AM indexes are not supported.
The following prerequisites apply in addition to the general GPORCA parallel prerequisites:
The index must cover all columns referenced by the query (a covering index).
VACUUMmust have been run on the table so that the visibility map is populated. Without an up-to-date visibility map, the optimizer’s index-only scan cost equals the index scan cost, and it may prefer a parallel index scan instead.Bitmap AM indexes are not supported for parallel index-only scans.
Query plan example
The following example shows a parallel index-only scan plan generated by GPORCA when enable_parallel is ON:
CREATE TABLE t (a int, b int) WITH (parallel_workers=2) DISTRIBUTED BY (a);
CREATE INDEX t_a_idx ON t(a);
INSERT INTO t SELECT i, i FROM generate_series(1, 200000) i;
VACUUM ANALYZE t;
SET enable_parallel = on;
EXPLAIN (VERBOSE, COSTS OFF) SELECT a FROM t WHERE a < 1000;
Gather Motion 6:1 (slice1; segments: 6)
Output: a
-> Parallel Index Only Scan using t_a_idx on t
Output: a
Index Cond: (t.a < 1000)
Optimizer: GPORCA
Note that the query selects only a, which is covered by t_a_idx. If you also selected a column not in the index, GPORCA would use a Parallel Index Scan instead.
To disable parallel index only scan, set the optimizer_enable_indexonlyscan GUC to off:
SET optimizer_enable_indexonlyscan = off;
Parallel execution with Postgres query optimizer
You can also use the Postgres query optimizer for parallel execution. In this case, you must explicitly disable GPORCA.
SET enable_parallel = ON;
SET optimizer = OFF;
Query heap tables in parallel
Enable the parallel query feature.
If using GPORCA:
SET enable_parallel = ON;
If using the Postgres query optimizer:
SET enable_parallel = ON; SET optimizer = OFF;
Set the maximum degree of parallelism.
-- The setting should consider the number of CPU cores and segments. SET max_parallel_workers_per_gather = 4;
Query example:
SET statement_mem = '256MB'; CREATE TABLE t1 (c1 int,c2 int, c3 int, c4 box) WITH (parallel_workers=2) distributed by (c1); INSERT INTO t1 SELECT x, 2*x, 3*x, box('6,6,6,6') FROM generate_series(1,1000000) AS x; SELECT count(*) from t1;
Query AO/AOCO tables in parallel
Enable the parallel query feature.
If using GPORCA:
SET enable_parallel = ON;
If using the Postgres query optimizer:
SET enable_parallel = ON; SET optimizer = OFF;
Set the maximum degree of parallelism.
-- The setting should consider the number of CPU cores and segments. SET max_parallel_workers_per_gather = 4; -- Set how many files to insert data into within one transaction. A larger value might decrease performance, especially for AOCO tables. SET gp_appendonly_insert_files = 8; -- Switch to the next file for data insertion every 100,000 rows. This can be adjusted to distribute data as evenly as possible across multiple files. SET gp_appendonly_insert_files_tuples_range = 100000;
Insert and query.
CREATE TABLE ao(x INT) USING ao_row; INSERT INTO ao SELECT i FROM generate_series(1, 1000000) i; -- It is recommended to update statistics first. ANALYZE ao; SELECT COUNT(*) FROM ao;
Query partitioned tables in parallel
GPORCA supports two complementary features for partitioned table scan performance:
Parallel Append (
optimizer_enable_parallel_append): Parallelizes segment-level scans across partitions. GPORCA evaluates both a parallel append path and the default dynamic scan path, selecting the lower-cost plan.Parallel Partition Selector (
optimizer_enable_parallel_part_selector): When parallel append is enabled, this further parallelizes dynamic partition elimination (DPE) across workers using a shared-memory bitmapset, so workers coordinate partition pruning without redundant scans. This is enabled by default.
Enable the parallel query feature and GPORCA optimizer.
SET enable_parallel = ON; SET optimizer = ON; -- Default
Enable parallel append for partitioned tables.
SET optimizer_enable_parallel_append = on;
The parallel partition selector (
optimizer_enable_parallel_part_selector) is enabled by default and takes effect automatically once parallel append is turned on. To disable it explicitly:SET optimizer_enable_parallel_part_selector = off;
Set the maximum degree of parallelism.
SET max_parallel_workers_per_gather = 4;
Query example:
CREATE TABLE sales (id INT, amount DECIMAL) DISTRIBUTED BY (id) PARTITION BY RANGE (id) (START (1) END (1000001) EVERY (100000)); INSERT INTO sales SELECT i, i * 1.5 FROM generate_series(1, 1000000) i; ANALYZE sales; SELECT SUM(amount) FROM sales;
Note
Parallel append is a cost-based optimization. GPORCA evaluates the cost of both the parallel append path and the default dynamic scan path, and selects the plan with the lower estimated cost. In environments with a small number of segments or limited data volume, the optimizer may determine that the dynamic scan path is more efficient, and therefore not choose the parallel append plan.
When a subquery or subplan appears in the
WHEREclause of a partitioned table join, GPORCA falls back to usingDynamic Seq Scanfor partition elimination, which handles subplans correctly.
Execute UNION operations in parallel
SynxDB parallelizes UNION operations by introducing a mechanism called Parallel-oblivious Append. This mechanism allows multiple worker processes to process data independently in a shared-nothing manner, thereby improving the performance and reliability of UNION operations.
When you execute a query containing a UNION, you can observe operators such as HashAggregate and Redistribute Motion in the execution plan (EXPLAIN). This indicates that the query is being executed in parallel.
Query example:
SELECT DISTINCT a FROM t_distinct_0 UNION SELECT DISTINCT b FROM t_distinct_0;
The execution plan for the query might look like this:
QUERY PLAN
----------------------------------------------------------------------
Gather Motion 6:1 (slice1; segments: 6)
-> HashAggregate
Group Key: t_distinct_0.a
-> Redistribute Motion 6:6 (slice2; segments: 6)
Hash Key: t_distinct_0.a
Hash Module: 3
-> Append
-> GroupAggregate
Group Key: t_distinct_0.a
-> Sort
Sort Key: t_distinct_0.a
-> Redistribute Motion 6:6 (slice3;
segments: 6)
Hash Key: t_distinct_0.a
Hash Module: 3
-> Streaming HashAggregate
Group Key: t_distinct_0.a
-> Parallel Seq Scan on
t_distinct_0
-> GroupAggregate
Group Key: t_distinct_0_1.b
-> Sort
Sort Key: t_distinct_0_1.b
-> Redistribute Motion 6:6 (slice4;
segments: 6)
Hash Key: t_distinct_0_1.b
Hash Module: 3
-> Streaming HashAggregate
Group Key: t_distinct_0_1.b
-> Parallel Seq Scan on
t_distinct_0 t_distinct_0_1
Execute DISTINCT operations in parallel
In a Massively Parallel Processing (MPP) architecture, SynxDB supports parallelizing DISTINCT operations to accelerate query performance.
The core principle is to use Motion operators to redistribute tuples among multiple worker processes. Even if the data’s distribution key does not match the DISTINCT target expression, the system can still generate a parallel plan based on a Parallel Scan of the underlying table.
This process is usually divided into multiple stages:
Pre-deduplication: Perform an initial parallel deduplication on partial data using the
Streaming HashAggregateorHashAggregateoperator.Redistribution: Redistribute the data according to the
DISTINCTexpression.Final deduplication: Perform the final
DISTINCToperation.
Query example:
SELECT DISTINCT a FROM t_distinct_0;
Its parallel execution plan might look like the following, where Streaming HashAggregate and Parallel Seq Scan indicate multi-stage parallel processing:
QUERY PLAN
------------------------------------------------------------
Gather Motion 6:1 (slice1; segments: 6)
-> HashAggregate
Group Key: a
-> Redistribute Motion 6:6 (slice2; segments: 6)
Hash Key: a
Hash Module: 3
-> Streaming HashAggregate
Group Key: a
-> Parallel Seq Scan on t_distinct_0
Optimizer: Postgres query optimizer
In addition, the DISTINCT operation is often treated as a special case of an aggregation operation in its implementation. For example, the query SELECT DISTINCT a, b FROM t_distinct_0; is logically equivalent to SELECT a, b FROM t_distinct_0 GROUP BY a, b;.
SynxDB supports parallelizing such aggregate queries with GROUP BY. To achieve parallel aggregation, the system uses different aggregation operators, such as HashAggregate, GroupAggregate, and Streaming HashAggregate. These are concrete implementations of aggregation operations and may appear in the execution plan regardless of whether the query explicitly includes a GROUP BY clause.
As shown in the example below, a query with multiple DISTINCT columns:
EXPLAIN (COSTS OFF)
SELECT DISTINCT a, b FROM t_distinct_0;
Its execution plan will include a parallel GroupAggregate and Parallel Seq Scan:
QUERY PLAN
-----------------------------------------------------------
GroupAggregate
Group Key: a, b
-> Gather Motion 6:1 (slice1; segments: 6)
Merge Key: a, b
-> GroupAggregate
Group Key: a, b
-> Sort
Sort Key: a, b
-> Parallel Seq Scan on t_distinct_0
Optimizer: Postgres query optimizer
Parameter descriptions
Parameter |
Description |
Default |
Required |
Example |
|---|---|---|---|---|
|
Whether to enable the parallel feature. |
|
Yes |
|
|
Whether to enable the GPORCA optimizer. |
|
Yes |
|
|
Sets the maximum number of parallel processes per executor node. |
|
Yes |
|
|
The number of files into which data is inserted within one transaction. The default value |
|
No |
|
|
The number of rows after which to switch to the next file for data insertion. |
|
No |
|
|
Sets the memory limit per query per segment. It is recommended to set this to at least |
|
No |
|
|
Sets the planner’s estimate of the cost of starting up worker processes for parallel query. |
|
No |
|
|
Controls whether GPORCA generates index scan plans (including parallel index scans) when parallel execution is enabled. When set to |
|
No |
|
|
Controls whether GPORCA generates index-only scan plans (including parallel index-only scans) when parallel execution is enabled. Requires a populated visibility map (run |
|
No |
|
|
Enables parallel append for seq/bitmap/index scan in partitioned tables in ORCA. |
|
No |
|
|
Enables parallel partition selector for parallel append plans. When enabled, workers share a bitmapset in shared memory to coordinate dynamic partition elimination (DPE) across parallel workers. Takes effect only when |
|
No |
|
|
Enables the optimizer’s use of parallel group aggregate (streaming) plans in ORCA. |
|
No |
|
|
Enables the optimizer’s use of parallel hash aggregate plans in ORCA. |
|
No |
|
|
Enables parallel execution for UNION/UNION ALL queries in ORCA. When enabled, GPORCA distributes UNION/UNION ALL work across multiple workers within each segment. |
|
No |
|
Frequently asked questions
Currently, parallel execution is supported for queries containing the following operators. SynxDB does not currently support queries with other operators.
sequence scan index scan index only scan bitmap heap scan append parallel append hash join nestloop join merge join hash aggregate group aggregate
Parallel query does not always improve query performance. An excessively high degree of parallelism can cause an overload, leading to a decrease in performance.
Enabling parallelism multiplies memory consumption, which might lead to “out of memory” errors.