Execute Queries in Parallel

This document describes the use cases, methods, limitations, and frequently asked questions for executing queries in parallel in SynxDB. SynxDB introduces the parallel query feature, where the number of computing nodes (including the SeqScan operator) is dynamically adjusted based on the data volume during query execution. Parallel query aims to improve query performance by utilizing multiple CPU cores to process a single query.

Use cases

Deploy a small number of segments on a single physical machine and improve performance by dynamically adjusting the degree of parallelism instead of deploying a large number of segments.
Enabling operator parallelism provides a performance advantage when the host CPU and disk load are not high.

How to use

SynxDB supports parallel queries on AO/AOCO tables and heap tables. Both GPORCA and the Postgres query optimizer support parallel execution strategies.

Parallel execution with GPORCA

GPORCA supports intra-segment parallel query execution, using multiple CPU cores on each segment to reduce execution time for long-running queries.

Key capabilities include:

Parallel table scan: Scans a single table using multiple workers in parallel.
Parallel hash join: Allows multiple workers to execute hash join operations concurrently, supporting shared hash tables. Supported join types include Inner Join, Left Outer Join, Semi Join (used in EXISTS subqueries), and Left Anti Semi Join (used in NOT IN subqueries).
Parallel aggregation: Supports parallel group and hash aggregation, enabling workers to perform partial aggregations locally.
Parallel index scan: Uses multiple workers to execute an index scan in parallel for queries with selective predicates on indexed columns.
Parallel UNION ALL: Executes UNION and UNION ALL queries in parallel within segments, improving throughput for queries combining multiple result sets. Controlled by the optimizer_parallel_union parameter.
Parallel index only scan: Uses multiple workers to execute an index-only scan in parallel, reading results entirely from the index without accessing table data blocks.

To use parallel queries with GPORCA, ensure optimizer is enabled (default) and enable_parallel is set to ON.

SET enable_parallel = ON;
SET optimizer = ON; -- Default
SET statement_mem = '256MB'; -- At least 32MB is recommended for GPORCA parallel

The following prerequisites apply to GPORCA parallel execution:

Parallel mode must be enabled (enable_parallel = on).
Tables involved must not be replicated tables.
The query plan must include other parallel-capable operators.
statement_mem should be set to at least 32MB (for example, SET statement_mem = '256MB';).
Tables should have the parallel_workers storage parameter set to specify the number of parallel workers per segment. For example, CREATE TABLE t (a INT) WITH (parallel_workers=2) DISTRIBUTED BY (a);.

Parallel index scan with GPORCA

When a query has a selective predicate on an indexed column, GPORCA evaluates whether a parallel index scan is more cost-effective than a parallel sequential scan and selects the plan accordingly. This applies to standard B-tree indexes.

This optimization applies to a wide range of query patterns, including queries with LIMIT, ORDER BY with LIMIT, various join types (Left Join, Right Join, Semi Join, Anti-Semi Join), IN subqueries, and aggregations.

Query plan example

The following example shows a parallel index scan plan generated by GPORCA when enable_parallel is ON:

CREATE TABLE t (a int, b int) WITH (parallel_workers=2) DISTRIBUTED BY (a);
CREATE INDEX t_a_idx ON t(a);
INSERT INTO t SELECT i, i FROM generate_series(1, 10000) i;
ANALYZE t;

SET enable_parallel = on;
EXPLAIN (VERBOSE, COSTS OFF) SELECT * FROM t WHERE a < 1000;

 Gather Motion 6:1  (slice1; segments: 6)
   Output: a, b
   ->  Parallel Index Scan using t_a_idx on t
         Output: a, b
         Index Cond: (t.a < 1000)
 Optimizer: GPORCA

To disable parallel index scan so that GPORCA falls back to a parallel sequential scan, set the optimizer_enable_indexscan GUC to off:

SET optimizer_enable_indexscan = off;

Parallel index only scan with GPORCA

When a query selects only columns covered by an index, GPORCA evaluates whether a parallel index-only scan is more cost-effective than a parallel index scan or sequential scan. If chosen, workers read results entirely from the index without accessing table data blocks. This applies to B-tree indexes only; bitmap AM indexes are not supported.

The following prerequisites apply in addition to the general GPORCA parallel prerequisites:

The index must cover all columns referenced by the query (a covering index).
VACUUM must have been run on the table so that the visibility map is populated. Without an up-to-date visibility map, the optimizer’s index-only scan cost equals the index scan cost, and it may prefer a parallel index scan instead.
Bitmap AM indexes are not supported for parallel index-only scans.

Query plan example

The following example shows a parallel index-only scan plan generated by GPORCA when enable_parallel is ON:

CREATE TABLE t (a int, b int) WITH (parallel_workers=2) DISTRIBUTED BY (a);
CREATE INDEX t_a_idx ON t(a);
INSERT INTO t SELECT i, i FROM generate_series(1, 200000) i;
VACUUM ANALYZE t;

SET enable_parallel = on;
EXPLAIN (VERBOSE, COSTS OFF) SELECT a FROM t WHERE a < 1000;

 Gather Motion 6:1  (slice1; segments: 6)
   Output: a
   ->  Parallel Index Only Scan using t_a_idx on t
         Output: a
         Index Cond: (t.a < 1000)
 Optimizer: GPORCA

Note that the query selects only a, which is covered by t_a_idx. If you also selected a column not in the index, GPORCA would use a Parallel Index Scan instead.

To disable parallel index only scan, set the optimizer_enable_indexonlyscan GUC to off:

SET optimizer_enable_indexonlyscan = off;

Parallel execution with Postgres query optimizer

You can also use the Postgres query optimizer for parallel execution. In this case, you must explicitly disable GPORCA.

SET enable_parallel = ON;
SET optimizer = OFF;

Query heap tables in parallel

Enable the parallel query feature.
- If using GPORCA:
```
SET enable_parallel = ON;
```
- If using the Postgres query optimizer:
```
SET enable_parallel = ON;
SET optimizer = OFF;
```

Set the maximum degree of parallelism.

-- The setting should consider the number of CPU cores and segments.
SET max_parallel_workers_per_gather = 4;

Query example:

SET statement_mem = '256MB';
CREATE TABLE t1 (c1 int,c2 int, c3 int, c4 box) WITH (parallel_workers=2) distributed by (c1);
INSERT INTO t1 SELECT x, 2*x, 3*x, box('6,6,6,6') FROM generate_series(1,1000000) AS x;
SELECT count(*) from t1;

Query AO/AOCO tables in parallel

Enable the parallel query feature.
- If using GPORCA:
```
SET enable_parallel = ON;
```
- If using the Postgres query optimizer:
```
SET enable_parallel = ON;
SET optimizer = OFF;
```

Set the maximum degree of parallelism.

-- The setting should consider the number of CPU cores and segments.
SET max_parallel_workers_per_gather = 4;

-- Set how many files to insert data into within one transaction. A larger value might decrease performance, especially for AOCO tables.
SET gp_appendonly_insert_files = 8;

-- Switch to the next file for data insertion every 100,000 rows. This can be adjusted to distribute data as evenly as possible across multiple files.
SET gp_appendonly_insert_files_tuples_range = 100000;

Insert and query.

CREATE TABLE ao(x INT) USING ao_row;
INSERT INTO ao SELECT i FROM generate_series(1, 1000000) i;
-- It is recommended to update statistics first.
ANALYZE ao;
SELECT COUNT(*) FROM ao;

Query partitioned tables in parallel

GPORCA supports two complementary features for partitioned table scan performance:

Parallel Append (optimizer_enable_parallel_append): Parallelizes segment-level scans across partitions. GPORCA evaluates both a parallel append path and the default dynamic scan path, selecting the lower-cost plan.
Parallel Partition Selector (optimizer_enable_parallel_part_selector): When parallel append is enabled, this further parallelizes dynamic partition elimination (DPE) across workers using a shared-memory bitmapset, so workers coordinate partition pruning without redundant scans. This is enabled by default.

Enable the parallel query feature and GPORCA optimizer.

SET enable_parallel = ON;
SET optimizer = ON; -- Default

Enable parallel append for partitioned tables.
```
SET optimizer_enable_parallel_append = on;
```
The parallel partition selector (optimizer_enable_parallel_part_selector) is enabled by default and takes effect automatically once parallel append is turned on. To disable it explicitly:
```
SET optimizer_enable_parallel_part_selector = off;
```
Set the maximum degree of parallelism.
```
SET max_parallel_workers_per_gather = 4;
```
Query example:
```
CREATE TABLE sales (id INT, amount DECIMAL)
    DISTRIBUTED BY (id)
    PARTITION BY RANGE (id)
    (START (1) END (1000001) EVERY (100000));
INSERT INTO sales SELECT i, i * 1.5 FROM generate_series(1, 1000000) i;
ANALYZE sales;
SELECT SUM(amount) FROM sales;
```
Note
- Parallel append is a cost-based optimization. GPORCA evaluates the cost of both the parallel append path and the default dynamic scan path, and selects the plan with the lower estimated cost. In environments with a small number of segments or limited data volume, the optimizer may determine that the dynamic scan path is more efficient, and therefore not choose the parallel append plan.
- When a subquery or subplan appears in the WHERE clause of a partitioned table join, GPORCA falls back to using Dynamic Seq Scan for partition elimination, which handles subplans correctly.

Execute UNION operations in parallel

SynxDB parallelizes UNION operations by introducing a mechanism called Parallel-oblivious Append. This mechanism allows multiple worker processes to process data independently in a shared-nothing manner, thereby improving the performance and reliability of UNION operations.

When you execute a query containing a UNION, you can observe operators such as HashAggregate and Redistribute Motion in the execution plan (EXPLAIN). This indicates that the query is being executed in parallel.

Query example:

SELECT DISTINCT a FROM t_distinct_0 UNION SELECT DISTINCT b FROM t_distinct_0;

The execution plan for the query might look like this:

QUERY PLAN
----------------------------------------------------------------------
 Gather Motion 6:1  (slice1; segments: 6)
   ->  HashAggregate
         Group Key: t_distinct_0.a
         ->  Redistribute Motion 6:6  (slice2; segments: 6)
               Hash Key: t_distinct_0.a
               Hash Module: 3
               ->  Append
                     ->  GroupAggregate
                           Group Key: t_distinct_0.a
                           ->  Sort
                                 Sort Key: t_distinct_0.a
                                 ->  Redistribute Motion 6:6  (slice3;
segments: 6)
                                       Hash Key: t_distinct_0.a
                                       Hash Module: 3
                                       ->  Streaming HashAggregate
                                             Group Key: t_distinct_0.a
                                             ->  Parallel Seq Scan on
t_distinct_0
                     ->  GroupAggregate
                           Group Key: t_distinct_0_1.b
                           ->  Sort
                                 Sort Key: t_distinct_0_1.b
                                 ->  Redistribute Motion 6:6  (slice4;
segments: 6)
                                       Hash Key: t_distinct_0_1.b
                                       Hash Module: 3
                                       ->  Streaming HashAggregate
                                             Group Key: t_distinct_0_1.b
                                             ->  Parallel Seq Scan on
t_distinct_0 t_distinct_0_1

Execute DISTINCT operations in parallel

In a Massively Parallel Processing (MPP) architecture, SynxDB supports parallelizing DISTINCT operations to accelerate query performance.

The core principle is to use Motion operators to redistribute tuples among multiple worker processes. Even if the data’s distribution key does not match the DISTINCT target expression, the system can still generate a parallel plan based on a Parallel Scan of the underlying table.

This process is usually divided into multiple stages:

Pre-deduplication: Perform an initial parallel deduplication on partial data using the Streaming HashAggregate or HashAggregate operator.
Redistribution: Redistribute the data according to the DISTINCT expression.
Final deduplication: Perform the final DISTINCT operation.

Query example:

SELECT DISTINCT a FROM t_distinct_0;

Its parallel execution plan might look like the following, where Streaming HashAggregate and Parallel Seq Scan indicate multi-stage parallel processing:

QUERY PLAN
------------------------------------------------------------
 Gather Motion 6:1  (slice1; segments: 6)
   ->  HashAggregate
         Group Key: a
         ->  Redistribute Motion 6:6  (slice2; segments: 6)
               Hash Key: a
               Hash Module: 3
               ->  Streaming HashAggregate
                     Group Key: a
                     ->  Parallel Seq Scan on t_distinct_0
 Optimizer: Postgres query optimizer

In addition, the DISTINCT operation is often treated as a special case of an aggregation operation in its implementation. For example, the query SELECT DISTINCT a, b FROM t_distinct_0; is logically equivalent to SELECT a, b FROM t_distinct_0 GROUP BY a, b;.

SynxDB supports parallelizing such aggregate queries with GROUP BY. To achieve parallel aggregation, the system uses different aggregation operators, such as HashAggregate, GroupAggregate, and Streaming HashAggregate. These are concrete implementations of aggregation operations and may appear in the execution plan regardless of whether the query explicitly includes a GROUP BY clause.

As shown in the example below, a query with multiple DISTINCT columns:

EXPLAIN (COSTS OFF)
SELECT DISTINCT a, b FROM t_distinct_0;

Its execution plan will include a parallel GroupAggregate and Parallel Seq Scan:

QUERY PLAN
-----------------------------------------------------------
 GroupAggregate
   Group Key: a, b
   ->  Gather Motion 6:1  (slice1; segments: 6)
         Merge Key: a, b
         ->  GroupAggregate
               Group Key: a, b
               ->  Sort
                     Sort Key: a, b
                     ->  Parallel Seq Scan on t_distinct_0
 Optimizer: Postgres query optimizer

Parameter descriptions

Parameter	Description	Default	Required	Example
`enable_parallel`	Whether to enable the parallel feature.	`OFF`	Yes	`SET enable_parallel = ON;`
`optimizer`	Whether to enable the GPORCA optimizer.	`ON`	Yes	`SET optimizer = ON;`
`max_parallel_workers_per_gather`	Sets the maximum number of parallel processes per executor node.	`2`	Yes	`SET max_parallel_workers_per_gather = 2;`
`gp_appendonly_insert_files`	The number of files into which data is inserted within one transaction. The default value `0` means that the system automatically determines the number of files. If you need to prepare AO/AOCO files in advance for parallel queries, set this parameter to an appropriate value before data insertion.	`0`	No	`SET gp_appendonly_insert_files = 8;`
`gp_appendonly_insert_files_tuples_range`	The number of rows after which to switch to the next file for data insertion.	`100000`	No	`SET gp_appendonly_insert_files_tuples_range = 100000;`
`statement_mem`	Sets the memory limit per query per segment. It is recommended to set this to at least `32MB` for GPORCA to generate parallel plans.	`128MB`	No	`SET statement_mem = '256MB';`
`parallel_setup_cost`	Sets the planner’s estimate of the cost of starting up worker processes for parallel query.	`1000`	No	`SET parallel_setup_cost = 1000;`
`optimizer_enable_indexscan`	Controls whether GPORCA generates index scan plans (including parallel index scans) when parallel execution is enabled. When set to `off`, GPORCA falls back to parallel sequential scans.	`on`	No	`SET optimizer_enable_indexscan = off;`
`optimizer_enable_indexonlyscan`	Controls whether GPORCA generates index-only scan plans (including parallel index-only scans) when parallel execution is enabled. Requires a populated visibility map (run `VACUUM` first). Does not apply to bitmap AM indexes. When set to `off`, GPORCA falls back to index scan or sequential scan.	`on`	No	`SET optimizer_enable_indexonlyscan = off;`
`optimizer_enable_parallel_append`	Enables parallel append for seq/bitmap/index scan in partitioned tables in ORCA.	`off`	No	`SET optimizer_enable_parallel_append = on;`
`optimizer_enable_parallel_part_selector`	Enables parallel partition selector for parallel append plans. When enabled, workers share a bitmapset in shared memory to coordinate dynamic partition elimination (DPE) across parallel workers. Takes effect only when `optimizer_enable_parallel_append` is also set to `on`.	`on`	No	`SET optimizer_enable_parallel_part_selector = off;`
`optimizer_enable_parallel_groupagg`	Enables the optimizer’s use of parallel group aggregate (streaming) plans in ORCA.	`on`	No	`SET optimizer_enable_parallel_groupagg = on;`
`optimizer_enable_parallel_hashagg`	Enables the optimizer’s use of parallel hash aggregate plans in ORCA.	`on`	No	`SET optimizer_enable_parallel_hashagg = on;`
`optimizer_parallel_union`	Enables parallel execution for UNION/UNION ALL queries in ORCA. When enabled, GPORCA distributes UNION/UNION ALL work across multiple workers within each segment.	`on`	No	`SET optimizer_parallel_union = off;`

Frequently asked questions

Currently, parallel execution is supported for queries containing the following operators. SynxDB does not currently support queries with other operators.

sequence scan
index scan
index only scan
bitmap heap scan
append
parallel append
hash join
nestloop join
merge join
hash aggregate
group aggregate

Parallel query does not always improve query performance. An excessively high degree of parallelism can cause an overload, leading to a decrease in performance.
Enabling parallelism multiplies memory consumption, which might lead to “out of memory” errors.