GPORCA Performance Tuning
This document describes the performance optimizations available in the GPORCA optimizer in SynxDB. Each section addresses a common performance scenario and explains how GPORCA can help.
Quick reference
Problem you are facing |
Go to |
|---|---|
Queries are slow on large table scans, joins, or aggregations |
|
Queries read too many rows from disk despite having indexes |
|
Joins are slow or cause excessive data movement |
|
Window functions are slow because of data skew |
|
Top-N-per-group queries are slow |
Optimize top-N-per-group queries with WindowAgg Top-K pushdown |
Queries on large partitioned tables are slow |
|
Vectorized execution does not work on partitioned tables |
Accelerate queries with parallel execution
If your queries spend most of their time scanning large tables, performing heavy joins, or running aggregations, you can enable intra-segment parallelism to distribute the work across multiple CPU cores within each segment. GPORCA generates parallel plans automatically when it estimates that parallelism is beneficial.
These features are controlled by the standard PostgreSQL parallel query GUCs (for example, max_parallel_workers_per_gather). To learn how to configure and use parallelism, see Execute Queries in Parallel.
Parallel table scan
GPORCA can scan a single table using multiple workers in parallel. This applies to both Heap and AO/AOCO tables. The integrated cost model ensures that GPORCA chooses parallel scans only when beneficial, accounting for worker setup costs and efficiency scaling.
Note
Currently, the parallel table scan optimization in GPORCA does not support:
Common Table Expressions (CTEs)
Partitioned tables using Dynamic Scans (for partitioned table parallelism, see Parallel append for partitioned table scans)
Foreign tables
Replicated tables
Rewindable cursors (parallel scans are non-rewindable)
Parallel index scan
GPORCA can use multiple workers to execute an index scan in parallel, speeding up queries that filter rows through an index on a large table. For details, see Parallel Index Scan with GPORCA.
Parallel index only scan
GPORCA can use multiple workers to execute an index-only scan in parallel for queries that select only columns covered by an index. Workers read results entirely from the index without accessing table data blocks, which reduces I/O compared to a regular parallel index scan. This optimization requires a populated visibility map (run VACUUM on the table first) and applies to B-tree indexes only. For details, see Parallel Index Only Scan with GPORCA.
Parallel hash join
GPORCA supports parallel-aware hash joins (Inner and Left Outer Joins), allowing multiple workers to execute a hash join concurrently. The optimizer can transform logical joins into parallel physical joins, with workers coordinating to build shared hash tables or process independent subsets of data.
Parallel hash semi join and hash left anti semi join
GPORCA extends parallel execution to Hash Semi Join (used in EXISTS subqueries) and Hash Left Anti Semi Join (used in NOT IN subqueries). These join types can use multiple CPU cores within each segment, improving performance for queries that filter rows based on the existence or non-existence of matching rows in a subquery.
Parallel aggregation
GPORCA supports parallel group and hash aggregation, enabling workers to perform partial aggregations locally before redistributing or merging results. This helps queries that aggregate over large datasets or use GROUP BY with many distinct values.
Parallel append for partitioned table scans
GPORCA can generate parallel append paths for partitioned table scans, parallelizing segment-level scans across partitions to improve performance on large partitioned tables. This feature is controlled by the optimizer_enable_parallel_append GUC parameter.
When parallel append is enabled, GPORCA can additionally generate a parallel partition selector node that parallelizes dynamic partition elimination (DPE) across workers using a shared-memory bitmapset, preventing redundant scans. This feature is controlled by the optimizer_enable_parallel_part_selector GUC parameter and is enabled by default.
Note
Parallel append is a cost-based optimization. GPORCA evaluates the cost of both the parallel append path and the default dynamic scan path, and selects the plan with the lower estimated cost. In environments with a small number of segments or limited data volume, the optimizer may determine that the dynamic scan path is more efficient, and therefore not choose the parallel append plan.
For detailed usage and configuration, see Query Partitioned Tables in Parallel.
Reduce I/O with index optimizations
If your queries have suitable indexes but GPORCA still reads more data than necessary, the following optimizations can help reduce disk I/O and avoid unnecessary sort operations.
Index-only scans on AO and PAX tables
In addition to B-tree index-only scans on Heap tables, GPORCA supports index-only scans on AO tables and PAX tables. You can also use PostgreSQL-style INCLUDE columns to create covering indexes on these table types, so that the query reads only from the index without touching the table data.
This is useful when your query selects only a subset of columns that the index already covers. For more information about index scans on AO tables, see Use Index Scan on AO Tables.
Backward index scans
When a query requires data in descending order (for example, ORDER BY col DESC LIMIT 10), GPORCA can automatically choose a backward index scan instead of scanning forward and then sorting. This applies to both regular and index-only scans, reducing the need for extra Sort nodes and improving top-N query performance.
ScalarArrayOp predicate pushdown to index paths
When your query filters rows using IN (...) or = ANY(array) predicates, GPORCA can push these ScalarArrayOp predicates down to index paths, including:
B-tree or hash indexes
Index Scan or Bitmap Index Scan paths
The optimizer decides whether to use a Bitmap scan based on cost estimation. In low-selectivity cases, it might still fall back to a sequential scan. Note that this pushdown only applies to the leading column of an index, not to non-leading columns in composite indexes. Example:
CREATE INDEX idx ON t(col1, col2);
SELECT * FROM t WHERE col1 = ANY('{1,2,3}'); -- Pushdown supported
SELECT * FROM t WHERE col2 = ANY('{1,2,3}'); -- No pushdown, used as filter only
Index-only scans inside CTEs
GPORCA supports index-only scans inside common table expressions (CTEs). If your CTE selects only indexed columns, GPORCA can serve the query entirely from the index:
CREATE TABLE t(a int, b int);
CREATE INDEX i ON t(a);
INSERT INTO t SELECT i, i+i FROM generate_series(1, 10)i;
VACUUM ANALYZE t;
EXPLAIN WITH cte AS (SELECT a FROM t WHERE a > 42) SELECT * FROM cte;
Improve join performance
If your queries involve complex joins and you observe slow execution or excessive data movement (Motion), the following optimizations can help GPORCA choose better join strategies.
Hash full join
When your query uses FULL JOIN, GPORCA executes it using the Hash Full Join strategy. This approach does not require sorting join keys and works well for large datasets, high-cardinality join keys, or mismatched distribution keys.
Compared to traditional Merge Join, Hash Full Join offers the following advantages:
No sorting required on join keys
Reduced data movement (Motion) overhead
Potentially better performance when join keys are highly skewed or have high cardinality
GPORCA does not currently support Merge Full Join, so all FULL JOIN queries use Hash Full Join.
Example:
EXPLAIN SELECT * FROM t1 FULL JOIN t2 ON t1.id = t2.id;
This might produce the following plan:
Hash Full Join
Hash Cond: t1.id = t2.id
...
Push JOIN below UNION ALL
When a query joins a small table with a view or subquery that contains a UNION ALL, GPORCA can rewrite the query to push the JOIN into each branch of the UNION ALL. This can significantly improve performance by:
Converting a large join over a
UNION ALLinto multiple smaller joinsAllowing each sub-join to use indexes independently, reducing Motion and
Hash JoinoverheadPushing the
JOINto either the left or right side of theUNION ALL, enabling more flexible query structures
This optimization is off by default. To enable it, set the following GUC:
SET optimizer_enable_push_join_below_union_all = on;
The following example shows how the optimizer pushes the JOIN below each UNION ALL branch when you enable this optimization:
-- Create test tables
CREATE TABLE dist_small_1(c1 int);
INSERT INTO dist_small_1 SELECT generate_series(1, 1000);
CREATE INDEX dist_small_1_index ON dist_small_1 USING btree (c1);
ANALYZE dist_small_1;
CREATE TABLE dist_small_2(c1 int);
INSERT INTO dist_small_2 SELECT generate_series(1, 1000);
ANALYZE dist_small_2;
CREATE TABLE inner_1(cc int);
INSERT INTO inner_1 VALUES(1);
ANALYZE inner_1;
-- Create a view
CREATE VIEW dist_view_small AS
SELECT c1 FROM dist_small_1
UNION ALL
SELECT c1 FROM dist_small_2;
-- Enable the optimization and run the query
SET optimizer_enable_push_join_below_union_all = on;
EXPLAIN ANALYZE
SELECT c1 FROM dist_view_small JOIN inner_1 ON c1 < cc;
The optimizer might produce a plan like the following:
-> Append
-> Nested Loop
...
-> Index Scan using dist_small_1_index on dist_small_1
-> Nested Loop
...
-> Seq Scan on dist_small_2
This optimization is especially useful when:
You join a
UNION ALLview with a small tableEach branch of the
UNION ALLhas indexes that the optimizer can use
Note
This optimization does not support
FULL JOINor Common Table Expressions (CTEs).It also does not support
JOIN of UNION ALLorUNION ALL of JOINstructures.
Speed up NOT IN subqueries
NOT IN subqueries can be slow when the optimizer avoids broadcast paths because of the optimizer_penalize_broadcast_threshold setting. For NOT IN queries (Left Anti Semi Join), GPORCA automatically lifts this penalty so that broadcast paths remain available. This prevents the optimizer from concentrating large tables on the coordinator, which can otherwise lead to severe performance issues or out-of-memory (OOM) errors.
How this optimization works:
Applies only to
NOT INqueries (LASJ)Ignores the setting of
optimizer_penalize_broadcast_thresholdThe penalty strategy remains in place for other types of joins (such as
INorEXISTS)
Example:
SELECT * FROM foo WHERE a NOT IN (SELECT a FROM bar);
Sample query plan:
Gather Motion 2:1
-> Hash Left Anti Semi (Not-In) Join
-> Seq Scan on foo
-> Broadcast Motion
-> Seq Scan on bar
Eliminate unnecessary redistribution in self-joins
When you join a table with itself multiple times using outer joins, GPORCA can detect this pattern and skip unnecessary Redistribute Motion operations. This optimization applies when:
The query contains multiple LEFT OUTER JOIN or RIGHT OUTER JOIN operations
All joined tables are aliases of the same base table
The join conditions are symmetric (for example,
t1.a = t2.a)All tables use the same distribution key and satisfy locality constraints
Example:
CREATE TABLE o1 (a1 int, b1 int) DISTRIBUTED BY (a1);
EXPLAIN (COSTS OFF)
SELECT * FROM (SELECT DISTINCT a1 FROM o1) t1
LEFT OUTER JOIN o1 t2 ON t1.a1 = t2.a1
LEFT OUTER JOIN o1 t3 ON t2.a1 = t3.a1;
Without this optimization, the query inserts Redistribute Motion between each join level. With it, GPORCA detects the multi-level self-join pattern and avoids unnecessary data redistribution.
Handle data skew in window functions
If your window function queries are slow because the PARTITION BY key has skewed data (for example, most rows fall into a few partitions), GPORCA can split the computation into two phases to reduce the amount of data that needs to be redistributed.
Two-phase WindowAgg optimization
When the optimizer detects a window function with a highly selective filter condition (such as rank() < N), it automatically splits the computation into two phases:
Phase 1 — Local computation: The first
WindowAggruns independently on each segment, and the filter condition is applied locally. This filters out most rows before any data redistribution.Phase 2 — Global computation: The small amount of remaining data is redistributed across all segments based on the partition key, and the second
WindowAggruns to produce the final result.
You do not need to modify any SQL statements to benefit from this optimization. By minimizing the amount of data before redistribution, it effectively mitigates the performance impact of data skew.
Note
This optimization only improves performance when the window function has a highly selective filter condition (that is, it can effectively filter out a large amount of data). If the filtering is not effective, enabling this optimization might actually degrade performance because of the extra computation and data movement introduced.
By default, GPORCA assigns a high cost to the two-phase window function path, so the optimizer usually does not choose this plan automatically. To force it, you can enable it with the following GUC parameter:
SET optimizer_force_split_window_function = on;
Query plan example
The following query plan shows how GPORCA automatically applies the two-phase optimization for a window function with a filter condition.
EXPLAIN SELECT * FROM (SELECT rank() OVER (PARTITION BY four ORDER BY ten) AS rank_1, ten, four FROM tenk1 WHERE unique2 < 10) t WHERE rank_1 < 3;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------
Gather Motion 3:1 (slice1; segments: 3)
-> Result
Filter: ((rank() OVER (?)) < 3)
-> WindowAgg -- Phase 2 WindowAgg
Partition By: four
-> Redistribute Motion 3:3 -- Redistribute pre-aggregated data
Hash Key: four
-> Result
Filter: ((rank() OVER (?)) < 3)
-> WindowAgg -- Phase 1 WindowAgg (pre-aggregation)
Partition By: four
-> Sort
-> Index Scan using tenk1_unique2 on tenk1
Index Cond: (unique2 < 10)
Optimizer: GPORCA
Optimize top-N-per-group queries with WindowAgg Top-K pushdown
For common top-N-per-group analytics queries such as “find the top 2 items per category”, GPORCA can push down a Top-K filter below the WindowAgg operator. This inserts a Partition Top-K node that prunes non-qualifying rows early during execution, reducing memory and CPU usage.
How it works
When you write a query like:
SELECT * FROM (
SELECT id, category, score, name,
RANK() OVER (PARTITION BY category ORDER BY score DESC) AS r
FROM my_table
) t
WHERE r <= 2;
With the Top-K pushdown enabled, GPORCA inserts a Partition Top-K node below the WindowAgg operator. As each partition is processed, this node retains only rows within the top K ranks and correctly includes all tied rows at the rank boundary. Rows that cannot qualify are discarded early, before the full window rank computation.
Enable the optimization
This optimization is controlled by the GUC parameter optimizer_force_partition_topk, which is off by default. To enable it:
SET optimizer = on;
SET optimizer_force_partition_topk = on;
Note
This optimization requires the GPORCA optimizer to be enabled (
SET optimizer = on).Only
RANK()with<= Kor< (K+1)filter conditions is supported.ROW_NUMBER()andDENSE_RANK()fall back to standard WindowAgg execution.
Query plan example
The following example demonstrates the Top-K pushdown optimization. First, create a test table:
CREATE TABLE test_partition_topk (
id INT,
category TEXT,
score INT,
name TEXT
) DISTRIBUTED BY (id);
Then, enable the optimization and view the query plan:
SET optimizer = on;
SET optimizer_force_partition_topk = on;
EXPLAIN (COSTS OFF)
SELECT * FROM (
SELECT id, category, score, name,
RANK() OVER (PARTITION BY category ORDER BY score DESC) AS r
FROM test_partition_topk
) t
WHERE r <= 2
ORDER BY category, r, id;
QUERY PLAN
---------------------------------------------------------------------------------------------
Gather Motion 3:1 (slice1; segments: 3)
Merge Key: category, (rank() OVER (?)), id
-> Sort
Sort Key: category, (rank() OVER (?)), id
-> Result
Filter: ((rank() OVER (?)) <= 2)
-> WindowAgg
Partition By: category
Order By: score
-> Sort
Sort Key: category, score DESC
-> Redistribute Motion 3:3 (slice2; segments: 3)
Hash Key: category
-> Result
Filter: ((rank() OVER (?)) <= 2)
-> WindowAgg
Partition By: category
Order By: score
-> Sort
Sort Key: category, score DESC
-> Partition Top-K
Top K: 2
-> Seq Scan on test_partition_topk
Optimizer: GPORCA
The plan contains a Partition Top-K node with Top K: 2, confirming that the Top-K pushdown is active. This node prunes rows early before they reach the WindowAgg operator, significantly improving performance for large datasets.
Enable vectorized execution for partitioned tables
If you use partitioned tables and want to take advantage of the vectorized executor, you might find that Dynamic Scan does not support multi-threaded metadata reading, which blocks vectorized execution. For more information about vectorized execution, see Vectorization Query Computing.
Scan partitioned tables using the Append operator
To work around this limitation, you can switch from Dynamic Scan to the Append operator by setting the following GUC:
optimizer_disable_dynamic_table_scan = off(default): GPORCA usesDynamic Scanto scan partitioned tables.optimizer_disable_dynamic_table_scan = on: GPORCA uses theAppendoperator to scan partitioned tables for compatibility with the vectorized executor.
Warning
When you enable the Append operator (that is, set optimizer_disable_dynamic_table_scan to on), the following limitation applies:
Index-only scans on partitioned tables are not supported.