GPORCA Performance Tuning

This document describes the performance optimizations available in the GPORCA optimizer in SynxDB. Each section addresses a common performance scenario and explains how GPORCA can help.

Quick reference

Problem you are facing	Go to
Queries are slow on large table scans, joins, or aggregations	Accelerate queries with parallel execution
Queries read too many rows from disk despite having indexes	Reduce I/O with index optimizations
Joins are slow or cause excessive data movement	Improve join performance
Window functions are slow because of data skew	Handle data skew in window functions
Top-N-per-group queries are slow	Optimize top-N-per-group queries with WindowAgg Top-K pushdown
Queries on large partitioned tables are slow	Accelerate partitioned table scans with parallel append
Vectorized execution does not work on partitioned tables	Enable vectorized execution for partitioned tables

Accelerate queries with parallel execution

If your queries spend most of their time scanning large tables, performing heavy joins, or running aggregations, you can enable intra-segment parallelism to distribute the work across multiple CPU cores within each segment. GPORCA generates parallel plans automatically when it estimates that parallelism is beneficial.

These features are controlled by the standard PostgreSQL parallel query GUCs (for example, max_parallel_workers_per_gather). To learn how to configure and use parallelism, see Execute Queries in Parallel.

Parallel table scan

GPORCA can scan a single table using multiple workers in parallel. This applies to both Heap and AO/AOCO tables. The integrated cost model ensures that GPORCA chooses parallel scans only when beneficial, accounting for worker setup costs and efficiency scaling.

Note

Currently, the parallel table scan optimization in GPORCA does not support:

Common Table Expressions (CTEs)
Partitioned tables using Dynamic Scans (for partitioned table parallelism, see Parallel append for partitioned table scans)
Foreign tables
Replicated tables
Rewindable cursors (parallel scans are non-rewindable)

Parallel index scan

GPORCA can use multiple workers to execute an index scan in parallel, speeding up queries that filter rows through an index on a large table. For details, see Parallel Index Scan with GPORCA.

Parallel index only scan

GPORCA can use multiple workers to execute an index-only scan in parallel for queries that select only columns covered by an index. Workers read results entirely from the index without accessing table data blocks, which reduces I/O compared to a regular parallel index scan. This optimization requires a populated visibility map (run VACUUM on the table first) and applies to B-tree indexes only. For details, see Parallel Index Only Scan with GPORCA.

Parallel hash join

GPORCA supports parallel-aware hash joins (Inner and Left Outer Joins), allowing multiple workers to execute a hash join concurrently. The optimizer can transform logical joins into parallel physical joins, with workers coordinating to build shared hash tables or process independent subsets of data.

Parallel hash semi join and hash left anti semi join

GPORCA extends parallel execution to Hash Semi Join (used in EXISTS subqueries) and Hash Left Anti Semi Join (used in NOT IN subqueries). These join types can use multiple CPU cores within each segment, improving performance for queries that filter rows based on the existence or non-existence of matching rows in a subquery.

Parallel aggregation

GPORCA supports parallel group and hash aggregation, enabling workers to perform partial aggregations locally before redistributing or merging results. This helps queries that aggregate over large datasets or use GROUP BY with many distinct values.

Parallel append for partitioned table scans

GPORCA can generate parallel append paths for partitioned table scans, parallelizing segment-level scans across partitions to improve performance on large partitioned tables. This feature is controlled by the optimizer_enable_parallel_append GUC parameter.

When parallel append is enabled, GPORCA can additionally generate a parallel partition selector node that parallelizes dynamic partition elimination (DPE) across workers using a shared-memory bitmapset, preventing redundant scans. This feature is controlled by the optimizer_enable_parallel_part_selector GUC parameter and is enabled by default.

Note

Parallel append is a cost-based optimization. GPORCA evaluates the cost of both the parallel append path and the default dynamic scan path, and selects the plan with the lower estimated cost. In environments with a small number of segments or limited data volume, the optimizer may determine that the dynamic scan path is more efficient, and therefore not choose the parallel append plan.

For detailed usage and configuration, see Query Partitioned Tables in Parallel.

Reduce I/O with index optimizations

If your queries have suitable indexes but GPORCA still reads more data than necessary, the following optimizations can help reduce disk I/O and avoid unnecessary sort operations.

Index-only scans on AO and PAX tables

In addition to B-tree index-only scans on Heap tables, GPORCA supports index-only scans on AO tables and PAX tables. You can also use PostgreSQL-style INCLUDE columns to create covering indexes on these table types, so that the query reads only from the index without touching the table data.

This is useful when your query selects only a subset of columns that the index already covers. For more information about index scans on AO tables, see Use Index Scan on AO Tables.

Backward index scans

When a query requires data in descending order (for example, ORDER BY col DESC LIMIT 10), GPORCA can automatically choose a backward index scan instead of scanning forward and then sorting. This applies to both regular and index-only scans, reducing the need for extra Sort nodes and improving top-N query performance.

ScalarArrayOp predicate pushdown to index paths

When your query filters rows using IN (...) or = ANY(array) predicates, GPORCA can push these ScalarArrayOp predicates down to index paths, including:

B-tree or hash indexes
Index Scan or Bitmap Index Scan paths

The optimizer decides whether to use a Bitmap scan based on cost estimation. In low-selectivity cases, it might still fall back to a sequential scan. Note that this pushdown only applies to the leading column of an index, not to non-leading columns in composite indexes. Example:

CREATE INDEX idx ON t(col1, col2);
SELECT * FROM t WHERE col1 = ANY('{1,2,3}');  -- Pushdown supported
SELECT * FROM t WHERE col2 = ANY('{1,2,3}');  -- No pushdown, used as filter only

Index-only scans inside CTEs

GPORCA supports index-only scans inside common table expressions (CTEs). If your CTE selects only indexed columns, GPORCA can serve the query entirely from the index:

CREATE TABLE t(a int, b int);
CREATE INDEX i ON t(a);
INSERT INTO t SELECT i, i+i FROM generate_series(1, 10)i;
VACUUM ANALYZE t;

EXPLAIN WITH cte AS (SELECT a FROM t WHERE a > 42) SELECT * FROM cte;

Improve join performance

If your queries involve complex joins and you observe slow execution or excessive data movement (Motion), the following optimizations can help GPORCA choose better join strategies.

Hash full join

When your query uses FULL JOIN, GPORCA executes it using the Hash Full Join strategy. This approach does not require sorting join keys and works well for large datasets, high-cardinality join keys, or mismatched distribution keys.

Compared to traditional Merge Join, Hash Full Join offers the following advantages:

No sorting required on join keys
Reduced data movement (Motion) overhead
Potentially better performance when join keys are highly skewed or have high cardinality

GPORCA does not currently support Merge Full Join, so all FULL JOIN queries use Hash Full Join.

Example:

EXPLAIN SELECT * FROM t1 FULL JOIN t2 ON t1.id = t2.id;

This might produce the following plan:

Hash Full Join
Hash Cond: t1.id = t2.id
...

Push JOIN below UNION ALL

When a query joins a small table with a view or subquery that contains a UNION ALL, GPORCA can rewrite the query to push the JOIN into each branch of the UNION ALL. This can significantly improve performance by:

Converting a large join over a UNION ALL into multiple smaller joins
Allowing each sub-join to use indexes independently, reducing Motion and Hash Join overhead
Pushing the JOIN to either the left or right side of the UNION ALL, enabling more flexible query structures

This optimization is off by default. To enable it, set the following GUC:

SET optimizer_enable_push_join_below_union_all = on;

The following example shows how the optimizer pushes the JOIN below each UNION ALL branch when you enable this optimization:

-- Create test tables
CREATE TABLE dist_small_1(c1 int);
INSERT INTO dist_small_1 SELECT generate_series(1, 1000);
CREATE INDEX dist_small_1_index ON dist_small_1 USING btree (c1);
ANALYZE dist_small_1;

CREATE TABLE dist_small_2(c1 int);
INSERT INTO dist_small_2 SELECT generate_series(1, 1000);
ANALYZE dist_small_2;

CREATE TABLE inner_1(cc int);
INSERT INTO inner_1 VALUES(1);
ANALYZE inner_1;

-- Create a view
CREATE VIEW dist_view_small AS
SELECT c1 FROM dist_small_1
UNION ALL
SELECT c1 FROM dist_small_2;

-- Enable the optimization and run the query
SET optimizer_enable_push_join_below_union_all = on;
EXPLAIN ANALYZE
SELECT c1 FROM dist_view_small JOIN inner_1 ON c1 < cc;

The optimizer might produce a plan like the following:

->  Append
      ->  Nested Loop
            ...
            ->  Index Scan using dist_small_1_index on dist_small_1
      ->  Nested Loop
            ...
            ->  Seq Scan on dist_small_2

This optimization is especially useful when:

You join a UNION ALL view with a small table
Each branch of the UNION ALL has indexes that the optimizer can use

Note

This optimization does not support FULL JOIN or Common Table Expressions (CTEs).
It also does not support JOIN of UNION ALL or UNION ALL of JOIN structures.

Speed up NOT IN subqueries

NOT IN subqueries can be slow when the optimizer avoids broadcast paths because of the optimizer_penalize_broadcast_threshold setting. For NOT IN queries (Left Anti Semi Join), GPORCA automatically lifts this penalty so that broadcast paths remain available. This prevents the optimizer from concentrating large tables on the coordinator, which can otherwise lead to severe performance issues or out-of-memory (OOM) errors.

How this optimization works:

Applies only to NOT IN queries (LASJ)
Ignores the setting of optimizer_penalize_broadcast_threshold
The penalty strategy remains in place for other types of joins (such as IN or EXISTS)

Example:

SELECT * FROM foo WHERE a NOT IN (SELECT a FROM bar);

Sample query plan:

Gather Motion 2:1
-> Hash Left Anti Semi (Not-In) Join
      -> Seq Scan on foo
      -> Broadcast Motion
            -> Seq Scan on bar

Eliminate unnecessary redistribution in self-joins

When you join a table with itself multiple times using outer joins, GPORCA can detect this pattern and skip unnecessary Redistribute Motion operations. This optimization applies when:

The query contains multiple LEFT OUTER JOIN or RIGHT OUTER JOIN operations
All joined tables are aliases of the same base table
The join conditions are symmetric (for example, t1.a = t2.a)
All tables use the same distribution key and satisfy locality constraints

Example:

CREATE TABLE o1 (a1 int, b1 int) DISTRIBUTED BY (a1);

EXPLAIN (COSTS OFF)
SELECT * FROM (SELECT DISTINCT a1 FROM o1) t1
             LEFT OUTER JOIN o1 t2 ON t1.a1 = t2.a1
             LEFT OUTER JOIN o1 t3 ON t2.a1 = t3.a1;

Without this optimization, the query inserts Redistribute Motion between each join level. With it, GPORCA detects the multi-level self-join pattern and avoids unnecessary data redistribution.

Handle data skew in window functions

If your window function queries are slow because the PARTITION BY key has skewed data (for example, most rows fall into a few partitions), GPORCA can split the computation into two phases to reduce the amount of data that needs to be redistributed.

Two-phase WindowAgg optimization

When the optimizer detects a window function with a highly selective filter condition (such as rank() < N), it automatically splits the computation into two phases:

Phase 1 — Local computation: The first WindowAgg runs independently on each segment, and the filter condition is applied locally. This filters out most rows before any data redistribution.
Phase 2 — Global computation: The small amount of remaining data is redistributed across all segments based on the partition key, and the second WindowAgg runs to produce the final result.

You do not need to modify any SQL statements to benefit from this optimization. By minimizing the amount of data before redistribution, it effectively mitigates the performance impact of data skew.

Note

This optimization only improves performance when the window function has a highly selective filter condition (that is, it can effectively filter out a large amount of data). If the filtering is not effective, enabling this optimization might actually degrade performance because of the extra computation and data movement introduced.

By default, GPORCA assigns a high cost to the two-phase window function path, so the optimizer usually does not choose this plan automatically. To force it, you can enable it with the following GUC parameter:

SET optimizer_force_split_window_function = on;

Query plan example

The following query plan shows how GPORCA automatically applies the two-phase optimization for a window function with a filter condition.

EXPLAIN SELECT * FROM (SELECT rank() OVER (PARTITION BY four ORDER BY ten) AS rank_1, ten, four FROM tenk1 WHERE unique2 < 10) t WHERE rank_1 < 3;

                                                         QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------
 Gather Motion 3:1  (slice1; segments: 3)
   ->  Result
         Filter: ((rank() OVER (?)) < 3)
         ->  WindowAgg  -- Phase 2 WindowAgg
               Partition By: four
               ->  Redistribute Motion 3:3  -- Redistribute pre-aggregated data
                     Hash Key: four
                     ->  Result
                           Filter: ((rank() OVER (?)) < 3)
                           ->  WindowAgg  -- Phase 1 WindowAgg (pre-aggregation)
                                 Partition By: four
                                 ->  Sort
                                       ->  Index Scan using tenk1_unique2 on tenk1
                                             Index Cond: (unique2 < 10)
 Optimizer: GPORCA

Optimize top-N-per-group queries with WindowAgg Top-K pushdown

For common top-N-per-group analytics queries such as “find the top 2 items per category”, GPORCA can push down a Top-K filter below the WindowAgg operator. This inserts a Partition Top-K node that prunes non-qualifying rows early during execution, reducing memory and CPU usage.

How it works

When you write a query like:

SELECT * FROM (
    SELECT id, category, score, name,
           RANK() OVER (PARTITION BY category ORDER BY score DESC) AS r
    FROM my_table
) t
WHERE r <= 2;

With the Top-K pushdown enabled, GPORCA inserts a Partition Top-K node below the WindowAgg operator. As each partition is processed, this node retains only rows within the top K ranks and correctly includes all tied rows at the rank boundary. Rows that cannot qualify are discarded early, before the full window rank computation.

Enable the optimization

This optimization is controlled by the GUC parameter optimizer_force_partition_topk, which is off by default. To enable it:

SET optimizer = on;
SET optimizer_force_partition_topk = on;

Note

This optimization requires the GPORCA optimizer to be enabled (SET optimizer = on).
Only RANK() with <= K or < (K+1) filter conditions is supported.
ROW_NUMBER() and DENSE_RANK() fall back to standard WindowAgg execution.

Query plan example

The following example demonstrates the Top-K pushdown optimization. First, create a test table:

CREATE TABLE test_partition_topk (
    id          INT,
    category    TEXT,
    score       INT,
    name        TEXT
) DISTRIBUTED BY (id);

Then, enable the optimization and view the query plan:

SET optimizer = on;
SET optimizer_force_partition_topk = on;

EXPLAIN (COSTS OFF)
SELECT * FROM (
    SELECT id, category, score, name,
           RANK() OVER (PARTITION BY category ORDER BY score DESC) AS r
    FROM test_partition_topk
) t
WHERE r <= 2
ORDER BY category, r, id;

                                         QUERY PLAN
---------------------------------------------------------------------------------------------
 Gather Motion 3:1  (slice1; segments: 3)
   Merge Key: category, (rank() OVER (?)), id
   ->  Sort
         Sort Key: category, (rank() OVER (?)), id
         ->  Result
               Filter: ((rank() OVER (?)) <= 2)
               ->  WindowAgg
                     Partition By: category
                     Order By: score
                     ->  Sort
                           Sort Key: category, score DESC
                           ->  Redistribute Motion 3:3  (slice2; segments: 3)
                                 Hash Key: category
                                 ->  Result
                                       Filter: ((rank() OVER (?)) <= 2)
                                       ->  WindowAgg
                                             Partition By: category
                                             Order By: score
                                             ->  Sort
                                                   Sort Key: category, score DESC
                                                   ->  Partition Top-K
                                                         Top K: 2
                                                         ->  Seq Scan on test_partition_topk
 Optimizer: GPORCA

The plan contains a Partition Top-K node with Top K: 2, confirming that the Top-K pushdown is active. This node prunes rows early before they reach the WindowAgg operator, significantly improving performance for large datasets.

Enable vectorized execution for partitioned tables

If you use partitioned tables and want to take advantage of the vectorized executor, you might find that Dynamic Scan does not support multi-threaded metadata reading, which blocks vectorized execution. For more information about vectorized execution, see Vectorization Query Computing.

Scan partitioned tables using the Append operator

To work around this limitation, you can switch from Dynamic Scan to the Append operator by setting the following GUC:

optimizer_disable_dynamic_table_scan = off (default): GPORCA uses Dynamic Scan to scan partitioned tables.
optimizer_disable_dynamic_table_scan = on: GPORCA uses the Append operator to scan partitioned tables for compatibility with the vectorized executor.

Warning

When you enable the Append operator (that is, set optimizer_disable_dynamic_table_scan to on), the following limitation applies:

Index-only scans on partitioned tables are not supported.