v4.1.0 Release Notes
Release date: September 2025
Version: v4.1.0
v4.1.0 is a feature and performance enhancement release that improves query processing efficiency, data lake integration, system high availability, and overall stability.
This version greatly enhances the query optimizer, improving the efficiency of complex analytical queries through advancements in parallel processing, memory management, and locking mechanisms. Data lake integration is expanded with enhanced direct access to mainstream data formats and external data sources, along with increased flexibility in data synchronization. The new version also includes built-in support for common database extensions, simplifying development and management. For system high availability and operations, an automatic failover mechanism for critical nodes has been introduced, and deployment flexibility has been enhanced.
In addition, this release includes many bug fixes, comprehensively improving system stability and reliability.
New features
Database
The following table lists some of the new features in SynxDB v4.1.0. For full list of features, see the sections below the table.
Category |
Features |
User documents |
|---|---|---|
Query processing and optimization |
GPORCA optimizer introduces a two-phase optimization for window functions to effectively address performance bottlenecks caused by data skew of partition keys in window function queries. |
|
Query processing and optimization |
The GPORCA optimizer now supports the Append operator as an alternative for scanning partitioned tables, addressing the limitation where the vectorized executor does not support multi-threaded metadata reading with Dynamic Scan. |
|
Query processing and optimization |
Implements parallel processing for UNION operations by introducing a “Parallel-oblivious Append” mechanism, which allows multiple worker processes to handle data independently without sharing resources. |
|
Query processing and optimization |
The runtime filtering feature in the GPORCA optimizer has been extended to the DynamicSeqScan operator, allowing runtime filters to be pushed down from the HashJoin operator to the partitioned table scan phase. |
|
Query processing and optimization |
Supports parallelizing |
|
Query processing and optimization |
Adds detailed hash aggregation statistics to the |
|
Query processing and optimization |
Introduces a new lock optimization feature for |
Lock optimization for SELECT … FOR UPDATE, enable_lock_optimization parameter |
Storage |
Comprehensively optimizes memory management for the Hash Aggregate operator when processing large-scale data. |
Hash aggregation memory management optimization, hash_mem_multiplier parameter |
Data loading and external tables |
|
|
Data loading and external tables |
Adds |
Use sync_hive_partition_table to synchronize partitioned Hive tables |
Data loading and external tables |
Supports direct access to Apache Iceberg tables stored in Amazon S3 or compatible object storage, without relying on external metadata catalogs (such as Hive Metastore or REST Catalog). |
Load Iceberg table data from S3 (without external metadata service) |
Tools and utilities |
Adds support for PL/Container extensions, allowing users to run PL/Python user-defined functions (UDFs) securely within Docker containers. |
|
Tools and utilities |
Adds built-in support for |
Query processing and optimization
Two-phase window function optimization: GPORCA optimizer introduces a two-phase optimization for window functions to effectively address performance bottlenecks caused by data skew of partition keys in window function queries. When a window function with a highly selective filter condition (for example,
rank() < N) is detected, the computation is automatically split into two phases: local pre-computation and global computation after redistribution, minimizing data transfer. This optimization is completely transparent to users, delivering performance improvements without requiring any SQL statement modifications.For details, see Two-phase window function optimization.
ORCA supports using Append operator to scan partitioned tables: The GPORCA optimizer now supports the Append operator as an alternative for scanning partitioned tables, addressing the limitation where the vectorized executor does not support multi-threaded metadata reading with Dynamic Scan. Users can enable this feature via the GUC parameter
optimizer_disable_dynamic_table_scan, allowing the vectorized executor to leverage its performance advantages in partitioned table scenarios.For details, see Support using Append operator to scan partitioned tables.
Parallelize UNION operations: Implements parallel processing for UNION operations by introducing a “Parallel-oblivious Append” mechanism, which allows multiple worker processes to handle data independently without sharing resources. This optimization improves the performance and reliability of queries containing UNION, and users can observe parallel execution operators such as HashAggregate and Redistribute Motion in the execution plan.
For details, see Execute UNION operations in parallel.
Push down runtime filters to dynamic SeqScan: The runtime filtering feature in the GPORCA optimizer has been extended to the DynamicSeqScan operator, allowing runtime filters to be pushed down from the HashJoin operator to the partitioned table scan phase. By filtering data before it is scanned, this feature effectively reduces the amount of data to be processed, greatly improving join query performance on partitioned tables. This feature can be enabled using the GUC parameter
gp_enable_runtime_filter_pushdown.For details, see Push down filters to dynamic scans in GPORCA.
Parallelize DISTINCT operations: Supports parallelizing
DISTINCToperations in an MPP architecture by redistributing tuple data among multiple worker processes using the Motion operator. A multi-stage processing strategy is employed, including pre-deduplication viaStreaming HashAggregate, data redistribution based on theDISTINCTexpression, and a final deduplication stage. This greatly enhancesDISTINCTquery performance with large data volumes.For details, see Execute DISTINCT operations in parallel.
Enhance
EXPLAIN ANALYZEdetails for hash aggregation: Adds detailed hash aggregation statistics to theEXPLAIN ANALYZEoutput, including key metrics such as the number of hash tables, group and batch processing details, number of overflow partitions, disk space used, and hash collision chain length. These details help users better understand the memory usage and performance characteristics of hash aggregation operations, facilitating the diagnosis of query performance issues and targeted optimization.For details, see Interpret the EXPLAIN ANALYZE output for hash aggregation.
Optimize locks for
SELECT ... FOR UPDATE: Introduces a new lock optimization feature forSELECT ... FOR UPDATEand similar queries, which downgrades a relation-level lock to a tuple-level lock when certain conditions are met. Controlled by theenable_lock_optimizationparameter, enabling this feature can considerably improve concurrent transaction performance and reduce lock wait times. The parameter isfalseby default to maintain compatibility.For details, see Lock optimization for SELECT … FOR UPDATE, enable_lock_optimization parameter.
ORCA deduplicates superset conditions in
ANDclauses: Optimizes GPORCA’s logic for deduplicating superset conditions when processingANDclauses, such as pruning redundant conditions in subquery scenarios, improving the efficiency of query optimization.ORCA prunes unused columns in CTEs: Optimizes GPORCA’s handling of Common Table Expressions (CTEs) to prune unused column references (
ColRefs) in both the producer and consumer.ORCA inlines basic wrapper functions: The GPORCA optimizer now inlines some basic wrapper functions, reducing the overhead of function calls and further improving query optimization performance.
Optimize PLT linking method: By using static linking instead of
libpostgres.so, the procedure language table (PLT) function call overhead has been optimized, resulting in a 5-8% performance improvement in TPC-DS benchmarks.Optimize AQUMV system: The AQUMV system now directly stores and uses the original SQL when processing materialized views, avoiding the overhead of rule parsing and thus improving query performance.
Optimize path generation for UNION operations: Optimizes the path generation logic for
UNIONoperations by skipping mismatched partial paths, which improves the efficiency of query plan generation.
Storage
Optimize hash aggregation memory management: Comprehensively optimizes memory management for the Hash Aggregate operator when processing large-scale data. By adjusting the default value of the
hash_mem_multiplierparameter to 2.0 and reducing the extra 16B memory overhead of NumericAggState, memory utilization is greatly improved, effectively reducing the risk of data spilling to disk due to insufficient memory. This optimization is completely transparent to users and can provide a substantial boost to query performance when dealing with large datasets or scenarios with multiple grouping keys.For details, see Hash aggregation memory management optimization, hash_mem_multiplier parameter.
Gracefully downgrades on AO table WAL compression failure: When
WALcompression for AO/AOCO tables fails, the system will gracefully downgrade by logging the event and reverting to the uncompressedWALformat, thus avoiding aPANICerror and improving system stability.Unifies catalog table paths: To avoid path inconsistencies between QD and QE nodes, catalog table paths are now unified using
relidinstead ofrelfilenode, ensuring metadata consistency.
Data loading and external tables
External table supports HDFS user mapping:
datalake_fdwexternal tables now support custom HDFS user access viaCREATE USER MAPPING, replacing the defaultgpadminsystem role. This feature implements fine-grained access control for HDFS resources, particularly suitable for permission management in multi-tenant or multi-user environments, and users can configure different HDFS access users for different external tables based on actual needs.For details, see External table supports HDFS user mapping.
Support for single-partition synchronization of Hive partition tables: Adds
sync_hive_partition_tablefunction, supporting only a single partition of a Hive table rather than the entire table. This function synchronizes data under the partition corresponding to the value specified by the top-level partition key (the first one in the partition column definition), improving the data synchronization efficiency of large partition tables. Particularly suitable for incremental data synchronization scenarios, users can synchronize specific partitions as needed to avoid the overhead of full table synchronization.For details, see Use sync_hive_partition_table to synchronize partitioned Hive tables.
Direct read of Iceberg tables (S3 storage): Supports direct access to Apache Iceberg tables stored in Amazon S3 or compatible object storage, without relying on external metadata catalogs (such as Hive Metastore or REST Catalog). Through
datalake_fdwands3.confconfiguration, users can quickly perform read-only queries and analysis on existing Iceberg data, supporting both partitioned and non-partitioned tables. This feature is particularly suitable for quick data exploration and analysis needs in data lake scenarios.For details, see Load Iceberg table data from S3 (without external metadata service).
Tools and utilities
PL/Container: Adds support for PL/Container extensions, allowing users to run PL/Python user-defined functions (UDFs) securely within Docker containers. Through containerization isolation technology, it ensures that user code cannot access the host operating system or file system, effectively preventing security risks. Supports Python 2.7, Python 3.6+, and R language, built into SynxDB without requiring additional installation.
For details, see PL/Container extension.
pg_partmanpartition management extension: Adds built-in support forpg_partmanextensions, which are used for automated creation and management of partitioned tables based on time or sequence. Users do not need to manually compile and install, enabling this extension in the database simplifies the maintenance of partitioned tables.For details, see Use
pg_partmanto manage partitioned tables.Improve
pg_dumplock table logic: Optimizes the lock table logic of thepg_dumptool, reducing communication overhead between the frontend and backend, and improving backup efficiency.
Interactive manager DBCC
Note
Starting from v4.1.0, the interactive manager CBCC has been renamed to DBCC.
Feature |
User document |
|---|---|
Supports coordinator automatic failover. |
|
Supports binding virtual IP. |
|
Supports modifying user password. |
|
Supports customizing service ports. |
|
Supports disabling multi-language. |
Product change information
Configuration parameters
gp_appendonly_insert_filesparameter default value change: The default value of system parametergp_appendonly_insert_fileshas been modified to 0. If users need to prepare AO/AOCO files for parallel queries, they must manually set this parameter before data insertion.
Installation and deployment
The environment script
greenplum_path.shhas been officially renamed tocloudberry-env.shto be consistent with Apache Cloudberry™ (Incubating). See Deploy on multi-node.PAX build optimization: To simplify dependencies and build processes, local source code has replaced the
cpp-stubsubmodule.
Data lake and external integration
Default data lake agent server address set to localhost: The default binding address of the
datalake_agentservice has been changed tolocalhost, ensuring that the proxy only accepts connections from the local database instance. This security design effectively prevents the proxy service port from being exposed on the network, enhancing data lake access security.The default port of the data lake agent
datalake_agenthas been changed from5888to3888to resolve the port conflict with PXF.Default gpfdist compression level adjustment: The default compression level of the gpfdist tool has been adjusted from 1 to 3. The new default value aims to achieve a better balance between CPU overhead and network traffic, thereby improving the performance of most network-intensive ETL tasks.
Optimizes the performance of the
gpfdisttool, automatically increasing the socket buffer size when compression is enabled to improve data loading efficiency.
Removed features
Remove QuickLZ compression algorithm support: Due to potential security risks and lower compression efficiency, the support for the QuickLZ compression algorithm has been removed from the system. It is recommended that users migrate to Zstandard (
zstd) orzliband other more modern and efficient compression algorithms.
Bug fixes
Fixed the
pg_dumptool to correctly handle AO/AOCO tables withappendonlyandorientationoptions, thereby supporting backup operations on these tables.Fixed the internal database version check logic of
pg_upgrade.Fixed the issue where metadata in the
gp_matview_auxtable was not synchronized during renaming materialized views.Fixed the system crash and calculation result error issues that may occur when calling
pg_relation_sizeon PAX partitioned tables due to improper handling of the root node access method (AM).Optimized the storage space usage of TOAST data in AO row-storage tables.
initdbtool now prohibits the use of an empty username (-U ""), and will explicitly report an error, improving the robustness of initialization.Fixed the symbol export conflict between the
postgresbinary file and thelibpqlibrary, improving system stability.Optimized the exception handling logic of Interconnect, fixed the stack overflow issue caused by recursive calls, and improved system stability in exceptional scenarios.
Fixed the invalid write issue of
MotionConnthat may exist during explicit data redistribution (Motion), improving query execution stability.Fixed the issue where parallel path generation might result in
consider_parallelbeing disabled due to improper handling ofNULLbitmap.Fixed the issue where parallel degree calculation might be inaccurate due to incorrect
activeTaskCountstatistics in dynamic parallelism.Fixed the issue where
COUNT(*)andCOUNT(column)might be mixed in the vectorized executor, ensuring accurate aggregation results.Fixed the conversion issues for
DecimalandTimestamp-Millistypes when reading Parquet and ORC files, improving data lake query accuracy.Fixed the compatibility issues that may arise due to the ICU library not supporting certain database encodings.
Fixed the system crash (core dump) issue that might occur when executing data lake related queries in a single Coordinator node deployment environment.
Fixed the query result incorrect issue in GPORCA due to aggregation number (aggno, aggtransno) generation errors, which might occur in
HAVINGclause scenarios.Fixed the invalid attribute mapping issue that might occur when Runtime Filter Pushdown is enabled.
Fixed the recovery failure issue caused by erroneous ported code in
gprecoversegtool.Improved compatibility with
libevent 2.0+library forgpfdisttool, and enhanced its stability in high-concurrency scenarios by using thread-safeevent_baseinstead ofevent_init.Enhanced problem diagnosis ability, when the writer (Writer Gang) is lost, the system will print detailed stack trace information.
Fixed the issue where materialized view status might be incorrect when processing partitioned tables.
Fixed the limitation issue that might exist when processing a large number of object OIDs.