v4.2.0 Release Notes

Release date: October 2025

Version: v4.2.0

SynxDB v4.2.0 introduces a suite of advancements designed to improve data lakehouse integration, AI-readiness, query performance, storage optimization, and operational observability.

  • Data lakehouse integration: Expands data lake capabilities with support for reading Apache Iceberg tables directly from Amazon S3 through the Polaris Catalog. It also introduces granular configuration parameters for HDFS and Alibaba OSS access, along with enhanced local file protocol support.

  • AI-readiness: Features the new SynxDB MCP (Model Context Protocol) service, streamlining the integration of Large Language Models (LLMs) and other AI tools with the database.

  • Query performance & storage optimization: Implements runtime filter pushdown directly to the Table Access Method (AM), significantly reducing data scanned and accelerating query execution. For storage, column-level LZ4 compression is now available for PAX tables, offering a superior balance of compression ratio and decompression speed.

  • Observability & reliability: Enhances system management with CBDR (Continuous Backup and Disaster Recovery) for robust, continuous archiving and recovery. This release also includes multiple DBCC (Database Console Command) enhancements for deeper diagnostics and introduces new summary views that aggregate information from multiple gp_* system views, simplifying cluster monitoring and management.

New features

Database

Category

Feature

User documents

Data federation and lakehouse integration

Supports reading Iceberg tables stored in S3 via Polaris Catalog (currently read-only).

Read Iceberg tables from object storage (Polaris Catalog)

Data federation and lakehouse integration

Adds HDFS/OSS related configuration parameters to optimize connection, routing, and context updates.

Configuration parameters (HDFS/OSS)

Data federation and lakehouse integration

file:// external table protocol now supports ON COORDINATOR for loading data from local files on the coordinator node.

Load local files using the file:// protocol

AI-ready

Provides SynxDB MCP service, a secure database interface for LLM applications.

MCP service

Query performance and data storage optimization

Pushes down runtime filters to the table access method (AM), using PAX min/max statistics to accelerate scans.

Optimize HashJoin query performance

Query performance and data storage optimization

Adds LZ4 compression algorithm support for PAX column storage, enhancing the compression algorithm matrix and optimizing read/write performance and storage usage.

PAX table format

Observability & reliability

Adds Summary System Views to provide cluster-wide aggregated views for better observability and capacity assessment.

System views

Observability & reliability

Adds gp_resource_group_cgroup_parent to customize the cgroup v2 root directory name.

Manage resources with resource groups; Parameter list

Observability & reliability

Introduces CBDR for continuous archiving and recovery (PITR/Hot Standby), supporting cross-site incremental replication and read-only services.

CBDR continuous archiving and recovery

Interactive manager DBCC

Feature

User documents

Supports Standby Missing and VIP status alerts

Alert template configuration

Enhanced SQL query monitoring (filtering, plan text, etc.)

SQL monitoring information

Enhanced cluster monitoring (Cluster Metrics time window)

View cluster status

Enhanced table queries (fuzzy table names, sub-partition sizes)

Alerts and configuration

Supports modifying login names

Installation and configuration

New feature details

Data federation and lakehouse integration

  • Read Iceberg tables on S3 via Polaris Catalog (read-only): Connect to an external Polaris Catalog to directly query Iceberg tables stored in S3 or compatible object storage within SynxDB. Currently supports SELECT queries. Suitable for read-only analysis and exploration of data lakes.

    See: Read Iceberg tables from object storage (Polaris Catalog)

  • HDFS/OSS access related GUC parameters: Adds several new parameters to optimize behaviors such as connections, routing, and context handling. For example:

    • pg_gophermeta.gphdfs_configure_router: Indicates whether to configure multiple routers.

    • pg_gophermeta.gopher_hash_connect_hdfs_router: Hashes traffic by Segment ID in a multi-router scenario.

    • pg_gophermeta.gopher_connect_hdfs_disable_getstate: Controls whether to disable the getFsStats RPC.

    • pg_gophermeta.gopher_enable_update_oss_context: Controls whether to update OSS context information.

    For more information, see: Parameter list

  • file:// protocol supports ON COORDINATOR: Allows loading data from local files on the coordinator node into external tables, facilitating data import and debugging in single-process scenarios.

    See: Load local files using the file:// protocol

  • Optimized datalake list operations: Changes the execution mode from distributed (per segment) to centralized (coordinator node), reducing unnecessary overhead/latency, improving performance, and lowering resource usage.

AI-ready

  • SynxDB MCP service: Provides a standardized and secure database interface for LLM applications. It includes built-in protection against SQL injection, parameterized queries, connection pooling, and sensitive table protection, making it suitable for intelligent data queries, automated operations, and AI-assisted development.

    See: MCP service

Query performance and data storage optimization

  • Runtime filter pushdown to table access method (AM): When scanning PAX tables, pushed-down runtime filters use column-level min/max statistics to skip non-matching data files before reading, significantly reducing I/O and the amount of data processed by subsequent operators. This provides significant benefits when queries meet conditions such as using a HashJoin, the outer table being a PAX table, minmax statistics being enabled, and the feature switch being on.

    See: Optimize HashJoin query performance

  • PAX supports LZ4 compression: Adds LZ4 column-level compression to the existing zlib/zstd options, balancing compression ratio with read/write performance.

    See: PAX table format

Observability and reliability

  • Adds and enhances multiple summary views based on Segment statistics (for example, gp_stat_*_summary and gp_statio_*_summary series). These cover usage and I/O metrics for activities, processes, archiving, databases, DDL operations, ANALYZE/CLUSTER/VACUUM/COPY progress, and objects like tables, indexes, sequences, and system catalogs, facilitating observability analysis and capacity assessment from a cluster-wide perspective.

    See: System views

  • Additional cgroup v2 level and custom parent directory: Adds the gp_resource_group_cgroup_parent parameter to customize the cgroup root directory name (defaults to gpdb.service), adapting to different operating systems and runtime environments. A restart is required for changes to take effect.

    See: Manage resources with resource groups; Parameter list

  • CBDR continuous archiving and recovery: Combines full backups with WAL archiving to achieve continuous data protection and Point-in-Time Recovery (PITR), supporting hot standby for read-only queries. It is suitable for business scenarios such as cross-site disaster recovery, incremental replication, and read-only traffic offloading.

    See: CBDR continuous archiving and recovery

  • Enhanced DBCC alert capabilities: Includes built-in event templates for Standby Missing, VIP disconnection, etc., supporting continuous maintenance and alert triggering.

    See: Alert template configuration

  • Enhanced DBCC SQL query monitoring: Query History now supports filtering by submitted_time, user, and database. Query Details now supports displaying the Text plan. A PID column has been added to the list.

    See: SQL monitoring information

  • Enhanced DBCC cluster monitoring: Cluster Metrics now supports custom time ranges for retrospective analysis.

    See: View cluster status

  • Enhanced DBCC database table queries: Supports fuzzy search for table names and displaying properties like sub-partition sizes.

    See: Alerts and configuration

  • DBCC login name management: Supports modifying login names, improving user account management capabilities.

    See: Installation and configuration

Product change information

Metadata

  • Adds the following system summary views to provide aggregated results for progress and statistics across the cluster:

    • gp_stat_progress_vacuum_summary: Aggregates the distributed progress from gp_stat_progress_vacuum. For replicated tables (policytype='r'), count-based metrics are normalized by numsegments to provide a cluster-wide view of VACUUM progress.

    • gp_stat_progress_analyze_summary: Aggregates the distributed progress from gp_stat_progress_analyze. Includes key metrics such as sampled blocks, extended statistics, and child table processing progress, with segment-wise normalization for replicated tables.

    • gp_stat_progress_cluster_summary: Aggregates the distributed progress from gp_stat_progress_cluster. Includes metrics such as command, phase, number of index rebuilds, heap tuples scanned/written, and block-level statistics, with segment-wise normalization for replicated tables.

    • gp_stat_progress_create_index_summary: Aggregates the distributed progress from gp_stat_progress_create_index. Includes metrics such as locking progress, and the number of blocks/tuples/partitions processed, with segment-wise normalization for replicated tables.

  • Metadata implementation supplement: Adds system_views_gp_summary.sql to centralize the definition of the above SynxDB summary views.

GUC configuration parameters

  • pax_enable_sparse_filter is renamed to pax.enable_sparse_filter.

  • pax_enable_row_filter is renamed to pax.enable_row_filter.

  • pax_scan_reuse_buffer_size is renamed to pax.scan_reuse_buffer_size.

  • pax_max_tuples_per_group is renamed to pax.max_tuples_per_group.

  • pax_max_tuples_per_file is renamed to pax.max_tuples_per_file.

  • pax_max_size_per_file is renamed to pax.max_size_per_file.

  • pax_enable_toast is renamed to pax.enable_toast.

  • pax_min_size_of_compress_toast is renamed to pax.min_size_of_compress_toast.

  • pax_default_storage_format is renamed to pax.default_storage_format.

  • pax_bloom_filter_work_memory_bytes is renamed to pax.bloom_filter_work_memory_bytes.

  • The default value of pg_gophermeta.gopher_local_capacity_mb is changed from 10240 to 1024000.

Components

  • Upgraded MADlib to version 2.1.0.

  • Updated Gopher version to v4.0.21.

Bug fixes

Query optimizer and executor

  • Fixed an issue where the Locus of a Shared Scan could be empty: When gp_cte_sharing is enabled, the Locus type and parallelism of the Shared Scan are explicitly set to avoid a NULL locus in the query plan.

  • Fixed parallel path worker allocation: Moves the assignment of parallel_workers before the assertion and uses pathnode->parallel_workers in the check, correcting an exception where the number of parallel workers was 0 for partial paths.

  • Fixed locus issue for writable CTEs on replicated tables: In set operations (for example, WITH ... RETURNING and EXCEPT), ensures that the loci are correctly set to SingleQE or Entry to avoid errors caused by inconsistencies between replicated and partitioned loci.

  • Fixed an issue where make_grouping_rel() did not preserve relid and cdbpolicy: Correctly passes these fields from input_rel to grouped_rel, preventing crashes in extensions that rely on these fields in create_upper_paths_hook.

Vectorized executor

  • Fixed memory leaks: Ensures consistent calls to FreeVecExecuteState() in ExecEndVecXXX() for all vectorized operators, ensuring that Arrow plans and related structures are correctly released.

Storage and access methods (Table AM / PAX / DataLake)

  • Fixed Table AM sampling logic: When a table access method implements relation_acquire_sample_rows, this method is now called directly to get sample rows, ensuring correct logic.

  • Fixed PAX related GUC issues: Corrects the handling of PAX GUC configuration items to ensure settings take effect and maintain system stability.

  • Fixed a segmentation fault when reading Iceberg tables: Allocates a buffer using palloc when encountering an empty buffer while reading Iceberg data, preventing a crash.

  • Fixed liboss2 curlopt timeout (set to 180s) to resolve 403 error codes.

Resource groups and cgroup management

  • Fixed unresponsiveness of ALTER RESOURCE GROUP ... SET IO_LIMIT '-1': Cleans up io.max and synchronously updates pg_resgroupcapability to ensure parameter updates take effect immediately.

  • Fixed cgroup directory deletion logic: Deletes only the leaf directories of the Greenplum cgroup in group-v2 mode, avoiding cascading issues from failed deletions in group-v1 mode.

  • Fixed a double-free issue in the IO Limit callback: Correctly frees and resets the old io_limit pointer in alterResgroupCallback to avoid potential double-free problems.

  • Fixed hardcoded cgroup root directory name in gpcheckresgroupv2impl: Adds get_cgroup_parent() to dynamically read the gp_resource_group_cgroup_parent parameter, replacing hardcoded paths and improving error messages.

  • Fixed instability from concurrent directory creation for resource group IO Limit: When multiple Segments on the same host create directories concurrently, it now catches and ignores “already exists” errors, improving stability.

  • Improved the robustness of IO Limit behavior:

    • Added a function to clean up io.max, ensuring state consistency when changing IO_LIMIT.

    • Added a check for IO limit associations when deleting a tablespace to prevent accidental deletion.

    • Downgraded some parseio errors to WARNING during InitResGroup/AlterResourceGroup to ensure the cluster can start smoothly in exceptional scenarios.

System views

  • Fixed adaptation issues in system views after merging upstream code: Ensures that relevant view functions and queries work correctly.

  • Fixed naming and structure of pg_stat_all_tables/pg_stat_all_indexes: Restores the original definitions (single Segment statistics), adds the generation of cross-Segment aggregate views gp_stat_all_tables/gp_stat_all_indexes, and introduces summary versions gp_stat_all_tables_summary/gp_stat_all_indexes_summary. Also adds pg_stat_user_tables_summary/pg_stat_user_indexes_summary and improves regression tests.

Processes and concurrency (Gang/Writer/Interconnect)

  • Fixed occasional unresponsiveness due to “writer proc entry not found”: Introduces a configurable retry duration find_writer_proc_retry_time for forking gangs in asynchronous mode, allowing for longer startup waits to accommodate complex environments and reduce jitter.

Compilation and build

  • Fixed --disable-orca build failure: Removes the conditional compilation wrapper around the OptimizerOptions structure, ensuring successful compilation in configurations without ORCA.