v4.3.0 Release Notes

Release date: December 2025

Version: v4.3.0

SynxDB v4.3.0 delivers enhancements in data federation, AI integration, query performance, and operational observability.

  • Data federation and lakehouse integration: Introduces cloudberry_fdw for high-performance cross-cluster data access and optimized datalake list operations for better scalability.

  • Search and AI-ready: Launches pg_search for advanced full-text search and AIFun for seamless LLM integration, along with an upgraded MADlib library for in-database machine learning.

  • Query processing and data storage optimization: Enables ORCA intra-segment parallelism for faster analytics, introduces Streaming Hash Aggregation for improved memory management, adds JIT (Just-In-Time) compilation for CPU-bound queries, and includes PAX RLE batch encoding for efficient storage.

  • Observability and reliability: Enhances DBCC (Database Console Command) with resource management, monitoring display, and configuration management optimizations for improved operational control.

New features

Database Lightning

Category

Feature

User documents

Data federation and lakehouse integration

Introduces cloudberry_fdw foreign data wrapper for high-performance cross-cluster data access.

cloudberry_fdw

Data federation and lakehouse integration

Upgrades Gopher to v4.0.23 with HTTP status code statistics, no-cache write support, and enhanced monitoring.

Configuration parameters

Search and AI-ready

Launches pg_search full-text search extension with BM25 algorithm support.

pg_search extension

Search and AI-ready

Introduces AIFun extension for seamless LLM integration with major providers.

AIFun extension

Query processing and optimization

Enables ORCA intra-segment parallel execution for table scans, hash joins, and aggregations.

Execute queries in parallel

Query processing and optimization

Introduces Streaming Hash Aggregation for multi-phase aggregation plans to minimize overhead and prevent disk spills.

Query processing and optimization

Implements next-generation interconnect protocol (UDP2) to decouple the interconnect layer from the database kernel.

Query processing and optimization

Introduces adaptive motion timeout mechanism with dynamic threshold adjustment and RTT estimation.

Query processing and optimization

Adds JIT (Just-In-Time) compilation using LLVM to optimize CPU-bound analytical queries.

JIT compilation

Storage

Adds PAX RLE batch encoding support for improved compression and decoding efficiency.

PAX table format

Interactive manager DBCC

Feature

User documents

Supports modifying and deleting cluster names

Manage clusters

Supports converting JSON format query plans to text format and listing table skew and bloat information

SQL monitoring information

Supports direct configuration of postgresql.conf and pg_hba.conf files

Configure database

New feature details

Data federation and lakehouse integration

  • cloudberry_fdw foreign data wrapper: Based on PostgreSQL’s postgres_fdw and deeply optimized for the SynxDB MPP architecture, it provides parallel read/write capabilities across clusters, avoiding bottlenecks from data aggregation on the coordinator node. It is suitable for scenarios like high-speed migration/synchronization, data federation, ETL, and read/write splitting.

    See cloudberry_fdw.

  • Gopher service upgrade: Upgrades Gopher to v4.0.23, introducing HTTP status code statistics, no-cache write support, HTTP request monitoring classification, and retry count statistics. In addition, the default value of the gopher_local_capacity_mb parameter is set to 1024000 to optimize tool execution defaults.

Search and AI-ready

  • pg_search full-text search extension: Based on Tantivy and the pgrx framework, it provides high-performance full-text search using the BM25 algorithm, supporting complex boolean and phrase queries, as well as aggregate function pushdown. It is suitable for online analysis and report generation.

    See pg_search extension

  • AIFun extension: A new PostgreSQL extension that seamlessly integrates major Large Language Models (LLMs) like OpenAI, Anthropic, and Google Gemini into the database. It offers a comprehensive set of AI functions for text generation, analysis, and multimodal processing, empowering users to build intelligent applications directly within SQL. The extension prioritizes security with simple API key management and robust user isolation via Row Level Security (RLS).

    See AIFun extension

Query processsing and optimization

  • Streaming HashAgg for multi-phase aggregation: Updates the Postgres planner to use Streaming Hash Aggregation for multi-phase aggregation plans, aligning with Orca’s optimization logic. This change minimizes overhead and prevents disk spills commonly seen with non-streaming aggregation when processing data with many unique values, resulting in better overall performance. Added the gp_use_streaming_hashagg GUC parameter to toggle streaming hash aggregation usage in the first phase of multi-phase aggregations. This parameter is designed to prevent plan divergences in PAX test cases, thereby facilitating consistent result verification. It defaults to on.

  • Next-generation interconnect protocol (UDP2): Implements the UDP2 protocol to fully decouple the interconnect layer from the database kernel, using a layered architecture with standardized C/C++ interfaces and lightweight data structures. This update introduces a CMake-based independent build system and a unified error handling mechanism, enabling independent evolution and kernel-free end-to-end testing. These improvements greatly simplify the extension and maintenance of the interconnect layer.

  • Adaptive motion timeout mechanism: Implements a dynamic timeout threshold adjustment and RTT estimation (Jacobson/Karels variant) for the interconnect layer. This optimization filters non-network delays and adapts to volatile latency, significantly reducing unnecessary retransmissions and improving stability and throughput in unreliable or congested network environments.

  • ORCA intra-segment parallel execution: GPORCA now supports worker-level parallelism within segments, enabling parallel execution for table scans, hash joins, and aggregations. This capability allows long-running queries to utilize multiple CPU cores per segment, greatly reducing execution time for compute-intensive workloads. The feature is controlled by standard PostgreSQL parallel query parameters (for example, max_parallel_workers_per_gather), ensuring seamless integration with existing configurations.

    See Execute queries in parallel.

  • JIT (Just-In-Time compilation): Introduces Just-In-Time (JIT) compilation to transform interpreted program evaluation into native program execution at run time. Instead of using general-purpose code to evaluate arbitrary SQL expressions, JIT generates functions specific to those expressions, which the CPU executes natively for faster execution. JIT compilation reduces the overhead of indirect jumps and branches common in generic interpreted code by generating native code with direct calls and constant folding. SynxDB uses LLVM for JIT compilation, which is an optional execution mode. The JIT workflow is designed with fault tolerance—if the JIT library fails to load on segments (for example, if the dependency is not installed), the execution mode automatically falls back to non-JIT interpretation without interrupting the query. JIT compilation primarily benefits long-running CPU-bound queries, such as analytical queries, by optimizing away a great percentage of the interpretation overhead and speeding up query completion for complex workloads.

    See JIT compilation.

Storage

  • PAX RLE batch encoding support: Introduces the pax.enable_rle_batch_encoding GUC parameter (default: off) to enable batch encoding for RLE in PAX storage format. When enabled, this feature allows for encoding data in batches, which both compresses the data and improves decoding efficiency.

    See PAX table format

Observability and reliability

  • DBCC resource management optimization: DBCC now supports cluster name modification and deletion, enhancing compute resource management capabilities and flexibility in the interactive admin console. See Manage clusters.

  • DBCC monitoring display optimization: Supports converting JSON format query plans to text format and listing table skew and bloat information in DBCC, providing better visibility into query execution and data distribution for performance optimization. See SQL monitoring information.

  • DBCC configuration management optimization: Supports direct configuration of postgresql.conf and pg_hba.conf files through DBCC, enabling administrators to manage database and client authentication settings without direct file access. See Database configuration.

Product change information

GUC configuration parameters

  • Add gp_use_streaming_hashagg to control streaming hash aggregation usage (Default: on).

  • Add pax.enable_rle_batch_encoding to control PAX RLE batch encoding (Default: off).

  • The default value of gopher_local_capacity_mb is changed to 1024000.

Components

  • Upgrade Gopher to version v4.0.23.

  • Upgrade MADlib to version 2.1.0.

  • Upgrade DBCC to version v1.4.0.

  • Add cloudberry_fdw extension.

  • Add pg_search extension.

  • Add AIFun extension.

Bug fixes

Query optimizer and executor

  • Fixed unnecessary distribution requests for single-phase parallel global aggregation in ORCA: Parallel global hash/stream aggregation was requesting distribution even in non-multi-phase scenarios, causing ORCA to prefer single-phase plans. This fix guards SetDistrRequests calls with fMultiStage, ensuring only multi-phase global aggregation requests distribution, resolving the issue where TPCH Q1 should select two-phase aggregation.

  • Fixed fallback error when querying standalone AO tables due to MVCC system columns: AO tables do not support MVCC system columns (xmin/xmax/cmin/cmax). Querying AO tables with multiple DISTINCT aggregates would trigger “Invalid system target list found for AO table” errors. This fix skips these columns when building metadata for standalone AO tables, while preserving them for partition tables to maintain column mapping consistency.

  • Fixed inaccurate judgment of whether a relation is empty in ORCA: ORCA previously judged a relation as empty only when reltuples was -1, but reltuples of 0 also indicates an empty relation. This fix adjusts the judgment logic to accurately identify empty relations.

  • Fixed segmentation fault in ORCA when appending group statistics: Corrected the handling of group statistics appending to prevent crashes.

  • Fixed CTE-related issues: Resolved CTE prune instability and fixed a crash when readable CTEs contain SELECT INTO clauses. The original logic would call transformWithClause twice (Cloudberry only supports one WITH clause per query level). This fix abandons calling this function when traversing CTEs to verify writability and adds comprehensive test cases.

  • Fixed duplicate distribution keys from subqueries: While the parser prevents duplicate distribution keys in main query syntax, subqueries (especially window function PARTITION BY clauses) could still produce them. This fix triggers fallback processing logic when duplicate distribution keys are detected, ensuring correct degradation.

  • Fixed crash when UDF in subquery references OuterParam: When volatile UDFs in subquery target lists reference OuterQuery Param, SingleQE motion would block parameter setting causing crashes. This fix adjusts UDF execution path location strategy: volatile UDFs referencing OuterParam are set to OuterQuery, other volatile UDFs are set to SingleQE, and other modified UDFs are set to Segment General.

  • Fixed core dump when analyzing partition tables in certain scenarios: When partition tables were empty (reltuples and relpages both 0), leaf_parts_analyzed directly returned false without executing analyze, causing a core dump. This fix adjusts this logic to avoid skipping analyze for empty tables.

  • Fixed hash table memory limit causing low execution efficiency: When hash table memory exceeded limits, available memory was insufficient, allowing only a small amount of data to be loaded with remaining data needing to spill to disk again, resulting in low execution efficiency. This fix destroys and recreates the hash table when memory limits are exceeded, releasing memory for subsequent use.

Storage and access methods

  • Fixed database/tablespace size calculation not including PAX table subdirectories: pg_database_size/pg_tablespace_size previously only calculated physical files in the database/tablespace directory and did not recursively calculate PAX table independent storage subdirectories, leading to inaccurate size calculations. This fix makes db_dir_size recursively calculate subdirectories, including PAX table physical files.

  • Fixed incorrect table column numbering in funcTupleDesc initialization: When initializing funcTupleDesc in process_sample_rows, table columns should start from NUM_SAMPLE_FIXED_COLS + 1 (5) rather than 4. Although this error had no actual harm (subsequent code did not use type information, only column count), the fix ensures code correctness and maintainability.

  • Fixed delayed error detection and reporting for CFTYPE_EXEC external tables: CFTYPE_EXEC external tables previously checked and reported errors after external_getnext ended, easily causing error omission when execution nodes were suppressed. This fix immediately checks and reports errors when external_getnext returns null, ensuring timely detection and reporting.

Processes and concurrency

  • Fixed unnecessary network socket opening for auxiliary background worker processes: Auxiliary processes (ftsprobe, global deadlock detector, etc.) were opening unnecessary interconnect communication network sockets, posing security risks and consuming resources. This fix skips calling cdb_setup/cdb_cleanup for these processes in InitPostgres, resolving unnecessary network port exposure.

  • Fixed timeout retry logic synchronization issue in cdbgang_createGang_async: The creation timeout retry logic in cdbgang_createGang_async was not synchronized with the reader, causing the reader to prematurely judge abnormal termination when creation was slow due to platform/container/network reasons. This fix synchronizes the retry logic to avoid this issue.

  • Fixed UDP Motion layer exceptions in resource-constrained environments for 10TB-scale 8-parallel processing: Ported UDP motion layer fixes from ic_udpifc.c to UDP2 and fixed four types of exceptions in resource-constrained environments: 1) Receiver buffer full but sender unaware, causing retransmission packet loss—added buffer full flag and notifies sender to pause; 2) False deadlock detection—adjusted deadlock check timeout and ACK polling logic; 3) Receive queue full dropping packets without waking main thread—ensures main thread is awakened to process backlogged packets; 4) Node execution time mismatch causing packet timeout—added retransmission logs and reset retry count.

Security and TDE

  • Fixed TDE-related issues: 1) Fixed unnecessary shared memory usage for TDE encrypted buffer context—BufEncCtx/BufDecCtx shared memory size was not calculated and shared memory was not needed, changed to malloc allocation (TopmemoryContext is NULL during initialization, cannot use palloc); 2) Fixed database exception when backend panic occurred with TDE enabled—postmaster reinitializes TDE-related shared memory and sets global variables when restarting the database.

System commands and utilities

  • Fixed assertion failure when executing \dit+ command: The translate_columns array had 9 elements but the query returned 10 columns (when displaying both tables and indexes), causing assertion failure. This fix expands the array to 10 elements and correctly tracks cols_so_far when adding the Storage column.

  • Fixed typo from EXITS to EXISTS in SQL and code: Changed DROP TABLE IF EXITS and similar statements as well as code from EXITS to EXISTS, resolving SQL execution failures caused by syntax errors.

DBCC

  • Fixed pagination count selection issue: Corrected the logic for selecting pagination counts in the admin console.

  • Fixed username display error: Resolved issues with incorrect username display in the console interface.

Compilation and build

  • Fixed LLVM compilation errors during kernel upgrade: Resolved compilation errors introduced when LLVM compilation was not enabled during the kernel upgrade process.

  • Fixed compilation errors with higher versions of gcc/g++: Resolved issues including return std::move(temp_value) errors in higher versions of g++, removed uninitialized variables, and added -Wno-error=array-bounds compilation option to avoid protobuf header file errors.