v4.3.0 Release Notes
Release date: December 2025
Version: v4.3.0
SynxDB v4.3.0 delivers enhancements in data federation, AI integration, query performance, and operational observability.
Data federation and lakehouse integration: Introduces
cloudberry_fdwfor high-performance cross-cluster data access and optimized datalake list operations for better scalability.Search and AI-ready: Launches
pg_searchfor advanced full-text search andAIFunfor seamless LLM integration, along with an upgraded MADlib library for in-database machine learning.Query processing and data storage optimization: Enables ORCA intra-segment parallelism for faster analytics, introduces Streaming Hash Aggregation for improved memory management, adds JIT (Just-In-Time) compilation for CPU-bound queries, and includes PAX RLE batch encoding for efficient storage.
Observability and reliability: Enhances DBCC (Database Console Command) with resource management, monitoring display, and configuration management optimizations for improved operational control.
New features
Database Lightning
Category |
Feature |
User documents |
|---|---|---|
Data federation and lakehouse integration |
Introduces |
|
Data federation and lakehouse integration |
Upgrades Gopher to v4.0.23 with HTTP status code statistics, no-cache write support, and enhanced monitoring. |
|
Search and AI-ready |
Launches |
|
Search and AI-ready |
Introduces AIFun extension for seamless LLM integration with major providers. |
|
Query processing and optimization |
Enables ORCA intra-segment parallel execution for table scans, hash joins, and aggregations. |
|
Query processing and optimization |
Introduces Streaming Hash Aggregation for multi-phase aggregation plans to minimize overhead and prevent disk spills. |
|
Query processing and optimization |
Implements next-generation interconnect protocol (UDP2) to decouple the interconnect layer from the database kernel. |
|
Query processing and optimization |
Introduces adaptive motion timeout mechanism with dynamic threshold adjustment and RTT estimation. |
|
Query processing and optimization |
Adds JIT (Just-In-Time) compilation using LLVM to optimize CPU-bound analytical queries. |
|
Storage |
Adds PAX RLE batch encoding support for improved compression and decoding efficiency. |
Interactive manager DBCC
Feature |
User documents |
|---|---|
Supports modifying and deleting cluster names |
|
Supports converting JSON format query plans to text format and listing table skew and bloat information |
|
Supports direct configuration of |
New feature details
Data federation and lakehouse integration
cloudberry_fdwforeign data wrapper: Based on PostgreSQL’spostgres_fdwand deeply optimized for the SynxDB MPP architecture, it provides parallel read/write capabilities across clusters, avoiding bottlenecks from data aggregation on the coordinator node. It is suitable for scenarios like high-speed migration/synchronization, data federation, ETL, and read/write splitting.See
cloudberry_fdw.Gopher service upgrade: Upgrades Gopher to v4.0.23, introducing HTTP status code statistics, no-cache write support, HTTP request monitoring classification, and retry count statistics. In addition, the default value of the
gopher_local_capacity_mbparameter is set to 1024000 to optimize tool execution defaults.
Search and AI-ready
pg_searchfull-text search extension: Based on Tantivy and thepgrxframework, it provides high-performance full-text search using the BM25 algorithm, supporting complex boolean and phrase queries, as well as aggregate function pushdown. It is suitable for online analysis and report generation.AIFun extension: A new PostgreSQL extension that seamlessly integrates major Large Language Models (LLMs) like OpenAI, Anthropic, and Google Gemini into the database. It offers a comprehensive set of AI functions for text generation, analysis, and multimodal processing, empowering users to build intelligent applications directly within SQL. The extension prioritizes security with simple API key management and robust user isolation via Row Level Security (RLS).
See AIFun extension
Query processsing and optimization
Streaming HashAgg for multi-phase aggregation: Updates the Postgres planner to use Streaming Hash Aggregation for multi-phase aggregation plans, aligning with Orca’s optimization logic. This change minimizes overhead and prevents disk spills commonly seen with non-streaming aggregation when processing data with many unique values, resulting in better overall performance. Added the
gp_use_streaming_hashaggGUC parameter to toggle streaming hash aggregation usage in the first phase of multi-phase aggregations. This parameter is designed to prevent plan divergences in PAX test cases, thereby facilitating consistent result verification. It defaults toon.Next-generation interconnect protocol (UDP2): Implements the UDP2 protocol to fully decouple the interconnect layer from the database kernel, using a layered architecture with standardized C/C++ interfaces and lightweight data structures. This update introduces a CMake-based independent build system and a unified error handling mechanism, enabling independent evolution and kernel-free end-to-end testing. These improvements greatly simplify the extension and maintenance of the interconnect layer.
Adaptive motion timeout mechanism: Implements a dynamic timeout threshold adjustment and RTT estimation (Jacobson/Karels variant) for the interconnect layer. This optimization filters non-network delays and adapts to volatile latency, significantly reducing unnecessary retransmissions and improving stability and throughput in unreliable or congested network environments.
ORCA intra-segment parallel execution: GPORCA now supports worker-level parallelism within segments, enabling parallel execution for table scans, hash joins, and aggregations. This capability allows long-running queries to utilize multiple CPU cores per segment, greatly reducing execution time for compute-intensive workloads. The feature is controlled by standard PostgreSQL parallel query parameters (for example,
max_parallel_workers_per_gather), ensuring seamless integration with existing configurations.JIT (Just-In-Time compilation): Introduces Just-In-Time (JIT) compilation to transform interpreted program evaluation into native program execution at run time. Instead of using general-purpose code to evaluate arbitrary SQL expressions, JIT generates functions specific to those expressions, which the CPU executes natively for faster execution. JIT compilation reduces the overhead of indirect jumps and branches common in generic interpreted code by generating native code with direct calls and constant folding. SynxDB uses LLVM for JIT compilation, which is an optional execution mode. The JIT workflow is designed with fault tolerance—if the JIT library fails to load on segments (for example, if the dependency is not installed), the execution mode automatically falls back to non-JIT interpretation without interrupting the query. JIT compilation primarily benefits long-running CPU-bound queries, such as analytical queries, by optimizing away a great percentage of the interpretation overhead and speeding up query completion for complex workloads.
See JIT compilation.
Storage
PAX RLE batch encoding support: Introduces the
pax.enable_rle_batch_encodingGUC parameter (default:off) to enable batch encoding for RLE in PAX storage format. When enabled, this feature allows for encoding data in batches, which both compresses the data and improves decoding efficiency.See PAX table format
Observability and reliability
DBCC resource management optimization: DBCC now supports cluster name modification and deletion, enhancing compute resource management capabilities and flexibility in the interactive admin console. See Manage clusters.
DBCC monitoring display optimization: Supports converting JSON format query plans to text format and listing table skew and bloat information in DBCC, providing better visibility into query execution and data distribution for performance optimization. See SQL monitoring information.
DBCC configuration management optimization: Supports direct configuration of
postgresql.confandpg_hba.conffiles through DBCC, enabling administrators to manage database and client authentication settings without direct file access. See Database configuration.
Product change information
GUC configuration parameters
Add
gp_use_streaming_hashaggto control streaming hash aggregation usage (Default:on).Add
pax.enable_rle_batch_encodingto control PAX RLE batch encoding (Default:off).The default value of
gopher_local_capacity_mbis changed to1024000.
Components
Upgrade Gopher to version v4.0.23.
Upgrade MADlib to version 2.1.0.
Upgrade DBCC to version v1.4.0.
Add
cloudberry_fdwextension.Add
pg_searchextension.Add
AIFunextension.
Bug fixes
Query optimizer and executor
Fixed unnecessary distribution requests for single-phase parallel global aggregation in ORCA: Parallel global hash/stream aggregation was requesting distribution even in non-multi-phase scenarios, causing ORCA to prefer single-phase plans. This fix guards
SetDistrRequestscalls withfMultiStage, ensuring only multi-phase global aggregation requests distribution, resolving the issue where TPCH Q1 should select two-phase aggregation.Fixed fallback error when querying standalone AO tables due to MVCC system columns: AO tables do not support MVCC system columns (
xmin/xmax/cmin/cmax). Querying AO tables with multiple DISTINCT aggregates would trigger “Invalid system target list found for AO table” errors. This fix skips these columns when building metadata for standalone AO tables, while preserving them for partition tables to maintain column mapping consistency.Fixed inaccurate judgment of whether a relation is empty in ORCA: ORCA previously judged a relation as empty only when
reltupleswas -1, butreltuplesof 0 also indicates an empty relation. This fix adjusts the judgment logic to accurately identify empty relations.Fixed segmentation fault in ORCA when appending group statistics: Corrected the handling of group statistics appending to prevent crashes.
Fixed CTE-related issues: Resolved CTE prune instability and fixed a crash when readable CTEs contain
SELECT INTOclauses. The original logic would calltransformWithClausetwice (Cloudberry only supports one WITH clause per query level). This fix abandons calling this function when traversing CTEs to verify writability and adds comprehensive test cases.Fixed duplicate distribution keys from subqueries: While the parser prevents duplicate distribution keys in main query syntax, subqueries (especially window function
PARTITION BYclauses) could still produce them. This fix triggers fallback processing logic when duplicate distribution keys are detected, ensuring correct degradation.Fixed crash when UDF in subquery references
OuterParam: When volatile UDFs in subquery target lists reference OuterQuery Param, SingleQE motion would block parameter setting causing crashes. This fix adjusts UDF execution path location strategy: volatile UDFs referencingOuterParamare set to OuterQuery, other volatile UDFs are set to SingleQE, and other modified UDFs are set to Segment General.Fixed core dump when analyzing partition tables in certain scenarios: When partition tables were empty (
reltuplesandrelpagesboth 0),leaf_parts_analyzeddirectly returned false without executing analyze, causing a core dump. This fix adjusts this logic to avoid skipping analyze for empty tables.Fixed hash table memory limit causing low execution efficiency: When hash table memory exceeded limits, available memory was insufficient, allowing only a small amount of data to be loaded with remaining data needing to spill to disk again, resulting in low execution efficiency. This fix destroys and recreates the hash table when memory limits are exceeded, releasing memory for subsequent use.
Storage and access methods
Fixed database/tablespace size calculation not including PAX table subdirectories:
pg_database_size/pg_tablespace_sizepreviously only calculated physical files in the database/tablespace directory and did not recursively calculate PAX table independent storage subdirectories, leading to inaccurate size calculations. This fix makesdb_dir_sizerecursively calculate subdirectories, including PAX table physical files.Fixed incorrect table column numbering in
funcTupleDescinitialization: When initializingfuncTupleDescinprocess_sample_rows, table columns should start fromNUM_SAMPLE_FIXED_COLS + 1(5) rather than 4. Although this error had no actual harm (subsequent code did not use type information, only column count), the fix ensures code correctness and maintainability.Fixed delayed error detection and reporting for
CFTYPE_EXECexternal tables:CFTYPE_EXECexternal tables previously checked and reported errors afterexternal_getnextended, easily causing error omission when execution nodes were suppressed. This fix immediately checks and reports errors whenexternal_getnextreturns null, ensuring timely detection and reporting.
Processes and concurrency
Fixed unnecessary network socket opening for auxiliary background worker processes: Auxiliary processes (ftsprobe, global deadlock detector, etc.) were opening unnecessary interconnect communication network sockets, posing security risks and consuming resources. This fix skips calling
cdb_setup/cdb_cleanupfor these processes inInitPostgres, resolving unnecessary network port exposure.Fixed timeout retry logic synchronization issue in
cdbgang_createGang_async: The creation timeout retry logic incdbgang_createGang_asyncwas not synchronized with the reader, causing the reader to prematurely judge abnormal termination when creation was slow due to platform/container/network reasons. This fix synchronizes the retry logic to avoid this issue.Fixed UDP Motion layer exceptions in resource-constrained environments for 10TB-scale 8-parallel processing: Ported UDP motion layer fixes from
ic_udpifc.cto UDP2 and fixed four types of exceptions in resource-constrained environments: 1) Receiver buffer full but sender unaware, causing retransmission packet loss—added buffer full flag and notifies sender to pause; 2) False deadlock detection—adjusted deadlock check timeout and ACK polling logic; 3) Receive queue full dropping packets without waking main thread—ensures main thread is awakened to process backlogged packets; 4) Node execution time mismatch causing packet timeout—added retransmission logs and reset retry count.
Security and TDE
Fixed TDE-related issues: 1) Fixed unnecessary shared memory usage for TDE encrypted buffer context—
BufEncCtx/BufDecCtxshared memory size was not calculated and shared memory was not needed, changed tomallocallocation (TopmemoryContext is NULL during initialization, cannot usepalloc); 2) Fixed database exception when backend panic occurred with TDE enabled—postmaster reinitializes TDE-related shared memory and sets global variables when restarting the database.
System commands and utilities
Fixed assertion failure when executing
\dit+command: Thetranslate_columnsarray had 9 elements but the query returned 10 columns (when displaying both tables and indexes), causing assertion failure. This fix expands the array to 10 elements and correctly trackscols_so_farwhen adding the Storage column.Fixed typo from
EXITStoEXISTSin SQL and code: ChangedDROP TABLE IF EXITSand similar statements as well as code fromEXITStoEXISTS, resolving SQL execution failures caused by syntax errors.
DBCC
Fixed pagination count selection issue: Corrected the logic for selecting pagination counts in the admin console.
Fixed username display error: Resolved issues with incorrect username display in the console interface.
Compilation and build
Fixed LLVM compilation errors during kernel upgrade: Resolved compilation errors introduced when LLVM compilation was not enabled during the kernel upgrade process.
Fixed compilation errors with higher versions of
gcc/g++: Resolved issues includingreturn std::move(temp_value)errors in higher versions ofg++, removed uninitialized variables, and added-Wno-error=array-boundscompilation option to avoid protobuf header file errors.