Connect to a Remote SynxDB
cloudberry_fdw is a foreign data wrapper (FDW) designed for SynxDB. It is based on PostgreSQL’s postgres_fdw and is highly optimized for SynxDB’s Massively Parallel Processing (MPP) architecture.
Compared to the standard postgres_fdw, the core advantage of cloudberry_fdw lies in its MPP-aware data access path. It enables parallel cross-cluster read and write operations, fully leveraging the segment resources of both the source and target clusters. This avoids funneling all data traffic through the coordinator node, thus eliminating performance bottlenecks and significantly improving the efficiency and speed of data interaction between two SynxDB clusters.
Key features include:
Parallel read: Utilizes SynxDB’s
PARALLEL RETRIEVE CURSORmechanism to allow segments in the local cluster to pull data in parallel directly from the corresponding segments in the remote cluster.Parallel write: Uses the
COPYprotocol and thecbcopy_helperutility to enable segments in the local cluster to stream data in parallel to the remote cluster, achieving efficient data insertion, updates, and deletion.Rich SQL functionality: Supports complex queries (for example,
JOIN, aggregation), DML operations (INSERT/UPDATE/DELETE), transactions, and some remote trigger functionalities.
Use cases
cloudberry_fdw provides a high-performance solution for data interaction between multiple SynxDB clusters, primarily for the following scenarios:
High-speed data migration and synchronization: Quickly migrate or synchronize data between SynxDB clusters in different environments (for example, development, testing, production).
Build a data federation: Perform unified federated queries and analysis on data across multiple SynxDB clusters without physically moving the data.
ETL jobs: Serve as a part of the ETL process to efficiently extract data from one SynxDB cluster and load it into another.
Read/write splitting: Offload analytical query workloads to a remote read-only replica cluster, while the primary cluster continues to handle write and transactional workloads.
By leveraging the parallel processing capabilities of the MPP architecture, cloudberry_fdw helps users complete cross-cluster data operations with minimal performance overhead and in the shortest possible time.
How to use
Prerequisites
Before using cloudberry_fdw, you need to complete the following preparations:
Enable the
cloudberry_fdwextension in both the local and remote SynxDB clusters.CREATE EXTENSION cloudberry_fdw;
In the local cluster, create a server object that points to the coordinator node of the remote SynxDB cluster.
CREATE SERVER my_remote_server FOREIGN DATA WRAPPER cloudberry_fdw OPTIONS (host 'remote_host', port 'port', dbname 'remote_db', mpp_execute 'all segments');
Here,
remote_hostandportare the IP address or domain name and port of the remote SynxDB cluster, andremote_dbis the database name of the remote cluster. The optionmpp_execute 'all segments'is mandatory and cannot be changed. Use it as specified.Create a user mapping for a local user to the remote server, providing the login credentials for the remote database.
CREATE USER MAPPING FOR current_user SERVER my_remote_server OPTIONS (user 'remote_user', password 'remote_password');
Here,
remote_userandremote_passwordare the username and password for the remote SynxDB cluster.
Create a foreign table
Use the CREATE FOREIGN TABLE command to define a foreign table on the local cluster, which maps to a real table on the remote cluster. The syntax for creating a foreign table is as follows:
CREATE FOREIGN TABLE local_ft_name (
id INT,
data TEXT,
created_at TIMESTAMP
)
SERVER my_remote_server
OPTIONS (schema_name 'public', table_name 'remote_table_name');
Common options (OPTIONS):
schema_name: The schema name of the remote table.table_name: The table name of the remote table.
Read data in parallel
After creating the foreign table, you can query it just like a local table. cloudberry_fdw will automatically parallelize the query.
Simple query:
SELECT * FROM local_ft_name WHERE id > 100;
Aggregate query:
SELECT date_trunc('day', created_at) AS day, count(*) FROM local_ft_name GROUP BY day ORDER BY day;
Cross-cluster JOIN:
SELECT l.order_id, r.product_name, l.quantity FROM local_orders_table l JOIN local_ft_name r ON l.product_id = r.id;
Write data in parallel
cloudberry_fdw supports parallel data writing, including INSERT, UPDATE, and DELETE.
Parallel
INSERT:The most common use case is to efficiently insert data from a local table into a remote foreign table.
-- Assumes local_source_table is a local data source table. INSERT INTO local_ft_name (id, data, created_at) SELECT id, data, created_at FROM local_source_table;
This operation triggers a parallel
COPYprocess, where data is streamed directly between the segments of the two clusters.Parallel
UPDATE:You can update data in the remote table based on a local table or values.
-- Updates using a WHERE clause. UPDATE local_ft_name SET data = 'updated data' WHERE id = 123; -- Batch updates using a local table (UPDATE ... FROM ...). UPDATE local_ft_name AS remote SET data = local.new_data FROM local_updates_table AS local WHERE remote.id = local.id;
Parallel
DELETE:Similarly, you can efficiently delete data from the remote table.
-- Deletes using a WHERE clause. DELETE FROM local_ft_name WHERE created_at < '2023-01-01'; -- Batch deletes using data from a local table as a condition (DELETE ... USING ...). DELETE FROM local_ft_name AS remote USING local_deletes_table AS local WHERE remote.id = local.id;
Connection management
cloudberry_fdw provides several functions to help manage connections to the remote server.
View active connections:
SELECT * FROM cloudberry_fdw_get_connections();
Disconnect from a specific server:
SELECT cloudberry_fdw_disconnect('my_remote_server');
Disconnect all connections:
SELECT cloudberry_fdw_disconnect_all();
Important notes
Extension required on both sides: The
cloudberry_fdwextension must be successfully installed on both the local and remote SynxDB clusters. Otherwise, you may encounter anERRCODE_UNDEFINED_SCHEMAerror with a message likeEnsure that the extension is also installed on the remote server....Transaction isolation level: Transactions initiated by
cloudberry_fdwin a remote session use theREPEATABLE READisolation level by default (if the local session isSERIALIZABLE, the remote session will also useSERIALIZABLE). This means that multiple scans of the same foreign table within a single transaction will see the same data snapshot, which differs from the localREAD COMMITTEDbehavior.Function and operator pushdown: To perform computations on the remote nodes and reduce data transfer,
cloudberry_fdwattempts to “push down” functions and operators from the query to the remote server. Generally, only built-in,IMMUTABLEfunctions and operators can be successfully pushed down. Custom functions or complex expressions may not be pushed down, which can causecloudberry_fdwto pull more data to the local cluster for computation, affecting performance.Dependency on
cbcopy_helper: The parallel write functionality (INSERT/UPDATE/DELETE) depends on a background utility namedcbcopy_helper. This utility is responsible for establishing a data channel between the segments of the two clusters. Typically,cbcopy_helperis included withcloudberry_fdw, so no manual user intervention is required.Triggers:
AFTERtriggers created on a foreign table are not pushed down for remote execution; they are executed locally.BEFOREtriggers can be pushed down.DDL limitations: Most
ALTER FOREIGN TABLEDDL operations (such as adding/dropping columns) do not propagate to the remote table. You need to manually keep the schemas synchronized between the remote table and the foreign table definition.