Connect to a Remote SynxDB

cloudberry_fdw is a foreign data wrapper (FDW) designed for SynxDB. It is based on PostgreSQL’s postgres_fdw and is highly optimized for SynxDB’s Massively Parallel Processing (MPP) architecture.

Compared to the standard postgres_fdw, the core advantage of cloudberry_fdw lies in its MPP-aware data access path. It enables parallel cross-cluster read and write operations, fully leveraging the segment resources of both the source and target clusters. This avoids funneling all data traffic through the coordinator node, thus eliminating performance bottlenecks and significantly improving the efficiency and speed of data interaction between two SynxDB clusters.

Key features include:

  • Parallel read: Utilizes SynxDB’s PARALLEL RETRIEVE CURSOR mechanism to allow segments in the local cluster to pull data in parallel directly from the corresponding segments in the remote cluster.

  • Parallel write: Uses the COPY protocol and the cbcopy_helper utility to enable segments in the local cluster to stream data in parallel to the remote cluster, achieving efficient data insertion, updates, and deletion.

  • Rich SQL functionality: Supports complex queries (for example, JOIN, aggregation), DML operations (INSERT/UPDATE/DELETE), transactions, and some remote trigger functionalities.

Use cases

cloudberry_fdw provides a high-performance solution for data interaction between multiple SynxDB clusters, primarily for the following scenarios:

  • High-speed data migration and synchronization: Quickly migrate or synchronize data between SynxDB clusters in different environments (for example, development, testing, production).

  • Build a data federation: Perform unified federated queries and analysis on data across multiple SynxDB clusters without physically moving the data.

  • ETL jobs: Serve as a part of the ETL process to efficiently extract data from one SynxDB cluster and load it into another.

  • Read/write splitting: Offload analytical query workloads to a remote read-only replica cluster, while the primary cluster continues to handle write and transactional workloads.

By leveraging the parallel processing capabilities of the MPP architecture, cloudberry_fdw helps users complete cross-cluster data operations with minimal performance overhead and in the shortest possible time.

How to use

Prerequisites

Before using cloudberry_fdw, you need to complete the following preparations:

  1. Enable the cloudberry_fdw extension in both the local and remote SynxDB clusters.

    CREATE EXTENSION cloudberry_fdw;
    
  2. In the local cluster, create a server object that points to the coordinator node of the remote SynxDB cluster.

    CREATE SERVER my_remote_server
        FOREIGN DATA WRAPPER cloudberry_fdw
        OPTIONS (host 'remote_host', port 'port', dbname 'remote_db', mpp_execute 'all segments');
    

    Here, remote_host and port are the IP address or domain name and port of the remote SynxDB cluster, and remote_db is the database name of the remote cluster. The option mpp_execute 'all segments' is mandatory and cannot be changed. Use it as specified.

  3. Create a user mapping for a local user to the remote server, providing the login credentials for the remote database.

    CREATE USER MAPPING FOR current_user
        SERVER my_remote_server
        OPTIONS (user 'remote_user', password 'remote_password');
    

    Here, remote_user and remote_password are the username and password for the remote SynxDB cluster.

Create a foreign table

Use the CREATE FOREIGN TABLE command to define a foreign table on the local cluster, which maps to a real table on the remote cluster. The syntax for creating a foreign table is as follows:

CREATE FOREIGN TABLE local_ft_name (
    id INT,
    data TEXT,
    created_at TIMESTAMP
)
SERVER my_remote_server
OPTIONS (schema_name 'public', table_name 'remote_table_name');

Common options (OPTIONS):

  • schema_name: The schema name of the remote table.

  • table_name: The table name of the remote table.

Read data in parallel

After creating the foreign table, you can query it just like a local table. cloudberry_fdw will automatically parallelize the query.

  • Simple query:

    SELECT * FROM local_ft_name WHERE id > 100;
    
  • Aggregate query:

    SELECT date_trunc('day', created_at) AS day, count(*)
    FROM local_ft_name
    GROUP BY day
    ORDER BY day;
    
  • Cross-cluster JOIN:

    SELECT
        l.order_id,
        r.product_name,
        l.quantity
    FROM local_orders_table l
    JOIN local_ft_name r ON l.product_id = r.id;
    

Write data in parallel

cloudberry_fdw supports parallel data writing, including INSERT, UPDATE, and DELETE.

  • Parallel INSERT:

    The most common use case is to efficiently insert data from a local table into a remote foreign table.

    -- Assumes local_source_table is a local data source table.
    INSERT INTO local_ft_name (id, data, created_at)
    SELECT id, data, created_at FROM local_source_table;
    

    This operation triggers a parallel COPY process, where data is streamed directly between the segments of the two clusters.

  • Parallel UPDATE:

    You can update data in the remote table based on a local table or values.

    -- Updates using a WHERE clause.
    UPDATE local_ft_name
    SET data = 'updated data'
    WHERE id = 123;
    
    -- Batch updates using a local table (UPDATE ... FROM ...).
    UPDATE local_ft_name AS remote
    SET data = local.new_data
    FROM local_updates_table AS local
    WHERE remote.id = local.id;
    
  • Parallel DELETE:

    Similarly, you can efficiently delete data from the remote table.

    -- Deletes using a WHERE clause.
    DELETE FROM local_ft_name
    WHERE created_at < '2023-01-01';
    
    -- Batch deletes using data from a local table as a condition (DELETE ... USING ...).
    DELETE FROM local_ft_name AS remote
    USING local_deletes_table AS local
    WHERE remote.id = local.id;
    

Connection management

cloudberry_fdw provides several functions to help manage connections to the remote server.

  • View active connections:

    SELECT * FROM cloudberry_fdw_get_connections();
    
  • Disconnect from a specific server:

    SELECT cloudberry_fdw_disconnect('my_remote_server');
    
  • Disconnect all connections:

    SELECT cloudberry_fdw_disconnect_all();
    

Important notes

  • Extension required on both sides: The cloudberry_fdw extension must be successfully installed on both the local and remote SynxDB clusters. Otherwise, you may encounter an ERRCODE_UNDEFINED_SCHEMA error with a message like Ensure that the extension is also installed on the remote server....

  • Transaction isolation level: Transactions initiated by cloudberry_fdw in a remote session use the REPEATABLE READ isolation level by default (if the local session is SERIALIZABLE, the remote session will also use SERIALIZABLE). This means that multiple scans of the same foreign table within a single transaction will see the same data snapshot, which differs from the local READ COMMITTED behavior.

  • Function and operator pushdown: To perform computations on the remote nodes and reduce data transfer, cloudberry_fdw attempts to “push down” functions and operators from the query to the remote server. Generally, only built-in, IMMUTABLE functions and operators can be successfully pushed down. Custom functions or complex expressions may not be pushed down, which can cause cloudberry_fdw to pull more data to the local cluster for computation, affecting performance.

  • Dependency on cbcopy_helper: The parallel write functionality (INSERT/UPDATE/DELETE) depends on a background utility named cbcopy_helper. This utility is responsible for establishing a data channel between the segments of the two clusters. Typically, cbcopy_helper is included with cloudberry_fdw, so no manual user intervention is required.

  • Triggers: AFTER triggers created on a foreign table are not pushed down for remote execution; they are executed locally. BEFORE triggers can be pushed down.

  • DDL limitations: Most ALTER FOREIGN TABLE DDL operations (such as adding/dropping columns) do not propagate to the remote table. You need to manually keep the schemas synchronized between the remote table and the foreign table definition.