CBDR

CBDR is a backup and recovery tool for SynxDB and Apache Cloudberry™ (Incubating), built on top of WAL-G. It provides a simple command-line interface for performing backup and recovery operations, helping ensure data safety and enabling disaster recovery.

CBDR continuous archiving recovery is a disaster recovery solution based on WAL log archiving. It combines physical full backups with WAL archive files to achieve continuous data protection and Point-in-Time Recovery (PITR) for the database cluster. CBDR makes off-site disaster recovery for MPP databases possible. Users can deploy a cluster with fewer servers but the same number of instances in a disaster recovery site. You can use CBDR for incremental data replication and recovery across data centers, and the disaster recovery cluster can also provide read-only services (hot standby), thus making comprehensive use of system resources.

CBDR offers the following features:

Full backup: Supports full backup of the entire database cluster.
Incremental backup: Backs up only the changes made since the last backup.
Backup listing: Displays all available backups.
Data recovery: Restores data from a specified backup.
Continuous archiving and recovery (PITR): Achieves continuous data protection and point-in-time recovery through physical full backups and WAL archive files, providing better RTO and RPO.
Hot standby: Supports providing read-only query services on the disaster recovery cluster to improve resource utilization.
Storage support: Supports only S3-compatible object storage, not local storage.
Configuration management: Generates and manages the configuration files needed for backup and restore.

Tip

Compared to peer tools like gpbackup and gprestore, CBDR also supports storing backups to S3, multiple compression algorithms (lz4, lzma, zstd, brotli), and backup encryption.

Full backup and restore procedure

Before using CBDR to back up or restore a SynxDB cluster, make sure the following requirements are met:

SynxDB is properly installed and running.
The wal-g binary is installed under /usr/local/bin/ or /usr/bin/.
If using S3 storage, the appropriate credentials have been configured.

The general procedure for performing a backup using CBDR is as follows:

Backup process

Create a backup configuration file named config.yaml. Assume the file is located at /path/to/config.yaml. For the configuration file template, see Configuration file reference.
Distribute the configuration file to all Segment nodes and update the archive command in postgresql.conf:
```
cbdr configure backup --config=/path/to/config.yaml
```
Restart the SynxDB cluster:
```
gpstop -ari
```

Perform the backup:

cbdr backup --config=/path/to/config.yaml

View the list of available backups:

cbdr backup-list --config=/path/to/config.yaml

Restore process

Prepare a new SynxDB cluster as the target for restoration, and create the required configuration file config.yaml. Assume the file is located at /path/to/config.yaml.
Generate the restore configuration file restore_cfg.json. Before running this command, make sure the new cluster is reachable:
```
cbdr configure restore --config=/path/to/config.yaml --restore-config=/path/to/restore_cfg.json
```

Delete all existing data directories on the new cluster, including both Coordinator and Segment nodes. For example:

rm -rf /data202502111728221784/coordinator/gpseg-1
rm -rf /data202502111728221784/segment/gpseg-0
rm -rf /data202502111728221784/segment/gpseg-1
rm -rf /data202502111728221784/segment/gpseg-2

Perform the restore:

cbdr restore --config=/path/to/config.yaml --restore-config=/path/to/restore_cfg.json

Start the Coordinator node in admin mode and update the gp_segment_configuration system table to set the correct hostname, address, mirror, and other fields:
```
gpstart -c -a
```
If the following error occurs during startup, use ps -ef | grep postgres to check if the Coordinator process is running. If it is, you can safely ignore the error:
```
gpstart failed. (Reason='connection to server at "localhost" (127.0.0.1), port 7000 failed: FATAL:  the database system is not accepting connections
DETAIL:  Hot standby mode is disabled.')
```
Once the Coordinator starts successfully, you can query the segment configuration:
```
select * from gp_segment_configuration;
```
If you have exited admin mode, restart the cluster in admin mode:
```
gpstop -c -a
```

Start the full cluster:

gpstart -a

You might encounter the following error during startup:

invalid IP mask "trust": Name or service not known

This happens because the pg_hba.conf file generated by WAL-G is missing CIDR masks (for example, /32). You need to manually fix the configuration files for the Coordinator, Segment, and Mirror nodes.

Example of incorrect configuration:

host    all             all             192.168.199.42              trust
host    all             gpadmin         192.168.192.159             trust
host    all             gpadmin         192.168.197.5               trust

Corrected configuration:

host    all             all             192.168.199.42/32           trust
host    all             gpadmin         192.168.192.159/32          trust
host    all             gpadmin         192.168.197.5/32            trust

Attention

Before backing up, make sure the database is running, configuration is correct, and there is enough available disk space.
Prepare the restore environment in advance. Do not interrupt the restore process. After restoring, always verify data integrity.
For storage management, regularly clean up invalid backups and monitor storage usage. If using S3, ensure a stable network connection.

Incremental backup and restore procedure

Before performing an incremental backup, make sure that you have completed at least one full backup.

On the source cluster, run the following command to start an incremental backup based on a specific full backup. Example:
```
cbdr backup --config=/path/to/config.yaml --delta-from-name=backup_20250409T153036Z
```
Attention

If you have not run the cbdr configure backup command on the current machine, or if you have run it before but the configuration file has changed, run cbdr configure backup --config=/path/to/config.yaml first. This command distributes the /path/to/config.yaml file from the coordinator node to the same file path on all segment nodes.

If you have run the cbdr configure backup command on the current machine, and the configuration file has not changed since then, you can simply run cbdr backup.

View all available backups (including full and incremental):

cbdr backup-list --config=/path/to/config.yaml

Sample output:

backup_name                                modified                  wal_file_name            storage_name
backup_20250409T153036Z                    2025-04-09T15:31:36+08:00 ZZZZZZZZZZZZZZZZZZZZZZZZ default
backup_20250409T153136Z_D_20250409T153036Z 2025-04-09T15:32:36+08:00 ZZZZZZZZZZZZZZZZZZZZZZZZ default

Prepare a new SynxDB cluster as the target for restore. The preparation process is the same as for full backup restore.

Run the restore on the new cluster:

cbdr restore backup_20250409T153136Z_D_20250409T153036Z \
    --config=/path/to/config.yaml \
    --restore-config=/path/to/restore_cfg.json

Continuous archiving recovery (PITR) and hot standby procedure

Compared to traditional incremental backups, continuous archiving recovery offers a more lightweight and frequent restore point creation capability, providing better Recovery Time Objective (RTO) and Recovery Point Objective (RPO).

This procedure allows you to set up a disaster recovery (DR) cluster that continuously pulls WAL archive logs from the primary cluster and can optionally provide read-only query services (hot standby).

Steps for backing up primary cluster

Prepare configuration file: Prepare the config.yaml configuration file. For specific parameters, refer to Configuration file reference.
Configure backup: Distribute the backup configuration file to all segment nodes and modify the archive command in postgresql.conf.
Restart the cluster:
```
gpstop -ari
```
Create a base backup: Perform a full backup to serve as the base for continuous archiving.
```
cbdr backup --full=true --config=/path/to/config.yaml
```
Create restore points on demand: Create restore points on the primary cluster as needed. A restore point is a specific time marker to which the DR cluster can recover.
```
cbdr create-restore-point "rp1" --config=/path/to/config.yaml
cbdr create-restore-point "rp2" --config=/path/to/config.yaml
```

Steps for restoring recovery cluster and setting up hot standby

Generate recovery configuration: On the new DR cluster, generate the recovery configuration file restore_cfg.json.
```
cbdr configure restore --config=/path/to/config.yaml --restore-config=/path/to/restore_cfg.json
```
Perform initial restore: Ensure that all data directories on the recovery cluster are empty, then perform an initial restore to the latest base backup.
```
rm -rf /path/to/all/data/dirs/*
cbdr restore --restore-config=/path/to/restore_cfg.json --config=/path/to/config.yaml
```
Set up hot standby mode: After running this command, the recovery cluster will start up in hot standby mode and can serve read-only queries.
```
cbdr read-replica --config=/path/to/config.yaml --restore-config=/path/to/restore_cfg.json
```

Continuous recovery (track restore points): Use the follow-primary command to make the recovery cluster track the restore points of the primary cluster.

# The first time you run follow-primary, the cluster will start (if not already running),
# recover to the specified restore point (e.g., "rp1"), and then pause.
# At this point, it can accept read-only queries.
cbdr follow-primary "rp1" --config=/path/to/config.yaml --restore-config=/path/to/restore_cfg.json

# ...after a new restore point "rp2" is created on the primary cluster...

# 1. First, specify the next target restore point "rp2"
# (This command does not immediately start replaying logs)
cbdr follow-primary "rp2" --config=/path/to/config.yaml --restore-config=/path/to/restore_cfg.json

# 2. Then, run replay-resume to make the cluster continue replaying logs from "rp1"
# until it reaches "rp2" and pauses again.
cbdr replay-resume --config=/path/to/config.yaml --restore-config=/path/to/restore_cfg.json

# ...repeat this process to track "rp3", "rp4", etc.

(Optional) Promote to primary: When the primary cluster fails and you need to switch the DR cluster to be the new primary, run this command.

Attention

After running promote, the cluster will become read-write and stop replaying WAL logs. This is an irreversible operation. The cluster can no longer be used as a DR cluster for archive recovery.
```
cbdr promote --config=/path/to/config.yaml --restore-config=/path/to/restore_cfg.json
gpstop -ari
```

Configuration file reference

CBDR uses YAML configuration files. Currently, it only supports configuring S3 storage parameters and does not support storing backups on the local file system. The following is a sample configuration.

# Database connection settings
PGHOST: "localhost"
PGPORT: 7000
PGUSER: "gpadmin"
PGDATABASE: "postgres"

# Concurrency settings
GOMAXPROCS: 6

# Relative path to the recovery configuration file
WALG_GP_RELATIVE_RECOVERY_CONF_PATH: "conf.d/recovery.conf"

# Polling interval for segment status
WALG_GP_SEG_POLL_INTERVAL: "1m"

# Compression method: supports lz4, lzma, zstd, brotli
WALG_COMPRESSION_METHOD: "lz4"

# Upload and download concurrency
WALG_UPLOAD_CONCURRENCY: 5
WALG_DOWNLOAD_CONCURRENCY: 5

# Retry attempts for file download
WALG_DOWNLOAD_FILE_RETRIES: 5

# Required settings for using S3 storage
WALE_S3_PREFIX: "xxxxxxxxxxxxx"
AWS_ENDPOINT: "xxxxxxxxxxxxx"
AWS_SECRET_ACCESS_KEY: "xxxxxxxxxxxxx"
AWS_ACCESS_KEY_ID: "xxxxxxxxxxxxx"

# Directory for log files
WALG_GP_LOGS_DIR: "/var/log/cbdr"

# Incremental backup limit: maximum number of incremental backups allowed
# after a full backup. For example, if set to 6, a new full backup will be
# forced after 6 consecutive incremental backups to prevent long backup chains.
WALG_DELTA_MAX_STEPS: 10

Command usage

The basic syntax for running CBDR commands is:

cbdr <command> [options] --config=<config_file>

The following sections describe the main usage of each CBDR command.

Configure commands

The cbdr configure command is used to distribute the backup configuration to all Segment nodes or to generate a restore configuration file.

# Distribute the backup configuration and update the archive command in postgresql.conf
cbdr configure backup --config=<config_file>

# Generate the restore configuration file
cbdr configure restore --config=<config_file> --restore-config=<restore_config_file>

The restore configuration file can be automatically generated using cbdr configure restore (requires the cluster to be reachable), or it can be written manually. A sample JSON format is shown below:

{
   "segments": {
      "-1": {
            "hostname": "localhost",
            "port": 7000,
            "data_dir": "/tmp/tests/gpdemo/datadirs1/qddir/demoDataDir-1"
      },
      "0": {
            "hostname": "localhost",
            "port": 7002,
            "data_dir": "/tmp/tests/gpdemo/datadirs1/dbfast1/demoDataDir0"
      },
      "1": {
            "hostname": "localhost",
            "port": 7003,
            "data_dir": "/tmp/tests/gpdemo/datadirs1/dbfast2/demoDataDir1"
      },
      "2": {
            "hostname": "localhost",
            "port": 7004,
            "data_dir": "/tmp/tests/gpdemo/datadirs1/dbfast3/demoDataDir2"
      }
   }
}

Cluster backup

The cbdr backup command is used to back up the database cluster. Syntax:

cbdr backup [options] --config=<config_file>

Optional parameters:

--permanent: Marks the backup as permanent. It cannot be deleted unless forced.
--full: Performs a full backup.
--add-user-data=<json>: Attaches custom metadata to the backup in JSON format.
--delta-from-user-data=<json>: Specifies the base backup for incremental backup using metadata.
--delta-from-name=<backup_name>: Specifies the base backup for incremental backup by name.

View backup list

The cbdr backup-list command displays the list of available backups.

cbdr backup-list --config=<config_file> [options]

Optional parameters:

--pretty: Outputs the list in a more readable format.
--json: Outputs the list in JSON format.
--detail: Displays detailed information.

Restore command

The cbdr restore command restores a backup to the target cluster. Usage:

cbdr restore <backup_name> --config=<config_file> [--restore-config=<restore_config_file>] [--target-user-data=<json>]

Parameter descriptions:

backup_name: (Optional) The name of the backup to restore. If omitted, the latest backup is restored by default.
--restore-config: Path to the restore configuration file.
--target-user-data: Restores a backup that matches the specified user-defined metadata.
--restore-point: (Deprecated, use the follow-primary procedure) Restore to a specific restore point.

Delete command

The cbdr delete command removes an existing backup.

cbdr delete --config=<config_file> [--confirm] [--force-delete]

Parameter descriptions:

--confirm: Must be explicitly set to execute the deletion.
--force-delete: Forces deletion even for permanent backups.

Continuous recovery and restore point commands

These commands are used for the “continuous archiving recovery (PITR) and hot standby” procedure.

Create a restore point: Create a time marker on the primary cluster for continuous recovery.
```
cbdr create-restore-point <restore-point-name> --config=<config_file>
```

View restore points: View all created restore points.

cbdr restore-point-list --config=<config_file> [--pretty] [--json] [--detail]

Set hot standby mode: Run on the recovery cluster to put it in hot standby mode, allowing it to handle read-only queries.
```
cbdr read-replica --config=<config_file> --restore-config=<restore_config_file>
```
Continuous recovery: Run on the recovery cluster to make it follow the primary cluster’s restore points.
```
cbdr follow-primary <restore-point-name> --config=<config_file> --restore-config=<restore_config_file>
```
- The first time you run this, it will start the cluster (if not already running) and recover to the specified restore point, then pause.
- Subsequent executions are used to specify the next target restore point (but recovery does not start immediately).
Resume replay: After specifying a new restore point with follow-primary on the recovery cluster, run this command to make the cluster continue replaying WAL logs to the next target restore point.
```
cbdr replay-resume --config=<config_file> --restore-config=<restore_config_file>
```
Promote to primary: Run on the recovery cluster to promote the hot standby cluster to a primary cluster, making it read-write. This operation is irreversible.
```
cbdr promote --config=<config_file> --restore-config=<restore_config_file>
```