Backing up and restoring registry databases

The Postgres Operator by Crunchy Data (PGO), used for managing registry databases, includes pgBackRest, an open-source solution for backup and restoring. PGO facilitates the execution of many common tasks required throughout the lifecycle of a database, including:

  • Configuring schedules for automatic backup and retention policies.

  • Backing up data to multiple locations.

    • Supporting backup storage in Kubernetes, AWS S3 (or S3-compatible systems like MinIO), Google Cloud Storage (GCS), and Azure Blob Storage.

  • Performing one-time backups

  • Performing point-in-time recovery (PITR).

  • Cloning data to a new database instance.

  • And more

1. Configuring backup

By default, the Postgres operational and analytical clusters are configured to continuously archive write-ahead logs (WAL) and create a full backup once a day.The retention policy is set to keep one full backup, so when a new backup is created, pgBackRest cleans up the previous backup and its associated WAL files.

pgBackRest provides various types of backups, retention policy configurations, and backup schedule options to customize the system according to desired recovery point objectives (RPO), recovery time objectives (RTO), and space utilization requirements.

1.1. Managing backup schedules

Schedules can be configured for all three types of backups:

  • full: backup of the entire Postgres cluster. This is the largest among all types of backups.

  • differential backup of all data since the last full backup.

  • incremental: backup of all data since the last full, differential, or incremental backup.

Backup schedules are stored in the spec.backups.pgbackrest.repos.schedules section. Each value in this section takes a string in cron format that defines the backup schedule.

For example, the backup policy consists of creating a full backup every week on Sunday at 1 AM and creating incremental backups every day at 1 AM, except on Sundays. The configuration should look similar to:

spec:
  backups:
    pgbackrest:
      repos:
      - name: repo1
        schedules:
          full: "0 1 * * 0"
          incremental: "0 1 * * 1-6"

To manage scheduled backups, PGO will create several Kubernetes CronJobs that perform backups at the specified intervals. The backups will use the configuration you provided.

1.2. Managing backup retention policies

PGO allows you to set backup retention policies for full and differential backups. When the retention period of a full backup expires, pgBackRest will clean up all associated backups and WAL files. For example, if you have a full backup with four associated incremental backups, when the retention period of the full backup expires, the retention period of all its incremental backups also expires.

There are two types of repo1-retention-full-type policies that can be set:

  • count: based on the number of backups you want to retain. This is the default value.

  • time: based on the time in days. Full backups older than repo-retention-full will be removed from the repository if there is at least one backup that equals or exceeds the repo-retention-full parameter

For example, if we want to retain full backups for 14 days, the most convenient way to do this is through the spec.backups.pgbackrest.global section:

spec:
  backups:
    pgbackrest:
      global:
        repo1-retention-full: "14"
        repo1-retention-full-type: time

In this case, if we have 2 backups: one 12-day-old and another 15-day-old, no backups will be deleted because deleting the 15-day-old backup would leave only the 12-day-old backup, which would violate the retention policy of having at least one backup of 14-day age before removing older backups.

It may be the case that the daily volume of WAL logs is significant, and saving storage space in the backup repository is more important than the ability to perform point-in-time recovery (PITR) to a significant depth. To configure the retention of WAL logs for a certain number of backups, pgBackRest has the following parameters:

  • repo1-retention-archive: The number of backups for which WAL will be retained.

  • repo1-retention-archive-type: The type of backup for retaining WAL (incr, diff, full). If set to full, pgBackRest will retain WAL for the number of full backups defined by repo-retention-archive. If set to diff, pgBackRest will retain archive logs for the number of full and differential backups defined by repo-retention-archive. If set to incr, pgBackRest will retain archive logs for the number of full, differential, and incremental backups defined by repo-retention-archive.

1.3. Creating a one-time backup

First, you need to configure the spec.backups.pgbackrest.manual section to be able to perform a one-time backup. It contains information about the type of backup you want to create and any other pgBackRest configuration parameters.

Let’s configure a custom resource for creating a one-time full backup:

spec:
  backups:
    pgbackrest:
      manual:
        repoName: repo1
        options:
         - --type=full

This doesn’t initiate the backup yet - you need to add the postgres-operator.crunchydata.com/pgbackrest-backup annotation to your custom resource. The best way to configure this annotation is by using a timestamp label, so you know when you initiated the backup.

For example, for an operational cluster, you can execute the following command to initiate a one-time backup:

kubectl annotate -n postgres-operator postgrescluster operational \
  postgres-operator.crunchydata.com/pgbackrest-backup="$(date)"

PGO will detect this annotation and create a new disposable backup task!

If you plan to make one-time backups with similar parameters in the future, you can leave them in the specification; simply update the annotation to a different value during the next backup.

To re-execute the above command, you will need to add the --overwrite flag to update the annotation value, like in the below example:

kubectl annotate -n postgres-operator postgrescluster operational --overwrite \
  postgres-operator.crunchydata.com/pgbackrest-backup="$(date)"

2. Restoring

2.1. Point-in-Time Recovery (PITR) or restoring a backup copy

To restore the database to a specific date and time, you need to add the following to the spec.backups.pgbackrest section:

spec:
  backups:
    pgbackrest:
      restore:
        enabled: true
        repoName: repo1
        options:
        - --type=time
        - --target="2022-06-09 14:15:11-04"

where --target is the target time for PITR. An example of a restoring target is 2022-06-09 14:15:11-04.

To restore the database to a specific backup, you need to add the following to the spec.backups.pgbackrest section:

spec:
  backups:
    pgbackrest:
      restore:
        enabled: true
        repoName: repo1
        options:
        - --type=immediate
        - --set=20220602-073427F_20220602-073507I

where --set is the name of the target backup. You can view the list of available backups in the S3 backup storage bucket or by executing the command pgbackrest info --stanza=db in the database console.

Now, to initiate the restoring process, you need to add the annotation postgres-operator.crunchydata.com/pgbackrest-restore as follows:

kubectl annotate -n postgres-operator postgrescluster operational --overwrite \
  postgres-operator.crunchydata.com/pgbackrest-restore=id1

Once the restoring process is complete, you can disable the added configuration:

spec:
  backups:
    pgbackrest:
      restore:
        enabled: false

All these operations need to be performed on both the operational and analytical databases. To synchronize data between the operational and analytical databases, perform Synchronize data on the analytical cluster.

2.2. Cloning from a backup

To clone a database from a backup, you need to add the spec.dataSource section to the manifest that creates a new database instance. To restore to a specific point in time, the section will look similar to the below example:

spec:
  dataSource:
    pgbackrest:
      stanza: db
      configuration:
      - secret:
          name: s3-conf
      global:
        repo1-path: "/postgres-backup/source_system/operational"
        repo1-s3-uri-style: path
        repo1-storage-verify-tls: n
        repo1-storage-port: "9000"
      options:
      - --type=time
      - --target="2022-06-09 14:15:11-04"
      repo:
        name: repo1
        s3:
          bucket: "bucketName"
          endpoint: "endpoint"
          region: "us-east-1"

To restore the database from a specific backup, in the spec.dataSource.pgbackrest.options section you need to change the restoring type and add the name of the backup copy:

      options:
      - --type=immediate
      - --set=20220602-073427F_20220602-073507I

All of these operations need to be performed on both the operational and analytical databases.

To restore data consistency between the operational and analytical databases, perform the following: Synchronize data on the analytical cluster

2.3. Synchronize data on the analytical cluster

Since the operational and analytical databases are asynchronously replicated, their backups are not synchronized. Therefore, even when restoring to the same point in time, data consistency between these databases cannot be guaranteed. To bring the restored databases into a synchronized state, perform the following steps on the registry database of the analytical instance:

  • Disable the subscription: ALTER SUBSCRIPTION operational_sub DISABLE;

  • Truncate all tables included in the subscription: SELECT 'TRUNCATE' ||' '||srrelid::regclass ||' '||'CASCADE;' FROM pg_subscription_rel \gexec;

  • Drop the subscription: DROP SUBSCRIPTION operational_sub;

  • Create the subscription: create subscription operational_sub connection 'host=OperationalHost user=postgres dbname=registry password=XXXXXX' PUBLICATION analytical_pub WITH(create_slot=false,slot_name=operational_sub);

Once the initial table synchronization on the analytical cluster is completed, logical replication will automatically be enabled, and the operational and analytical clusters will be in a synchronized state.

3. Continuous restoring on the standby cluster

Advanced high availability and disaster recovery strategies involve distributing database clusters across multiple data centers to maximize uptime. PGO provides ways to deploy PostgreSQL clusters that can span multiple Kubernetes clusters using an external storage system.

3.1. Creating a standby cluster

The backup repository of the primary cluster needs to be accessible to the standby cluster. When creating a standby cluster, you need to add the following to the manifest with standby.enabled set to true and the backup s3 repository parameters for the primary cluster:

spec:
  backups:
    pgbackrest:
      image: registry.developers.crunchydata.com/crunchydata/crunchy-pgbackrest:ubi8-2.40-1
      repos:
      - name: repo1
        s3:
          bucket: "bucket"
          endpoint: "primary.endpoint"
          region: "ca-central-1"
  standby:
    enabled: true
    repoName: repo1

3.2. Promoting the standby cluster

Before promoting the standby cluster, we need to ensure that we don’t accidentally create a "split-brain" scenario where two primary instances try to write to the same storage. If the primary cluster is still active, make sure you have shut it down before attempting to promote the standby cluster.

When the primary cluster becomes inactive, we can promote the standby cluster by removing or disabling its spec.standby section:

spec:
  standby:
    enabled: false

This change triggers the promotion of the standby cluster to the primary status, and the cluster starts accepting writes.