Backing up and restoring registry databases
The Postgres Operator by Crunchy Data (PGO), used for managing registry databases, includes pgBackRest, an open-source solution for backup and restoring. PGO facilitates the execution of many common tasks required throughout the lifecycle of a database, including:
-
Configuring schedules for automatic backup and retention policies.
-
Backing up data to multiple locations.
-
Supporting backup storage in Kubernetes, AWS S3 (or S3-compatible systems like MinIO), Google Cloud Storage (GCS), and Azure Blob Storage.
-
-
Performing one-time backups
-
Performing point-in-time recovery (PITR).
-
Cloning data to a new database instance.
-
And more
1. Configuring backup
By default, the Postgres operational and analytical clusters are configured to continuously archive write-ahead logs (WAL) and create a full backup once a day.The retention policy is set to keep one full backup, so when a new backup is created, pgBackRest cleans up the previous backup and its associated WAL files.
pgBackRest provides various types of backups, retention policy configurations, and backup schedule options to customize the system according to desired recovery point objectives (RPO), recovery time objectives (RTO), and space utilization requirements.
1.1. Managing backup schedules
Schedules can be configured for all three types of backups:
-
full
: backup of the entire Postgres cluster. This is the largest among all types of backups. -
differential
backup of all data since the last full backup. -
incremental
: backup of all data since the last full, differential, or incremental backup.
Backup schedules are stored in the spec.backups.pgbackrest.repos.schedules
section. Each value in this section takes a string in cron format that defines the backup schedule.
For example, the backup policy consists of creating a full backup every week on Sunday at 1 AM and creating incremental backups every day at 1 AM, except on Sundays. The configuration should look similar to:
spec:
backups:
pgbackrest:
repos:
- name: repo1
schedules:
full: "0 1 * * 0"
incremental: "0 1 * * 1-6"
To manage scheduled backups, PGO will create several Kubernetes CronJobs that perform backups at the specified intervals. The backups will use the configuration you provided.
1.2. Managing backup retention policies
PGO allows you to set backup retention policies for full and differential backups. When the retention period of a full backup expires, pgBackRest will clean up all associated backups and WAL files. For example, if you have a full backup with four associated incremental backups, when the retention period of the full backup expires, the retention period of all its incremental backups also expires.
There are two types of repo1-retention-full-type
policies that can be set:
-
count
: based on the number of backups you want to retain. This is the default value. -
time
: based on the time in days. Full backups older thanrepo-retention-full
will be removed from the repository if there is at least one backup that equals or exceeds therepo-retention-full
parameter
For example, if we want to retain full backups for 14 days, the most convenient way to do this is through the spec.backups.pgbackrest.global
section:
spec:
backups:
pgbackrest:
global:
repo1-retention-full: "14"
repo1-retention-full-type: time
In this case, if we have 2 backups: one 12-day-old and another 15-day-old, no backups will be deleted because deleting the 15-day-old backup would leave only the 12-day-old backup, which would violate the retention policy of having at least one backup of 14-day age before removing older backups.
It may be the case that the daily volume of WAL logs is significant, and saving storage space in the backup repository is more important than the ability to perform point-in-time recovery (PITR) to a significant depth. To configure the retention of WAL logs for a certain number of backups, pgBackRest has the following parameters:
-
repo1-retention-archive
: The number of backups for which WAL will be retained.
-
repo1-retention-archive-type
: The type of backup for retaining WAL(incr, diff, full)
. If set tofull
, pgBackRest will retain WAL for the number of full backups defined by repo-retention-archive. If set todiff
, pgBackRest will retain archive logs for the number of full and differential backups defined by repo-retention-archive. If set toincr
, pgBackRest will retain archive logs for the number of full, differential, and incremental backups defined by repo-retention-archive.
1.3. Creating a one-time backup
First, you need to configure the spec.backups.pgbackrest.manual
section to be able to perform a one-time backup. It contains information about the type of backup you want to create and any other pgBackRest configuration parameters.
Let’s configure a custom resource for creating a one-time full backup:
spec:
backups:
pgbackrest:
manual:
repoName: repo1
options:
- --type=full
This doesn’t initiate the backup yet - you need to add the postgres-operator.crunchydata.com/pgbackrest-backup
annotation to your custom resource. The best way to configure this annotation is by using a timestamp label, so you know when you initiated the backup.
For example, for an operational cluster, you can execute the following command to initiate a one-time backup:
kubectl annotate -n postgres-operator postgrescluster operational \
postgres-operator.crunchydata.com/pgbackrest-backup="$(date)"
PGO will detect this annotation and create a new disposable backup task!
If you plan to make one-time backups with similar parameters in the future, you can leave them in the specification; simply update the annotation to a different value during the next backup.
To re-execute the above command, you will need to add the --overwrite
flag to update the annotation value, like in the below example:
kubectl annotate -n postgres-operator postgrescluster operational --overwrite \
postgres-operator.crunchydata.com/pgbackrest-backup="$(date)"
2. Restoring
2.1. Point-in-Time Recovery (PITR) or restoring a backup copy
To restore the database to a specific date and time, you need to add the following to the spec.backups.pgbackrest
section:
spec:
backups:
pgbackrest:
restore:
enabled: true
repoName: repo1
options:
- --type=time
- --target="2022-06-09 14:15:11-04"
where --target
is the target time for PITR. An example of a restoring target is 2022-06-09 14:15:11-04
.
To restore the database to a specific backup, you need to add the following to the spec.backups.pgbackrest section
:
spec:
backups:
pgbackrest:
restore:
enabled: true
repoName: repo1
options:
- --type=immediate
- --set=20220602-073427F_20220602-073507I
where --set
is the name of the target backup. You can view the list of available backups in the S3 backup storage bucket or by executing the command pgbackrest info --stanza=db
in the database console.
Now, to initiate the restoring process, you need to add the annotation postgres-operator.crunchydata.com/pgbackrest-restore
as follows:
kubectl annotate -n postgres-operator postgrescluster operational --overwrite \
postgres-operator.crunchydata.com/pgbackrest-restore=id1
Once the restoring process is complete, you can disable the added configuration:
spec:
backups:
pgbackrest:
restore:
enabled: false
All these operations need to be performed on both the operational and analytical databases. To synchronize data between the operational and analytical databases, perform Synchronize data on the analytical cluster. |
2.2. Cloning from a backup
To clone a database from a backup, you need to add the spec.dataSource
section to the manifest that creates a new database instance.
To restore to a specific point in time, the section will look similar to the below example:
spec:
dataSource:
pgbackrest:
stanza: db
configuration:
- secret:
name: s3-conf
global:
repo1-path: "/postgres-backup/source_system/operational"
repo1-s3-uri-style: path
repo1-storage-verify-tls: n
repo1-storage-port: "9000"
options:
- --type=time
- --target="2022-06-09 14:15:11-04"
repo:
name: repo1
s3:
bucket: "bucketName"
endpoint: "endpoint"
region: "us-east-1"
To restore the database from a specific backup, in the spec.dataSource.pgbackrest.options
section you need to change the restoring type and add the name of the backup copy:
options:
- --type=immediate
- --set=20220602-073427F_20220602-073507I
All of these operations need to be performed on both the operational and analytical databases. To restore data consistency between the operational and analytical databases, perform the following: Synchronize data on the analytical cluster |
2.3. Synchronize data on the analytical cluster
Since the operational and analytical databases are asynchronously replicated, their backups are not synchronized. Therefore, even when restoring to the same point in time, data consistency between these databases cannot be guaranteed. To bring the restored databases into a synchronized state, perform the following steps on the registry
database of the analytical instance:
-
Disable the subscription:
ALTER SUBSCRIPTION operational_sub DISABLE
; -
Truncate all tables included in the subscription:
SELECT 'TRUNCATE' ||' '||srrelid::regclass ||' '||'CASCADE;' FROM pg_subscription_rel \gexec
; -
Drop the subscription:
DROP SUBSCRIPTION operational_sub
; -
Create the subscription:
create subscription operational_sub connection 'host=OperationalHost user=postgres dbname=registry password=XXXXXX' PUBLICATION analytical_pub WITH(create_slot=false,slot_name=operational_sub)
;
Once the initial table synchronization on the analytical cluster is completed, logical replication will automatically be enabled, and the operational and analytical clusters will be in a synchronized state.
3. Continuous restoring on the standby cluster
Advanced high availability and disaster recovery strategies involve distributing database clusters across multiple data centers to maximize uptime. PGO provides ways to deploy PostgreSQL clusters that can span multiple Kubernetes clusters using an external storage system.
3.1. Creating a standby cluster
The backup repository of the primary cluster needs to be accessible to the standby cluster.
When creating a standby cluster, you need to add the following to the manifest with standby.enabled
set to true and the backup s3 repository parameters for the primary cluster:
spec:
backups:
pgbackrest:
image: registry.developers.crunchydata.com/crunchydata/crunchy-pgbackrest:ubi8-2.40-1
repos:
- name: repo1
s3:
bucket: "bucket"
endpoint: "primary.endpoint"
region: "ca-central-1"
standby:
enabled: true
repoName: repo1
3.2. Promoting the standby cluster
Before promoting the standby cluster, we need to ensure that we don’t accidentally create a "split-brain" scenario where two primary instances try to write to the same storage. If the primary cluster is still active, make sure you have shut it down before attempting to promote the standby cluster.
When the primary cluster becomes inactive, we can promote the standby cluster by removing or disabling its spec.standby
section:
spec:
standby:
enabled: false
This change triggers the promotion of the standby cluster to the primary status, and the cluster starts accepting writes.