Start with RTO and RPO

Before choosing any technology, define your targets. Every DR decision flows from these two numbers:

StrategyTypical RPOTypical RTOCost
Cold backup (restore from file)HoursHoursLow
Log shipping / WAL archivingMinutesMinutes–HoursLow–Medium
Streaming replication<1 minMinutesMedium
Synchronous HA (AG / Patroni)ZeroSecondsHigh

A backup you've never tested is not a backup — it's a hope. Schedule quarterly restore drills to a test environment and measure actual RTO. You'll almost always discover something unexpected.

SQL Server

Backup, Log Shipping, Always On AG

Backup Strategy

A solid SQL Server backup strategy follows the 3-2-1 rule: 3 copies of data, on 2 different media types, with 1 copy offsite. For most production databases, the schedule looks like this:

T-SQL — Automated backup with compression and checksum
-- Full backup
BACKUP DATABASE [YourDB]
TO DISK = 'D:\Backups\YourDB_Full_' +
          FORMAT(GETDATE(), 'yyyyMMdd_HHmm') + '.bak'
WITH
    COMPRESSION,
    CHECKSUM,
    STATS = 10,
    DESCRIPTION = 'Weekly full backup';

-- Verify backup integrity
RESTORE VERIFYONLY
FROM DISK = 'D:\Backups\YourDB_Full_20260328_0200.bak'
WITH CHECKSUM;

-- Transaction log backup
BACKUP LOG [YourDB]
TO DISK = 'D:\Backups\YourDB_Log_' +
          FORMAT(GETDATE(), 'yyyyMMdd_HHmm') + '.trn'
WITH COMPRESSION, CHECKSUM;

Log Shipping for DR

Log shipping ships transaction log backups to a secondary server and restores them continuously, keeping the secondary 15–30 minutes behind the primary. It's simple, reliable, and works across SQL Server versions — ideal for DR sites where some data loss is tolerable.

T-SQL — Monitor log shipping lag
-- Run on the secondary
SELECT
    primary_server,
    primary_database,
    last_copied_file,
    last_copied_date,
    last_restored_file,
    last_restored_date,
    DATEDIFF(MINUTE, last_restored_date, GETDATE()) AS lag_minutes
FROM msdb.dbo.log_shipping_monitor_secondary
ORDER BY lag_minutes DESC;

Always On AG for Zero-RPO DR

For RPO = 0, Always On Availability Groups with synchronous commit is the answer. See our dedicated article on Always On AG architecture for full setup details.

Point-in-Time Restore

T-SQL — Restore to a specific point in time
-- Step 1: Restore full backup (NORECOVERY keeps it in restoring state)
RESTORE DATABASE [YourDB]
FROM DISK = 'D:\Backups\YourDB_Full_20260323_0200.bak'
WITH NORECOVERY, REPLACE;

-- Step 2: Restore differential (if available)
RESTORE DATABASE [YourDB]
FROM DISK = 'D:\Backups\YourDB_Diff_20260327_0200.bak'
WITH NORECOVERY;

-- Step 3: Restore log backups up to the target time
RESTORE LOG [YourDB]
FROM DISK = 'D:\Backups\YourDB_Log_20260328_0945.trn'
WITH
    NORECOVERY,
    STOPAT = '2026-03-28 09:43:00';  -- restore to just before the incident

-- Step 4: Bring database online
RESTORE DATABASE [YourDB] WITH RECOVERY;

PostgreSQL

pg_dump, WAL Archiving, pgBackRest, Streaming Replication

Logical Backups with pg_dump

For smaller databases or selective restores, pg_dump creates portable logical backups. It's not suitable as your sole DR strategy for large databases — use it alongside physical backups.

Bash — pg_dump with compression
# Full database dump in custom format (recommended)
pg_dump \
    --host=localhost \
    --port=5432 \
    --username=postgres \
    --format=custom \
    --compress=9 \
    --file=/backups/yourdb_$(date +%Y%m%d_%H%M).dump \
    yourdb

# Restore from custom format dump
pg_restore \
    --host=localhost \
    --port=5432 \
    --username=postgres \
    --dbname=yourdb_restored \
    --jobs=4 \
    /backups/yourdb_20260328_0200.dump

Physical Backups with pgBackRest

pgBackRest is the industry standard for production PostgreSQL backups. It supports full, differential, and incremental backups, parallel processing, compression, encryption, and S3/Azure Blob storage.

Bash — pgBackRest configuration and backup
# /etc/pgbackrest/pgbackrest.conf
[global]
repo1-path=/var/lib/pgbackrest
repo1-retention-full=2
repo1-retention-diff=14
compress-type=lz4
log-level-console=info

[yourdb]
pg1-path=/var/lib/postgresql/15/main

# Take a full backup
pgbackrest --stanza=yourdb backup --type=full

# Take a differential backup
pgbackrest --stanza=yourdb backup --type=diff

# Take an incremental backup
pgbackrest --stanza=yourdb backup --type=incr

# Point-in-time restore
pgbackrest --stanza=yourdb restore \
    --target="2026-03-28 09:43:00" \
    --target-action=promote

WAL Archiving for PITR

Enable WAL archiving in postgresql.conf to continuously ship WAL segments to a backup location. Combined with a base backup, this enables point-in-time recovery to any moment since the last base backup.

postgresql.conf — WAL archiving settings
wal_level = replica
archive_mode = on
archive_command = 'pgbackrest --stanza=yourdb archive-push %p'
archive_timeout = 300      # force WAL switch every 5 minutes (max RPO)

Streaming Replication for HA/DR

PostgreSQL streaming replication sends WAL records from primary to standby in real time. Combined with Patroni for automated failover, this is the standard production HA setup.

Bash — Create a streaming replica with pg_basebackup
# On the standby server
pg_basebackup \
    --host=primary-server \
    --port=5432 \
    --username=replicator \
    --pgdata=/var/lib/postgresql/15/main \
    --wal-method=stream \
    --checkpoint=fast \
    --progress \
    --verbose

# postgresql.conf on standby
primary_conninfo = 'host=primary-server port=5432 user=replicator'
recovery_target_timeline = 'latest'

# Create standby.signal to activate standby mode
touch /var/lib/postgresql/15/main/standby.signal

MongoDB

mongodump, Replica Sets, Ops Manager, Atlas

Logical Backups with mongodump

mongodump creates BSON exports of your data. It's suitable for smaller datasets and selective restores but is not recommended as the sole DR strategy for large or high-write databases — it creates a point-in-time snapshot but doesn't capture oplog during the dump.

Bash — mongodump with oplog for consistent backup # Dump with oplog (ensures consistency across collections) mongodump \ --host="replicaset/mongo1:27017,mongo2:27017,mongo3:27017" \ --oplog \ --gzip \ --out=/backups/mongo_$(date +%Y%m%d_%H%M) # Restore with oplog replay mongorestore \ --host="localhost:27017" \ --oplogReplay \ --gzip \ /backups/mongo_20260328_0200

Replica Sets — The Foundation of MongoDB DR

Every production MongoDB deployment should use a replica set. A replica set is a group of mongod instances that maintain the same dataset — one primary accepts writes, secondaries replicate asynchronously and can become primary on failover.

MongoDB Shell — Initiate and configure a replica set
// Initiate the replica set (run on primary)
rs.initiate({
  _id: "rs0",
  members: [
    { _id: 0, host: "mongo1:27017", priority: 2 },
    { _id: 1, host: "mongo2:27017", priority: 1 },
    { _id: 2, host: "mongo3:27017", priority: 0, hidden: true, votes: 1 }
    // hidden member: good for backups, doesn't serve reads
  ]
});

// Check replica set status
rs.status();

// Check replication lag
rs.printSecondaryReplicationInfo();

Point-in-Time Restore with Oplog

MongoDB's oplog (operations log) is a capped collection that records all write operations. By combining a base backup with oplog replay, you can restore to any point in time within the oplog window.

Bash — Restore to a specific timestamp using oplog # Restore base backup first mongorestore \ --host="localhost:27017" \ --gzip \ /backups/mongo_20260327_0200 # Replay oplog up to a specific timestamp # Timestamp format: Timestamp(epochSeconds, ordinal) mongorestore \ --host="localhost:27017" \ --oplogReplay \ --oplogLimit="1743155580:1" \ /backups/mongo_oplog

MongoDB Atlas Built-in DR

If you use MongoDB Atlas, cloud backup and point-in-time restore are built in. Atlas provides continuous cloud backups with configurable retention (up to 35 days) and restores to any second within the retention window — no infrastructure to manage.

For on-premises MongoDB, Ops Manager (included with MongoDB Enterprise) provides the same automated backup, PITR, and monitoring capabilities as Atlas, with full control over your infrastructure.


The DR Runbook Checklist

Whatever platform you use, every DR plan needs a documented runbook. Here's the minimum checklist:

The most common DR failure is not a technical one — it's an organisational one. Backups exist, but nobody knows how to restore them. Document and drill the procedure, not just the infrastructure.

JR

João Rocha

Database Administrator and Software Consultant at Bytedivision with 20+ years of experience in SQL Server, PostgreSQL, MongoDB and Azure cloud solutions.