advancedHigh AvailabilityFree Guide

Database Replication

Definition

Database replication is the process of continuously copying data from one database server (primary/master) to one or more replica servers, enabling read scaling, high availability, and geographic distribution.

Introduction to Database Replication

Replication is how production databases achieve high availability and scale beyond a single server's capacity. By maintaining synchronized copies of the data on multiple servers, replication enables read load distribution, automatic failover when a server fails, and disaster recovery with geographically distributed replicas.

Key Takeaways

Primary-replica (master-slave): all writes go to primary, reads can go to replicas
Synchronous replication: primary waits for replica acknowledgment before committing — zero data loss, higher latency
Asynchronous replication: primary commits immediately, replica catches up — lower latency, possible data loss on failover
Replication lag: the time delay between a write on primary and it appearing on replicas
Read replicas scale read throughput horizontally without scaling write capacity
Multi-primary (multi-master) replication allows writes to multiple nodes — complex conflict resolution

Real-World Examples & SQL Schema

PostgreSQL Streaming Replication Configuration

-- On PRIMARY: postgresql.conf
-- wal_level = replica
-- max_wal_senders = 10
-- hot_standby = on

-- Create replication user:
CREATE USER replicator REPLICATION LOGIN
ENCRYPTED PASSWORD 'secure_password';

-- On REPLICA: recovery.conf / postgresql.conf
-- primary_conninfo = 'host=primary_host port=5432 user=replicator'
-- hot_standby = on

-- Check replication status:
SELECT client_addr, state, sent_lsn, write_lsn,
       flush_lsn, replay_lsn,
       (sent_lsn - replay_lsn) AS replication_lag_bytes
FROM pg_stat_replication;

PostgreSQL streaming replication sends WAL (Write-Ahead Log) records to replicas in near real-time.

Run code in Playground

Check Replication Lag (MySQL)

-- On replica:
SHOW REPLICA STATUS\G
-- Look at: Seconds_Behind_Source (replication lag in seconds)
-- Replica_IO_Running: Yes
-- Replica_SQL_Running: Yes

-- On primary — see connected replicas:
SHOW PROCESSLIST;

-- Monitor replication lag with a query:
SELECT TIMESTAMPDIFF(SECOND,
  MIN(create_time), NOW()) AS lag_seconds
FROM performance_schema.replication_applier_status_by_worker;

Monitoring replication lag is critical — stale reads from lagging replicas can cause data consistency issues.

Run code in Playground

Primary Use Cases

Read scaling: route SELECT queries to read replicas to offload primary

High availability: automatic failover to replica if primary fails

Disaster recovery: geographically distributed replicas for regional resilience

Reporting: run expensive analytical queries on replicas without impacting production

Zero-downtime backups: take backups from replica without blocking primary writes

Frequently Asked Questions

What is database replication and why is it important?

Database replication maintains synchronized copies of data across multiple servers. It is important for: high availability (automatic failover when primary server fails), read scaling (distribute read queries across multiple replica servers), disaster recovery (geographically distributed data copies), and zero-downtime maintenance (promote a replica while maintaining the primary).

What is replication lag and how do you minimize it?

Replication lag is the time delay between a write completing on the primary and that write appearing on the replica. It occurs because replication is typically asynchronous. To minimize lag: ensure replicas have adequate hardware (disk I/O is the common bottleneck), use parallel replication (multiple threads applying changes), avoid write storms on the primary, and consider synchronous replication for zero-lag requirements (at the cost of write latency).

What is the difference between synchronous and asynchronous replication?

Synchronous replication: the primary waits for at least one replica to confirm it has written the data before acknowledging the commit to the application. Guarantees zero data loss on failover. Increases write latency by at least one network round-trip. Asynchronous replication: the primary commits immediately without waiting for replicas. Lower latency, but there is a window of data loss if the primary fails before the replica catches up.

Related Concepts

Sharding Transactions Indexing Partitioning