Databases at Scale

The database is where your bugs become permanent. Schema, replication, and index decisions outlast every other architectural choice.

02.1 · CONCEPT

Replication

Copy the data so reads scale and a single node failure isn't fatal. The cost is consistency — replicas always lag, and the lag is a feature, not a bug.

ASYNC vs SYNC

Async replication is the default: low write latency, replicas may be seconds behind. Sync waits for at least one replica — survives loss but doubles tail latency.

READ-YOUR-WRITES

Route a user's reads to the primary for N seconds after their write, or until the replicas catch up. Otherwise users see stale data and file bug reports.

FAILOVER

Promote a replica when the primary dies. The dangerous case is split-brain: old primary comes back, both accept writes. Use a coordinator (etcd, Sentinel) to fence.

FIG · 02.1

02.2 · CONCEPT

Sharding

When a single primary can't hold the dataset or absorb the writes, split it. The choice of shard key is the single most important — and irreversible — schema decision.

SHARD KEY

Pick a key with high cardinality and even access. user_id beats country (US is hot). Avoid timestamps (today's shard takes all writes).

CONSISTENT HASHING

Hash-mod-N rebalances 100% of keys when N changes. Consistent hashing moves only 1/N. Use it the moment you might add a shard.

CROSS-SHARD QUERIES

JOINs across shards are scatter-gather: query every shard, merge results. Slow and fragile. Denormalise or maintain a search index instead.

FIG · 02.2

02.3 · CONCEPT

Indexes & Trade-offs

An index is a cached copy of one query plan, kept in sync by the database on every write. Powerful and expensive — most production databases have too many.

READ vs WRITE

Every index turns one row write into N+1 writes. On a hot insert path, six secondary indexes can dominate IO.

COVERING INDEXES

If the index contains every column the query needs, the engine never touches the table. Huge win for hot read paths — at the cost of disk.

RIGHT TOOL

B-tree for range scans. Hash for equality. GIN for arrays/JSON. Bitmap for low-cardinality columns. The engine cannot fix a wrong choice.

FIG · 02.3

← PREVIOUS · 01

Scalability Fundamentals

NEXT · 03 →

Distributed Systems