← all chapters
CHAPTER 02
3 DIAGRAMS · ~9 MIN

Databases at Scale

The database is where your bugs become permanent. Schema, replication, and index decisions outlast every other architectural choice.

02.1 · CONCEPT

Replication

Copy the data so reads scale and a single node failure isn't fatal. The cost is consistency — replicas always lag, and the lag is a feature, not a bug.

ASYNC vs SYNC
Async replication is the default: low write latency, replicas may be seconds behind. Sync waits for at least one replica — survives loss but doubles tail latency.
READ-YOUR-WRITES
Route a user's reads to the primary for N seconds after their write, or until the replicas catch up. Otherwise users see stale data and file bug reports.
FAILOVER
Promote a replica when the primary dies. The dangerous case is split-brain: old primary comes back, both accept writes. Use a coordinator (etcd, Sentinel) to fence.
FIG · 02.1
PRIMARY / REPLICA · ASYNCPrimarywritesReplica 1read-onlyReplica 2read-onlyReplica 3read-onlyWAL streamreplication lag · failover · split-brain risk
02.2 · CONCEPT

Sharding

When a single primary can't hold the dataset or absorb the writes, split it. The choice of shard key is the single most important — and irreversible — schema decision.

SHARD KEY
Pick a key with high cardinality and even access. user_id beats country (US is hot). Avoid timestamps (today's shard takes all writes).
CONSISTENT HASHING
Hash-mod-N rebalances 100% of keys when N changes. Consistent hashing moves only 1/N. Use it the moment you might add a shard.
CROSS-SHARD QUERIES
JOINs across shards are scatter-gather: query every shard, merge results. Slow and fragile. Denormalise or maintain a search index instead.
FIG · 02.2
HASH RING · CONSISTENT HASHINGRouterhash(key) mod NShard 00000 – 3FFF~250k rowsShard 14000 – 7FFF~250k rowsShard 28000 – BFFF~250k rowsShard 3C000 – FFFF~250k rowshotspots · rebalance cost · cross-shard joinspick shard key from access pattern, not convenience
02.3 · CONCEPT

Indexes & Trade-offs

An index is a cached copy of one query plan, kept in sync by the database on every write. Powerful and expensive — most production databases have too many.

READ vs WRITE
Every index turns one row write into N+1 writes. On a hot insert path, six secondary indexes can dominate IO.
COVERING INDEXES
If the index contains every column the query needs, the engine never touches the table. Huge win for hot read paths — at the cost of disk.
RIGHT TOOL
B-tree for range scans. Hash for equality. GIN for arrays/JSON. Bitmap for low-cardinality columns. The engine cannot fix a wrong choice.
FIG · 02.3
B-TREE · READ ↑ WRITE ↓50 | 100< 5050-100> 100read O(log n)write amplifydisk overheadmemory cacheevery index is a write tax · don't index what you don't query
← PREVIOUS · 01
Scalability Fundamentals
NEXT · 03
Distributed Systems