CHAPTER 03
3 DIAGRAMS · ~9 MIN
Distributed Systems
A distributed system is one in which the failure of a machine you didn't know existed renders your own machine unusable. — Leslie Lamport
03.1 · CONCEPT
Message Queues
Synchronous calls couple your uptime to every downstream service. A queue lets producers fire-and-forget and consumers process at their own pace.
BACK-PRESSURE
A queue absorbs traffic spikes the consumer couldn't handle live. The queue depth is your real-time signal that consumers are falling behind.
DELIVERY GUARANTEES
Exactly-once is mostly marketing. Real systems give you at-least-once + idempotent consumers, or at-most-once for things you can afford to lose.
ORDERING
Global order is expensive. Most brokers give per-partition order — partition by entity (user_id) so each entity's events stay sequential.
FIG · 03.1
03.2 · CONCEPT
CAP Theorem
When the network partitions — and it will — every system must choose: keep accepting writes and serve potentially stale data, or refuse writes to preserve consistency.
PARTITIONS ARE GIVEN
P is not optional in a real network. The real choice is CP (refuse) vs AP (accept and reconcile). Pick per-feature, not per-system.
PACELC
CAP only describes partitions. PACELC adds: when there's no partition, choose between Latency and Consistency. Most systems live here 99% of the time.
PER OPERATION
Same database can serve different consistency levels per query. Strong for billing, eventual for the feed. Push the choice to the caller.
FIG · 03.2
03.3 · CONCEPT
Consensus (Raft)
Multiple nodes have to agree on a single sequence of events despite crashes and network drops. Raft is the modern, understandable answer to this question.
QUORUM
A write is committed when a majority (⌈N/2⌉+1) of nodes have it in their log. 5-node clusters survive 2 failures. Even numbers buy you nothing.
LEADER ELECTION
Followers start an election if they don't hear from the leader. The first to get majority votes for a new term becomes leader. Randomised timeouts prevent split votes.
WHEN TO REACH FOR IT
Use a Raft library (etcd, Consul) for config, locks, leader election. Don't roll your own. Don't put high-throughput data through it.
FIG · 03.3
← PREVIOUS · 02
Databases at Scale
NEXT · 04 →
APIs & Security