Inside Consensus: How Distributed Systems Discover and Propagate Leadership Timelines
Share this article
Distributed systems face a fundamental challenge: How do you safely transfer leadership while preserving every durable decision and maintaining sequential consistency? This deep dive explores the critical mechanics of timeline discovery and propagation during leadership transitions—where one misstep could mean data loss or inconsistency.
The Foundation: Durability and Consistency Rules
At the core lie two non-negotiable rules:
- Durability: Every distributed decision must be made durable before application. If a decision met durability criteria before a failure, it must survive.
- Consistency: Decisions must be applied sequentially. Agents must discover and honor previously durable-but-unapplied decisions, treating them as new decisions under their own term.
Figure: Generalized durability policy requiring replication across nodes.
Timeline Discovery: The Serendipity of Revocation
Leadership revocation has a crucial side effect: It forces coordinators to recruit nodes that likely hold completed requests. Consider a scenario where Node 1 (leader) replicates requests A-D across nodes:
- A is durable (reached N2 and N3)
- B is durable (reached N2 and N3)
- C and D are incomplete
Figure 1: A coordinator recruiting nodes discovers durable requests (A, B) and incomplete ones (C, D).
If the coordinator recruits all nodes, it discards C/D. But during a partition, if only N3 is reachable, it must honor A/B as potentially applied (Rule 2b(i)). If N2 is reachable, it honors A/B/C. The outcome is non-deterministic—yet B always survives because it met durability.
Propagation: Making Leadership Changes Durable
Once a timeline is chosen, the coordinator must propagate it as a new decision. This demands atomic overwrites and versioning. Three approaches emerge:
1. The Paxos Way
- Treats the entire timeline as a single composite value.
- Tracks two variables:
node_term(current term) andvalue_term(term when value was accepted). - Propagation atomically overwrites previous timelines under the new term.
2. The Raft Way
- Appends log entries non-atomically but marks them with the original term.
- Critical nuance: A new request under the current term must be appended to “re-validate” the entire timeline for durability.
- Ensures prior-term entries only become durable when ratified by the new leader.
Figure 2: Raft propagates logs incrementally but requires a new term entry (Step 3) for durability.
3. The Timestamp Way
- For systems like PostgreSQL WAL replication where adding metadata is impractical.
- Uses timestamps instead of term numbers to order decisions.
- Requires bounded clock skew but eliminates explicit term tracking.
Navigating Failure Scenarios
When coordinators fail mid-propagation, successors inherit partial states. The rules for timeline selection are strict:
- Recruit enough nodes to revoke all prior leaderships.
- Choose the timeline with the latest term.
- For equal terms, pick the most progressed timeline.
Consider five failure scenarios:
- Coordinator can’t recruit enough nodes: Stalemate. No progress.
- Propagation fails before durability: Partial state discarded.
- Conflicting timeline attempt fails: New coordinator honors latest durable state.
- Successor discovers multiple timelines: Chooses highest term (e.g., term 8 > term 7).
Figure 6: Timeline priority—newer terms always supersede older ones, regardless of progress.
Why This Matters
These mechanics ensure that even with overlapping coordinator failures and network partitions, no durable decision is lost, and consistency is preserved. Systems like etcd, Consul, and distributed databases rely on these principles. The “timestamp” approach also shows how consensus can adapt to constrained environments.
In the next installment, we’ll explore how these rules enable dynamic changes to a system’s durability policies—critical for scaling and fault tolerance.
Source: Generalized Consensus: Part 7 from Multigres