Fast Eventual Consistency: Inside Corrosion, the Distributed System Powering Fly.io

Fly.io transitioned from Consul to a custom gossip-based system called Corrosion to replicate SQLite data globally with low latency. Built on CR-SQLite and the SWIM protocol, Corrosion prioritizes speed and availability over strong consistency, enabling the Fly Proxy to make optimal routing decisions based on near real-time machine state across 800+ physical servers.

Fly.io operates a global application platform that routes users to the nearest instance of their application. This requires the networking layer to maintain a highly available, low-latency view of machine state—where apps are running, their health, and which ports are open. Initially, Fly.io relied on Consul for this state management, but the architecture introduced significant latency and scalability challenges. This led to the development of Corrosion, a custom distributed system designed for fast eventual consistency using SQLite, CRDTs, and gossip protocols.

The Problem with Consul

Fly.io's architecture involves two server types: edges that announce application IP addresses via Anycast BGP, and workers that host Firecracker VMs. The Fly Proxy software on edges routes requests to the nearest healthy worker. For this routing to be effective, the proxy needs immediate access to global state changes.

Consul, a distributed key-value store using Raft consensus, was the initial solution. However, it operated a single cluster in the US-East (IAD) region. Every other global region—Tokyo, London, etc.—had to pull data from this central cluster, introducing latency. The team used consul-templaterb to update a JSON file that the proxy read, but as traffic grew, deserializing and loading this file became unwieldy.

The next iteration, attache, ran on every node. It pulled from the central Consul API and wrote to a local SQLite database. While this eliminated JSON parsing, it created a bandwidth storm: 800+ nodes simultaneously pulling the same data from a single source. The problem wasn't solved; it was just moved.

Corrosion: A Gossip-Based Architecture

Corrosion was built to decouple state replication from any central authority. It is a distributed system that replicates SQLite data across nodes using a gossip protocol for dissemination and CRDTs for conflict resolution.

Core Components:

CR-SQLite: Corrosion is built on CR-SQLite, an extension that brings conflict-free replicated data types to SQLite. It tracks changes at the column level using version numbers rather than timestamps.
Foca (SWIM Protocol): Cluster membership is managed via Foca, which implements the SWIM protocol. This handles node discovery, failure detection, and membership updates.
QUIC Transport: Replication happens over QUIC, not TCP or raw UDP, providing reliable, low-latency transport.

How CR-SQLite Resolves Conflicts

CR-SQLite avoids the complexity of distributed timestamps by tracking state changes per column. Every table in a Corrosion-managed database has an internal crsql_changes table. When a row is modified, the column version increments.

When nodes sync, they compare these versions. The algorithm resolves conflicts deterministically:

Causal Length: First, it checks if a row has been deleted and resurrected (even causal length) versus a live row (odd causal length). Resurrected rows win.
Column Version: Higher versions win.
Value Comparison: If versions are identical, the actual value is compared (e.g., higher integers win).
Site ID: Finally, the node's unique Site ID acts as a tiebreaker.

This ensures that regardless of the order in which changes arrive, all replicas eventually converge to the same state.

Propagation: Broadcast and Sync

Corrosion uses a dual-strategy for data propagation to ensure speed and reliability:

Gossip Broadcast: When a local change occurs, it is broadcast to a random subset of nodes (e.g., 3 neighbors). Those nodes forward it to their neighbors. This "infection-style" spread ensures rapid dissemination. The broadcast includes a counter to prevent infinite loops.
Periodic Sync: To handle missed messages or nodes that were offline, Corrosion nodes periodically initiate syncs. They exchange version vectors ("I have versions 1-50") and request only the missing data. This acts as a repair mechanism.

Subscriptions: The Fly Proxy Integration

To prevent the Fly Proxy from constantly polling the database, Corrosion offers an HTTP streaming subscription API. A client can subscribe to a specific SQL query. When the underlying data changes, Corrosion pushes updates.

The API is efficient because Corrosion knows exactly which primary keys changed. It sends lightweight notifications ("PK X changed") or full row updates. This allows the proxy to maintain a hot in-memory cache that updates reactively rather than polling.

Production Reality at Scale

Corrosion currently runs on approximately 800 physical servers at Fly.io. It is not the source of truth but a replication layer. The flyd daemon, which manages local machines, writes to its own database and then to Corrosion.

Performance:

P99 replication latency is under one second globally.
The system handles high throughput with constant chatter between nodes.

Constraints and Caveats:

Schema Requirements: Tables must have primary keys. Unique constraints on non-primary keys are not supported.
Destructive Schema Changes: Renaming columns or dropping tables is difficult because schema changes must propagate before data changes, which is hard to guarantee in an async system.
No Built-in Auth: The HTTP API currently lacks authentication.

Lessons Learned

Start with the Problem, Not the Tool: The team initially tried to adapt Consul (via attache). Stepping back to define the core requirement—fast, global state replication—led to building Corrosion.
The Broadcast Storm: Early versions faced a "broadcast storm" where failed transactions caused nodes to endlessly re-broadcast the same change. The fix was to track applied changes persistently and stop broadcasting once a change is applied or fails permanently.
Reasoning About Eventual Consistency: Developers must build systems resilient to stale data. If a proxy routes a request to a node that just deleted a machine, the node rejects it, and the proxy must handle the failure gracefully.

Conclusion

Corrosion represents a shift from consensus-based coordination (Raft) to a high-speed, availability-focused replication model. By leveraging CRDTs via CR-SQLite and gossip protocols, Fly.io achieved a system that replicates state globally in under a second, enabling the low-latency routing required for their Anycast architecture.

The project is open source and written in Rust. You can explore the code and documentation on GitHub.

Fast Eventual Consistency: Inside Corrosion, the Distributed System Powering Fly.io - InfoQ

Q&A Highlights

Q: How does Corrosion handle nodes that go offline for extended periods? A: The cluster marks the node as down. When it returns, it automatically detects peers and initiates a sync to catch up on missed versions. However, until it syncs, it serves stale data, which the team monitors via alerts.

Q: How are multi-statement transactions handled? A: Corrosion does not have a concept of distributed transactions. However, all changes within a local SQLite transaction are tagged with the same internal version. When syncing, nodes batch changes by version, ensuring that partial transaction states are never visible.

Q: Can Corrosion replace Consul entirely? A: Yes. Fly.io has retired Consul for health checks. They now run a local "Corrosion Consul" component that pulls data from the local Consul agent (not the central cluster) and injects it into Corrosion for global replication.

Author photo