Martin Kleppmann connects AT Protocol and local-first software to user data control
#Dev

Martin Kleppmann connects AT Protocol and local-first software to user data control

Infrastructure Reporter
6 min read

Kleppmann said open sync protocols, client-held data, and portable service providers give users more control than cloud apps that keep the only copy on a vendor’s servers.

Featured image

Martin Kleppmann, a University of Cambridge associate professor and author of Designing Data-Intensive Applications, said developers can give users more control over their data by combining portable protocols, client-side storage, and open sync systems.

In an InfoQ podcast, Kleppmann connected three threads: cloud-native data systems, Bluesky’s AT Protocol, and the local-first software movement. His argument starts with infrastructure and ends with product design. Teams have spent the past decade moving data systems away from single database processes and toward modular stacks built from object stores, columnar formats, query engines, and sync protocols. Product teams can apply the same idea to user data.

Kleppmann said cloud-native databases changed a core assumption in distributed systems. Older databases stored data on local disks and handled replication inside the database. Newer systems often place data in object stores such as S3-compatible storage, where the storage layer already handles replication. That change lets database builders split storage, compute, file format, and query execution into separate components.

Teams now compose systems from pieces such as Apache Arrow, Apache Parquet, Apache DataFusion, and Apache Flight. A team can pick an object store, a columnar file format, a table format, and a query engine without buying one monolithic database for the whole workload.

That shift matters for application architects because the same design pressure shows up in user-facing software. If one vendor controls identity, storage, sync, search, and collaboration, users depend on that vendor for access to their work. If engineers split those concerns behind open protocols, users can keep their data and change providers.

Kleppmann used Bluesky as one example. The Bluesky team built AT Protocol for social networking with portable accounts, personal data servers, relays, and indexers. Users store posts and social graph data in Personal Data Servers, or PDSs. Relays collect repository updates from PDS hosts into a firehose. Indexing services consume that stream and build views such as timelines, reply threads, likes, and search.

That design chooses consistency for common social interactions. In a pure federation model, two servers may show different reply threads because each server may have seen a different subset of replies. AT Protocol adds a relay and indexing layer so client apps can show the same thread and count the same likes across providers.

The trade-off sits in the relay and indexer tier. Bluesky the company still runs services many users depend on, so the deployment has more central control than a federation model such as ActivityPub. Kleppmann said the protocol reduces lock-in because another organization can run compatible services and give users access to the same public data stream.

That portability goal shapes the AT Protocol design. A user should keep a handle, posts, followers, and replies after changing service providers. The protocol cannot force other companies to run competing infrastructure, but it gives them the data model and software path to do it. Kleppmann mentioned Blacksky as one alternative provider in the AT Protocol ecosystem.

Local-first software applies the same principle to collaboration tools. Instead of treating the cloud as the primary holder of a document, local-first apps place the main working copy on the user’s device. The cloud can still sync data between devices and collaborators, but the app can keep working without a network connection.

That model fits editors, spreadsheets, design tools, bug trackers, and other apps where users create and edit their own data. It fits less well for bank balances, warehouse stock, and online shop catalogs because those systems track shared physical or financial state that a server must validate.

Kleppmann compared the local-first model with Git. Developers can inspect history, branch, merge, and push the same repository to more than one remote. GitHub and GitLab add services around Git, but the repository format and sync protocol give users an exit path. Google Docs does not offer the same provider portability for documents, comments, history, and sync.

The hard part comes from data types beyond text. Git can store a spreadsheet or CAD file, but it treats many such files as opaque binary data. Engineers who build local-first apps need data structures that capture edits as structured operations, merge concurrent changes, and preserve enough history for review.

Kleppmann’s group works on Automerge, an open-source library for local-first collaboration. Automerge uses conflict-free replicated data types, or CRDTs, to let devices accept edits without a central coordinator and merge those edits after devices reconnect. The project supports real-time collaboration, offline work, branching, diffing, and merging.

The Automerge team writes the core library in Rust and ships bindings for JavaScript through WebAssembly. The project also offers or supports bindings across Swift, Java, Python, Go, and C. That portability lets teams use one sync engine across browser, desktop, iOS, Android, and server-side tooling.

For a first experiment, Kleppmann suggested a to-do app that syncs across devices. That example looks small, but it tests the core architecture: client-side state, durable local storage, structured changes, sync, conflict handling, and UI updates after merge. A team that cannot make those pieces clear in a small app will struggle with a spreadsheet or graphics editor.

Retrofitting an existing app depends on where the team placed its business logic. A single-user React app with a clear client-side data model may accept Automerge as a replacement state layer. A server-heavy app that computes core behavior in backend services needs a larger redesign because the team must move more logic into the client.

Kleppmann said Automerge contributors now work on performance, sync protocol design, end-to-end encryption, and decentralized access control. Those areas matter because local-first software pushes more responsibility onto client devices while still asking cloud services to coordinate collaboration. Teams need compact storage, fast merge operations, clear authorization, and usable recovery paths.

The broader pattern links modern data infrastructure with user agency. Object storage, table formats, and query engines give backend teams more freedom to compose systems. AT Protocol gives social app providers a path to build compatible services around portable user repositories. Local-first libraries give application teams a way to keep user-created data on user devices while retaining sync and collaboration.

Engineers who adopt these ideas still face deployment choices. They must decide which data belongs on the client, which data needs a server authority, how much history to retain, and how users recover access after device loss. They also need sync services that can survive flaky networks, mobile sleep cycles, and version skew between clients.

Kleppmann’s point lands in those details. User control does not come from an export button at the edge of a closed service. It comes from data models, protocols, and product flows that let users keep working, change providers, and inspect their own history without asking one vendor for permission.

Comments

Loading comments...