CRDT vs Operational Transformation: How Google Docs and Notion Avoid Edit Chaos
#Infrastructure

CRDT vs Operational Transformation: How Google Docs and Notion Avoid Edit Chaos

Backend Reporter
7 min read

A deep dive into the two dominant concurrency models for collaborative editing—Operational Transformation and Conflict‑free Replicated Data Types—explaining their mechanics, trade‑offs, and why modern tools choose one over the other.

CRDT vs Operational Transformation: How Google Docs and Notion Avoid Edit Chaos

Featured image

When you watch several cursors dance across a shared document, the experience feels effortless. Behind that smoothness are two very different engineering philosophies: Operational Transformation (OT) and Conflict‑free Replicated Data Types (CRDTs). Both aim to keep a single logical view of a document while dozens, hundreds, or even thousands of users type, delete, and rearrange text at the same time. The difference lies in when and how the system resolves conflicts.


1. The problem both solve

Consider a tiny document "Hello World". Two users act simultaneously:

  • User A inserts the word "Beautiful" after "Hello".
  • User B deletes the word "World".

The server now sees two operations that target overlapping positions. Without a coordination mechanism the final state could be ambiguous, duplicated, or outright corrupted. OT and CRDT each provide a deterministic rule set that guarantees convergence – every replica ends up with the same text, regardless of operation order or network latency.


2. Two philosophies at a glance

Aspect Operational Transformation (OT) Conflict‑free Replicated Data Types (CRDT)
Core idea Transform incoming operations so they remain valid in the context of previously applied operations. Encode the data structure so that any merge of concurrent edits is mathematically safe.
Typical coordinator Central server (or a designated leader) that orders operations. No central coordinator; each replica can apply operations independently.
Offline support Hard – requires replaying a log against a globally ordered sequence. Natural – operations carry enough metadata to be merged later.
Scaling profile Server becomes a bottleneck as user count grows. Load spreads across all replicas; network traffic grows linearly with edits.
Implementation complexity Complex transformation functions; edge‑case heavy. Complex data structures (e.g., RGA, LSEQ, WOOT) but simpler runtime logic.

3. A concrete analogy

OT is like a traffic cop at a busy intersection. Cars (edits) arrive, the cop decides who goes first, and may tell a driver to wait a fraction of a second to avoid a crash. The cop must be present for every decision.

CRDT is like a fleet of self‑driving cars that all follow a shared rulebook. Each vehicle knows how to merge its path with any other vehicle’s path without ever stopping for a central authority.


4. How OT works (simplified flow)

  1. User generates an operation – e.g., Insert(“Beautiful ”, pos=6).
  2. Operation is sent to the server.
  3. Server orders operations (often by timestamp or sequence number).
  4. Before applying, the server transforms the incoming operation against any concurrent operations that have already been accepted. The transformation adjusts the position index so the intent remains correct.
  5. Transformed operation is broadcast back to all clients, which apply it in the same order.

If the server receives Insert from User A and Delete from User B at the same logical time, the transformation step will shift the delete’s index if the insert is applied first, ensuring both edits survive.


5. How CRDT works (simplified flow)

  1. Every character (or structural element) receives a globally unique identifier – often a tuple of (clientId, counter).
  2. Operations are expressed as “add this identifier with this payload” or “remove identifier X”. No need to refer to a mutable position.
  3. Each replica applies operations locally using a deterministic merge rule, such as “order identifiers lexicographically”.
  4. When replicas exchange logs, the merge algorithm simply takes the union of identifier sets and re‑orders them according to the rule. Conflicts disappear because identifiers are immutable.

Because identifiers never change, a user can edit offline, generate a batch of operations, and later sync them; the merge will still converge.


6. Trade‑offs in practice

Coordination vs. autonomy

  • OT requires a central sequencer. That makes reasoning about causality easier, but the sequencer can become a latency hotspot. Large‑scale Google Docs deployments mitigate this by sharding documents and using a fast, geographically distributed coordination layer.
  • CRDT removes the sequencer, which simplifies horizontal scaling and improves resilience to network partitions. However, the data structures often carry extra metadata (identifiers, tombstones) that increase memory footprint.

Conflict handling

  • OT fixes conflicts after they are detected. The transformation functions must handle every possible combination of inserts, deletes, and moves – a source of subtle bugs that have caused real‑world outages.
  • CRDT prevents conflicts by design. The cost is a more complex initial data model and, in some algorithms, a higher per‑character overhead.

Offline friendliness

  • OT needs a replay log that matches the global order; offline edits must be re‑ordered when connectivity returns, which can be error‑prone.
  • CRDT’s identifier‑based approach works naturally offline. When a device reconnects, its operations are simply merged.

7. Where you encounter them today

Product Concurrency model Notes
Google Docs (classic architecture) OT Uses a central server to serialize ops; heavy engineering around transformation functions.
Etherpad OT Open‑source clone of early Google Docs; demonstrates the limits of OT at high edit rates.
Notion CRDT Stores blocks as CRDTs, enabling offline editing and seamless sync across devices.
Figma (vector layer) CRDT Uses a custom CRDT for shape properties, allowing many designers to work on the same canvas.
CouchDB / PouchDB CRDT (via replicated‑log extensions) Shows how database replication can adopt the same principles.

8. Why OT feels harder to reason about

Transformation functions must be order‑aware. A single off‑by‑one error can cause divergent states that are hard to reproduce.

The central sequencer creates a single point of failure; any latency spike directly impacts user experience.

When you add new operation types (e.g., rich‑text formatting, embedded tables), the transformation matrix grows quadratically, increasing the maintenance burden.


9. Why CRDT feels magical

Because identifiers are immutable, the merge step is stateless – you can drop the entire history after compacting tombstones.

Offline scenarios become first‑class citizens; the same code path works whether the client is online or not.

Horizontal scaling is straightforward: add more replicas, each runs the same merge algorithm, and the system stays consistent.

The trade‑off is that designers must choose identifier schemes that balance entropy (to avoid collisions) with size (to keep storage reasonable). Recent research, such as the LSEQ algorithm, reduces identifier growth to logarithmic size.


10. The key insight for engineers

Both models are solving the convergence problem, but they sit on opposite sides of the timing axis:

  • OTconflict resolution before the edit becomes visible.
  • CRDTconflict avoidance built into the edit representation.

When you design a new collaborative feature, ask yourself:

  1. Do you need a strong central authority for business logic? If yes, OT may integrate more cleanly with existing request‑response pipelines.
  2. Will your users work offline or in high‑latency environments? CRDTs give you graceful degradation without extra replay logic.
  3. What are your scaling targets? Large‑scale, geo‑distributed products usually benefit from the decentralised nature of CRDTs.

11. Final thought

The next time you watch multiple cursors type, remember you are seeing one of two invisible engines at work. Either a referee constantly rewrites the shared story (OT), or a set of self‑organising rules that guarantee every edit fits together like puzzle pieces (CRDT). Understanding the trade‑offs helps you pick the right tool for the job, avoid the pitfalls that have taken years of production experience to surface, and build collaboration features that truly feel effortless.

Build seamlessly, securely, and flexibly with MongoDB Atlas. Try free.

Further reading

  • The original OT paper – Operational Transformation in Real‑time Group Editors (PDF).
  • A practical CRDT guide – A comprehensive study of CRDTs by Martin Kleppmann (GitHub).
  • Notion’s engineering blog post on CRDT adoption (blog).
  • Google Docs architecture overview (archived) (Google Docs engineering).

Comments

Loading comments...