The Case for a New Revision Control System in the LLM Era

A deep dive into why traditional git workflows are breaking down as AI coding assistants become ubiquitous, and what a modern CRDT-based system might look like.

The software development landscape is undergoing a fundamental shift that's exposing the limitations of our decades-old version control systems. As AI coding assistants like GitHub Copilot, Claude, and ChatGPT become integral to daily development workflows, the friction points in git are becoming increasingly apparent—and increasingly costly.

The LLM Revolution Has Changed Everything

The way we write code is transforming. Developers are spending less time typing and more time reviewing, prompting, and iterating with AI assistants. This shift has several critical implications:

High-volume, low-value changes: LLMs can generate dozens of code variations in minutes, but sorting through these changes in git becomes a bottleneck. The diff review process that worked for human-authored commits now creates disproportionate overhead when dealing with AI-generated variations.

Collaborative development with AI: Even solo developers now work with AI as a constant collaborator. This creates merge-like scenarios constantly—every AI suggestion, every prompt refinement, every code variation needs to be tracked and potentially reconciled.

The browsing imperative: As JetBrains founder Sergey Dmitriev noted years ago, "Code is hypertext, IDE is a browser." This insight is even more relevant now. Understanding code—not just writing it—has become the primary bottleneck. Without effective browsing and history navigation, developers are "like a rider who fell off a horse with his foot caught in the stirrup."

Git's Fundamental Architectural Problems

Git was revolutionary when it emerged, but its core design decisions are now showing their age. The issues aren't just surface-level inconveniences—they're baked into git's fundamental architecture:

The Monorepo Problem

Git struggles with modular codebases. Submodules were a clumsy workaround that never fully solved the problem of splitting and joining code. The conceptual framework for managing code modules is fundamentally lacking.

This becomes critical when you want to maintain separate repositories for different concerns (like prompts, plans, and implementation) but need to work with them as a unified whole. Git has no concept of "overlay branches"—a way to logically combine separate codebases without physical merging.

The Merge/Rebase Dilemma

Git's merge commits create friction while rebases discard valuable context. The fundamental issue is that git merges are acts of will, not deterministic operations. This makes collaboration unpredictable and history reconstruction difficult.

The Data Accretion Problem

Once committed, data in git is permanently tied into the Merkle graph. While you can fetch only the latest version, there's no general "pay-as-you-go" mode for accessing repository data. This becomes problematic as repositories grow and AI-generated content multiplies.

The Data Model Limitation

Git works with blobs—raw content addressed by hash. It's essentially a content-addressable filesystem, not a content-addressable database. For structured data like code, this is like using a hammer for every problem.

The CRDT Solution: Beyond Line-Based Text

The solution lies in CRDTs (Conflict-free Replicated Data Types) and a shift from treating code as text to treating it as structured data. The CRDT community has been discussing "overlay branches" and "CRDT revision control" for 15 years, but the technology is finally mature enough for practical implementation.

The Vector Approach

One approach represents text as a CRDT vector of letters. This aligns with systems like Zed's DeltaDB and has been implemented in various experimental systems. It's a safe default that works well for collaborative editing.

The AST Approach

However, code has inherent structure that line-based text ignores. Modern IDEs and compilers work with ASTs (Abstract Syntax Trees) because they capture the semantic structure of code. A revision control system that works with AST-like trees could provide:

Formal deterministic merge algorithms: Associative, commutative, idempotent operations that eliminate merge conflicts
Reversible operations: Split, join, fork, and merge operations that preserve context
Structural awareness: Understanding code semantics rather than just text differences

RDX: A JSON-Based Foundation

The proposed system uses Replicated Data eXchange format (RDX), a JSON superset with CRDT merge semantics. This provides:

Structured data handling: Native support for the hierarchical nature of code
Conflict resolution: Built-in mechanisms for handling concurrent edits
Extensibility: Easy to add new data types and operations

What This Means for Developers

The implications are profound. A CRDT-based revision control system could:

Eliminate merge conflicts: Deterministic operations mean no more "merge hell"
Enable true modular development: Overlay branches that let you work with separate concerns while maintaining a unified view
Scale with AI collaboration: Handle the high throughput of AI-generated changes without friction
Preserve context: Keep the semantic meaning of changes, not just the text differences
Provide better queries: A proper query language for code history and structure

The Path Forward

This isn't just theoretical. The technology exists, and the need is urgent. As AI becomes more integrated into development workflows, the limitations of git will only become more painful.

The vision is clear: a revision control system that treats code as structured data, provides deterministic operations, and scales with the new reality of AI-augmented development. It's time to move beyond git's filesystem model to a true database for code.

The original discussion on GitHub provides more technical details on the implementation approach and experiments underway.

#Version Control #CRDT #LLM #AI Coding #Software Development