Anthropic engineer Adam Wolff details three engineering war stories from building Claude Code, the terminal-based agentic coding tool, using Claude Code itself. The talk reveals how AI shifts software development bottlenecks from implementation to architectural decision-making, why rapid experimentation and unshipping failed changes are critical, and how the speed of learning becomes the primary competitive advantage when coding costs approach zero.

Engineering at AI Speed: Lessons from the First Agentically Accelerated Software Project
Anthropic engineer Adam Wolff shared hard-won lessons from building Claude Code, the company’s terminal-based agentic coding tool, using Claude Code itself. The talk, presented at QCon San Francisco and hosted by InfoQ, details three engineering war stories that illustrate how AI shifts software development lifecycle bottlenecks, why conventional wisdom often fails in AI-accelerated projects, and how the speed of learning becomes the only sustainable competitive advantage when coding costs approach zero.
Wolff brings extensive experience to the topic, having served as Head of Engineering at Robinhood and led Facebook’s product infrastructure group, which produced widely adopted open source tools including React and GraphQL. His talk draws on direct experience building one of the first production tools developed almost entirely by AI, with Anthropic calculating that 90% of Claude Code’s production code is written by or with Claude. The team ships daily to internal users and maintains a weekday external release cadence, supported by fast feedback channels that let them iterate quickly on user reports and feature requests.

What Changed: The AI-Driven Shift in Software Development
Traditional software development prioritizes upfront design, lengthy requirements gathering, and proof of concept phases. This approach exists because implementation is historically the most expensive, time-consuming part of the SDLC. Teams invest heavily in planning to avoid wasting developer hours on rework.
Wolff explains that AI eliminates this bottleneck. With tools like Claude Code, a developer can prototype most features within a single day. The new constraint is not writing code, but making architectural decisions, running experiments, and learning from real-world usage. This shift requires entirely new processes: instead of debating design for weeks, teams ship quickly, collect feedback, and adjust course. Failures become expected parts of experimentation rather than costly mistakes.
Provider Comparison: Traditional SDLC vs. AI-Accelerated Development
Anthropic’s approach to building Claude Code differs sharply from traditional enterprise software practices, including those Wolff oversaw at Robinhood. In financial services, databases serve as a critical safety net, providing concurrency and data integrity guarantees that protect against costly errors. Consistency is prioritized over availability, as data errors can have regulatory or financial consequences.
For a developer tool like Claude Code, these priorities reverse. The worst outcome is a tool that fails to start, even if that means accepting some data inconsistency in conversation persistence. Wolff notes that conventional wisdom, such as storing data in a database rather than flat files, does not apply when the deployment context changes. Claude Code is distributed as an npm package, a context where native dependencies and complex data layers create more risk than the problems they solve.
This case also highlights a divide between providers that dogfood their own AI tools and those that treat tool development as separate from tool usage. Anthropic’s team uses Claude Code to build Claude Code, creating a feedback loop where the tool’s developers are also its heaviest users. This approach accelerates learning but also exposes the team to the same pain points their users face, leading to faster iteration on features like input handling and shell integration.
Three War Stories: Experimentation, Failure, and Learning
Wolff structures his talk around three detailed engineering narratives, each illustrating different tradeoffs of AI-accelerated development.
Episode I: Rebuilding Input
The first story covers the development of Claude Code’s custom input handler, which supports features like slash commands, file @ mentions, and tab completion. Conventional wisdom warns against rebuilding terminal input, as existing libraries like Readline (a C library dating to early Linux and Emacs) handle decades of edge cases for keybindings, word wrapping, and special character behavior.
The Claude Code team chose to build a custom cursor class anyway, prioritizing control over input behavior. The initial implementation was 300 lines of immutable, testable code with a fluent interface, making it easy to extend. When the team needed to add Vim mode ahead of external launch, the testable design allowed the feature to ship in a single pull request.
The win came with hidden complexity. International users exposed Unicode edge cases: double-wide characters, grapheme clusters, and NFC normalization requirements added hundreds of lines of code and new test cases. Performance issues arose as the input handler ran JavaScript on every keystroke, leading to a rewrite that deferred layout work until render time (lazy cursor) to improve speed.
The lesson here is that even simple architectural choices can explode in complexity, but testable, self-contained designs make it feasible to iterate quickly with AI assistance. Claude Code excelled at fixing bugs and adding features once the core architecture was sound.
Episode II: Reimagining Shell
The second story covers Claude Code’s Bash tool, which allows the agent to run shell commands. The initial implementation, called persistent shell, mimicked human terminal usage: it ran one command at a time, waiting for output before processing the next, and persisted environment variables and working directories across commands.
This design became a bottleneck when the team added a batch tool to run commands in parallel. Persistent shell forced all commands to run sequentially, negating the performance benefits of parallel execution. The team deleted the original persistent shell implementation and pivoted to transient shells, which spawn a new shell process for each command.
The challenge was preserving user environment context. Transient shells did not inherit bashrc, zshrc, or custom aliases, breaking workflows for users with heavily customized shells. The team spent seven months building a snapshot system that captures a user’s shell environment once and replays it for each transient shell spawn. The result is a composable architecture: snapshots handle environment setup, a separate sandboxing layer handles security, and the shell spawner combines both without tight coupling.
Wolff notes that persistent shell had comprehensive tests, but the entire test suite was deleted when the abstraction proved wrong. Tests are valuable only when the underlying architecture aligns with actual requirements, which often only become clear after shipping.
Episode III: Reversing SQLite
The final story is a clear example of a failed experiment. The team wanted to add a conversation resume feature, allowing users to pick up past Claude Code sessions. The initial implementation used JSONL files, which worked but lacked schema migration support and data integrity guarantees. Conventional wisdom suggested moving to SQLite, a widely trusted, well-tested database, paired with Drizzle ORM for TypeScript-native schema management and migrations.
The implementation hit immediate problems. SQLite requires the Better-SQLite3 native module, which is difficult to distribute via npm. The module failed to install on many user operating systems, and attempts to rebuild it natively during installation caused further issues. Beyond distribution problems, SQLite’s locking model (which locks the entire database on writes) created concurrency issues for multiprocess usage, where users might have 15-20 Claude Code sessions open at once.
Migration also proved harder than expected. SQLite does not support adding constraints like ON DELETE CASCADE to existing tables, requiring table recreation and data copying that risks data integrity. Most critically, the team realized that for a developer tool, availability matters far more than consistency. A failed database startup prevented users from launching Claude Code at all, a worse outcome than occasional data inconsistency in conversation history.
The team unshipped the SQLite implementation two weeks after launching it, reverting to JSONL files. Wolff notes that the entire experiment took 15 days, a cost that would have been far higher without AI-accelerated development. The failure confirmed that context matters more than conventional best practices: what works for a financial system (Postgres, strong consistency) fails for a distributed npm-based CLI tool.
Business Impact: Adapting to AI-Speed Development
For organizations adopting AI coding tools, the lessons from Claude Code’s development point to several strategic shifts. First, the primary competitive advantage is no longer implementation capacity, but learning speed. Teams that can ship, collect feedback, and iterate faster will outpace those that rely on upfront design.
Second, processes must prioritize reversibility. The Claude Code team uses feature flags, modular architecture, and fast rollback mechanisms to unship failed experiments quickly. Investing in build, release, and distribution pipelines reduces the cost of both shipping and reverting changes.
Third, dogfooding is non-negotiable. Teams that use their own AI tools to build internal processes, write design docs, and analyze feedback will develop better intuition for how to apply the technology than those that treat AI as a separate development aid.
Finally, conventional wisdom must be tested, not assumed. All three stories defied common best practices: rebuilding input is bad, persistent shells are better, databases are safer than files. In AI-accelerated development, the cost of testing these assumptions is low enough that experimentation is always cheaper than prolonged debate.
Conclusion
The core takeaway from Wolff’s talk is that when coding costs drop to zero, the feedback loop becomes the only thing that matters. Optimizing for how quickly you can learn from what you ship, rather than how quickly you can ship, is the key to engineering at AI speed. Failures like the SQLite experiment are not mistakes, but necessary parts of a process that prioritizes learning over perfection.
As AI tools become more capable, the bottleneck will shift further up the SDLC, toward longer feedback loops like bug report analysis, eval systems for prompt changes, and organizational adoption of AI-augmented processes. Teams that master fast learning now will be best positioned to adapt as the technology continues to evolve.

Comments
Please log in or register to join the discussion