AI in Software Development: The Retreat That Questioned Everything

A deep dive into the Thoughtworks Future of Software Development Retreat, where industry leaders grappled with AI's transformative impact on coding practices, team structures, and the very nature of software engineering.

The software development world is in the midst of a transformation that's both exhilarating and unsettling. At the Thoughtworks Future of Software Development Retreat, industry leaders gathered to confront the uncomfortable reality that the practices, tools, and organizational structures built for human-only software development are breaking under the weight of AI-assisted work.

The retreat surfaced a consistent pattern: what worked before isn't working anymore. The replacements are forming, but they're not yet mature. This isn't just about new tools—it's about rethinking the fundamental nature of how we build software.

The Middle Loop: A New Category of Work

One of the most intriguing concepts to emerge was the "supervisory engineering middle loop." As AI takes over more of the mechanical aspects of coding, developers are shifting from writing code to supervising AI agents that write code. This creates a new category of work that sits between high-level architecture and low-level implementation.

The traditional engineering workflow—understand requirements, design solution, implement, test, deploy—is being disrupted. Now there's this middle layer where you're constantly reviewing, guiding, and correcting AI-generated code. It's less about typing and more about directing.

Risk Tiering as the New Core Discipline

With AI accelerating development velocity, the old model of treating all code changes equally is breaking down. The retreat proposed risk tiering as the new core engineering discipline. Not all changes carry the same risk, and AI makes it possible to move faster on low-risk changes while being more deliberate about high-risk ones.

This shift requires a fundamental change in how teams think about quality assurance. Instead of applying the same rigorous process to everything, teams need to develop sophisticated risk assessment capabilities to determine where to apply resources.

TDD as Prompt Engineering

Perhaps the most surprising insight was the renewed emphasis on Test-Driven Development. Far from being obsolete in an AI-driven world, TDD emerged as perhaps the strongest form of prompt engineering.

When you're directing AI agents to write code, having clear, executable specifications in the form of tests becomes even more critical. The AI can generate code that passes tests, but if the tests aren't clear about the desired behavior, you're just generating garbage faster.

One heavy user of LLM coding agents put it bluntly: "Thank you for all your advocacy of TDD. TDD has been essential for us to use LLMs effectively." This isn't confirmation bias—it's a pattern emerging from the leading edge of AI adoption.

The TDD cycle provides a structured way to interact with AI coding tools. You describe what you want in a test, let the AI generate code to pass it, review the result, and iterate. This cycle is proving more effective than trying to describe entire systems upfront.

The Human Side: Roles, Skills, and Experience

The retreat didn't just focus on technical practices—it grappled with the human implications of AI-driven development. Will the rise of specifications bring us back to waterfall-style development? The natural impulse of many business folks is "don't bother me until it's finished."

But here's the crucial insight: LLMs don't change the value of rapidly building and releasing small slices of capability. If anything, they promise to increase the frequency of that cycle. The ability to quickly validate assumptions and get feedback remains essential.

Security: The Forgotten Discipline

One of the more sobering observations was that the security session had a small turnout. In an enterprise context, one participant noted they were deliberately slow with AI tech, keeping about a quarter behind the leading edge. "We're not in the business of avoiding all risks, but we do need to manage them."

Security is tedious, and people naturally want to first make things work, then make them reliable, and only then make them secure. But AI accelerates everything, including the creation of security vulnerabilities. The retreat concluded that platform thinking is essential here. Platform teams need to create a fast but safe path—"bullet trains" for those using AI in applications building.

The Cost Question

Will LLMs be cheaper than humans once the subsidies for tokens go away? At this point, we have little visibility into what the true cost of tokens is now, let alone what it will be in a few years. It could be so cheap that we don't care how many tokens we send to LLMs, or it could be high enough that we have to be very careful.

This uncertainty affects everything from architectural decisions to team structures. If tokens remain expensive, we'll optimize for token efficiency. If they become cheap, we might use AI more liberally, potentially changing the economics of software development entirely.

The Democratization Effect

Stephen O'Grady's analysis provides crucial context: these tools are, or can be, powerful accelerants and enablers that dramatically lower the barriers to software development. They have the ability to democratize access to skills that used to be very difficult, or even possible for some, to acquire.

Even Grady Booch, who has been appropriately dismissive of AGI claims, recently admitted he was "gobsmacked" by Claude's abilities. His advice to developers alarmed by AI? "Be calm" and "take a deep breath."

From his perspective, having watched and shaped the evolution of the technology firsthand over decades, AI is just another step in the industry's long history of abstractions, and one that will open new doors for the industry.

Code Health in the Age of AI

Adam Tornhill's research on code health and its impact on agentic development provides empirical backing for some of the retreat's intuitions. The study "Code for Machines, Not Just Humans" defines "AI-friendliness" as the probability that AI-generated refactorings preserve behavior and improve maintainability.

They found that LLMs performed consistently better in healthy code bases. The risk of defects was 30% higher in less-healthy code. And a limitation of the study was that the less-healthy code wasn't anywhere near as bad as much legacy code is.

What would the AI error rate be on such code? Based on patterns observed across all Code Health research, the relationship is almost certainly non-linear. As code health deteriorates, AI performance likely degrades much faster than linearly.

The Open Space Format: A Meta-Success

One of the most heartening aspects of the retreat was the meta-stuff. While many participants were very familiar with the Open Space format, it was the first time for a few. It's always fun to see how people quickly realize how this style of (un)conference leads to wide-ranging yet deep discussions.

I hope we made a few more open space fans. One participant commented how they really appreciated how the sessions had so much deep and respectful dialog. There wasn't the interruptions and a few people gobbling up airtime that they'd seen around so much of the tech world.

Another attendee commented, "it was great that while I was here I didn't have to feel I was a woman, I could just be one of the participants." One of the lovely things about Thoughtworks is that I've got used to that sense of camaraderie, and it can be a sad shock when I go outside the bubble.

The Uncomfortable Truth

Annie Vella's take-aways capture the essence of what made the retreat valuable: "I walked into that room expecting to learn from people who were further ahead. People who'd cracked the code on how to adopt AI at scale, how to restructure teams around it, how to make it work."

And nobody has it all figured out. There is more uncertainty than certainty. About how to use AI well, what it's really doing to productivity, how roles are shifting, what the impact will be, how things will evolve. Everyone is working it out as they go.

I actually found that to be quite comforting, in many ways. Yes, we walked away with more questions than answers, but at least we now have a shared understanding of the sorts of questions we should be asking. That might be the most valuable outcome of all.

The retreat didn't produce a new manifesto, and that's probably for the best. The Agile Manifesto emerged from a specific context and solved specific problems. The challenges we face with AI are different, more complex, and perhaps more fundamental.

What we do know is that the practices, tools, and organizational structures built for human-only software development are breaking in predictable ways under the weight of AI-assisted work. The replacements are forming, but they are not yet mature.

The ideas ready for broader industry conversation include the supervisory engineering middle loop, risk tiering as the new core engineering discipline, TDD as the strongest form of prompt engineering, and the agent experience reframe for developer experience investment.

But perhaps the most important insight is that we're all figuring this out together. The uncertainty isn't a bug—it's a feature. It means we're pushing boundaries, asking hard questions, and being honest about what we don't know.

In a field that often pretends to have all the answers, that kind of intellectual honesty might be the most valuable outcome of all.

#AI #Software Development #LLM #TDD #Developer Experience