Apenwarr challenges the dogma of small pull requests, arguing that while annealing works for stable systems, it can become an enemy of necessary change in software development.
In the intricate dance of software development, we've come to worship a particular ritual: the small pull request. Limited to a few files, a few hundred lines, doing just one thing and doing it well. These requirements, often enforced with religious fervor, promise quality, maintainability, and ease of review. We tell ourselves this is the pinnacle of software engineering, the crystalline structure of mature code. But what if this annealing process, while appropriate for stable systems, has become an invisible cage preventing necessary evolution?
The author introduces an elegant analogy from metallurgy and computer science: simulated annealing. In this process, one begins with high energy—making bold, sweeping changes to quickly explore the solution space—then gradually reduces energy to refine and stabilize the system. The result, in physical annealing, is stronger, more stable structures. In software, we've seemingly inverted this process, starting with small changes and making them smaller still, fearing that any significant jump will break our carefully constructed crystalline lattice.
This approach works remarkably well for mature systems. Consider Tailscale, with its seven years of evolution and millions of users dependent on its stability. Here, annealing isn't just appropriate; it's essential. The cost of breaking established patterns is measured in real user frustration and lost trust. But the annealing mindset creates a dangerous assumption: that all systems benefit from the same treatment.
The emergence of AI-driven coding presents a fascinating counterpoint. LLMs, trained using processes remarkably similar to annealing, operate without our fear of change. They produce changes as large and interconnected as prompts demand, jumping across the solution space with abandon. The mathematical predictions hold true: the output is often less coherent, more likely to fail. Yet, this fearless exploration enables a capability we've lost—the ability to make substantial, risky changes to mature systems and quickly discard what doesn't work.
The author's experience developing Aperture reveals the critical limitation of pure annealing. When implementing dollar-based spend quotas, the change required three interconnected components: a new Grant syntax for applying attributes to sessions, a query cost approximator combining multiple sources, and the actual quota enforcement system. None could meaningfully exist without the others. Breaking this into 24 separate 500-line PRs would have been not just inefficient but impossible, as each component's design evolved in response to insights gained from the others.
This is the essence of the annealing paradox: sometimes the smallest possible step is no step at all. When components are fundamentally interdependent, artificial decomposition doesn't enable progress; it prevents it. The author wisely implemented the 12,000-line change as a whole, then afterward broke it into manageable parts for review—a practical acknowledgment that some ideas must be born whole.
The economics of change are shifting dramatically. AI assistance has dramatically reduced the cost of producing large changes, making experimentation feasible in ways previously unimaginable. This doesn't make large changes safer, but it does change the risk calculation. When a 12,000-line AI-generated change takes as much effort to produce as a 500-line human-written change, the potential upside of bold experimentation begins to outweigh the costs.
Yet, we remain trapped in a false dichotomy. GitHub's code review system, still lacking stacked diffs after 18 years, forces us to choose between unreviewable monstrosities and tedious, fragmented sequences of small changes. The solution isn't to abandon either approach but to develop new tools that can handle the reality of complex, interconnected changes.
The author suggests a promising direction: AI-assisted review workflows that can analyze large changes holistically, identify potential issues, and provide actionable feedback before human review begins. In a world where writing code becomes easier but reviewing it becomes harder, we should place greater demands on writers, not less.
Ultimately, the annealing debate is about more than just pull request size. It's about recognizing that systems exist in different states and require different approaches. New systems need the energy to explore broadly, finding their form through experimentation. Mature systems benefit from the stability of careful refinement. But even mature systems occasionally require the boldness of a high-energy jump to escape local optima and evolve.
The future of software development lies not in choosing between small and large changes, but in developing the wisdom to know when each approach is appropriate. As AI continues to transform our tools and processes, we have an opportunity to break free from the annealing dogma and embrace a more flexible, context-aware approach to software evolution—one that honors both the need for stability and the imperative to grow.
Comments
Please log in or register to join the discussion