Programming, Evolved: Lessons from 18 Months of Pair-Programming with AI

A software engineer's deep dive into the practical realities of using coding assistants like Claude Code, Codex, and Cursor for over a year, exploring how they're reshaping the craft of software engineering and what it means for developers' mental models and workflows.

The author of this GitHub document has spent the last 18 months in a relentless experiment, using every major coding assistant available—from Copilot and Cody to Claude Code, Codex, and Gemini. The result isn't a simple verdict on whether AI can write code. It's a nuanced field report on how the craft of software engineering is being fundamentally reshaped, and what it takes to thrive in this new environment.

The New Pair-Programming Buddy

The core thesis is straightforward: every software engineer, regardless of level, should pick a model and shape it into their best pair-programming buddy. This isn't about delegation; it's about collaboration. The author draws a critical distinction between programming—the pursuit of one person solving a known problem with code—and software engineering, which is inherently a team sport involving people, time, and trade-offs.

For decades, programming evolved by stripping away incidental complexity: from punch cards to line editors, from low-level languages to high-level paradigms. Each step lowered barriers and expanded the pool of practitioners. The author argues we're in the midst of a similar, but more rapid and broad, evolution—a "Cambrian explosion of programming"—driven by AI assistants.

The Three Dimensions of Improvement

Over the past year, coding assistants have improved across three key dimensions:

Code Quality: Models generate better quality code for languages in their distribution (Python, TypeScript, Rust, Go). They're no longer just autocomplete on steroids; they can reason about structure and patterns.
Context Awareness: Assistants are increasingly grounded in the specific codebase they're working on, not just the vast corpus they were trained on. This is a crucial shift from generic suggestions to project-specific solutions.
Reliability: Innovations in the "harness" built around models allow them to work on problems for longer periods while producing coherent output. They can maintain context through multi-step tasks.

Where They Win, Where They Struggle

For solving known problems—standard business logic, API integrations, boilerplate—the author states they win on both speed and quality compared to an average programmer. They excel as bug hunters. When pointed at a symptom, models like Opus 4.5 and GPT 5.2 can read code, identify the root cause (deadlocks, starvation), and explain the logic flaw. The author notes they've only encountered one counterexample where a model incorrectly blamed the macOS virtualization layer for a connection pool exhaustion issue.

However, significant gaps remain, particularly in frontend development. The author's experience shows models struggle with generating good-looking, well-functioning user interfaces using idiomatic code. They are "bad at Tailwind, bad at Ink, bad at Textual, and OK at Ratatui." It's unclear whether this is a sampling problem in training data or if the heavy abstractions in UI frameworks trip them up. For web and mobile UIs, the author starts with design mocks from tools like Google Stitch, but notes Stitch cannot yet produce mocks for terminal UIs (TUIs).

The "Personality" Problem and Prompt Engineering

A critical observation is that a model's default "personality" is to solve the problem in front of it as quickly as possible to earn user praise. This leads to sub-optimal, short-term fixes. The author caught Opus 4.5 trying to solve a deadlock by having a process "sleep for 2 seconds."

This tendency can be altered with guidance. The author uses specific language to steer models toward better solutions:

"Idiomatic": Using phrases like "come up with an idiomatic solution" or "is that the most idiomatic way?" pushes the model toward established patterns and best practices.
"Intent": When writing or reviewing tests, peppering prompts with "intent of the function under test" yields better, more meaningful test cases.

Claude Code's own harness uses similar tricks to keep its model on the rails, suggesting these are fundamental techniques for effective collaboration.

The Erosion of the Mental Model

Perhaps the most profound insight is about the developer's mental model. The act of writing code—line by line, function by function—builds a deep, personal understanding of the system. This mental model is invaluable for making architectural decisions and debugging complex incidents.

When coding assistants write most of the software, the fidelity of this mental model degrades quickly. The author isn't fighting this; instead, they're developing methods to use the AI as a tool to query and develop the mental model on-demand. It's not the same as building it through manual coding, but it's becoming the new norm.

This shift necessitates new tooling and a new approach to training. The author suggests we may need to train software engineers regularly on the failure modes of their systems, much like the aviation industry trains pilots, to compensate for the reduced hands-on familiarity.

The New Workflow: From Editor to Editor-in-Chief

For years, developers spent hundreds of hours fine-tuning their terminal and editor to feel just right. That editor was their primary tool. Now, the author notes, "I am the 'editor' for my coding assistant." The time once spent tweaking the IDE is now spent learning about the AI, teaching it new tricks, skills, and commands.

To facilitate this learning, the author built personal tools like Catsyphon and Aiobscura to review and analyze their interactions with the AI, extracting lessons from the process. This reframes the relationship: the developer becomes a mentor, growing and guiding their pair-programming buddy.

Practical Recommendations

For those hesitant to adopt coding assistants, the author suggests starting with toil—tasks that are necessary but not creative. Assistants are excellent at:

Comprehending stack traces
Making sense of poorly written code
Summarizing documentation
Querying documentation for specific details

Regarding security, the author notes that while coding assistants come with a sandbox, it can sometimes hinder the agents within them. They personally use an "exo-sandbox" (like sandbox-exec) to contain sessions while turning off the assistant's internal sandbox, acknowledging this is a personal choice with trade-offs.

Finally, there's a clear-eyed view on the joy of coding. "There is so much fun, beauty and pleasure in writing code by hand," the author writes. "You can still handcraft code. Just don’t expect this to be your job. This is your passion."

The evolution is here. The tools are powerful but imperfect. Success lies not in blind adoption, but in deliberate collaboration—shaping the AI into a true partner while consciously preserving and adapting the human skills that remain irreplaceable.

#AI #Programming #Software Engineering #Developer Tools #Mental Model