EvanFlow: Structuring AI-Assisted Development with TDD and Human Oversight

EvanFlow introduces a structured, TDD-driven approach to AI-assisted software development, emphasizing human control through checkpoints while leveraging Claude Code's capabilities. The framework addresses common failure modes in agentic coding through deliberate constraints and verification steps.

The recent emergence of AI coding assistants has brought both excitement and challenges to the development community. While these tools promise increased productivity, they also introduce new failure modes and require careful integration into existing workflows. EvanFlow, a new plugin for Claude Code, attempts to address these challenges by providing a structured, TDD-driven iterative feedback loop that maintains human oversight throughout the development process.

Understanding EvanFlow's Approach

EvanFlow presents itself as more than just a collection of coding tools; it's a comprehensive methodology designed to guide developers through the entire software development lifecycle using Claude Code. The framework centers around a specific loop structure: brainstorm → plan → execute → tdd → iterate, with deliberate checkpoints where human approval is required before proceeding.

What distinguishes EvanFlow from other AI coding approaches is its emphasis on discipline and control. Rather than allowing the AI to autonomously complete tasks, the framework stops at critical decision points, requiring explicit human approval. This "conductor, not autopilot" philosophy aims to balance AI assistance with human judgment.

The framework's creator emphasizes that "the loop is conductor, not autopilot: real checkpoints at design approval, plan approval, and after iteration. The agent stops short of every git operation and waits for your direction." This approach directly addresses one of the primary concerns with AI coding assistants: the potential for uncontrolled, error-prone generation.

The Technical Architecture

At its core, EvanFlow consists of 16 cohesive skills plus 2 custom subagents that work together to implement the development loop. These skills are organized into three categories:

Default Loop (5 skills): The core skills that implement the main development loop
Special-Purpose (8 skills): Additional skills for specific tasks like debugging, architecture improvement, and QA
Cross-Cutting (1 skill): Context management for longer sessions
Meta (1 skill): An index and shared vocabulary

The two custom subagents, evanflow-coder and evanflow-overseer, play crucial roles in the parallel execution mode. The coder implements code while the overseer provides read-only review, ensuring that implementation stays within bounds without directly modifying code.

The framework also includes a git guardrail hook that blocks dangerous git operations, addressing another significant concern with AI coding assistants: the potential for unintended destructive changes. This bundled hook prevents operations like git reset --hard or git clean -f from being executed automatically.

Addressing Common AI Coding Failures

EvanFlow's design appears to be informed by research on common failure modes in AI-assisted coding. Several hard rules are baked into the framework:

Never invent values: The framework explicitly prevents hallucination of file paths, environment variables, IDs, function names, and library APIs. If uncertain, the agent stops and asks.
Assertion-correctness warning: Research shows that 62% of LLM-generated test assertions are incorrect. EvanFlow's TDD skill and overseer review explicitly check whether test assertions would catch one-character bugs in implementation.
Context drift management: The framework includes specific mechanisms to detect and address context drift, which industry data suggests accounts for approximately 65% of enterprise AI coding failures.
Five Failure Modes checklist: During the iteration phase, the framework explicitly checks against hallucinated actions, scope creep, cascading errors, context loss, and tool misuse.

These constraints represent a deliberate attempt to create guardrails that prevent common failure modes while still leveraging the power of AI assistance.

The Development Loop in Detail

The EvanFlow loop is structured around five main phases:

Brainstorming: Clarifies intent and proposes 2-3 approaches with embedded stress-testing. This phase requires design approval before proceeding.
Planning: Maps file structure first (following "deep modules" principles), then creates bite-sized tasks. This phase includes a plan approval checkpoint and can offer parallel execution for plans with 3+ independent units.
Execution: Runs task-by-task with inline verification. Blockers stop the loop and surface to the developer. For parallel execution, the framework uses a coder/overseer orchestration with integration tests at touchpoints.
TDD: Implements vertical-slice TDD (one failing test → minimal implementation → repeat). Tests verify behavior through public interfaces to ensure they survive refactoring.
Iteration: Self-review loop that re-reads diffs, fixes issues, runs quality checks, and screenshots UI changes. It includes a Five Failure Modes checklist and has a hard cap of 5 iterations.

This structured approach aims to provide both the benefits of AI assistance and the reliability of human-guided development.

Parallel Execution and Integration Testing

One of EvanFlow's more sophisticated features is its support for parallel execution. For plans with three or more truly independent units, the loop can fork into a parallel coder/overseer orchestration:

One coder per unit (using vertical-slice TDD)
One overseer per coder (read-only review)
An integration overseer that runs named integration tests at every touchpoint

The integration tests serve as executable contracts, ensuring that interfaces can't drift as both sides must satisfy the same passing tests. This approach addresses a common challenge in parallel development: maintaining consistency across independently developed components.

Installation and Usage

EvanFlow can be installed through three paths:

Claude Code Plugin Marketplace (recommended): The cleanest install that automatically activates skills, agents, and the guardrail hook.
npx skills CLI: Installs skills only without the guardrail hook or custom subagents.
Manual Copy: For users who want full control without CLI dependencies.

Once installed, developers can initiate the loop by saying "let's evanflow this" followed by their idea. The framework then guides them through the complete development process.

The framework also includes extensive customization options, allowing developers to adapt skills to their specific project needs, replace placeholder values, and adjust quality checks.

Community Context and Adoption

EvanFlow emerges at a time when the developer community is still grappling with how to effectively integrate AI into development workflows. While some developers embrace AI coding assistants as productivity multipliers, others express concerns about code quality, maintainability, and the potential for introducing subtle errors.

EvanFlow's approach represents a middle ground: leveraging AI's capabilities while maintaining human oversight and implementing safeguards against common failure modes. This balanced approach may appeal to developers who are interested in AI assistance but wary of fully autonomous coding.

The framework's emphasis on TDD and iterative development aligns with established best practices in software engineering, suggesting that it may integrate more smoothly into existing development workflows than more radical approaches.

Potential Limitations and Counter-Perspectives

Despite its thoughtful design, EvanFlow is not without potential limitations:

Overhead: The checkpoint system, while providing control, may introduce additional overhead compared to more direct AI coding approaches.
Learning Curve: The framework's comprehensive nature means developers may need time to learn and effectively use all its components.
Claude Code Dependency: EvanFlow is specifically designed for Claude Code, limiting its applicability to users of other AI coding assistants.
Opinionated Approach: The framework's deliberate constraints may feel restrictive to developers who prefer more open-ended AI assistance.

Some developers may also question whether the level of control and structure provided by EvanFlow is necessary, particularly for simpler tasks where more direct AI assistance might suffice.

Conclusion

EvanFlow represents a thoughtful approach to AI-assisted development, attempting to balance the power of AI coding assistants with the need for human oversight and quality control. Its structured loop, emphasis on TDD, and safeguards against common failure modes address many legitimate concerns about AI in development.

As the developer community continues to explore how to effectively integrate AI into workflows, frameworks like EvanFlow provide valuable models for maintaining quality and control while leveraging AI's capabilities. Whether its particular approach becomes widely adopted remains to be seen, but it contributes an important perspective to the ongoing conversation about AI in software development.

For developers interested in exploring EvanFlow, the GitHub repository provides comprehensive documentation and installation instructions. The framework's plugin marketplace page offers another entry point for those using Claude Code's plugin system.

As with any development tool, the value of EvanFlow will ultimately depend on how well it fits into individual and team workflows. Its emphasis on structure and control may appeal to some developers while feeling overly constrained to others. Regardless, it represents an important contribution to the emerging field of AI-assisted software development.

#AI #TDD #Software Development #Human Oversight #Claude Code