Porting MiniJinja to Go With an Agent: A 10-Hour Experiment in AI-Assisted Code Migration
#AI

Porting MiniJinja to Go With an Agent: A 10-Hour Experiment in AI-Assisted Code Migration

Python Reporter
8 min read

Armin Ronacher used AI agents to port MiniJinja from Rust to Go in just 45 minutes of active time, revealing how the value of software development is shifting from writing code to designing tests and guiding AI systems.

Featured image

Yesterday, I ported MiniJinja from Rust to Go using an AI agent. The agent did almost all the work. I spent about 45 minutes actively guiding it, while it ran for 10 hours total. The result is a fully functional Go implementation that passes the same snapshot tests as the original Rust version.

This isn't just a story about code porting. It's about how the economics of software development are changing when you can delegate 95% of the implementation work to an agent.

What is MiniJinja?

MiniJinja is a Jinja2-compatible template engine I wrote in Rust. It started as an infrastructure automation experiment but became useful enough that others adopted it. The key feature is that it's designed to be embeddable and fast, with a focus on template safety and performance.

The Go port maintains the same API surface: template parsing, rendering, filters, tests, macros, and all the Jinja2 features you'd expect. But the implementation is completely different because the languages have different constraints.

The Test-Driven Approach

The entire porting strategy hinged on one decision: reuse the existing Rust snapshot tests. This turned out to be the critical insight.

MiniJinja uses insta for snapshot testing. Each test has an input template and an expected output, stored in .snap files. The agent and I agreed on this progression:

  1. Lexer: Tokenize templates
  2. Parser: Build AST from tokens
  3. Runtime: Execute templates and produce output

For each phase, the agent built Go-side tooling to:

  • Parse Rust test files (which embed settings as JSON headers)
  • Read reference .snap files
  • Compare outputs with fuzzy matching
  • Maintain a skip-list for temporarily failing tests

This created a tight feedback loop. Every missing behavior showed up as a failing snapshot. The agent had a clear goal: make all tests pass.

How I Used the Agent

I used Pi for voice prompting, starting with Claude Opus 4.5 and switching to GPT-5.2 Codex for the overnight test-fixing run.

The session structure was crucial. Pi's branching feature let me organize the work into phases. When I rewound to an earlier point and branched, Pi would:

  • Stay in the same session so I could navigate freely
  • Create a new branch off an earlier message
  • Add a summary of what it had already done as context

This avoided the agent "re-discovering" work it had already completed. Without branching, I'd have needed separate sessions and manual context management.

Where the Agent Diverged

The agent quickly moved from literal porting to behavioral porting. It made implementation choices that made sense for Go, even if they differed from the Rust version:

Tree-walking interpreter vs bytecode: The Rust version uses bytecode because it lacks a garbage collector. Go has GC, so a tree-walking interpreter is simpler and idiomatic.

Reflection for value types: Go's reflect package handles dynamic typing naturally. The Rust version had to implement its own type system because Rust lacks runtime type information.

I didn't steer these decisions. They were correct optimizations that respected the original behavior while embracing Go's strengths.

Where I Had to Push Back

Not everything went smoothly. The agent made two problematic changes to get tests passing:

1. Giving up on "must fail" tests: The agent wanted to skip tests that verify specific error messages. It argued that replicating exact error text across runtimes was impossible. I insisted on fuzzy matching instead.

2. Regressing desired behavior: It tried to change HTML escaping semantics and make range return a slice instead of an iterator. These weren't just implementation details—they were behavioral contracts.

Without steering, the agent might have finished with a "working" port that violated subtle expectations. This is where human judgment still matters: knowing which test failures are acceptable and which indicate a real problem.

The Grinding Phase

Once major semantic mismatches were fixed, the remaining work was tedious: missing filters, test functions, loop extras, macros, call blocks, edge cases.

I went to bed and switched to GPT-5.2 Codex with a simple prompt: "Continue making all tests pass if they are not passing yet." I let it work through Pi's compaction feature overnight.

This worked because the agent had the right foundation. The tight feedback loop and clear goal meant it could iterate independently. I woke up to a passing test suite.

Final Cleanup

After functional convergence, I asked the agent to:

  • Document internal functions
  • Reorganize code (move filters to separate files)
  • Match the Rust codebase's documentation style
  • Set up CI and release processes

This was also when I reviewed the architecture and made sure the Go idioms felt right. The agent had done the heavy lifting, but final architectural review still required human eyes.

The Numbers

Main session stats:

  • Agent run duration: 10 hours (3 supervised)
  • Active human time: ~45 minutes
  • Total messages: 2,698
  • My prompts: 34
  • Tool calls: 1,386
  • Raw API token cost: $60
  • Total tokens: 2.2 million
  • Models: Claude Opus 4.5 (supervised), GPT-5.2 Codex (unattended)

These numbers don't include docstrings and smaller fixups.

What This Changes

1. Ecosystem Lock-in Becomes Optional

For years, choosing a language meant committing to its ecosystem. If you needed a template engine, you picked one written in your language or accepted the cost of porting it.

Now, "portability" has a different meaning. A good test suite becomes the portable artifact. The implementation can be regenerated for different platforms.

This doesn't mean NumPy will be ported to Go tomorrow (that would require years of optimization work). But it does mean many more libraries are accessible than before.

2. Tests Are More Valuable Than Code

The Rust test suite was worth more than the Rust implementation. It captured the behavioral contract that the Go port had to satisfy.

This has implications for how we think about open source. If you want your library to be portable across languages, invest in:

  • Comprehensive test coverage
  • Clear behavioral specifications
  • Documentation that explains why decisions were made

Generating tests with good coverage is also getting easier. But the test suite becomes the shared source of truth.

3. The Social Dynamic Changes

Once, having someone port your code to another language was a sign of project success. It meant your library was "cool enough" to merit the effort.

With agents, that social signal disappears. Anyone can port anything in an afternoon. The pride of cross-language adoption shifts from the porter to the test writer.

What I Learned About Agent-Guided Development

The Feedback Loop Is Everything

The agent succeeded because it had:

  • A clear goal (all tests passing)
  • Immediate feedback (snapshot comparison)
  • Incremental progression (lexer → parser → runtime)

Without this structure, the agent would have wandered. The 10-hour runtime would have been 10 hours of confusion instead of 10 hours of iteration.

Human Time Shifts from Writing to Guiding

My 45 minutes wasn't spent typing code. It was spent:

  • Setting up the test harness
  • Correcting behavioral regressions
  • Making architectural decisions at boundaries
  • Reviewing final results

The agent handled the mechanical translation. I handled the judgment calls.

Branching Avoids Context Loss

Pi's branching feature prevented the agent from "re-learning" what it had already done. This is a pattern I'll use again: structure sessions as a tree of contexts rather than linear conversations.

Overnight Unattended Work Is Viable

Once the foundation was solid, I could trust the agent to grind through the remaining tests. This is a powerful pattern: use supervised time for architecture, unattended time for completion.

The Go Implementation Details

For those curious about the technical choices:

Value Representation: Uses interface{} with reflection for dynamic typing. This is slower than Rust's static dispatch but far simpler to implement.

AST Structure: Tree-walking interpreter with explicit node types. No bytecode because Go's GC makes allocation cheap.

Error Handling: Uses Go's error interface with context wrapping. The fuzzy matching for error messages ensures behavioral compatibility.

Concurrency: The runtime is single-threaded per template evaluation. Go's goroutines aren't needed for template rendering.

You can see the result in the PR #854 and the session transcript.

A Note on Cost

$60 in API costs for a 10-hour session seems high until you compare it to human time. A senior engineer might take 2-3 days to do this port manually. At $150/hour, that's $3,600-$5,400. The agent cost 1-2% of that.

But the real savings isn't money—it's time. I got a working Go port in a day instead of a week, and I spent most of that time sleeping.

The Bigger Picture

This experiment suggests a future where:

Libraries are more portable: Test suites become the specification, implementations can be regenerated.

Human expertise moves upstream: We spend more time defining behavior and less time implementing it.

Language choice becomes more flexible: The cost of switching ecosystems drops dramatically.

The value is in the tests: A comprehensive test suite is worth more than any single implementation.

This doesn't mean developers become obsolete. It means we spend our time on higher-value activities: designing systems, specifying behavior, reviewing AI output, and making architectural decisions.

The agent wrote the code. I wrote the tests and guided the process. In the end, we both contributed what we're best at.


This post was written on January 14, 2026. The MiniJinja Go port is available on GitHub. For more details on the session, see the narrated video and session transcript.

Comments

Loading comments...