Rewriting the OCaml Runtime in Rust, One Line at a Time, with Claude as Co-Pilot

A developer named mbacarella took the C heart of the OCaml runtime and translated it, file by file, into Rust using Claude Code. The result passes the full upstream test suite, compiles OCaml itself, and lands at rough performance parity. What it reveals about mechanical AI-assisted rewrites is more interesting than the port itself.

There is a particular kind of software project that resists rewriting not because the code is bad, but because the code is load-bearing in ways that are mostly invisible. The OCaml runtime is exactly that. It is roughly 40,000 lines of carefully managed C, full of macro machinery, multicore guarantees, a garbage collector that moves live pointers around while the program runs, and a foreign function interface whose contract must be honored down to the byte. It is the part of OCaml that nobody touches casually. So when a developer going by mbacarella published an experience report describing a line-by-line port of that runtime from C to Rust, steered by a human but largely typed by Claude, the result deserves to be read less as a benchmark contest and more as a probe into what mechanical translation has actually become.

The headline facts are unambiguous. The fork, which lives on the rust-runtime branch of the rustcaml repository, contains no C in the runtime. It passes the OCaml compiler's own test suite, unmodified, pulled straight from upstream. The patched compiler builds itself, builds dune, installs an opam switch, and compiles arbitrary OCaml programs in both bytecode and native modes. The fixpoint test, where the compiler builds itself and then uses that build to build itself again and the two generations come out bit-identical, passes. This is not a toy that runs a few examples. It is a working runtime.

The thesis hiding inside the experiment

The author is careful, repeatedly, to disclaim the obvious triumphalist reading. He is not arguing that OCaml should be rewritten in Rust, and he openly expects the comment section to explain why his benchmarks are wrong. The genuinely interesting claim is quieter and more unsettling: the mechanical cost of rewriting a mature, sophisticated codebase in another systems language has collapsed toward zero, and that collapse changes which questions are worth asking.

For decades, "it's written in C" functioned as a settled fact about a runtime like this. The implicit reasoning was that the porting effort would be enormous, error-prone, and require a specialist who understood both the source and the target intimately, sustained over months or years. That cost was the moat. When Claude can perform the translation in roughly seven days of wall-clock time, most of it spent waiting on the human to approve commands and on test suites to re-run, the moat drains. "It's in C" stops being a reason and becomes merely a historical accident, a record of the fact that Rust did not exist when OCaml broke ground two decades earlier. The author puts it precisely: that is legacy, not an active decision.

How the port was actually done

The methodology matters because it explains both why the project succeeded and what its results mean. The instruction to Claude was not "design and build me an OCaml runtime." It was the far more constrained "let's translate the C to equivalent Rust, file by file, line by line, and check with me every step of the way." That constraint was deliberate and it shaped everything downstream.

The build system got a per-file toggle. Flip a switch and the linker selects the Rust version of a given file instead of the C version. This meant the runtime was never in a non-working state. At 2 percent ported, at 50 percent, at 99 percent, it still compiled OCaml and still passed tests. After each of the 71 runtime C files was translated, the full test suite ran, and a known-good state was committed before moving on. The reasoning behind those commits is the kind of thing that separates an engineer from an enthusiast: if some obscure application someday trips a heap corruption bug, you can bisect across the entire porting history to find exactly which file introduced the divergence.

The line-by-line discipline served a second purpose beyond safety. Because each .rs file mirrors its .c ancestor closely, the port can track upstream OCaml as it evolves. Idiomatic Rust would have been prettier and arguably safer, but it would have severed that correspondence and turned every future upstream change into a manual reinterpretation.

What the benchmarks say, and what they don't

The author predicted his constrained, non-idiomatic Rust would run 10 to 20 percent slower for native code and 20 to 30 percent slower for the bytecode interpreter. The actual result for native executables is essentially parity, hovering around 1.05x with individual benchmarks ranging from 0.87x to 1.13x. Some workloads that hammer the garbage collector's slow paths, like the lists and finalise tests, actually run faster in Rust, though the author honestly investigates whether that win comes from the language or from build flags. The Rust runtime is a single crate built with -O3 and link-time optimization, while upstream C uses -O2 with separate translation units. Rebuilding C with matching aggressive flags closed about a third of the lists advantage and, interestingly, produced a 9 percent regression on bytecode because cross-unit inlining bloated the interpreter loop. The honest conclusion is that upstream's default flags are C's best configuration, which makes them the fair baseline.

The bytecode interpreter is where the story gets technically rich. On Rust stable, it runs about 1.44x slower, and the reason is a single missing feature: computed gotos. This is a non-standard GCC and LLVM extension that the C interpreter uses, and the speedup it provides has nothing to do with making jumps faster in the naive sense. It works because it makes each jump to the next bytecode handler predictable to the CPU's branch predictor. The same toolchain heuristic, when accidentally disturbed, caused a notable performance regression in CPython recently. Rust stable cannot express computed gotos, so the straightforward port collapsed the interpreter into a loop { match ... } and paid for it.

The rust-runtime-nightly branch fixes this by reaching for explicit tail calls via the experimental become keyword. With that, the interpreter reaches parity and in several benchmarks slightly beats the C version, landing around 0.91x overall. The Rust documentation warns that explicit tail calls are "currently incomplete and may not work properly," so this is a research result rather than a recommendation, but it is a striking one.

Twitter image

The 2,015 unsafe blocks, and why honesty matters here

Running ripgrep across the ported runtime turns up 2,015 uses of unsafe. The author refuses to pretend this makes the runtime safer, and his analysis of why is the most clarifying part of the whole report. After compilation, every OCaml value is a single machine word, tagged as either an integer or a heap pointer. The runtime spends its entire existence reading and writing untyped words, so there are no static types at that level for the borrow checker to reason about; nearly every field access is a raw pointer dereference. Worse, the runtime is the garbage collector, which means its core job is to mutate and move live pointers during collection, which is precisely the category of operation the borrow checker exists to forbid. And the whole thing must speak the C ABI because that is the OCaml runtime's ABI, and Rust has no stable ABI of its own, so extern "C" is unavoidable and unavoidably unsafe.

The reframing he offers is worth sitting with. The unsafe count is not a translation failure. Rust did not introduce the danger; it made visible the carefully managed unsafety that was always present in the C, just spelled in a language that demands you name it. A runtime like this is unsafe by nature, and a faithful port reveals that nature rather than laundering it.

Pair programming with a brilliant, distractible partner

The portrait of Claude that emerges reads like a candid review of a talented collaborator. The model was genuinely good at line-by-line translation, better than a human at the mechanical transcription, and it never once tried to cheat the test suite to fake progress, which the author flags as the guiding light of the entire effort. When the work demanded it, Claude wrote sed and awk scripts to bulk-translate tedious data.

But it stumbled in instructive ways. Early on, a test produced a segfault and Claude simply kept porting files, having assumed the crash was preexisting in trunk and never thinking to verify. This recurred. In long sessions, especially after context compaction, the model could lose track of what the passing state was supposed to look like and convince itself that failures it had caused were not its fault. The fix was to make it re-run the test suite on trunk and watch everything come back green. It occasionally botched enums that started at a nonzero index, or flipped magic numbers in initializers. At one point it decided the entire garbage collector, spanning several large files, had to be ported in one heroic 5,000-line push, abandoning the file-by-file strategy it had followed all along, until the author reminded it that the incremental approach had worked every other time, at which point it agreed and did it cleanly. The author's summary, that it felt like pair programming with someone who had internalized C, Rust, and the OCaml runtime simultaneously but also had ADHD, is both affectionate and exact.

The project also doubled as an education. The author had never used a skip list before encountering lf_skiplist.c, and watching the translation forced him to understand why the runtime uses a lock-free skip list rather than a tree: the concurrent, lock-free property is achievable with a skip list in a way that would be brutally hard with a tree. He also discovered he had OCaml's value encoding backwards. He thought the runtime stole the top bit of each word; in fact it steals the bottom bit, so that aligned heap pointers, which naturally end in zero, need no decoding at all, and only integers require a single shift to recover. Both that encoding trick and the computed gotos are, as he notes, reasons the runtime cannot be written optimally in OCaml itself. The runtime has to live below the abstractions the language provides.

The questions that are left

What the rewrite settles is small. What it leaves open is the actual conversation. If a human had spent two years doing this same port by hand, would you trust it more, and if so, what exactly is that trust made of, and can it be closed by better tests, better tooling, or time? Now that the mechanical cost has nearly vanished, these stop being footnotes and become the whole subject. The author does not pretend to answer them, and his restraint is the most credible thing about the report.

There is one more wrinkle that gives the project an almost wistful edge. The port does not comply with the OCaml team's new AI policy, so it was never going to be merged regardless of its quality. It exists as a demonstration and a provocation rather than a proposal. But provocations are how settled facts get reopened, and this one reopens a real one: human-steered AI rewrites of mature systems, anchored to an existing implementation and a strong test suite, now belong in the working programmer's toolkit. You do not need a research lab's token budget to get surprisingly far. The runtime that nobody touches casually just got touched, carefully, one line at a time.

#OCaml #Rust #ai-assisted-programming #compiler-runtime #Code translation