Jane Street's Reversal on Formal Methods, and What Agentic Coding Changed

After 25 years of telling people that formal methods weren't worth the cost, Jane Street's Yaron Minsky now argues that agentic coding has rewritten the math. The firm is building a team to make mathematical proof as routine a tool as type systems.

For a quarter century, Yaron Minsky says he told anyone who asked that Jane Street simply wasn't interested in formal methods. The firm that pushed OCaml into production trading systems, that built an internal culture around type safety and correctness, drew a hard line at the more demanding end of the verification spectrum. Now Minsky is saying something different, and the reason he gives is worth sitting with, because it reframes formal verification not as a luxury for safety-critical systems but as a structural response to how code gets written in an era of capable coding agents.

The central claim in his recent post is deceptively modest: the cost/benefit calculation around formal methods has shifted, and shifted enough that a firm famous for its pragmatism is now hiring proof engineers in London and New York. Understanding why requires taking seriously both halves of that calculation, because both moved at once.

The old math, and why it held for so long

Minsky is careful not to claim the firm was ever wrong. The skepticism was reasoned. Type systems, he notes, are themselves a kind of lightweight formal method, and Jane Street has extracted enormous value from them. The objection was never to mathematical rigor as a principle. It was to the price.

The example he reaches for is seL4, the formally verified microkernel that stands as one of the genuine landmarks of the field. The numbers tell the story of why most organizations never followed its path: roughly 25 person-years of effort to verify 8,700 lines of C, which works out to something like 23 lines of proof for every line of code, and about half a person-day of verification labor per line. For a security-critical kernel where the specification is clear and the stakes are existential, that investment can be justified. For ordinary software, even ordinary software handling enormous sums of money, it looked like a bad trade. Jane Street concluded that even its most critical systems didn't clear that bar.

That conclusion was coherent for as long as the costs stayed fixed and the benefits stayed bounded. Agentic coding, in Minsky's telling, disturbed both assumptions in the same motion.

What the agents changed on the cost side

The first shift is the obvious one, though Minsky states it with more precision than the usual enthusiasm. He does not claim that models can construct arbitrarily hard proofs on their own. His footnote is explicit that models still need a human to supply the high-level intuition for why a system works, the shape of the argument. What models are good at is the drudgery: translating a programmer's informal sense of why something is correct into the exacting syntax that a proof assistant will actually accept. A human might know why a concurrent data structure is safe without having any idea how to encode that reasoning for Lean or Rocq. The model bridges that gap, and in doing so widens the set of people who can use these tools productively.

That matters because the historical bottleneck in formal methods was never just compute. It was the scarcity of people fluent in both the problem domain and the proof apparatus. If models lower that barrier, the per-line cost that made seL4 look so forbidding starts to come down, and the whole calculation deserves a fresh look.

What changed on the benefit side, which is the more interesting argument

The cost story is intuitive. The benefit story is where Minsky's reasoning gets sharper, and it rests on a concept he calls the verification bottleneck.

Models are increasingly good at producing code that works, or appears to. But there is a wide gulf between code that achieves a stated goal and code you would actually want to ship. Minsky is blunt about the failure mode: agentic code tends toward slop. It is often overcomplicated, riddled with strange corner-case bugs, and indifferent to the invariants that hold a codebase together. The agent hits the target you set and ignores everything you didn't think to specify. The consequence is that humans now spend enormous effort verifying machine-generated output, and that verification, not the writing, is becoming the real constraint on throughput.

Formal methods offer a way to relieve that burden. If a property is proven rather than merely tested, the reviewer does not have to re-examine it. The proof carries the guarantee. Review becomes a matter of checking that the right things were specified, not exhaustively re-deriving that they hold.

The second benefit is subtler and connects to how agents behave. Agents thrive on feedback, both during reinforcement learning and during the act of coding itself. A type error, a failed test, a rejected proof obligation, each is a signal the agent can act on. Formal methods are a uniquely powerful source of such feedback because of what they can express. Tests sample the state space; they check particular inputs and hope the rest behave. A type system, by contrast, delivers the universal quantifier, the ∀. Minsky's examples are concrete: a type discipline that rules out data races eliminates all of them, not most of them; types that make cross-site scripting unrepresentable remove the entire class of vulnerability rather than the instances a test happened to probe. Jane Street has already seen, in its work on OxCaml, how much agents benefit from these universal guarantees. The bet on fuller formal methods is essentially an extrapolation: if types help this much, stronger proof techniques should help more.

Minsky is honest about the escape hatches. OCaml has Obj.magic, a way to circumvent the type system entirely, and he concedes the guarantees are not quite universal in practice. But such exceptions can be tracked, banned across most of a codebase, and where they remain, formal methods can be used to make explicit why a given use is actually safe. The universality is recoverable with discipline.

Why this firm, and not the startups

There is a fair question lurking here, which Minsky raises himself: the entire industry is thinking about agents and verification, and there is no shortage of startups chasing the intersection of formal methods and AI. Why should this happen inside a trading firm?

His answer rests on two assets that are hard to assemble elsewhere. The first is deep control over the language. Because Jane Street effectively owns its dialect of OCaml, it can reshape the language to make it hospitable to proof-oriented work: folding modular property specifications into the type system, adding type-level constraints around ownership and mutability that make certain proofs tractable, even building proof techniques directly into the language rather than bolting them on. Most teams working on programming languages can invent good ideas easily; the hard part is convincing anyone to adopt them. Jane Street has the rarer problem of users impatient that promised type-system features aren't arriving fast enough.

That community is the second asset, and it is the one outsiders tend to underrate. A population of programmers who are not merely tolerant of weird type-system features but actively demanding them gives the effort room to pursue two timelines at once: near-term improvements with immediate impact, and a more ambitious multi-year vision, learning from the former while building toward the latter.

The broader pattern, and where the skepticism should remain

yminsky avatar

What makes this reversal worth attention beyond Jane Street is the general shape of the argument. For decades the case against formal methods was economic, not philosophical. Everyone agreed proofs were good; almost no one could afford them. If the marginal cost of a proof is collapsing while the marginal value of a guarantee is rising, because guarantees are exactly what relieves the new verification bottleneck and exactly what agents feed on, then the equilibrium moves. Verification stops being the expensive last mile reserved for microkernels and becomes a routine part of the loop, closer to how type checking already functions today.

It is fair to hold some reservations. Minsky's footnotes do a lot of honest work in qualifying the optimism: models still cannot find the proofs, only help write them down; the universal guarantees leak at the edges. The history of formal methods is littered with moments when the costs seemed about to fall and didn't. And there is a circularity worth watching in using agents both to generate the slop and to staff the verification effort that cleans it up.

Still, the most credible signal here is the source. This is not a vendor selling proof tooling or a lab promoting its models. It is an organization that spent 25 years publicly arguing the opposite, now committing headcount to the reversed position and explaining its reasoning in terms a skeptic would recognize. The firms that built their reputations on knowing exactly when rigor was worth paying for are the ones whose change of mind tells you the price has genuinely moved. Whether the rest of the industry follows will depend on whether the tooling that made this look affordable inside a language a single firm controls can survive contact with the messier languages everyone else is stuck with. Jane Street, with its control over OxCaml and its unusually willing users, is running the experiment under nearly ideal conditions. The rest of us get to watch what it learns.