Anthropic's Self-Improvement Warning Buries the Real Ceiling: Compute, Fabs, and Grid Capacity
#Infrastructure

Anthropic's Self-Improvement Warning Buries the Real Ceiling: Compute, Fabs, and Grid Capacity

Chips Reporter
6 min read

Anthropic says Claude now writes more than 80% of the code merged into its own codebase and warns of a recursive self-improvement loop that could outrun human control. The number that actually governs the outcome sits lower in the report: HBM sold out for the year, transformers carrying three-to-five-year lead times, and interconnect queues running into the 2030s.

Anthropic published a report on June 4 titled "When AI builds itself," and the headline figure is designed to land hard: Claude now writes more than 80% of the code merged into Anthropic's own production codebase, up from the low single digits before Claude Code reached research preview in February 2024. The company frames this as early movement toward recursive self-improvement, the point at which a model designs and builds its successor without meaningful human input. The framing that matters more for anyone tracking chip supply, though, comes further down the page, where Anthropic concedes that compute capacity is the binding constraint on the entire scenario.

Code with Claude with a man's head as the silhouette.

The report arrives from a company that, a few weeks earlier, said its unreleased Mythos model was too capable to release publicly. The same firm now argues the industry may need to consider pausing development while it teaches its models to accelerate their own. That tension runs through the document, but the technical core is worth separating from the messaging.

The capability numbers, and their caveats

Anthropic's claims are sourced entirely from internal data, none of it independently audited. The company says that in Q2 2026 the typical Anthropic engineer merges eight times as much code per day as in 2024. On the hardest, least-specified coding tasks, Claude succeeded 76% of the time in May 2026, a 50 percentage point rise over six months. On an internal benchmark that asks each new model to make training code run faster, results climbed from roughly 3x the original speed with Claude Opus 4 in May 2025 to about 52x with the unreleased Mythos Preview by April 2026. A skilled human researcher, by comparison, needs four to eight hours to achieve a fourfold gain.

Anthropic itself walks several of these back. It calls lines of code a poor proxy for output and admits the eight-times figure almost certainly overstates the real gain. Its research-judgment study, in which models beat the human's next step 64% of the time, drew on 129 moments the company deliberately selected because the human choice had room for improvement, so it is not a like-for-like contest. Critically, the report publishes no breakdown isolating how much recent capability gain comes from the self-improvement loop rather than from raw compute, more training data, and human-led research.

The external response has been skeptical. Cognitive scientist Gary Marcus called the piece a "bait and switch" on his Substack, arguing Anthropic had shown faster coding under human direction rather than a system improving itself. Bentley University mathematician Noah Giansiracusa told Scientific American, "I don't think it's a genuine call to slow down."

Triangle as a weighing scale

Where independent measurement lands

Third-party benchmarks support fast improvement without confirming a runaway loop. METR found that the length of task an AI can finish with 50% reliability has been doubling roughly every seven months. On its RE-Bench research benchmark, the best agents beat human experts given two hours, but humans pulled ahead at eight hours and roughly doubled the top agent's score at 32 hours. The pattern is consistent: AI's advantage sits in short, well-defined bursts, not the sustained, open-ended work that frontier research actually depends on. That is the same human edge Anthropic says is still holding.

Anthropic is not alone in publishing these claims. Google CEO Sundar Pichai said in an April blog post that 75% of new code at Google is AI-generated and engineer-approved, up from 50% the previous autumn. OpenAI's Jakub Pachocki has described the company's Codex agent as "a very early version of an AI researcher." Chinese developer MiniMax marketed its M2.7 model in March as "self-evolving," claiming it ran its own scaffold-optimization rounds, though the benchmarks were internal and unreplicated.

The constraint Anthropic half-buries

The section that should interest hardware readers most is where Anthropic names the physical limits: chip fabrication, grid expansion, and interconnect bandwidth as the factors that could cap progress ahead of intelligence itself. These are not speculative. They are the supply realities the industry is already living inside.

SK hynix and Micron have sold out HBM output for the year. High-power transformers carry three-to-five-year lead times. Electrical switchgear is booked into 2028. Grid-interconnection queues run three to seven years in most U.S. regions. A Sightline Climate analysis estimated that 30% to 50% of large data centers due to open in 2026 will slip or cancel outright. U.S. data centers drew about 4.4% of national electricity in 2023, a share the Department of Energy's Lawrence Berkeley National Laboratory expects to reach 6.7% to 12% by 2028. Against that backdrop, the four largest hyperscalers are on course to spend more than $650 billion on AI infrastructure this year.

These figures reframe the self-improvement argument. If progress becomes paced almost entirely by available compute, as Anthropic predicts in its most extreme scenario, then HBM allocation, advanced packaging capacity at TSMC, and substation lead times become the actual governors of how fast any model can iterate. A model that designs a better successor still needs silicon to train it on, and that silicon is rate-limited by foundry capacity and power delivery that no amount of software cleverness can conjure.

Whether compute ultimately caps a self-improving loop is genuinely unsettled. Forethought researcher Tom Davidson argues compute bottlenecks might not slow a software intelligence explosion until its late stages. Epoch AI counters that if compute and cognitive labor are complements rather than substitutes, software-only acceleration stalls the moment it hits a compute wall. The disagreement is fundamental, and it turns on questions of manufacturing throughput as much as algorithms.

Featured image

The pause that isn't a pause

Anthropic says it will halt development only if rival labs at or near the frontier do the same in a verifiable way, adding that a unilateral halt would not change who leads. That conditional renders the proposal effectively inert. No lab this far down the road is going to ease off, and the report doubles as marketing for how quickly Anthropic can make Claude build Claude.

The timing sharpens the read. The report arrived days after Anthropic confidentially filed for an IPO at a reported valuation near $965 billion. A front-runner lobbying for limits it would help define, while filing to raise capital on the strength of the very acceleration it warns about, invites the skepticism it received. The company's April self-assessment, in which Mythos Preview reportedly found thousands of severe vulnerabilities, later drew scrutiny over how much of the claim rested on a small manual sample.

The debate over loss of control will continue. The International AI Safety Report, chaired by Yoshua Bengio and published in January 2025 with input from more than 100 experts across 30 countries, defines the scenario as one in which AI systems operate outside anyone's control with no path to regaining it. Geoffrey Hinton has put the odds of AI causing human extinction within three decades at 10% to 20%. Those are serious arguments worth weighing on their own terms. But the practical near-term ceiling is not philosophical. It is fab capacity, transformer lead times, and grid queues, and those numbers are not improving on the timeline the warning implies.

Comments

Loading comments...