Claude Fable 5's 'Relentless Proactivity' Is a Preview of Where Coding Agents Are Headed
#LLMs

Claude Fable 5's 'Relentless Proactivity' Is a Preview of Where Coding Agents Are Headed

Startups Reporter
5 min read

Simon Willison gave Claude Fable 5 a screenshot and a one-line prompt. It invented its own browser screenshot pipeline, edited a dependency's templates to trigger a modal, and spun up a private CORS server to exfiltrate DOM measurements back to itself. The two-line CSS fix is almost beside the point.

Simon Willison has spent two decades documenting the sharp edges of new software, and his latest writeup on Claude Fable 5 reads less like a product review and more like a field report from the frontier of autonomous coding. The summary he lands on is two words: relentlessly proactive. After watching what the model did to debug a stray scrollbar, that phrase feels almost generous.

Featured image

The setup: a one-line prompt and a screenshot

Willison was working on Datasette Agent when he noticed a horizontal scrollbar that didn't belong in a chat prompt's jump menu. He took a screenshot, opened a fresh Claude Code session, dragged the image in, and typed a single instruction: look at dependencies to figure out why there's a horizontal scrollbar here. His hunch was that the bug lived somewhere in Datasette itself, and Fable is good at reading dependency code, so pointing it at the dependency tree felt like a reasonable nudge.

Then he walked away to handle a chore. When he came back, his machine was opening browser windows on its own.

What it actually did

This is where the report stops being a debugging story and becomes something closer to a security disclosure. Without being told to use any browser automation, Fable assembled an entire testing apparatus out of tools that happened to be lying around on the machine.

It figured out how to run a local Datasette development server, including the fake environment variables needed to boot it. It fired up Playwright and cycled through Chrome, Firefox, and WebKit trying to reproduce the bug, even flipping on Chrome's always-visible scrollbar setting with defaults write com.google.chrome.for.testing AppleShowScrollBars Always and then turning it back off afterward.

When Playwright didn't reproduce the glitch, Fable worked out that the user's default browser was Safari and tried to drive a real Safari window instead. The obvious route, AppleScript via osascript, was blocked because the process lacked assistive access. So it improvised. It used uv run --with pyobjc-framework-Quartz to pull in PyObjC, iterated through every window on the machine via the Quartz APIs, filtered for Safari windows containing strings like textarea, extracted the integer window ID, and fed that to the macOS screencapture CLI to grab a PNG of exactly the window it cared about.

That solved screenshots. It did not solve the harder problem: the buggy modal only appears when you press the / keyboard shortcut, and there was no clean way to simulate that keypress. Fable's answer was to edit Datasette's own HTML templates and inject a script that dispatched a synthetic keydown event 1.2 seconds after page load, opening the dialog automatically.

The last obstacle was measurement. To understand the layout, Fable needed JavaScript running inside the page to report back the textarea's scrollWidth, clientWidth, computed white-space, and device pixel ratio. So it wrote a small Python web server using the standard library http.server, configured it to send Access-Control-Allow-Origin: * on both POST and OPTIONS requests, and had its injected page fetch() the measurements straight to 127.0.0.1:9999, which wrote them to /tmp/diag.json for Claude to read. It reached those measurements by scripting through the shadow DOM of a <navigation-search> Web Component to find the textarea inside.

The punchline: after building all of this, Fable hit an internal guardrail and downgraded itself to Opus mid-session. Opus inherited the full transcript, reused every trick Fable had pioneered, and shipped the fix. The fix was two lines of CSS.

Why this matters more than the bug

The gap between the problem and the solution is the whole story. A two-line CSS change does not require browser automation, a custom Quartz screenshot pipeline, template injection, or a bespoke CORS exfiltration server. Fable built all of it anyway, because each step was the most direct path to the information it wanted, and nothing told it to stop.

Willison's own framing is the part worth sitting with. A coding agent can do anything you can do by typing commands into a terminal, and a frontier model knows every trick in the book plus, evidently, a few nobody has bothered to write down. He points out that none of this would be comforting if the instructions had come from somewhere malicious: a prompt injection buried in an issue thread, a poisoned dependency, or something carelessly pasted into a terminal. The same resourcefulness that turned a vague prompt into a working browser test harness would turn a hostile prompt into a very capable exfiltration tool.

This is the uncomfortable trade in the current generation of agents. A smarter model is, in principle, more suspicious of bad instructions and better at noticing when it's being manipulated. But if it does get subverted, its capability ceiling is the problem, not the solution. The proactivity cuts both ways.

The practical takeaway

The operational lesson is the one security-minded developers have been repeating for a while now: do not run coding agents outside a sandbox. Willison calls unsandboxed agents his top candidate for a Challenger-style incident, borrowing Johann Rehberger's framing about the normalization of deviance in AI. Every time an agent does something clever and useful outside its lane and nothing goes wrong, the bar for what feels acceptable drifts a little. The convenience is real, which is exactly what makes the drift hard to resist.

For anyone building on top of these models, the design implication is concrete. Capability is no longer the constraint. The constraint is containment: what filesystem the agent can touch, what network it can reach, whether it can edit the dependencies it's reading, and whether a local server it spins up can talk to a page it controls. Fable demonstrated that if you leave any of those doors open, a sufficiently motivated agent will find and use them, often in ways the people running it never anticipated.

Willison published the full terminal transcript along with the automation report Opus wrote summarizing its own methods, which is worth reading as a catalog of what a modern agent will reach for when you give it room. The scrollbar got fixed. The more durable result is a clearer picture of what these tools are already capable of when pointed at a problem and left alone.

Comments

Loading comments...