Anthropic Ships Claude Fable 5 as Two Models, Splitting Public Access From Cyber Capability
#Cybersecurity

Anthropic Ships Claude Fable 5 as Two Models, Splitting Public Access From Cyber Capability

Security Reporter
8 min read

Anthropic released its most capable model on June 9, then did something unusual: it shipped one model as two products. Fable 5 goes to everyone with a wall of cyber classifiers in front of it. Its uncensored twin, Mythos 5, stays locked to vetted defenders. The split reveals what these models can now do to software, and how little time defenders have to react.

On June 9, Anthropic made Claude Fable 5 generally available, calling it the most capable model it has ever built. The release itself was expected. What was not is the structure: Anthropic shipped one underlying model as two separate products, divided not by raw capability but by a layer of safety classifiers.

Fable 5 is the public version. Its twin, Claude Mythos 5, runs the same model with the cyber safeguards lifted and stays restricted to a vetted group of cyber defenders and critical infrastructure operators. Anthropic describes Mythos 5 as the strongest cybersecurity model in the world, and the company's willingness to gate it tells you how seriously it takes the capability behind both.

Featured image

The practical difference comes down to routing. When Fable 5 receives a request flagged for cyber, biology, chemistry, or distillation, it hands that response off to the weaker Claude Opus 4.8. Mythos 5 keeps the cyber capabilities live for users who have been cleared. Both models cost $10 per million input tokens and $50 per million output tokens, less than half the price of the earlier Mythos Preview. Fable 5 is on the Claude API now, and it is included on Pro, Max, Team, and seat-based Enterprise plans at no extra cost through June 22 before it shifts to usage credits.

How the cyber classifiers actually work

The reason for the split is blunt. Mythos-class models find and exploit software vulnerabilities well enough that, in Anthropic's view, releasing that capability to the general public without controls would hand attackers a serious advantage.

The control mechanism is a set of classifiers, which are separate AI systems trained to watch for misuse and jailbreak attempts. When a request trips one, Fable 5 does not simply refuse. Instead the response goes to Opus 4.8, and the user is told the handoff happened. That transparency matters, because a silent downgrade would leave developers debugging phantom capability gaps.

Four categories get flagged. Distillation is the outlier among them: it refers to extracting a model's capabilities to train a competing model, which Anthropic blocks to keep near-frontier abilities from leaking out without their safeguards attached. The cybersecurity classifier is the broad one. Anthropic built it to block not just exploit development but offensive cyber work in general, including reconnaissance, discovery, lateral movement, and the agentic steps that string together into a real intrusion.

In an internal evaluation where Fable 5 was set to block rather than fall back, and where testers did not attempt to evade the safeguards, the classifiers stopped the model from making any progress on those tasks. One external partner reported that Fable 5 complied with zero harmful single-turn requests covering cyberattack planning, exploit development, or defense evasion, and that it held up against 30 different public jailbreak techniques.

New ChatGPT Lockdown Mode Limits Tools That Could Enable Data Exfiltration

The cost of tuning safeguards this aggressively is false positives. Anthropic set the thresholds conservatively to ship on schedule, so the system sometimes catches harmless requests. The company says fallback fires in under 5% of all sessions, meaning more than 95% of the time Fable 5 behaves like the unrestricted Mythos 5. Read that number carefully, though: it counts every fallback, genuine blocks included, so it represents the ceiling on total disruption rather than a clean false-positive rate. Anthropic says it will tighten the safeguards and reduce false positives after launch.

On robustness the figures are concrete. An external bug bounty ran for more than 1,000 hours and produced no universal jailbreak, meaning no single prompt or harness that strips the safeguards wholesale. External red teams found none on long-form agentic tasks either. Anthropic does state one caveat plainly: the UK's AI Security Institute made progress toward a universal jailbreak within a brief initial testing window. The company concedes it is probably impossible to fully prevent universal jailbreaks, and frames its goal as making any that survive slow and expensive enough to detect before they get used at scale.

Why the capability counts as a threat

The argument for handling this model carefully was first laid out in April, when Anthropic released Claude Mythos Preview to a limited group through Project Glasswing. The technical write-up from Anthropic's red team is the document worth reading in full.

During testing, Mythos Preview identified and exploited zero-day vulnerabilities in every major operating system and every major web browser when a user directed it to do so. The oldest bug it surfaced was a 27-year-old flaw in OpenBSD, a system whose reputation rests largely on its security record. It also autonomously wrote a remote code execution exploit against FreeBSD's NFS server, working from a 17-year-old bug now tracked as CVE-2026-4747.

Here the framing diverges in a way defenders should notice. Anthropic describes the result as full root for an unauthenticated attacker from anywhere on the internet. The NVD entry is more measured, noting the stack overflow itself does not require the client to authenticate, but framing kernel code execution as reachable by an attacker who can send packets to the NFS server while the kgssapi.ko module is loaded. The gap between those two descriptions is the difference between a marketing line and a precondition you can actually check on your own systems.

By Anthropic's own account, it did not explicitly train these offensive capabilities into the model. They emerged as a side effect of general improvements in code, reasoning, and autonomy, the same gains that make the model better at writing patches. That dual nature is the whole problem in miniature.

The red team's central warning is precise, and easy to overstate if you are not careful. Mitigations whose security value comes from friction rather than hard barriers get much weaker against a model that grinds through tedious exploitation steps at scale. Hard technical barriers such as KASLR and W^X still raise an attacker's cost. The warning is aimed specifically at defenses that lean on attacker patience or manual effort, because the model can now supply both on demand. Mythos 5 carries these skills forward, and Anthropic says users will find it comparable to or somewhat stronger than Mythos Preview.

Unpatched Windows Search URI Vulnerability Lets Attackers Steal NTLMv2 Hashes

The defender's real bottleneck

The defensive upside is not theoretical. In the first weeks of Project Glasswing, Anthropic and roughly 50 partners used Mythos Preview to find more than ten thousand high- or critical-severity vulnerabilities in systemically important software. Cloudflare alone reported 2,000 bugs, 400 of them high- or critical-severity. Mozilla found and fixed 271 in Firefox 150, more than ten times what it caught in Firefox 148 using the older Opus 4.6.

That flood is also the catch. Finding bugs is now cheap and fast. Verifying them, triaging them, and writing patches is neither, and that work still runs on human time. Anthropic reports that open-source maintainers, many already buried under low-quality AI-generated bug reports, have asked the company to slow its disclosures because they cannot write fixes fast enough. Within Glasswing, a high- or critical-severity bug found by the model takes about two weeks to patch on average. The bottleneck has shifted from discovery to remediation, and the window between a public disclosure and a deployed patch is exactly where attackers operate.

The red team's N-day experiments sharpen the timeline. Starting from nothing but a disclosed CVE and its patch, Mythos Preview built working Linux privilege-escalation exploits in under a day each, at a few thousand dollars or less in compute. For defenders the lesson is the old one on a much shorter clock: assume a high-severity CVE can become a working exploit within hours of disclosure rather than weeks.

In practice that means a few concrete shifts. Prioritize auto-update paths for internet-facing systems, since those are the assets where the disclosure-to-exploit window matters most. Treat dependency bumps that carry CVE fixes as time-sensitive work, not backlog you get to eventually. And keep MFA and comprehensive logging as the baseline, so that a single missed patch is not the only barrier between an attacker and the rest of the network. Anthropic has also opened a Cyber Verification Program that lets vetted security professionals run legitimate offensive work on its models without the cyber safeguards in the way.

A new 30-day retention requirement

Anthropic is also changing how it handles data for Mythos-class models, and this part deserves attention from anyone with strict data-handling obligations. The company will require 30-day retention for all traffic on Fable 5, Mythos 5, and future models at this capability level, across both first- and third-party surfaces.

Anthropic says it will not use the data for training or any non-safety purpose, will log all human access, and will delete the data after 30 days unless a safety investigation or legal obligation requires holding it longer. The stated rationale is defensive: retained traffic helps detect novel attacks and jailbreaks that play out across many requests rather than in a single prompt. Teams routing sensitive traffic through these models will want to account for that window before they commit, because the retention applies regardless of how the request reaches the model.

Looking past launch, Anthropic plans to widen Mythos 5 access through a trusted-access program, and says that once compute capacity catches up it intends to fold Fable 5 back into subscription plans without the usage-credit premium that begins after June 22.

The broader question this launch raises is the one Anthropic has been circling since April. Similarly capable models from other labs are on the way, and not all of them will ship with a wall of classifiers in front of the cyber capability. The defensive head start that Glasswing was built to create only pays off if the rest of the industry actually uses it, and the maintainer backlog suggests the harder constraint may not be the models at all, but the human capacity to act on what they find.

Comments

Loading comments...