Project Glasswing: Mythos Preview's Impact on Security Vulnerability Research

Cloudflare's Project Glasswing reveals how Mythos Preview transforms vulnerability discovery through exploit chain construction and proof generation, while highlighting the need for specialized harnesses over generic coding agents.

For the last few months, Cloudflare has been testing security-focused LLMs on their own infrastructure to identify potential vulnerabilities and understand what attackers might accomplish with the latest models. Among these, Mythos Preview from Anthropic has drawn significant attention as part of Project Glasswing. By applying Mythos Preview to over fifty repositories, Cloudflare gained valuable insights into both its capabilities and limitations in vulnerability discovery.

Mythos Preview: A New Approach to Vulnerability Hunting

Mythos Preview represents a substantial advancement beyond previous general-purpose frontier models. Rather than viewing it as merely an incremental improvement, Cloudflare researchers describe it as "a different kind of tool doing a different kind of work," making direct comparisons with earlier models challenging.

Two key capabilities distinguish Mythos Preview:

Exploit Chain Construction

Real-world attacks rarely rely on a single vulnerability. Instead, they combine multiple attack primitives into working exploits. For example, an attacker might:

Convert a use-after-free bug into arbitrary read/write capabilities
Hijack control flow
Implement return-oriented programming (ROP) chains to gain system control

Mythos Preview can take these primitives and reason about combining them into functional exploits. The reasoning process resembles that of a senior security researcher rather than an automated scanner.

Proof Generation

Finding a vulnerability and confirming its exploitability represent distinct challenges. Mythos Preview addresses both by:

Writing code to trigger suspected bugs
Compiling this code in a controlled environment
Executing the code to validate the hypothesis
Adjusting and retrying when the initial attempt fails

This iterative process transforms speculative findings into confirmed vulnerabilities, closing a critical gap in automated security analysis.

While other frontier models identified similar underlying vulnerabilities, they consistently fell short in chaining these vulnerabilities into complete exploits. Models would describe bugs thoroughly but leave the actual exploit construction unfinished.

Model Refusals and Inconsistent Behavior

Interestingly, Mythos Preview—despite lacking additional safeguards present in generally available models—exhibits emergent guardrails that cause it to refuse certain legitimate security research requests.

However, these refusals lack consistency:

The model initially refused to analyze a specific project but agreed after unrelated environmental changes
After confirming memory vulnerabilities, the model refused to write demonstration exploits
Identical requests sometimes produced different results across runs due to the probabilistic nature of the model

This inconsistency means that while organic refusals exist, they cannot serve as complete safety boundaries for general use. Any future broadly available cyber frontier model will require additional safeguards beyond these emergent behaviors.

The Signal-to-Noise Challenge

Vulnerability triage remains difficult even in the pre-AI era. AI-powered scanners have exacerbated this problem through two primary factors:

Programming Language Impact

Memory-unsafe languages like C and C++ introduce vulnerability classes—buffer overflows, out-of-bounds accesses—that memory-safe languages eliminate at compile time. Cloudflare observed consistently more false positives in projects using these languages.

Model Bias

Unlike human researchers who communicate confidence levels in their findings, models tend to hedge with qualifiers like "possibly," "potentially," and "could in theory." These speculative findings vastly outnumber confirmed vulnerabilities, creating significant triage overhead.

Mythos Preview improves this situation by chaining primitives into working proofs of concept rather than reporting vulnerabilities in isolation. Findings accompanied by executable proofs require less validation time and reduce the "is this even real?" questioning that consumes security teams' attention.

Why Generic Coding Agents Fail for Vulnerability Research

Cloudflare's initial approach—pointing generic coding agents at repositories to discover vulnerabilities—proved inadequate despite producing findings. Two fundamental limitations explain this:

Context Mismatch

Coding agents are optimized for focused, sequential work: building features, fixing bugs, or refactoring code. They maintain a single hypothesis and iterate against it. This approach misaligns with vulnerability research, which requires narrow, parallel investigation across many potential issues.

Human researchers investigate specific features, security boundaries, or vulnerability classes thoroughly before moving to the next. A single agent session against a large repository might cover only a fraction of a percent of the surface before context limitations force compaction that could discard critical earlier findings.

Throughput Limitations

Real codebases require simultaneous hypothesis testing across multiple components, with the ability to expand investigation when promising leads emerge. While single-agent approaches work for manual investigation with pre-existing leads, they cannot achieve comprehensive coverage.

The Necessity of Specialized Harnesses

Through extensive testing, Cloudflare identified four key insights that led to the development of a specialized harness:

Narrow scope produces better findings: Specific guidance ("Look for command injection in this function with this trust boundary") yields results closer to human researcher behavior than broad directives.
Adversarial review reduces noise: A second agent with different prompts and no ability to generate its own findings catches many issues the first agent would miss when self-reviewing.
Splitting the chain across agents improves reasoning: Asking "Is this code buggy?" and "Can an attacker reach this bug?" as separate questions produces better results than combining them.
Parallel narrow tasks beat exhaustive agents: Coverage improves when multiple agents work on tightly scoped questions with deduplication afterward, rather than expecting one agent to be exhaustive.

Cloudflare's Vulnerability Discovery Harness

Cloudflare implemented a multi-stage harness that leverages Mythos Preview's strengths while addressing its limitations:

Recon Stage

An agent reads the repository top-down, deploying subagents for each subsystem to produce an architecture document covering build commands, trust boundaries, entry points, and likely attack surface. This shared context prevents downstream agents from wandering aimlessly.

Hunt Stage

Each task combines an attack class with a scope hint. Multiple hunters run concurrently (typically around fifty), each exploring with subagents and tools to compile and run proof-of-concept code in per-task scratch directories.

Validate Stage

An independent agent re-examines the code attempting to disprove original findings. Using different prompts and lacking the ability to emit new findings, this stage catches significant noise the hunter would miss when self-reviewing.

Gapfill Stage

Hunters flag areas they touched but didn't cover thoroughly, which get re-queued for additional passes. This counteracts the model's tendency to drift toward attack classes where it has already found success.

Dedupe Stage

Findings sharing root causes collapse into single records. Variant analysis becomes a feature rather than a way to inflate the queue with duplicates.

Trace Stage

For confirmed findings in shared libraries, tracer agents fan out across consumer repositories using cross-repo symbol indices to determine whether attacker-controlled input can actually reach the vulnerability from outside the system.

Feedback Stage

Reachable traces become new hunt tasks in consumer repositories where the bug is exposed, closing the loop and improving the pipeline over time.

Report Stage

An agent generates structured reports against predefined schemas, fixing validation errors itself, and submits to an ingest API. This produces queryable data rather than free-form prose.

Implications for Security Teams

Many security teams are responding to AI-enhanced threats by attempting to compress response cycles, with some operating under two-hour SLAs from CVE release to patch in production.

However, simply patching faster addresses symptoms rather than causes. If regression testing takes a day, achieving a two-hour SLA requires skipping it—a practice that often introduces worse bugs than those being patched. Cloudflare learned this when models generated patches that fixed the original vulnerability while breaking dependent functionality.

The more fundamental solution involves architectural changes that make exploitation harder regardless of vulnerabilities:

Implementing defenses that block bugs from being reached
Designing applications so flaws in one component don't provide access to others
Enabling simultaneous deployment of fixes across all instances of code

Cloudflare recognizes these principles align with their product architecture, which protects millions of applications on the Internet. They plan to share more about customer implications in the coming weeks.

For teams conducting similar research, Cloudflare welcomes comparison notes at [email protected]. All vulnerabilities discovered during this research were triaged, validated, and remediated through Cloudflare's formal vulnerability management process.

This research represents a significant step forward in understanding how specialized AI models can transform security vulnerability discovery while highlighting the critical importance of proper implementation through specialized harnesses rather than direct model interaction.

#Vulnerabilities #AI #LLM #Cloudflare #Security Research