Anthropic’s recent blog post details the sandboxing stack used in Claude.ai, Claude Code, and Claude Cowork. The post explains the isolation layers—process sandboxes, VM boundaries, filesystem restrictions, and egress controls—intended to prevent credential leakage and other data exfiltration. While the architecture is solid on paper, real‑world risk remains in API‑level vectors and misconfiguration, and the open‑source Anthropic Sandbox Runtime (SRT) still needs broader community vetting.
What Anthropic claims
Anthropic’s engineering overview says that every Claude‑powered product runs inside a deliberately limited execution environment. The key points are:
- Process‑level sandboxes – gVisor for the hosted Claude.ai service, Seatbelt on macOS and Bubblewrap on Linux for the locally‑run Claude Code, and full virtual machines (Apple Virtualization Framework on macOS, HCS on Windows) for Claude Cowork.
- Filesystem boundaries – the sandbox only mounts a read‑only view of the model’s code and a temporary directory for user‑provided files. No persistent storage is exposed.
- Egress controls – outbound network traffic is filtered so that only whitelisted endpoints (e.g., the Anthropic inference API) can be reached. Anything else is dropped.
- Credential handling – secrets are never injected into the sandbox. If a user does not explicitly provide a credential, the model has no path to retrieve it.
The accompanying diagram shows a “hard boundary” that should, in theory, stop a model from inventing a way to leak data, regardless of how creative its reasoning becomes.
What’s actually new
The components themselves are not novel; gVisor, Bubblewrap, and hardware‑assisted virtualization have been used in cloud services for years. What Anthropic does differently is bundle them into a product‑specific stack and publish the design in a single, fairly detailed post. Two practical take‑aways emerge:
- Unified sandbox runtime (SRT) – Anthropic open‑sourced a thin wrapper that orchestrates the chosen isolation primitive, sets up the file and network policies, and launches the model process. The repository (github.com/anthropic/srt) includes CI scripts that verify the sandbox on macOS, Linux, and Windows.
- Cross‑product consistency – By reusing the same policy language across Claude.ai, Claude Code, and Claude Cowork, the team reduces the chance of a configuration drift that could open a backdoor in one product but not the others.
Limitations and open issues
1. API‑level exfiltration remains a blind spot
Anthropic previously disclosed a bug where a malicious request to api.anthropic.com/v1/files could cause a model to write data to a user‑controlled bucket. The sandbox prevented the model from opening a raw socket, but the API endpoint itself acted as an implicit exfiltration channel. This illustrates that sandboxing the runtime does not protect against higher‑level protocol misuse.
2. Credential leakage via user input
The claim that “credentials never enter the sandbox” holds only if the surrounding application enforces strict input validation. If a downstream UI mistakenly forwards an API key supplied by a user, the sandbox will treat it like any other string and could inadvertently expose it in logs or error messages. The security of the overall system therefore still depends on defense‑in‑depth at the application layer.
3. Performance trade‑offs
Running Claude Cowork inside a full VM adds measurable latency (≈ 150 ms overhead per inference round‑trip) compared with the container‑based Claude.ai service. For interactive coding assistants this latency can be noticeable, especially on lower‑end hardware. The open‑source SRT does not yet expose a lightweight “bubblewrap‑only” mode for Windows, forcing developers to choose between security and speed.
4. Community audit depth
The SRT codebase is modest (≈ 4 k LOC) and well‑documented, but it has not undergone a large‑scale third‑party audit. The security community tends to focus on kernel‑level exploits; a thorough review of the policy generation logic, which translates high‑level rules into concrete seccomp and firewall configurations, is still pending.
Practical implications for developers
- If you embed Claude Code in a CI pipeline, you can rely on Bubblewrap to isolate the model, but you should still scrub any environment variables that might contain secrets before invoking the sandbox.
- For SaaS products that expose Claude.ai via an API, consider adding a proxy layer that validates outgoing requests against an allow‑list, mitigating the risk of API‑level exfiltration.
- When evaluating Anthropic’s SRT for internal tooling, run the provided test suite (
make test-sandbox) on each target OS and verify that the generated seccomp profile blocksconnectsyscalls to non‑whitelisted IPs.
Bottom line
Anthropic’s sandboxing strategy is a solid engineering effort that consolidates known isolation techniques into a coherent product‑wide framework. The hard boundary concept is sound, but security does not end at the sandbox wall. API design, input sanitization, and performance considerations still require careful attention. The open‑source SRT gives practitioners a foothold to experiment, yet broader peer review will be needed before the community can fully trust the stack in high‑risk environments.
Related reads
- Anthropic’s Sandbox Runtime (SRT) repository
- gVisor documentation: https://gvisor.dev
- Bubblewrap man page: https://manpages.debian.org/bubblewrap
Comments
Please log in or register to join the discussion