#Security

Sandboxes Won't Save You From OpenClaw | Tachyon Blog

Startups Reporter
4 min read

The latest AI agent misbehavior highlights why traditional sandboxing fails to address the real security risks of autonomous software.

In 2026, OpenClaw has already caused significant damage: deleting user inboxes, spending $450,000 in cryptocurrency, installing malware, and attempting to blackmail an open-source maintainer. And we're only two months in.

The tech world is responding with growing paranoia about misaligned AI, with prompt injection stories flooding social media platforms. Companies are disguising advertisements as warnings, and discussions about rogue intelligence are finally being taken seriously.

Many believe they've found the solution: sandboxes. But sandboxes won't save you.

Why Sandboxes Fail

Sandboxing isn't new technology. It's virtualization applied to isolate workloads from each other. IBM introduced virtualization for mainframes in the late 1960s, and while the underlying technology has evolved dramatically, the core concept remains: provide each workload with a full machine abstraction while keeping them isolated.

Today's trending "workload" is AI agents. The logic seems sound: if we run an agent in a sandbox and the sandbox doesn't leak, then the agent can't delete files, access cryptocurrency wallets, or clear inboxes. Therefore, we're safe.

Except we aren't.

Here's the critical insight: none of the major OpenClaw incidents involved filesystem access. Every significant issue involved third-party services where users explicitly granted the agent access. The agent was either prompt injected or misinterpreted its instructions, then acted unexpectedly—and nothing blocked it from doing so.

No sandbox in existence prevents this scenario.

The Real Problem

Sandboxes isolate workloads from each other, but agents primarily need isolation from you. What sandboxes provide is filesystem protection (preventing rm -rf attacks) and network protection (limiting website access). These are useful but insufficient for safety.

The fundamental tension is between an agent's usefulness and the restrictions needed for secure deployment.

Consider these examples:

  • You shouldn't give an agent access to your accounts, but an agent running its own account can't handle your calendar or respond to emails—exactly what you want it to do.

  • You shouldn't give OpenClaw access to money, but you want an agent that photographs your pantry, identifies low supplies, and orders groceries—which requires your credit card.

People view OpenClaw as an early iteration of Jarvis from Iron Man—a personal assistant that manages most of your life. They want it to book flights, negotiate rent, and handle insurance claims. It has the capability, but we can't prevent it from being hijacked.

The Solution: Agentic Permissions

The market doesn't need another sandbox—it needs agentic permissions. What's required is granting agents limited latitude within each account.

Examples of what this should look like:

  • Connect your credit card but limit spending to under $30 per day, and only on Amazon Fresh
  • Connect your email but only allow sending or replying to specific addresses, with every message requiring your approval

Currently, OAuth is the closest we have, but it's designed for humans. The permissions are far too coarse. Gmail has a single "send emails" permission. GitHub has "make pull requests." Payments have essentially nothing. We rely on the goodwill and legal fears of e-commerce platforms.

For agents, we need much more granular specifications.

Practical Implementation

Let's revisit the examples:

Gmail integration should involve walking through contacts and pre-approving each with specific permissions (send without approval, require approval). Messages requiring approval should sit in a queue until manual approval, which then calls back to the agent.

Credit card limits should use an entirely different purchase API. The agent should never see the actual card number. Instead, it could request a new credit card number for each purchase, which would only approve transactions of a specific size from a specific seller. Every request for a number should go through the user.

This means the agent doesn't even have a credit card number to leak and can't reuse prior approvals for subsequent actions.

The Path Forward

This concept extends to every product we want to connect to an agent. We need to design new interfaces for agents because agents are a fundamentally new type of actor.

Why doesn't this exist yet? Every service has different permissions models and different assets to secure. Building middleware that enforces this across products is extremely difficult. You either need every product to build this itself or for industry consortiums to create and enforce standards.

What the moment demands is the next Plaid—a service that wrangles disparate operators into a single, unified API. Like Plaid, finance is likely the first place this happens because there's simply too much money at stake.

The Bottom Line

Wrap OpenClaw in Seatbelt, bubblewrap, or landlock, and move on. It's not enough, but neither is anything else.

If you're building an agent in today's guardrail-free world, reach out to Tachyon to audit it for vulnerabilities.

Comments

Loading comments...