How a Shared Contact and a Friendly Email Both Hijacked the OpenClaw AI Agent
#Cybersecurity

How a Shared Contact and a Friendly Email Both Hijacked the OpenClaw AI Agent

Security Reporter
6 min read

Two research teams showed this week that OpenClaw, the popular self-hosted AI agent, can be talked into running attacker code or leaking secrets through inputs that look completely ordinary. One flaw is patched in version 2026.4.23; the other is an architecture problem no patch will close.

Two security teams went after the same target this week and arrived at the same uncomfortable conclusion from opposite directions. OpenClaw, the self-hosted AI agent that has spread fast since its launch late last year, can be driven to execute attacker-controlled code or surrender sensitive data through inputs that look entirely normal. Imperva did it by hiding instructions inside shared contacts and location pins. Varonis did it with a single plain email. Different doors, same room: the agent trusts whatever reaches it, and its access becomes the attacker's access.

Featured image

One of these problems has a patch. The other does not, and that distinction is the whole story.

The hidden command in a shared contact

Imperva researcher Yohann Sillam looked at the plumbing, specifically how OpenClaw hands messaging data to the model behind it. When the agent fetches content from the web, it wraps that content in an untrusted-content marker so the model knows to treat it with suspicion. Message objects get no such treatment. When the agent passes a shared contact, vCard, or location to the LLM, it flattens the object straight into the prompt text with no boundary marking it as untrusted.

That gap is the entire attack. A shared contact sends only the name field, serialized as <contact: name, number>. Angle brackets are legal characters in a name, so the model has no reliable way to tell where the real name ends and an injected instruction begins. The display layer makes it worse: the contact name gets truncated on screen, both in WhatsApp and in the receiving app, so the victim never sees the payload that the model reads in full. The same trick rides in through a vCard's full-name field, which WhatsApp supports natively, and through the label on a shared location pin.

In tests against a preview build of Gemini 3.1 Pro, the buried text instructed the agent to download and run a script from a server the researchers controlled. It complied. Notably, a plain image with instructions embedded in it failed, most likely because image-based injection has been reported so widely that models are now trained to resist it. The message-object route worked precisely because models have seen far fewer examples of it during training. That detail matters for anyone building defenses: attacks that are novel to a model's training data slip past, regardless of how conceptually simple they are.

With OpenClaw's memory feature on by default, Imperva warns, one widely shared piece of content carrying a hidden instruction could quietly compromise every agent that ingests it, assuming those agents are not sandboxed. The fix in OpenClaw 2026.4.23 moves contact names, vCard fields, and location labels out of the prompt body and into a separate untrusted-metadata channel. If you run OpenClaw, that update is the immediate action item. Imperva also found the same flattening pattern in other personal AI assistants, so the underlying mistake is not unique to this project.

A normal email was enough

Varonis Threat Labs came at the same target from the social side. In research led by Itay Yashar, the team built an agent called Pinchy on the platform, wired it to a Gmail inbox stocked with realistic but synthetic business clutter and mock secrets, and ran it through four phishing simulations across Google Gemini 3.1 Pro and OpenAI Codex GPT-5.4.

The team draws a sharp line between prompt injection, which hides instructions inside data, and what they call agent phishing: a believable request that arrives through a normal channel and works because the agent acts before it checks who actually sent it. The first pretext was a message posing as a team lead named Dan, sent from an outside Gmail address, asking for staging access during a fake production incident. Pinchy located the credentials and forwarded mock AWS IAM access keys, database connection strings, and SSH credentials in plaintext. The second was softer, a routine-sounding request for the weekly customer export for a QBR deck. The agent shipped out a synthetic dataset of 247 enterprise customers with contacts and contract values attached.

Both failures happened under a strict profile that explicitly told the agent to verify senders first. The rule was there. Urgency beat it the first time, routine beat it the second. The agent's drive to be helpful is the attack surface.

The contrast with technical threats is revealing. Faced with a gift-card phishing page, the agent interacted with it but withheld real credentials and eventually flagged it, and the strict profile blocked the page outright. Presented with a malicious OAuth consent screen dressed up as a timesheet app, it inspected the redirect target, judged it suspicious, and stopped before granting access. The agent turns out to be better than many humans at spotting bad URLs and fake login portals, and worse at the social judgment that makes a person hesitate when a colleague suddenly asks for credentials at an odd hour. Varonis notes that Codex GPT-5.4 was more cautious than Gemini 3.1 Pro about sending data to outside sites without confirmation, but both models fell for the social pretexts.

The shared weak spot

Both attacks map onto what Simon Willison calls the lethal trifecta: an agent that can read private data, take in untrusted content, and send data back out. OpenClaw has all three capabilities, which is why a poisoned contact and a friendly email end at the same destination.

That trust boundary is not only a prompt-handling issue, it shows up in the code too. A separate InfoSec Write-ups analysis converted OpenClaw's past advisories into static-analysis rules and used them to find five more flaws across the Slack, Discord, Matrix, Zalo, and Microsoft Teams channel extensions. All five were the same bug: startup code resolved each channel's allowlist by mutable display name rather than a stable ID, so an attacker who renamed themselves to match an allowed user could slide onto the list and steer the agent. OpenClaw has patched those.

The broader risk profile is hard to ignore. OpenClaw ships with broad access to files, shells, and more than twenty messaging platforms, and it has attracted a steady run of prompt-injection and data-exfiltration warnings since launch. The Dutch data protection authority, the Autoriteit Persoonsgegevens, took the firmest stance, telling users and organisations not to run OpenClaw on systems holding sensitive data, citing data-breach and account-takeover risk.

What actually helps

The message-object fix is a patch you apply by updating to 2026.4.23 or later. Everything else is architecture, not prompt wording, and Varonis lays out four controls worth adopting regardless of which agent you run.

Treat the agent's instruction file as an enforced, version-controlled policy rather than a polite suggestion. Put a gate on outbound mail so there are no first-time sends to unfamiliar addresses without approval, which keeps a hijacked agent from relaying phishing through a trusted account. Tie connector access to the trust level of whatever triggered the task, so an inbox that handles outside email cannot also read the entire CRM. And route the riskiest actions, forwarding credentials or moving money, through a human checkpoint.

Both teams land on the same mental model. Varonis frames the agent as a junior employee with system access and no instinct for what looks off, not as a security tool you can point at a problem. Imperva reaches the same place from the other direction, describing it as an authenticated executor that trusts its inputs. The patches and guardrails on offer today are real and worth applying, but they address specific instances of a problem that stays open. An agent useful enough to act on your email and run your commands is, by design, one that trusts input and wants to help. Nobody has a general fix for that yet, and pretending otherwise is how the next plain-looking email gets through.

Comments

Loading comments...