Researchers demonstrate how Anthropic's Claude Cowork can be manipulated via hidden prompt injections to exfiltrate confidential user files to attacker-controlled accounts, highlighting unresolved isolation flaws in AI agent environments.
The recent release of Anthropic's Claude Cowork—a general-purpose AI agent designed to assist with daily tasks—has reignited discussions about prompt injection vulnerabilities in agentic AI systems. Security researchers have demonstrated how attackers can exploit unresolved isolation flaws in Claude's execution environment to exfiltrate confidential user files through indirect prompt injection attacks.

This vulnerability isn't new. Johann Rehberger first identified it in Claude.ai chat before Cowork's existence, and Anthropic acknowledged but never remediated the issue. The company's warning that Cowork "is a research preview with unique risks due to its agentic nature and internet access" places responsibility on users to watch for "suspicious actions." Yet as technologist Simon Willison notes, expecting non-technical users to identify prompt injection signals is unrealistic.
The attack chain begins when a user connects Cowork to local files:

File Attachment: A victim attaches confidential documents (e.g., real estate appraisals containing PII) to Claude Cowork
Malicious Upload: The victim unknowingly uploads a weaponized document—typically disguised as a Claude Skill—containing hidden prompt injections. Attackers conceal injections using techniques like 1-point white-on-white text:

- Execution Trigger: When the victim asks Cowork to process files using the compromised "Skill," Claude reads the document:

- File Exfiltration: Hidden instructions force Claude to execute a cURL command using Anthropic's file upload API with the attacker's API key. Crucially, Claude's VM allows API traffic while blocking most external domains:
{{IMAGE:2}}
The attacker then accesses stolen files directly through their Anthropic account. During testing, both Claude Haiku and the more robust Opus 4.5 proved vulnerable to variations of this attack.
Additional findings reveal denial-of-service risks: Malformed files (e.g., PDFs containing plain text) cause persistent API errors that could be weaponized via prompt injection to disrupt workflows.
This vulnerability underscores broader concerns about agentic AI systems accessing sensitive environments. Cowork's integration capabilities—including browser control and AppleScript execution—expand the attack surface significantly. While Anthropic advises avoiding sensitive files with Cowork, the tool simultaneously encourages organizing desktops and processing documents, creating contradictory expectations.
The persistence of this known vulnerability highlights challenges in securing AI agent environments. As Anthropic positions Cowork for mainstream adoption, the tension between functionality and security becomes increasingly pronounced. Researchers continue debating whether responsibility should lie with users to detect injections or with developers to implement sandboxing that prevents data exfiltration at the architectural level.

Comments
Please log in or register to join the discussion