Block red-teamed its own AI agent to run an infostealer

Block's CISO James Nettesheim explains why the company subjected its internal AI agent Goose to rigorous security testing, including a successful prompt injection attack that installed malware, and why AI systems must be provably safer than humans.

Block Chief Information Security Officer James Nettesheim draws a sharp comparison between AI agents and self-driving cars when discussing security requirements. "It's not enough for self-driving cars to be just as good as humans," Nettesheim said in an exclusive interview. "They have to be safer and better than humans - and provably so. We need that with our agentic use, too."

The statement reflects Block's aggressive approach to securing its AI infrastructure. The parent company of Square, Cash App, and Afterpay has positioned itself as an AI leader, co-designing the Model Context Protocol (MCP) with Anthropic and deploying Goose, its open-source AI agent, across nearly all 12,000 employees. Goose connects to Block's entire systems ecosystem, from Google accounts to Square payment infrastructure.

The Red Team Exercise That Proved the Point

To validate Goose's security posture, Block's security team conducted internal red team exercises that exposed critical vulnerabilities. In one documented case, they successfully used a prompt injection attack to compromise an employee's laptop with information-stealing malware.

Prompt injection represents a fundamental challenge in AI security. Attackers manipulate prompts to include malicious instructions that the AI executes, either through direct text input or indirect commands hidden within content. Block's exercise demonstrated how this could be weaponized through Goose's recipe system.

Goose uses "recipes" - reusable workflows that employees can share across the organization. The red team identified that this portability created an attack vector. They combined phishing with prompt injection, sending emails to the Goose development team about a supposed system "bug." The malicious payload was hidden using invisible Unicode characters within a recipe.

When a developer clicked the poisoned recipe during debugging, it downloaded and executed the infostealer. The attack succeeded because both the user and the agent were tricked by instructions that appeared legitimate but contained hidden malicious commands.

Applying Human Security Principles to AI Agents

Nettesheim emphasizes that AI agents should face the same security controls as human employees. "Software engineers also download and execute things they shouldn't," he noted. "Users do that regularly. We write bugs in our code to where it doesn't execute."

This philosophy translates into applying least-privilege access principles to both humans and machines. Block employees receive access only to data necessary for their specific roles, and the company extends this same restriction to AI agents. Customer data retention follows identical rules - information is kept only as long as needed for a specific purpose.

When an agent queries data on a user's behalf, such as retrieving store or account information, the system must ensure it accesses and returns only information specific to that user. This containment prevents lateral movement and data exfiltration.

Defensive Measures and Recipe Safeguards

Following the red team exercise, Block implemented multiple defensive layers into Goose:

Recipe Install Warnings: Before executing new recipes, users receive alerts advising them to proceed only if they trust the source. This adds transparency to workflow instructions.

Unicode Detection: Goose now scans for and removes invisible Unicode characters inserted into strings that could hide malicious commands. Desktop alerts warn users when recipes contain suspicious Unicode.

Input Validation and Output Monitoring: The system validates incoming data and monitors outputs for anomalous behavior patterns.

The Adversarial AI Approach

Block is experimenting with using AI to defend against AI attacks. The concept involves deploying one LLM or agent to check another agent's content and flag potentially malicious prompts.

"Let's use another LLM or another agent to check the content of the prompt and tell us if it thinks it's not good or bad, and then warn the user that it's bad," Nettesheim explained.

This adversarial approach is still in internal testing. Block faces implementation challenges around speed - sending agent inputs and outputs through additional security checks adds latency - and accuracy, to avoid overwhelming analysts with false positives.

Nettesheim believes adversarial AI will become a standard security tool. "It's gaining traction in the research community, and I think we are helping lead the way," he said. "If you have two agents that are competing or collaborating with each other, it leads to better code generation."

Balancing Innovation and Risk

The CISO role requires constant risk balancing, particularly in the rapidly evolving AI landscape. "What is a bigger risk right now? Not taking advantage of the technology enough? Or the security downsides of it?" Nettesheim asked.

This tension defines Block's approach: adopt AI aggressively while subjecting it to the same rigorous security standards applied to traditional systems. The company's open-source strategy with Goose allows broader security scrutiny while contributing to collective defense knowledge.

Block's experience demonstrates that securing AI agents requires more than prompt filtering or sandboxing. It demands comprehensive security engineering, adversarial testing, and acceptance that current defenses may be insufficient against determined attackers.

The red team exercise that successfully deployed an infostealer wasn't a failure - it was validation that Block's security testing can identify real vulnerabilities before attackers do. In the race between AI-powered offense and defense, Block is betting that rigorous self-testing provides the edge needed to stay ahead.

For organizations deploying AI agents, Nettesheim's advice is clear: assume prompt injection is possible, apply least-privilege principles, and test aggressively. The alternative - treating AI as inherently secure - creates blind spots that attackers will exploit.

As AI agents become more autonomous and connected to critical systems, Block's approach of red teaming their own infrastructure offers a blueprint for responsible deployment. The goal isn't perfect security, but provably better security than the alternatives - whether that's human operators or untested AI systems.

The broader lesson: AI security requires the same discipline as traditional security, plus new tools like adversarial AI, while accepting that some attacks will succeed and planning accordingly. In Nettesheim's view, that's the only way to make AI agents truly safer than humans.