Agentjacking: How a Fake Sentry Error Can Hijack Your AI Coding Agent
#Vulnerabilities

Agentjacking: How a Fake Sentry Error Can Hijack Your AI Coding Agent

Security Reporter
6 min read

Researchers at Tenet Security found a way to turn AI coding assistants like Claude Code and Cursor into code-execution vectors. The trick needs nothing but a public Sentry credential and a carefully formatted error report. Sentry says it won't fix the root cause.

Featured image

A fake error report can now talk your AI coding agent into running attacker-controlled commands on your own machine. That is the uncomfortable takeaway from research published by Tenet Security, which calls the technique Agentjacking and warns that it sidesteps nearly every defensive control most teams rely on.

The setup is deceptively simple. An attacker plants a malicious payload inside a Sentry error event. A developer later asks their AI agent to "fix unresolved Sentry issues." The agent fetches that event through the Model Context Protocol (MCP), reads the attacker's instructions as if they were trusted diagnostic guidance, and executes them with the developer's full privileges. No phishing email, no compromised server, no malware on disk.

What actually happens

Sentry is an open-source error-tracking and performance-monitoring platform used across a huge swath of web and mobile applications. To report errors, applications authenticate with a Data Source Name, or DSN. The DSN is a public, write-only credential, and it is routinely embedded directly in client-side websites and apps. Anyone who views source on a page using Sentry can often find it.

That write-only credential is the entire foothold. Here is the chain Tenet researchers Ron Bobrov, Barak Sternberg, and Nevo Poran laid out:

  1. The attacker scrapes a target's Sentry DSN from a public website or app bundle.
  2. Using that DSN, they send a crafted error event to Sentry's ingest endpoint with a normal POST request. Sentry accepts it, because accepting arbitrary error payloads from anyone holding the DSN is exactly what the endpoint is built to do.
  3. The malicious event carries carefully formatted markdown in its message field and context key names, structured to mimic Sentry's own system template.
  4. When the Sentry MCP server later returns that event to an AI agent, the injected content renders as structured output visually identical to legitimate Sentry guidance.
  5. A developer asks their agent to triage Sentry issues. The agent queries Sentry over MCP, receives the poisoned event, reads the embedded "Resolution" as trusted instructions, and runs the attacker's code.

"The attacker never touches the victim's infrastructure," the researchers wrote. "The malicious instruction arrives disguised as a legitimate 'Resolution' inside an ordinary error."

The stakes are what you would expect from code running with a developer's own permissions: environment variables, Git credentials, private repository URLs, and developer identities are all reachable. A laptop with cloud access tokens cached in environment variables is a particularly rich target.

The trust problem at the heart of MCP

The deeper issue is not a bug in any one product. It is architectural. MCP gives an AI agent a standardized way to pull in data from external services, and the agent treats what comes back as trustworthy system output. The flaw lives in the seam between two reasonable design choices: Sentry's ingestion accepts arbitrary payloads from anyone with the DSN, while the Sentry MCP server hands that same data to AI agents as trusted context.

An agent has no reliable way to tell a real application crash apart from a fabricated event an attacker injected. Both arrive through the same channel, formatted the same way. This is prompt injection wearing the costume of legitimate telemetry, and it is a pattern security teams will see again anywhere user-controllable data flows into an agent's context window through a connector that presents it as authoritative.

That distinction matters for defenders. Treating MCP connectors as trusted data sources is the assumption being exploited. Any external service that lets untrusted parties write data which later reaches an agent deserves the same scrutiny.

How widespread is the exposure

Tenet's numbers give the research weight. The company says it identified at least 2,388 organizations with valid, injectable DSNs exposed. It tested the attack in a controlled manner against more than 100 organizations and reported an 85 percent exploitation success rate across some of the most widely used AI coding assistants, including Claude Code and Cursor.

An 85 percent hit rate against production-grade agents is the part worth sitting with. This is not a fragile lab demonstration that breaks the moment a model updates. It worked reliably across multiple assistants.

Sentry's response: "technically not defensible"

Sentry acknowledged the issue but declined to fix the root cause, describing it as "technically not defensible." That position is defensible in its own way: the ingestion endpoint is doing precisely what it was designed to do, and the trust decision happens downstream in the agent. As a mitigation, the company activated a global content filter that blocks a specific payload string.

A single-string filter is a speed bump, not a wall. It stops the exact proof-of-concept payload and little beyond it. Markdown is flexible, and an attacker with motivation will find phrasings the filter does not catch.

Why your existing defenses miss it

The most sobering claim in the research is about detection. "The attack bypasses EDR, WAF, IAM, VPN, Cloudflare, and firewalls," Tenet said, "because there is nothing malicious to detect. Every action in the chain is authorized."

Think through each control. The DSN use is authorized, that is what DSNs are for. The POST to the ingest endpoint is a normal API call. The MCP query is the agent doing its job. The command execution happens under the developer's legitimate credentials on their own machine. There is no exploit payload crossing a network boundary, no signature to match, no anomalous login. Every link in the chain looks like sanctioned activity, which is exactly why the standard stack stays quiet.

Practical takeaways

If your developers run AI coding agents wired to MCP connectors, treat this as an active design risk rather than a future hypothetical. A few concrete moves:

  • Scope agent permissions. An agent that can read Sentry issues should not automatically be able to execute shell commands without a human confirming each one. Approval gates on tool execution blunt the entire attack, because the malicious instruction still has to clear a person.
  • Audit which MCP servers your agents trust. Every connector that surfaces externally writable data is a potential injection path. Inventory them the way you would inventory any other input source.
  • Rotate and restrict DSNs where you can. While DSNs are designed to be public, understanding which of yours are exposed and where helps you reason about who can write into your error stream.
  • Treat connector output as untrusted input. This is the mindset shift. Data arriving through an MCP server is not inherently safe just because the agent presents it as system context.
  • Run agents with least privilege. Cached cloud tokens and broad Git credentials in a developer's environment turn a code-execution bug into a serious breach. Tighten what an agent process can reach.

Agentjacking lands at a moment when enterprises are rushing AI coding agents into daily developer workflows. The research makes a pointed argument: the agent itself is now the attack surface, turned against the developers who trust it, using nothing more than data those organizations already publish about themselves. The convenience of "just ask the agent to fix it" carries an implicit trust boundary, and this is the first widely documented case of someone walking right through it. Teams adopting these tools should assume more attacks in this shape are coming, and design their guardrails accordingly.

Comments

Loading comments...