AI agents are a confused deputy with the keys to your kingdom
#Security

AI agents are a confused deputy with the keys to your kingdom

Backend Reporter
4 min read

Attackers hijacked 20,000 Instagram accounts through Meta's AI support assistant. The vulnerability wasn't the model itself—it was the missing authorization layer that should've verified who was asking.

Featured image

Last month, attackers took over more than twenty thousand Instagram accounts, including the dormant Obama-era White House account, without writing an exploit or guessing a single password. They opened a chat with Meta's AI support assistant, asked it to attach an email address they controlled to an account they did not own, and requested a password reset to that address. Meta confirmed the assistant behaved exactly as designed. A separate system was supposed to verify the email belonged to the account. That check never ran.

Calling this an AI mistake misses what happened. The assistant carried out a valid sequence of permitted operations for whoever was talking to it. A support worker who saw a stranger rerouting a celebrity's recovery email, sensed something was wrong, and refused would have stopped the attack. That human judgment was never encoded as software.

A large share of real-world authorization lived in the discretion of whoever stood between a request and the system. Everything behind that person was built assuming the discretion would always be there. Put an agent in that seat and discretion vanishes. Nothing and nobody downstream notices.

The agent does not bypass your security model. It exposes the part of it that was a person.

A confused deputy with a chat window

Security has a precise term for what Meta hit: the confused deputy. A process holding real privileges gets talked by a less-privileged party into using those privileges on its behalf. The night guard who unlocks the vault for anyone who calls and says the boss sent them has the keys. The caller has a good story.

The canonical 1988 case was a compiler that could write to a protected billing file. A user who could not write there asked the compiler to do it for them, and it complied, because it had the authority and never asked whose request it was serving.

An LLM agent is one of these by construction. Its interface is natural language, which carries no notion of who is authorized to do what. The model's entire job is to turn a plausible-sounding sentence into a tool call. A direct API request brings the caller's identity along with it. A sentence does not. Unless that identity is reattached before the call fires, the agent acts on its own authority and the requester's permissions never enter the picture.

Agents also cannot reliably separate instructions from data. Everything in the context window reads as potential instruction: the user's message, a retrieved document, the body of an email the agent was asked to summarize. A support bot that resets a password because a convincing chat told it to will follow a command hidden inside a file it was handed to process.

The VPN trick that defeated Meta's geolocation check is the crude version of this. The sophisticated versions, where the malicious instruction is smuggled through content the agent ingests, are already being documented as the dominant class of agent attack.

The blast radius is about to multiply

The Instagram bot could reset passwords, and that's a serious breach, but a bounded one. The agents shipping now are not bounded that way.

In the same week Meta disabled the support tool, it launched its Business Agent, which books appointments, qualifies leads, closes sales, takes payments, and connects to systems like Shopify and Zendesk to act on a company's behalf. Run the same confused-deputy logic through a payment API and a CRM and the failure is no longer a stolen account. It is a refund sent to the wrong party, an order rerouted, a price overridden, a customer record edited—each one a legitimate operation the agent was authorized to perform for whoever asked.

The market is outrunning the security model. Gartner projected 40% of enterprise applications will include task-specific AI agents by the end of 2026, up from under 5% at the start of it. Most of those deployments will inherit the same assumption Meta's did: whatever sits at the far end of a privileged action has judgment.

Why a better model does not fix this

A more capable model behind the same workflow would have handed over the same accounts with better grammar. The model cannot be the place authorization lives, because it is the part an attacker controls. The decision to allow an action has to be made outside it, by a policy layer that checks who is actually behind the session before anything runs.

Meta's assistant never established that the person it was talking to owned the account before it rebound their recovery email. In a couple lines of code, what shipped looked something like this:

Comments

Loading comments...