Uber’s production design gives AI agents short-lived, per-hop credentials that carry user context and agent provenance through internal tool calls.

Uber has described an internal identity architecture for multi-agent AI workflows that preserves user context, agent provenance and scoped access as agents delegate work and call tools. Auth0 has made the same case from the identity-provider side: production agents need delegated authority, short-lived credentials and human approval boundaries.
The pressure comes from agent behavior. A human user clicks through a session. A backend service follows code paths that engineers can inspect and audit. An AI agent can accept a goal, call tools, delegate work to another agent and act for a user across several steps. Security teams need to know which user started the task, which agent took action and which tool received the request.
Uber’s design extends its zero trust architecture for agentic systems. Engineers tied together an Agent Registry, AI Agent Mesh, Security Token Service, Model Context Protocol Gateway, downstream systems and an AI Gateway/AI Guard. The Agent Registry maps an agent to the workload that may host it. The Security Token Service checks that relationship and issues short-lived JSON Web Tokens for the next hop. The MCP Gateway controls access to internal tools and can redact sensitive data before a tool receives a request.

Uber avoids a common shortcut: passing one user credential or one service account through the whole workflow. Each agent combines local metadata, inbound context, destination audience and a SPIRE workload identity, then requests a new token from the Security Token Service. Uber based the exchange pattern on OAuth 2.0 Token Exchange, then added agent identity and provenance claims for internal audit and latency needs.
The Security Token Service issues single-hop tokens with an audience claim and a lifetime measured in minutes. That choice limits blast radius. A token that reaches the wrong tool should fail the audience check. A token that leaks should expire before an attacker can build much reach from it.
Uber calls the provenance record an actor chain. In the company’s example, an on-call engineer asks an Oncall Agent to investigate an issue. That agent delegates to an Investigation Agent, which calls an internal tool through the MCP Gateway. The token that reaches the gateway carries the originating user and the acting agents. Downstream services can make authorization decisions from the full chain, not from the last caller alone.

That chain gives security teams a more useful audit record. A log entry that says “Investigation Agent called service X” answers one question. A log entry that also includes the engineer, the Oncall Agent and the delegation path lets responders reconstruct the request path and check whether the task matched user intent.
Auth0’s model lines up with Uber’s implementation. Cameron Pavey argues that teams should give agents capability-scoped permissions, task-scoped credentials and enforcement at several layers. In practice, that means an identity provider issues narrow credentials, an agent runtime carries those credentials through execution and a tool layer checks policy before it runs an action.

Uber applies those controls with per-hop token exchange, audience scoping, registry-backed agent verification, gateway policy checks and redaction. The design keeps agent autonomy while giving security engineers points to inspect and stop a request. The gateway matters because agents can reach many tools through MCP-style integration. A central enforcement point can check identity, policy and data exposure before the tool runs.
Developers still need the identity path to fit normal agent code. Uber first considered an external proxy for agent-to-agent calls, according to the InfoQ report, but engineers found that application-layer support gave them better end-to-end context preservation. Uber built a standard A2A client that automates token exchange and actor-chain propagation, so agent teams do not have to rebuild the identity path in each service.
Latency determines whether that pattern works in production. Per-hop exchange adds a network call, and agent workflows can call many tools. Uber says thousands of internal agents use the system and that P99 latency for the Security Token Service token exchange API stays below 40 milliseconds. That number gives architects a concrete data point for sizing identity infrastructure around agent workloads.
Standards work has started to catch up. Uber says it tracks the IETF WIMSE working group and drafts on AI agent authentication and authorization. The broader field still lacks one settled model for agent identity across MCP, Agent2Agent and internal service meshes, but production systems have started to converge on several controls: workload identity, token exchange, explicit delegation chains and gateway enforcement.
The deployment lesson for platform teams is direct. Treating agents as plain users loses the agent provenance. Treating agents as plain services loses the user intent. A production access model needs both, plus tool-level authorization at the point where the agent takes action.
Teams building agent platforms should start with three questions. Which human or system authorized the task? Which agent and workload performed each step? Which gateway or service checked the requested tool action against policy? Uber’s design gives one production answer, and Auth0’s model gives identity teams a vocabulary for the same control plane.

Comments
Please log in or register to join the discussion