At the Code With Claude conference, Anthropic announced Managed Agents, auto‑mode classifiers, scheduled routines, and a new capability‑curve metric. The rollout includes sandboxed execution, credential scoping, and a tiered advisor‑executor model that promises lower token costs while preserving high‑quality output. Partners such as GitHub, Vercel, Datadog, and Bun demonstrated real‑world deployments, highlighting the impact on infrastructure economics and engineering processes.

Anthropic Unveils Managed Agents, Proactive Workflows, and the Capability Curve at Code With Claude 2026

Technical announcement

On May 6 2026 Anthropic hosted Code With Claude in San Francisco, streaming the sessions on YouTube. The event focused on the operational impact of the latest Claude model generation. Key product releases included:

Managed Agents – a hosted service that abstracts sandboxed code execution, checkpointing, and credential scoping. Agents run in isolated containers with per‑session resource quotas.
Auto mode – a classifier that intercepts every model‑generated action, filtering destructive commands and prompt‑injection attempts before they reach the execution layer.
Routines – cron‑style triggers, GitHub webhook bindings, and HTTP endpoint callbacks that let developers schedule recurring Claude tasks.
Capability curve – a benchmark suite that tracks model performance on SWE‑bench, code‑generation accuracy, and token‑efficiency across releases. The curve is now part of Anthropic’s public roadmap.

The announcements were accompanied by live demos from partners that illustrated how the new primitives integrate with existing CI/CD pipelines.

Specifications and benchmarks

Feature	Description	Current limits	Benchmark result
Managed Agents	Hosted container per agent, sandboxed filesystem, network egress control	2 GB RAM, 2 vCPU, 10 GB storage per agent	99.8 % isolation success in internal stress tests (10 k concurrent agents)
Auto‑mode classifier	Binary safety model (Claude‑Safe‑v2) that evaluates each generated command	1 ms latency per check, 99.97 % detection of known injection patterns	Reduced destructive‑action incidents by 92 % in beta customers
Routines scheduler	Cron syntax, GitHub webhook payload mapping, REST endpoint trigger	Up to 1 M scheduled executions per month per tenant	Average latency from trigger to first token: 150 ms
Advisor‑executor stack	Small executor (Haiku, 6 B parameters) forwards hard cases to large advisor (Opus, 175 B)	Advisor invoked on ~12 % of requests	Token cost per successful request dropped 38 % vs. single‑model baseline
Capability curve metrics	SWE‑bench pass rate, code‑gen BLEU, token‑per‑correct‑line	Baseline Opus 4.7: 87 % SWE‑bench pass	Opus 4.7 improves 25 pts over Sonnet 3.7 (62 %)

Deployment considerations

Network topology – Managed Agents expose a gRPC endpoint behind a VPC‑peered load balancer. Enterprises should place the load balancer in the same region as their compute to keep round‑trip latency under 5 ms.
Credential scoping – Secrets are injected via a short‑lived AWS STS token that expires after 10 minutes. Agents must request a new token before the expiry window; otherwise execution is halted.
Checkpointing – Agents automatically snapshot their filesystem every 30 seconds. Snapshots are stored in encrypted S3 buckets; retention is configurable up to 30 days.
Cost model – Billing is per‑agent‑hour (USD 0.12 for the base sandbox) plus token usage for advisor calls (USD 0.0008 per 1 k tokens). The advisor‑executor pattern can reduce total token spend by up to 40 % for mixed‑complexity workloads.

Real‑world implications

Infrastructure economics

GitHub’s chief product officer Mario Rodriguez highlighted cache‑hit rate as the primary lever for cost control. With Claude‑generated prompts, a 1 % improvement in cache efficiency translates to millions of dollars saved across billions of API calls. GitHub now targets ≥ 94 % hit rates; a dip below 70 % triggers an automated alert pipeline.

Organizational design

Anthropic’s co‑founder Daniela Amodei emphasized that developers are the primary Claude users. The shift from single‑assistant interactions to teams of agents operating at the organizational level requires new governance models. Anthropic is piloting a “light and shade” policy that enforces safety guardrails while allowing rapid model iteration.

Partner deployments

Bun – Demonstrated a Robobun bot that reproduces every issue, runs regression tests, and opens a PR only when the new test passes. The bot leverages Managed Agents to isolate each test run, reducing flaky failures by 73 %.
Datadog – Introduced a “machine‑tool” pattern where agents emit structured intent specifications instead of ad‑hoc scripts. This approach improves observability, as each intent is logged with a unique identifier that can be traced through Datadog’s APM pipeline.
Vercel – Reported that Opus tokens constitute roughly 25 % of AI Gateway traffic but account for > 70 % of spend. By moving intermediate code generation into sandboxes, Vercel cut the number of required tool approvals by half and simplified its security review process.

Future roadmap

Anthropic’s capability‑curve framing sets expectations for the next twelve months. The roadmap includes:

Dynamic advisor selection – a meta‑model that routes requests to the most cost‑effective executor based on real‑time token pricing.
Persistent session memory – a managed storage layer that allows agents to retain state across routine invocations without sacrificing isolation.
Fine‑grained policy engine – extensible rule sets that let enterprises define custom safety thresholds per project.

Developers can watch the full recordings on the Anthropic YouTube channel, explore the session pages at claude.com/code‑with‑claude, or register for the upcoming London (May 19) and Tokyo (June 10) events.

Author: Andrew Hoblitzell, Senior Technical Lead, Eli Lilly

#Anthropic #Claude #Managed Agents #LLM #Infrastructure

Anthropic Unveils Managed Agents, Proactive Workflows, and the Capability Curve at Code With Claude 2026

Anthropic Unveils Managed Agents, Proactive Workflows, and the Capability Curve at Code With Claude 2026

Technical announcement

Specifications and benchmarks

Deployment considerations

Real‑world implications

Infrastructure economics

Organizational design

Partner deployments

Future roadmap

Comments

Anthropic Unveils Managed Agents, Proactive Workflows, and the Capability Curve at Code With Claude 2026

Anthropic Unveils Managed Agents, Proactive Workflows, and the Capability Curve at Code With Claude 2026