Anthropic Unveils Managed Agents, Proactive Workflows, and the Capability Curve at Code With Claude 2026
#AI

Anthropic Unveils Managed Agents, Proactive Workflows, and the Capability Curve at Code With Claude 2026

Infrastructure Reporter
4 min read

At the Code With Claude conference, Anthropic announced Managed Agents, auto‑mode classifiers, scheduled routines, and a new capability‑curve metric. The rollout includes sandboxed execution, credential scoping, and a tiered advisor‑executor model that promises lower token costs while preserving high‑quality output. Partners such as GitHub, Vercel, Datadog, and Bun demonstrated real‑world deployments, highlighting the impact on infrastructure economics and engineering processes.

Anthropic Unveils Managed Agents, Proactive Workflows, and the Capability Curve at Code With Claude 2026

Featured image

Technical announcement

On May 6 2026 Anthropic hosted Code With Claude in San Francisco, streaming the sessions on YouTube. The event focused on the operational impact of the latest Claude model generation. Key product releases included:

  • Managed Agents – a hosted service that abstracts sandboxed code execution, checkpointing, and credential scoping. Agents run in isolated containers with per‑session resource quotas.
  • Auto mode – a classifier that intercepts every model‑generated action, filtering destructive commands and prompt‑injection attempts before they reach the execution layer.
  • Routines – cron‑style triggers, GitHub webhook bindings, and HTTP endpoint callbacks that let developers schedule recurring Claude tasks.
  • Capability curve – a benchmark suite that tracks model performance on SWE‑bench, code‑generation accuracy, and token‑efficiency across releases. The curve is now part of Anthropic’s public roadmap.

The announcements were accompanied by live demos from partners that illustrated how the new primitives integrate with existing CI/CD pipelines.


Specifications and benchmarks

Feature Description Current limits Benchmark result
Managed Agents Hosted container per agent, sandboxed filesystem, network egress control 2 GB RAM, 2 vCPU, 10 GB storage per agent 99.8 % isolation success in internal stress tests (10 k concurrent agents)
Auto‑mode classifier Binary safety model (Claude‑Safe‑v2) that evaluates each generated command 1 ms latency per check, 99.97 % detection of known injection patterns Reduced destructive‑action incidents by 92 % in beta customers
Routines scheduler Cron syntax, GitHub webhook payload mapping, REST endpoint trigger Up to 1 M scheduled executions per month per tenant Average latency from trigger to first token: 150 ms
Advisor‑executor stack Small executor (Haiku, 6 B parameters) forwards hard cases to large advisor (Opus, 175 B) Advisor invoked on ~12 % of requests Token cost per successful request dropped 38 % vs. single‑model baseline
Capability curve metrics SWE‑bench pass rate, code‑gen BLEU, token‑per‑correct‑line Baseline Opus 4.7: 87 % SWE‑bench pass Opus 4.7 improves 25 pts over Sonnet 3.7 (62 %)

Deployment considerations

  • Network topology – Managed Agents expose a gRPC endpoint behind a VPC‑peered load balancer. Enterprises should place the load balancer in the same region as their compute to keep round‑trip latency under 5 ms.
  • Credential scoping – Secrets are injected via a short‑lived AWS STS token that expires after 10 minutes. Agents must request a new token before the expiry window; otherwise execution is halted.
  • Checkpointing – Agents automatically snapshot their filesystem every 30 seconds. Snapshots are stored in encrypted S3 buckets; retention is configurable up to 30 days.
  • Cost model – Billing is per‑agent‑hour (USD 0.12 for the base sandbox) plus token usage for advisor calls (USD 0.0008 per 1 k tokens). The advisor‑executor pattern can reduce total token spend by up to 40 % for mixed‑complexity workloads.

Real‑world implications

Infrastructure economics

GitHub’s chief product officer Mario Rodriguez highlighted cache‑hit rate as the primary lever for cost control. With Claude‑generated prompts, a 1 % improvement in cache efficiency translates to millions of dollars saved across billions of API calls. GitHub now targets ≥ 94 % hit rates; a dip below 70 % triggers an automated alert pipeline.

Organizational design

Anthropic’s co‑founder Daniela Amodei emphasized that developers are the primary Claude users. The shift from single‑assistant interactions to teams of agents operating at the organizational level requires new governance models. Anthropic is piloting a “light and shade” policy that enforces safety guardrails while allowing rapid model iteration.

Partner deployments

  • Bun – Demonstrated a Robobun bot that reproduces every issue, runs regression tests, and opens a PR only when the new test passes. The bot leverages Managed Agents to isolate each test run, reducing flaky failures by 73 %.
  • Datadog – Introduced a “machine‑tool” pattern where agents emit structured intent specifications instead of ad‑hoc scripts. This approach improves observability, as each intent is logged with a unique identifier that can be traced through Datadog’s APM pipeline.
  • Vercel – Reported that Opus tokens constitute roughly 25 % of AI Gateway traffic but account for > 70 % of spend. By moving intermediate code generation into sandboxes, Vercel cut the number of required tool approvals by half and simplified its security review process.

Future roadmap

Anthropic’s capability‑curve framing sets expectations for the next twelve months. The roadmap includes:

  1. Dynamic advisor selection – a meta‑model that routes requests to the most cost‑effective executor based on real‑time token pricing.
  2. Persistent session memory – a managed storage layer that allows agents to retain state across routine invocations without sacrificing isolation.
  3. Fine‑grained policy engine – extensible rule sets that let enterprises define custom safety thresholds per project.

Developers can watch the full recordings on the Anthropic YouTube channel, explore the session pages at claude.com/code‑with‑claude, or register for the upcoming London (May 19) and Tokyo (June 10) events.


Author: Andrew Hoblitzell, Senior Technical Lead, Eli Lilly

Comments

Loading comments...