At Build 2026, Microsoft repositioned Foundry from a model catalog into a full agent runtime, adding sandboxed hosted sessions, a managed tool registry, procedural memory with measurable benchmark gains, and an SLA-backed retrieval layer. The interesting part is not the feature list but the infrastructure underneath: durable state, scheduled execution, and framework-agnostic tracing built to survive production traffic.
Microsoft used Build 2026 in San Francisco to reframe what Microsoft Foundry actually is. In a blog post, Nick Brady describes the platform as "the place where AI agents move from experiments to production systems," and the release reflects that ambition. Instead of shipping a fresh batch of model endpoints, Microsoft added the parts that production systems demand: a runtime, a tool layer, memory, grounding, observability, and governance. For anyone who has tried to take an agent prototype past the demo stage, that list reads like a punch list of everything that breaks first.

Microsoft Foundry is positioned as Microsoft's "AI app and agent factory," a unified Azure platform for building, grounding, and governing agents with shared observability and policy across every agent in an organization. The documentation emphasizes native integration with Azure services, Microsoft 365 data sources, and open protocols for tools and frameworks. That last point matters more than it sounds, because the recurring failure mode in agent platforms is lock-in at the tooling and observability layer, where switching frameworks means rebuilding your traces from scratch.
The hosted runtime
The centerpiece is the hosted agent capability in Foundry Agent Service. Each agent runs inside a managed, sandboxed session with its own state, filesystem access, and support for multiple frameworks. The runtime exposes two interfaces: a stateful Responses API for conversations that need to carry context, and a lighter-weight invocations protocol for passthrough calls where you do not want the overhead of session management. That split is a sensible piece of API design. Not every agent call needs durable state, and forcing a stateful contract onto a simple function-style invocation wastes resources on session bookkeeping.
The same runtime backs long-running agents such as OpenClaw and Hermes, with durable state and files that survive across invocations. On top of that sits routines, currently in public preview, which run agents on a schedule. The examples Microsoft gives are overnight ticket triage and daily reporting, which is exactly the kind of unglamorous batch work that benefits from durable execution. An agent that has to resume cleanly after a restart needs its filesystem and conversation state checkpointed somewhere reliable, and pushing that into the platform removes a class of bugs that teams otherwise reinvent badly.
These additions build on the Azure AI Foundry Agent Service general availability release that InfoQ covered in 2025, which introduced multi-agent orchestration, agent-to-agent APIs, and support for frameworks including Semantic Kernel, AutoGen, and CrewAI. The trajectory is consistent: start with orchestration primitives, then add the execution substrate underneath them.
Toolboxes and distribution
The second area is tooling. Toolboxes in Foundry, now in public preview, give agents a single managed endpoint for tools, skills, Model Context Protocol clients, and enterprise data integrations. The design principle is register once, discover at runtime, rather than wiring every tool into every agent. Skills become versioned, project-scoped artifacts that can be exposed over MCP, and a tool search step helps the platform select a small relevant subset for each task instead of dumping the entire catalog into the model's context.
That tool-search detail is worth dwelling on. Exposing fifty tools to a model degrades selection accuracy and burns tokens on schema definitions the model never uses. Narrowing the candidate set before the model sees it is both a cost optimization and a reliability improvement, and it mirrors what teams have been hand-building with retrieval over tool descriptions. Folding it into the platform makes it a default rather than an afternoon hack.
Microsoft is also adding direct publishing from Foundry into Microsoft Teams and Microsoft 365 Copilot, with general availability planned for June 2026. Agents built in Foundry can surface where employees already work, with identity, permissions, and policy applied automatically. The governance angle is the point here: an agent that inherits Entra identity and existing access policy does not become a new permission-bypass surface.

Memory as a platform function
Foundry treats memory as a property of the platform rather than something each application reimplements. Memory in Foundry Agent Service, which entered public preview at the end of 2025, now offers procedural, user, and session memory. The new arrival at Build is procedural memory, designed to help agents learn how to carry out work across runs rather than just recalling what was said.
Procedural memory helps agents learn how to do the work across runs, not just what was said, with early Tau bench results showing 7 to 14 percent absolute success rate gains at near baseline cost.
--Nick Brady
A 7 to 14 percent absolute gain on Tau-bench at near baseline cost is a meaningful number, particularly because it is framed as absolute rather than relative. The mechanism, per InfoQ's earlier coverage, has the service extract key facts and procedures from conversations, consolidate them, and retrieve them through a managed store scoped by identifiers such as Entra ID, with controls for retention and inspection. Scoping memory by identity and making retention inspectable addresses the obvious objection to persistent agent memory, which is that it quietly accumulates data nobody audits.
Grounding through Foundry IQ
Grounding and retrieval are handled by Foundry IQ, presented as a knowledge layer that unifies Work IQ, Fabric IQ, Azure SQL, file search, and other sources behind a single SLA-backed retrieval endpoint. At Build, Microsoft announced Foundry IQ Serverless in public preview, multi-source knowledge bases in general availability, and Microsoft Web IQ for live web grounding with sub-200-millisecond responses and zero-data-retention guarantees, alongside security features for encryption, permission synchronization, and sensitivity-label governance.
Sub-200-millisecond live web grounding with no data retention is a specific engineering claim, and the latency target tells you something about the intended use. At that budget, web retrieval can sit inside an interactive agent loop without the user feeling the round trip. The permission-synchronization and sensitivity-label pieces are what make this usable in regulated environments, where retrieval that ignores document-level access control is a non-starter.
In a separate deep dive, Satyanarayana Padidapu describes Microsoft IQ, covering Work IQ, Fabric IQ, and Foundry IQ, as an intelligence layer meant to reduce duplicated retrieval-augmented-generation pipelines and make grounding a shared service across Copilot Studio, Microsoft 365, and Foundry agents. Consolidating RAG into one governed service rather than letting every team stand up its own pipeline is the kind of platform decision that pays off in maintenance cost more than in any single benchmark.
Models and compute
The model catalog gains four first-party MAI models in public preview: MAI Thinking 1 for chat and reasoning, MAI Image 2.5 for image generation and editing, MAI Transcribe 2 for speech-to-text with diarization, and MAI Voice 2 for multilingual text-to-speech with voice cloning. Fireworks AI on Foundry is now generally available, providing access to open models through a single Azure endpoint with enterprise SLAs, support for custom-weight models, and integration with Foundry's access controls and logging.
Vesa Nopanen's analysis of Claude Opus on Foundry calls the pattern a meaningful step forward for organizations that want frontier models under Azure governance, citing day-zero access, low latency, and integration with Foundry IQ and Work IQ for grounded agents. Managed Compute in Foundry Models targets the practical problem of regional GPU scarcity by routing workloads around capacity constraints, and supports fine-tuning and "Frontier Tuning," which Microsoft claims is significantly more cost-efficient than calling GPT 5.5 directly for tasks such as technical documentation generation. The GPU-routing claim is the one to watch in practice, since regional capacity is the constraint that most often turns a working deployment into a queue.
Observability without lock-in
The governance story closes with tracing and evaluations that work across frameworks.
Tracing and evaluations for any agent framework mean no team has to choose between its stack and observability. You can keep LangChain, Semantic Kernel, or your own code, and still get production grade traces and evaluations in Foundry.
--Nick Brady
This is the right place to draw the line. Forcing teams onto a proprietary framework to get traces is how platforms lose adoption, and decoupling observability from the orchestration framework lets Foundry instrument LangChain, Semantic Kernel, or hand-rolled code equally.
The surrounding ecosystem positions Foundry as the code-first tier. Szymon Bochniak's comparison frames Microsoft 365 Copilot Agent Builder and Copilot Studio as the visual, low-code options, with Foundry reserved for teams needing custom logic, advanced retrieval, and deeper developer-workflow integration. Microsoft's secure-agent guidance pushes teams to map where agents already touch build, test, and release, then apply the same discipline they use for microservices: clear scope, policy, tracing, and continuous evaluation. A Build 2026 recap from a DevOps perspective put it plainly, calling this the point where Foundry feels like a real production platform for agents rather than a place to wire up demos.
That framing is the honest one. None of these features is individually exotic, but durable runtime, a tool registry with search, identity-scoped memory, SLA-backed retrieval, and framework-neutral tracing are precisely the components that separate a demo from a system that survives contact with production traffic. Further information is available on Microsoft's Foundry site.

Comments
Please log in or register to join the discussion