Building AI Agents with Microsoft Foundry: A Progressive Lab from Hello World to Self‑Hosted
#AI

Building AI Agents with Microsoft Foundry: A Progressive Lab from Hello World to Self‑Hosted

Cloud Reporter
6 min read

The Microsoft Foundry Agent Lab walks developers through nine incremental demos that introduce core agent capabilities—tool calling, UI decoupling, server‑side tools, code interpretation, RAG, MCP integration, toolbox governance and self‑hosting—while keeping the architecture simple with a single model‑router deployment and server‑side conversation state.

Building AI Agents with Microsoft Foundry: A Progressive Lab from Hello World to Self‑Hosted

Featured image

What changed?

Microsoft Foundry released a structured, open‑source Agent Lab that turns the traditionally chaotic process of building AI agents into a step‑by‑step curriculum. Instead of a monolithic example that mixes retrieval‑augmented generation, tool‑calling, streaming, and UI code, the lab provides nine self‑contained demos, each adding exactly one new primitive. All demos share the same Foundry SDK, a single model‑router deployment, and server‑side conversation management via the Responses API. This approach reduces the on‑ramp friction for engineers and delivers a reusable reference architecture for production‑grade agents.


Provider comparison – why Foundry’s model‑router matters

Feature Microsoft Foundry (Model‑Router) OpenAI Direct Calls Anthropic / Cohere
Routing logic Automatic selection based on task complexity, cost and latency; zero code required Developer writes custom logic or selects a single model per request Typically static model selection; custom routing must be built by the user
Cost optimisation Routes cheap factual queries to grok‑4‑1‑fast‑reasoning and reserves frontier models for code or tool‑heavy turns All calls hit the same model; cost can spike when a heavy model is used for simple queries Similar to OpenAI; no built‑in cost tiering
Latency Fast models for simple turns keep response times low; heavy turns use more capable but slower models only when needed Latency dictated by the single model chosen; may be higher than necessary for trivial requests
Complexity No routing code, only declare the task (e.g., need tool calling) Must manage model selection manually, increasing boilerplate
Integration Works natively with Foundry’s Responses API, MCP, Toolbox, and built‑in tools (WebSearch, CodeInterpreter, FileSearch) Requires separate SDKs or wrappers for each capability
Security Uses DefaultAzureCredential – no API keys in code, managed identity in production API keys often stored in env files or secret managers; higher secret‑management overhead

The lab’s empirical data (see MODEL-ROUTER.md) shows the router picking the right model for each demo, from cheap factual recall to frontier code‑generation models, eliminating the need for developers to maintain a model‑selection matrix.


Business impact – how the progressive demos translate to production decisions

1. Start with the minimum viable agent (Demo 0)

  • Code footprint: < 30 lines using the Foundry SDK.
  • Conversation state: Stored server‑side via the Responses API, removing the classic bug of lost history in multi‑instance deployments.
  • Authentication: DefaultAzureCredential works locally (az login) and in Azure (managed identity) – no secrets to rotate.

2. Add function tools only when needed (Demo 1)

  • Control: Function tools run in the client process, letting you enforce custom error handling, rate limits, or compliance checks.
  • Strict schema (strict=True) guarantees well‑formed JSON arguments, reducing production parsing errors.

3. Decouple UI from agent logic (Demo 2)

  • Portability: The same agent can be surfaced via a terminal, a Tkinter desktop app, or later a web UI without changing the agent definition.
  • Team alignment: Front‑end developers focus on UX, back‑end engineers on prompt engineering and tool integration.

4. Leverage server‑side built‑in tools (Demo 3‑5)

  • WebSearchTool: Removes the client‑side loop; the model decides when to search and Foundry returns citations.
  • CodeInterpreterTool: Provides a sandboxed Python environment inside Foundry, ideal for data‑analysis or chart generation workloads.
  • FileSearchTool + vector store: Enables Retrieval‑Augmented Generation without managing an external vector DB; the vector store lives in Foundry and persists across sessions.
  • Business benefit: Faster time‑to‑market for RAG‑based support bots or analytics assistants, with lower operational overhead.

5. Adopt Model Context Protocol (MCP) for external system integration (Demo 6‑7)

  • MCP servers expose tools (e.g., GitHub issues) over a standard wire protocol; agents discover and call them without hard‑coding function signatures.
  • Toolbox adds governance: versioned snapshots, central ownership, and permission scoping (allowed_tools).
  • Risk mitigation: Human‑in‑the‑loop approval before side‑effecting calls prevents accidental data changes.

6. Self‑hosted agents when you need full control (Demo 8)

  • Custom inference path: Deploy a Dockerised server that implements the Responses protocol, allowing pre‑ or post‑processing that cannot be expressed in a system prompt.
  • Use cases: Compliance‑driven environments, A/B testing of prompt variants, or orchestrators that need to expose themselves as agents to other orchestrators.

Migration considerations

Migration Step What to change Impact on cost / ops
From local API keys to DefaultAzureCredential Replace openai.api_key with DefaultAzureCredential and configure managed identity on Azure resources. Reduces secret‑management burden; slight increase in Azure AD token acquisition latency (negligible).
From client‑side tool execution to built‑in tools Swap FunctionTool definitions with WebSearchTool, CodeInterpreterTool, or FileSearchTool. Remove the tool‑calling loop in client code. Cuts compute cost for tool execution (Foundry runs them on shared infrastructure) and simplifies error handling.
From single‑model calls to Model‑Router Set model=MODEL_DEPLOYMENT where MODEL_DEPLOYMENT = model-router. Remove any per‑request model selection logic. Optimises spend automatically; latency improves for simple queries.
From ad‑hoc conversation storage to Responses API Use openai.conversations.create() and pass conversation.id on each turn. Delete any local history arrays. Enables horizontal scaling; eliminates state‑sync bugs.
From monolithic agent to toolbox‑governed tools Register a Toolbox resource, pin agents to a specific version, and use McpTool(server_label="toolbox", ...). Centralises tool updates, reduces deployment friction across multiple agents.

Next steps for teams

  1. Clone the repogit clone https://github.com/microsoft-foundry/Foundry-Agent-Lab.git
  2. Run the hello demo – validates credentials, project endpoint and the Responses API.
  3. Iterate through demos 1‑8 – add the next primitive only when your product requirement demands it.
  4. Evaluate model‑router logsMODEL-ROUTER.md shows which model was chosen; use this data to set cost budgets.
  5. Plan production deployment
    • Use managed identity for authentication.
    • Store vector‑store IDs and toolbox version numbers in Azure Key Vault.
    • Enable human‑in‑the‑loop approvals for any side‑effecting MCP tools.
  6. Contribute – the lab is MIT‑licensed; submit improvements or new demos via GitHub Issues.

Resources


Bottom line

The Microsoft Foundry Agent Lab demonstrates that a production‑ready AI agent can be built with under 100 lines of code, a single model‑router deployment, and server‑side state management. By progressing through the nine demos, engineers gain a clear mental model of where to place tool logic, how to govern tool access, and when to take ownership of the inference path. The result is faster delivery, lower operational risk, and a cost‑optimised stack that scales from a simple “hello‑world” bot to a self‑hosted, multi‑tool orchestrator.

Comments

Loading comments...