Compliance Academy turns Microsoft Foundry Agent Service from a typical agent demo into a practical test of how enterprises might teach policy, prove grounding, and compare agent platforms.
What changed
Microsoft Reactor's June 10, 2026 Reasoning Agents Live Streaming Battle featured Compliance Academy, an open source multi-agent cyber mystery built on Microsoft Foundry Agent Service, Azure OpenAI, and Azure AI Search. The project, available on GitHub, uses a fictional biotech breach at Helix Dynamics to teach compliance through investigation rather than a static quiz.

The important change is not that someone wrapped a game around a chatbot. The more useful signal is architectural. Compliance Academy separates work across specialized agents: a Game Master for scene flow, a Forensic Analyst for evidence reasoning, a Compliance Officer for policy-grounded verdicts, a Scenario Generator for new cases, and suspect agents with their own personas and disclosure rules. That is the difference between a demo that answers questions and a system that can stage a process.
For cloud buyers, this matters because agent platforms are now being judged on operational qualities: routing, retrieval, observability, identity, cost control, and migration paths. A training game is a good test case because it needs narrative flexibility, but it also needs policy accuracy. If an agent invents a SOC 2 or HIPAA citation, the experience becomes a liability. Compliance Academy addresses that by grounding the Forensic Analyst and Compliance Officer against an Azure AI Search index containing policy and control documents.
Microsoft's own Azure AI Search agentic retrieval documentation describes a pipeline that can break complex questions into subqueries, run retrieval in parallel, rerank results, and return grounding data plus source references. In Compliance Academy, that maps cleanly to the learning experience: the learner asks messy investigative questions, agents retrieve policy snippets, and the UI shows the sources being used.
The second notable piece is observability. The demo includes a Chainlit interface for the player and a live activity log that shows retrieval calls, model calls, source names, relevance scores, first-token latency, and token completion behavior. That may sound like demo theater, but it is exactly the kind of evidence enterprise teams need before they put agents into compliance, security, finance, or HR workflows. Agent systems are hard to trust when they look like a single text box. They become easier to govern when every retrieval and model step is visible.
Provider comparison
Microsoft's position with Foundry Agent Service is strongest when the buyer already runs on Azure, Microsoft Entra ID, Azure OpenAI, Azure AI Search, Microsoft 365, or a Microsoft security stack. Foundry Agent Service gives teams a managed agent runtime, model catalog access, tools, tracing, identity controls, and deployment paths. Its agent types include prompt agents and hosted agents, with hosted agents suited to custom code and multi-agent orchestration.
AWS takes a different route with Amazon Bedrock Agents. Bedrock Agents can orchestrate foundation models, data sources, action groups, APIs, memory, monitoring, encryption, permissions, and knowledge bases without requiring teams to manage infrastructure directly. The AWS model is attractive for organizations that already keep their data in S3, use Lambda and API Gateway heavily, and want agent actions tied into existing AWS application services.
Google's comparable offer now sits under Gemini Enterprise Agent Platform, formerly associated with Vertex AI Agent Builder. Google emphasizes Gemini models, Model Garden, Agent Development Kit support, Agent Studio, MLOps tooling, and tight connections to Google Cloud data and AI services. It is a strong fit when the enterprise center of gravity is BigQuery, Vertex AI, Gemini, Looker, or Google Workspace.
| Decision area | Microsoft Foundry Agent Service | Amazon Bedrock Agents | Google Gemini Enterprise Agent Platform |
|---|---|---|---|
| Best fit | Microsoft-centric enterprises that need Entra ID, Azure OpenAI, Azure AI Search, and Microsoft 365 integration | AWS-native teams that want agents to call business APIs and use Bedrock Knowledge Bases | Google Cloud teams using Gemini, BigQuery, Model Garden, and Vertex-style ML operations |
| Agent design | Prompt agents and hosted agents, plus Foundry tools and tracing | Bedrock agents with action groups, knowledge bases, traces, memory, and API invocation | Agent Studio, ADK, Gemini APIs, Model Garden, and managed AI app development |
| Retrieval pattern | Azure AI Search and Foundry IQ style grounding, including agentic retrieval patterns | Bedrock Knowledge Bases, usually tied to AWS data sources such as S3 and vector stores | Google Cloud grounding patterns across Gemini and enterprise data services |
| Pricing shape | Model tokens, tool usage, retrieval charges, and possibly hosted compute depending on architecture | Model pricing varies by provider and tier, plus knowledge base, guardrails, and related Bedrock usage | Pay-as-you-go model and platform usage, with model-specific token pricing |
| Migration concern | Check API versions, especially if using classic connected agents, because Microsoft documents retirement for classic agents on March 31, 2027 | Map action groups and knowledge bases carefully, since AWS patterns often assume Lambda-style integration | Map agent logic to ADK, Gemini model behavior, and Google Cloud IAM/data boundaries |
Pricing deserves a separate procurement conversation. On Azure, the cost is not only the model call. Agentic retrieval can add Azure AI Search token charges and Azure OpenAI token charges for query planning and answer generation. Microsoft's pricing pages for Azure OpenAI and Azure AI Search should be modeled together, especially if agents retrieve policy context on every turn.
AWS has a similar multi-part cost profile. Amazon Bedrock pricing depends on model provider, model tier, input and output tokens, knowledge base usage, guardrails, routing, and related services. A Bedrock version of Compliance Academy might look inexpensive in a short demo, then become materially different under classroom-scale use if every learner generates many investigative turns and retrieval calls.
Google's Agent Platform pricing also varies by model and service usage. The main planning point is the same: agent workloads are conversation-shaped, not request-shaped. A single learning session can contain dozens of turns, multiple retrieval passes, tool calls, validations, and generated summaries. FinOps teams need to price sessions and outcomes, not only tokens.
Migration considerations
Compliance Academy is a useful migration case because it separates concerns well. The game surface is Chainlit. The orchestration layer manages agents and session state. Retrieval is handled through Azure AI Search. Model calls use Azure OpenAI. Scenario definitions live in JSON. That separation makes it easier to reason about portability.
A move from Azure to AWS would likely preserve the scenario model and UI while replacing Azure AI Search with Bedrock Knowledge Bases or another search layer, Azure OpenAI calls with Bedrock model invocation, and Foundry agent coordination with Bedrock Agents and action groups. The difficult part would not be rewriting the game. The difficult part would be preserving trace quality, citation behavior, and the exact interaction between suspect agents and compliance agents.
A move from Azure to Google Cloud would likely keep the same application shell but rework model access around Gemini APIs or Model Garden, retrieval around Google Cloud data services, and orchestration around Agent Platform and ADK. Teams already using BigQuery for policy analytics or Google Workspace for training delivery may find that attractive, but they still need to validate citation behavior and identity boundaries.
The cleanest multi-cloud strategy is not to pretend agent platforms are interchangeable. They are not. The better strategy is to keep domain assets portable: scenarios, policy chunks, evaluation tests, agent role definitions, and audit requirements. Then let each cloud use its native agent runtime, identity model, retrieval service, and observability stack. In practice, this means treating the cloud provider as the execution environment and treating the compliance curriculum, policy corpus, and evaluation harness as the durable intellectual property.

There is also a timing issue for Microsoft adopters. The source project references the Connected Agents pattern, while Microsoft's current documentation flags classic connected agents as deprecated with retirement scheduled for March 31, 2027. That does not make the design invalid. It means production teams should confirm the current Foundry API path before building a long-lived platform. A live demo can use the pattern that works today. An enterprise rollout needs a migration path written into the architecture decision record.
Business impact
For compliance leaders, the business impact is straightforward: agentic training can make policy interpretation active. Instead of asking employees to memorize control language, the system puts them in a simulated incident where policy has operational consequences. That matters for SOC 2, HIPAA, ISO 27001, NIST 800-53, vendor access reviews, credential handling, data loss response, and exception management.
For CIOs and cloud architects, the bigger message is that agent platforms are becoming application platforms. Compliance Academy uses agents for workflow, retrieval, role play, validation, source display, and live generation. That is closer to a business application than a chatbot. Once agents become application components, the selection criteria become familiar: identity, logging, cost predictability, regional availability, lifecycle management, testing, rollback, and vendor exit strategy.
For CISOs, the key question is whether the agent can show its work. Compliance Academy's activity log is more than a developer convenience. It is an audit pattern. If a learner accuses the wrong suspect, if an agent cites the wrong policy, or if a generated scenario violates internal rules, the organization needs trace data. The same lesson applies outside training. Customer support agents, procurement agents, HR agents, and finance agents all need a record of tool calls, source documents, model choices, and decision paths.
For HR and learning teams, this project points to a better model for enterprise education. Compliance content often fails because it is abstract and passive. A cyber mystery gives employees a reason to ask for evidence, compare testimony, and care about controls. The strategic value is not entertainment. The value is retention, judgment, and context. Employees remember why MFA exceptions are risky when they have investigated a fictional breach caused by one.
For FinOps teams, the warning is that agentic learning has a different cost curve than video training or static LMS content. A video has fixed production cost and low marginal delivery cost. An agentic case has ongoing inference, retrieval, and tool execution cost. That does not make it too expensive, but it changes the unit economics. Price per learner, price per completed case, price per generated scenario, and price per successful assessment are the metrics to model.
My consultant view: Compliance Academy is best read as a reference pattern, not just a novelty build. It demonstrates how to combine specialized agents, grounded retrieval, scenario generation, validation retries, observability, and an interactive UI into a learning workflow that can be inspected. Microsoft has a credible advantage when the enterprise already depends on Azure identity and search. AWS has a strong case when action-oriented agents need to sit near AWS application infrastructure. Google is compelling where Gemini, BigQuery, and ML operations are already strategic.
The recommended path is a structured pilot. Pick one compliance domain, such as vendor access or incident response. Build a small policy index. Create three scenarios. Require citations for every compliance judgment. Log every retrieval and model call. Run the same scenario design through Azure, AWS, or Google only if there is a real multi-cloud requirement, not as an academic exercise. Then compare cost per completed session, citation accuracy, latency, administrative effort, and audit readiness.
Compliance Academy shows that enterprise agents are moving from answer engines toward governed experiences. The winning cloud provider will not simply be the one with the strongest model. It will be the one that helps teams prove what happened, why the agent responded as it did, how much it cost, and whether the result can survive review.

Comments
Please log in or register to join the discussion