An AI-powered Azure SRE Agent leverages Model Context Protocol to autonomously discover resources, assess them against all 5 WAF pillars and organizational standards, and generate exact remediation commands with quantified impact.
Azure governance at scale is complex. Security teams manually review many resource types across multiple subscriptions. Finance can't track costs without tags. Compliance teams spend days cross-referencing WAF standards against actual infrastructure. And critical security gaps - like RDP open to 0.0.0.0/0 or customer-managed encryption disabled - slip through until discovered in audits. This workflow is broken.
Enter the Azure SRE Agent: an AI-powered compliance engine that discovers all resources, assesses them against all 5 WAF pillars AND your organization's specific standards in minutes, then generates exact remediation commands with quantified impact.

How it works
The agent leverages these capabilities to transform Azure governance:
Autonomous Resource Discovery via MCP - Azure MCP (Model Context Protocol) server exposes Azure Resource Graph and ARM APIs as discoverable tools. The agent automatically inventories all resources across subscriptions with metadata (types, locations, tags, security settings) in seconds.
Multi-Pillar WAF Assessment - For each discovered resource, the agent validates against all 5 WAF pillars (Reliability, Security, Cost Optimization, Operational Excellence, Performance) using Azure MCP tools, generating an assessment summary with pass/fail/partial/unknown counts.
Org Best Practices Cross-Check - The agent references your organization's compliance standards (stored in knowledge base as org-practices.md) to escalate WAF findings into actionable org policies. A Security policy violation becomes a critical finding. A cost optimization recommendation becomes a warning.
Automated Remediation Codegen - For every finding, the agent generates exact Azure CLI commands, Terraform snippets, and Portal steps with impact quantification (risk reduction, cost savings, compliance improvement).
Architecture
The Azure SRE Agent orchestrates WAF compliance checks by calling Azure MCP tools to discover and assess resources, then cross-references findings with organizational best practices from the knowledge base. End-to-end: subscription scope → resource discovery → 5-pillar assessment → org compliance → remediation commands.

Deployment
The Azure MCP server runs locally via Node.js in Stdio mode, communicating with the SRE Agent through standard input/output. Configured with Managed Identity credentials, it exposes Azure ARM and governance APIs as MPC tools. The agent uses this catalog plus the org practices knowledge base to validate compliance autonomously.
Scheduling Options:
- Ad-hoc: Run on-demand via agent prompt
- Scheduled: Use Azure SRE Agent's built-in task scheduler (daily/weekly/monthly compliance scans)
- Event-driven: Trigger on resource group changes or policy violations
Getting Started
Configure Azure MCP Connector:
- Connection Type: Stdio (local process)
- Command:
npx - Arguments (in order):
-y(auto-accept)azure/mcp(package name)server(run as server)start(start command)--mode(mode flag)all(enable all resources) - Environment Variables:
AZURE_CLIENT_ID: Your managed identity client IDAZURE_TOKEN_CREDENTIALS: managedidentitycredential - Managed Identity: Select your managed identity from dropdown
Configure Azure SRE Agent:
- Upload Knowledge Base from Builder > Knowledge Base using the org practices doc:
org-practices.md - Benefit: Agent validates Azure resources against both WAF standards AND your company-specific requirements
- Deploy the agent YAML:
Azure_WAF_Compliance_Agent.yaml, attach the Azure MCP and associate agent with the KB article. - Benefit: Autonomous resource discovery + 5-pillar assessment + org compliance in one execution
Run Your First Validation:
- Prompt the agent with your subscription or resource group:
Integrate with Operations:
- Schedule daily compliance scans via Azure SRE Agent's scheduled tasks
- Trigger assessments on resource deployments
- Export findings to Azure DevOps or ServiceNow for ticketing
Autonomous Compliance Workflow
Phase 1: Resource Discovery
Agent Task: The agent automatically discovers all resources in the target resource group, including VMs, storage accounts, NSGs, Key Vaults, App Services, and more - no manual resource IDs needed.
What the Agent Does:
- Calls Azure MCP tools in parallel:
list_virtual_machines()- Inventory computelist_storage_accounts()- Inventory storagelist_network_security_groups()- Inventory network securitylist_key_vaults()- Inventory secrets managementlist_web_apps()- Inventory App Services - Additional resource type queries as needed
- Builds comprehensive resource inventory with metadata (types, locations, tags, configurations)
Expected Output:
Phase 2: Multi-Pillar WAF Assessment
Agent Task: For each discovered resource, the agent validates against all 5 Azure Well-Architected Framework pillars using Azure MCP tools.
What the Agent Does:
Reliability Pillar:
- VM availability zones, backup configuration, health probes
- Storage redundancy (LRS vs GRS vs ZRS)
- App Service health checks and auto-healing
Security Pillar:
- NSG rules: Check for 0.0.0.0/0 sources on SSH/RDP
- Key Vault: Secret expiration dates, soft delete, purge protection
- Storage: Public blob access, shared key access, HTTPS enforcement, TLS version
- App Service: HTTPS-only, managed identity, VNet integration
Cost Optimization Pillar:
- VM right-sizing: CPU/memory utilization analysis
- Orphaned resources: Unattached disks, unused public IPs
- Resource tagging: cost-center, owner, environment tags
- Storage tiering opportunities
Operational Excellence Pillar:
- Monitoring: Azure Monitor diagnostics enabled
- Logging: Log Analytics workspace integration
- IaC: Resource tags indicating Terraform/Bicep management
Performance Pillar:
- VM series appropriateness for workload
- Storage account performance tier
- App Service plan scaling configuration
Expected Output:
Phase 3: Org Best Practices Cross-Check
Agent Task: Cross-reference all WAF findings with organizational requirements from org-practices.md knowledge base. Escalate violations of company-specific standards.
What the Agent Does:
- Reads org-practices.md from knowledge base
- For each WAF finding, checks if it intersects with org requirement
- Maps severity based on org standards:
- 🔴 Critical: NSG open to 0.0.0.0/0, Key Vault secrets expiring < 30 days, storage public access enabled
- 🟡 Warning: Missing required tags (cost-center, owner), idle VMs, orphaned disks
- 🔵 Info: Recommendations without org mandate
- Provides specific evidence and references to org-practices.md sections
Expected Output:
Phase 4: Remediation with Exact Commands
Agent Task: Generate copy-paste CLI commands, Terraform snippets, and Portal steps for every gap. Quantify impact (cost savings, risk reduction).
What the Agent Does:
- For each finding, generates precise fix commands
- Provides Azure CLI, PowerShell, or Terraform as appropriate
- Quantifies impact: cost reduction, security risk level, compliance improvement
Expected Output:
Remediation Roadmap
Immediate (Deploy Today - Critical Risk):
- Enable TLS 1.2 minimum on SQL Server prod-sql-server-01 (2 min)
- Fix NSG "nsg-aks-nodes" - Remove RDP from 0.0.0.0/0 (5 min)
- Enable SQL Server Auditing (10 min)
- Enable RBAC on AKS Cluster (recreate cluster, ~20 min downtime)
- Enable Customer-Managed Encryption on diagstorage01 (15 min)
Short-Term (Next 3 Days - High Risk):
- Install Azure Monitor VM extensions on ubuntu-prod-01 (10 min)
- Enable HTTPS-only on Function App data-processor-func (1 min)
- Enable Purge Protection on Key Vault kv-prod-01 (1 min)
- Deploy Azure Bastion for secure VM access (20 min)
- Enable AKS Calico network policies (10 min)
Short-Term (1 Week - Medium Risk):
- Delete orphaned disk disk-unattached-03 ($48/month savings)
- Configure storage lifecycle policy on logsstorage01 ($120/month savings)
- Add monitoring to Function App data-processor-func
- Add required tags to VM ubuntu-prod-01 and others
- Add NSG to unprotected subnets in vnet-prod
Medium-Term (2-4 Weeks - Low Risk):
- Implement automated secret rotation for Key Vault
- Configure Application Insights for all Function Apps
- Set up managed identity on win-dev-02 and other VMs
- Implement Azure Policy enforcement for ongoing compliance
Long-Term (Ongoing - Preventive):
- Schedule weekly compliance scans via Azure Logic Apps
- Integrate findings with Azure DevOps for backlog tracking
- Implement GitOps for IaC enforcement (Terraform, Bicep)
- Train teams on security best practices and compliance requirements
Real-World Results
The assessment workflow discovered 8 critical findings and 11 warnings across SQL, AKS, Storage, VMs, and NSGs. Each came with remediation commands and prioritized timelines. Total time: 6 minutes. Manual review would take 4-6 hours.
Key Benefits
Autonomous Discovery & Assessment ✅ Zero manual resource inventory - agent discovers everything ✅ Multi-pillar WAF validation in one execution ✅ Org-specific compliance enforcement via knowledge base
Risk & Cost Visibility 🔴 Immediate identification of critical security gaps (NSGs, secrets, public storage) 💰 Quantified cost savings from orphaned resources and idle VMs 📊 Evidence-based findings with direct references to org policies
Actionable Remediation 🛠️ Copy-paste CLI commands and Terraform for every gap ⚡ Prioritized roadmap (immediate/short/medium/long-term) 📈 Impact quantification (risk reduction, cost savings, compliance %)
Operational Impact
| Metric | Before | After | Improvement |
|---|---|---|---|
| WAF compliance review | 4-6 hours manual | 5-8 minutes autonomous | 95% |
| Critical security gap discovery | 2-3 days | Real-time | Immediate |
| Org policy violation tracking | Manual spreadsheet | Automated report | 100% |
| Orphaned resource cleanup | Quarterly review | Weekly automated scan | 4x frequency |
Scheduling Compliance Scans
Azure SRE Agent includes built-in task scheduling. From the Scheduled tasks menu, create a new task specifying:
- Frequency: Daily, Weekly, or Monthly
- Time: When to run scans
- Scope: Target resource group or subscription
- Autonomy: Autonomous (auto-remediate) or Review (approval required)
The agent runs on schedule, discovers resources, assesses WAF compliance, and executes or flags remediation based on your settings.
Conclusion
Azure SRE Agent transforms Azure governance by combining autonomous resource discovery, multi-pillar WAF assessment, and organization-specific compliance enforcement. The MCP integration provides:
- Continuous compliance monitoring across all 5 WAF pillars
- Org best practices enforcement via knowledge base integration
- Automated remediation with exact CLI commands and impact quantification
- Flexible scheduling for ad-hoc, scheduled, or event-driven scans
Result: Security teams maintain compliance effortlessly, finance tracks costs accurately, and platform teams remediate gaps with confidence.
Resources
🤖 Agent Configuration YAML 📋 Org Best Practices 📖 Azure SRE Agent Documentation 📰 Azure SRE Agent Blogs 📜 MCP Specification 🏗️ Azure Well-Architected Framework
Questions? Open an issue on GitHub or reach out to the Azure SRE team.
Updated Feb 12, 2026
VERSION 2.0 AZURE SRE AGENT BEST PRACTICES
Like 0 Comment
varghesejoji MICROSOFT Joined May 17, 2022 View Profile Apps on Azure Blog Follow this blog board to get notified when there's new activity

Comments
Please log in or register to join the discussion