Microsoft Unveils MDASH: A Multi‑Model Agentic Platform for Large‑Scale AI Vulnerability Research
#Vulnerabilities

Microsoft Unveils MDASH: A Multi‑Model Agentic Platform for Large‑Scale AI Vulnerability Research

Serverless Reporter
4 min read

Microsoft’s new MDASH system combines over a hundred specialized AI agents to automate code auditing across Windows, Hyper‑V, Azure and other products. The platform delivers high recall on historic vulnerabilities, scores 88.45 % on the CyberGym benchmark, and introduces a model‑agnostic orchestration layer that reshapes how enterprises approach AI‑assisted security.

Microsoft Unveils MDASH: A Multi‑Model Agentic Platform for Large‑Scale AI Vulnerability Research

Featured image

Microsoft announced MDASH, a multi‑model agentic security platform that automates vulnerability discovery across its flagship codebases. The system brings together more than 100 purpose‑built AI agents that scan, debate, validate, deduplicate and even generate proof‑of‑concept exploits. By treating the orchestration layer as the primary asset, MDASH shifts the focus from a single, ever‑larger model to a coordinated ecosystem of agents that can be swapped or upgraded without disrupting the surrounding workflow.


Service update

  • What MDASH does – It ingests a repository, breaks it into logical units, and assigns each unit to a scanner agent. The output feeds a debater agent that cross‑examines findings against threat‑intel feeds. A validator agent then runs symbolic execution or sandboxed exploits to confirm practical exploitability. Finally, a deduplication agent consolidates overlapping reports and a reporting agent formats the results for the Microsoft Security Response Center (MSRC) or external customers.
  • Performance metrics – On the public CyberGym benchmark (1,507 real‑world bugs) MDASH achieved an 88.45 % score, about five points ahead of the runner‑up. Internally, the platform recalled 96 % of historic clfs.sys bugs and 100 % of tcpip.sys cases.
  • Pricing & preview – MDASH is currently available through a private preview. Microsoft lists a base consumption price of $0.12 per scanned thousand lines of code, with a tiered discount for volumes above 10 M LOC. Additional charges apply for sandbox execution time at $0.03 per second. Interested parties can apply via the Microsoft Security preview portal.
  • Integration points – MDASH ships as a set of Azure‑hosted micro‑services with OpenAPI specifications. It can be invoked from Azure Pipelines, GitHub Actions, or directly via REST. The platform also publishes findings to Azure Sentinel and Microsoft Defender for Cloud via built‑in connectors.

Use cases

Scenario How MDASH helps Example
Enterprise OS hardening Continuous, automated audit of Windows kernel and driver code, with proof‑of‑concept generation for critical bugs. A large OEM integrates MDASH into its nightly build, catching a race condition in ntoskrnl.exe before release.
Cloud service provider security Scans Azure hyper‑visor and service‑mesh components, validates exploits in isolated containers, and feeds results to Defender for Cloud. An Azure partner runs MDASH weekly on their custom Hyper‑V extensions, discovering a privilege‑escalation path that would have been missed by static analysis alone.
Third‑party software supply chain Agents can be configured with custom threat‑intel feeds, enabling targeted scans of open‑source dependencies used in Azure Marketplace images. A SaaS vendor uses MDASH to audit its Docker base images, automatically rejecting builds that trigger a CVE‑linked exploit proof.
Research & red‑team automation Researchers can swap the underlying LLM for a newer model without rewriting pipelines, focusing effort on novel validation strategies. A university lab replaces the default model with an open‑source instruction‑tuned LLM, keeping the same validation agents to evaluate new vulnerability classes.

Trade‑offs and architectural considerations

  1. Complexity vs. agility – The multi‑agent design introduces orchestration overhead. Teams must provision a Kubernetes cluster or Azure Container Apps environment capable of scaling dozens of agents in parallel. The benefit is modularity: a failure in the debater does not halt scanning, and individual agents can be upgraded independently.
  2. Model‑agnosticism vs. consistency – Because MDASH treats models as plug‑ins, swapping to a newer LLM can change the false‑positive profile of the scanner agent. Organizations need a regression suite to ensure that new models do not degrade overall recall.
  3. Security of the agents – As Sandesh KS highlighted, the orchestration layer becomes a high‑value target. Misconfigured IAM policies could allow an agent to act on privileged resources across Azure AD, potentially amplifying an attack. Microsoft recommends a zero‑trust policy: each agent runs under a dedicated managed identity with the least privileges required, and all inter‑agent traffic is encrypted with mutual TLS.
  4. Cost predictability – While per‑line pricing is transparent, sandbox execution can be unpredictable for complex exploits that require long‑running emulation. Budget‑aware teams should set execution‑time caps and monitor usage through Azure Cost Management.
  5. Proof‑generation reliability – Automated exploit generation works well for memory‑corruption bugs but struggles with logic‑flaw vulnerabilities that need domain‑specific knowledge. In such cases, the validator may flag a high‑confidence finding for manual review rather than produce a full PoC.

Looking ahead

MDASH illustrates a shift toward orchestration‑centric AI security. The platform’s open‑API contract invites third‑party extensions, such as custom policy agents that enforce corporate compliance rules or integration with external bug‑bounty platforms. As more organizations adopt agentic pipelines, we can expect a growing ecosystem of reusable security agents, similar to the way serverless functions have become composable building blocks.

For teams that already rely on Azure DevOps or GitHub Actions, integrating MDASH is as simple as adding a YAML step that calls the /scan endpoint and then consumes the /report artifact. The real value emerges when the output feeds downstream automation—auto‑creating tickets in Azure Boards, triggering hot‑patch builds, or feeding telemetry into a SIEM.


Source: Microsoft Blog announcement

MDASH benchmark

Comments

Loading comments...