Microsoft has shifted its strategy for securing the Windows ecosystem, moving away from reliance on single, massive AI models in favor of a “swarm” of specialized agents to identify software flaws. The company recently debuted a multi-model agentic scanning harness, known as MDASH, which utilizes more than 100 AI agents to hunt for vulnerabilities across the Windows networking and authentication stack.
The system recently identified 16 new vulnerabilities, including four critical remote code execution flaws. These high-severity bugs were located within the IKEv2 key management protocol and the Windows kernel TCP/IP stack. Microsoft has already patched these vulnerabilities as part of its standard monthly software update cycle.
This transition to a Microsoft bug hunting AI system signals a broader industry shift toward “agentic” AI—where multiple AI entities collaborate, debate, and verify each other’s work—rather than relying on a single prompt-and-response interaction from a frontier model.
Beyond the Single-Model Approach
For years, the race in AI security has been defined by the size and capability of individual models. However, Microsoft is now arguing that the “durable advantage” in cybersecurity lies in the orchestration system surrounding the model, not the model itself. By deploying an ensemble of both frontier and distilled models, MDASH can assign specific roles to different agents to mimic a human security team.
According to Taesoo Kim, vice president of security research at Microsoft, the system employs agents acting as an auditor, a debater, and a prover. This structure prevents the AI from simply “hallucinating” a bug or missing a subtle flaw due to a lack of perspective. The process relies on conflict; when an auditor flags a piece of code as suspect and the debater cannot successfully refute the claim, the credibility of the finding increases.
This methodology acknowledges the inherent limitations of current large language models. As Kim noted, the company does not expect a single prompt to handle the entire lifecycle of vulnerability discovery—from recognition to validation and exploitation—in one pass.
Benchmarking Performance Against Rivals
The effectiveness of the agentic approach has been measured using the CyberGym benchmark, a testing framework developed by the University of California, Berkeley to evaluate AI’s ability to find vulnerabilities in production software. In self-reported data, MDASH outperformed the flagship individual models from its primary competitors.
| AI System/Model | CyberGym Success Rate | Architecture Type |
|---|---|---|
| Microsoft MDASH | 88.4% | Multi-Agent Ensemble |
| Anthropic Mythos | 83.1% | Single Model |
| OpenAI GPT 5.5 | 81.8% | Single Model |
The performance gap highlights a strategic divergence in the AI arms race. While OpenAI and Anthropic have focused on the raw reasoning capabilities of models like GPT 5.5 and Mythos, Microsoft is investing in the “harness” that manages how those models interact. This move comes amid a fraying relationship between Microsoft and OpenAI, coinciding with Microsoft’s push to develop its own proprietary “MAI” series of models, including MAI-Transcribe-1, MAI-Voice-1, and MAI-Image-2.
Enterprise Scale and Production Defense
The deployment of MDASH represents a move from theoretical AI research into what Microsoft describes as “production-grade defense at enterprise scale.” Tom Gallagher, who leads the Microsoft Security Response Center, indicated that AI is significantly accelerating both the speed and the scale at which vulnerabilities can be discovered and mitigated.
Despite the results, Microsoft is not making the tool available to the general public. MDASH is currently restricted to internal use by Microsoft engineers and a little group of customers participating in a limited private preview. The company has positioned the system as a defensive shield for its own infrastructure rather than a commercial product.

The implications of this technology extend beyond Microsoft. As AI agents become more capable of finding and exploiting “zero-day” vulnerabilities, the window between the discovery of a bug and the deployment of a patch is shrinking. The use of an agentic system allows for a continuous, automated auditing process that can keep pace with the speed of modern software deployment.
Microsoft has not announced a date for a wider release of MDASH, but the company is expected to provide further updates on its agentic security research during its next quarterly security review.
Do you think multi-agent AI systems will eventually replace human security researchers? Share your thoughts in the comments.
