Microsoft has unveiled its newest AI accelerator, the Azure Maia 200. This in-house chip is the next generation of Microsoft’s Maia GPU line, designed to dramatically speed up AI model inferencing and outperform offerings from competitors like Amazon and Google.
A Performance Leap, With an Eye on Efficiency
Table of Contents
The Maia 200 is being touted as Microsoft’s “most efficient inference system” to date, with the company emphasizing both its impressive performance gains and its commitment to environmental sustainability. Microsoft claims the new chip delivers 30% more performance per dollar than the first-generation Maia 100, despite a 50% increase in its Thermal Design Power (TDP).
What’s a TDP? Thermal Design Power measures the maximum amount of heat a chip can generate. A higher TDP generally means more power consumption and the need for more robust cooling solutions.
Built on TSMC’s 3nm process node, the Maia 200 packs a staggering 140 billion transistors. Microsoft states the chip can achieve up to 10 petaflops of FP4 compute—three times the performance of Amazon’s Trainium3 accelerator. It also boasts 216 GB of HBM3e memory with 7 TB/s of bandwidth, alongside 272MB of on-die SRAM.
| Row 0 – Cell 0 |
Azure Maia 200 |
AWS Trainium3 |
Nvidia Blackwell B300 Ultra |
|
Process technology |
N3P |
N3P |
4NP |
|
FP4 petaFLOPS |
10.14 |
2.517 |
15 |
|
FP8 petaFLOPS |
5.072 |
2.517 |
5 |
|
BF16 petaFLOPS |
1.268 |
0.671 |
2.5 |
|
HBM Memory Size |
216 GB HBM3e |
144 GB HBM3e |
288 GB HBM3e |
|
HBM Memory Bandwidth |
7 TB/s |
4.9 TB/s |
8 TB/s |
|
TDP |
750 W |
??? |
1400 W |
|
Bi-directional Bandwidth |
2.8 TB/s |
2.56 TB/s |
1.8 TB/s bidirectional |
Focus on Inference, Not Just Training
While much of the AI hardware buzz centers on training large models, the Maia 200 is specifically optimized for inference—the process of *using* those models to generate results. This focus allows Microsoft to target applications like natural language processing and image recognition, where speed and efficiency are paramount. The chip’s design prioritizes FP4 and FP8 performance, catering to AI models that benefit from lower precision calculations.
Microsoft’s design choices also emphasize memory hierarchy, with 272MB of high-efficiency SRAM partitioned into “multi‑tier Cluster‑level SRAM (CSRAM) and Tile‑level SRAM (TSRAM).” This approach aims to distribute workloads evenly across all HBM and SRAM dies, maximizing operating efficiency.
Measuring the Maia 200’s improvements over the Maia 100 is challenging due to limited comparative data released by Microsoft. However, the company claims the new chip runs hotter and offers a 30% performance-per-dollar improvement.
Deployment and Future Plans
The Maia 200 has already been deployed in Microsoft’s US Central Azure data center, with plans for expansion to US West 3 in Phoenix, Arizona, and other locations as more chips become available. It will operate as part of a heterogeneous system, working alongside other AI accelerators.
Originally codenamed Braga, the Maia 200 faced significant development delays. Initially slated for release in 2025, its arrival was pushed back. Microsoft’s next-generation hardware is reportedly slated to be fabricated on Intel Foundry’s 18A process, according to reports from October.
Microsoft’s emphasis on efficiency aligns with its broader messaging around responsible AI development. CEO Satya Nadella recently stated at the World Economic Forum that AI companies must demonstrate the benefits of their technology to the public, or risk losing “social permission” and triggering an AI bubble.
