AMD Instinct MI350P PCIe AI Accelerator Debuts With 144GB HBM3E, 40% FP16/FP8 Compute Lead Over Nvidia H200 NVL
#Chips

AMD Instinct MI350P PCIe AI Accelerator Debuts With 144GB HBM3E, 40% FP16/FP8 Compute Lead Over Nvidia H200 NVL

Chips Reporter
5 min read

AMD has launched the Instinct MI350P, a PCIe-form-factor AI accelerator built on TSMC 3nm and 6nm processes, delivering 144GB of HBM3E memory and up to 43% higher theoretical FP16 compute than Nvidia's H200 NVL for data center inference and RAG workloads.

Featured image

Launch Announcement

AMD has expanded its Instinct MI350 series of data center AI accelerators with the new MI350P, a PCIe-compatible model designed for drop-in deployment in existing air-cooled server infrastructure. The new card targets inference and retrieval-augmented generation (RAG) workloads across small, medium, and large-scale AI deployments, offering a direct competitor to Nvidia's H200 NVL PCIe accelerator. AMD Instinct MI350 Series

The MI350P is positioned as a cost-effective upgrade path for data center operators that have already invested in air-cooled PCIe server racks, avoiding the need for liquid cooling retrofits required by higher-performance OAM (Open Accelerator Module) form factor accelerators such as the MI350X or Nvidia H100 SXM.

Technical Specifications

The MI350P is built on AMD's CDNA4 architecture, fabricated using a combined TSMC 3nm and 6nm FinFET process node split. Compute dies use the 3nm node for higher transistor density and performance per watt, while I/O and memory controllers are built on the more mature 6nm process to reduce production costs. TSMC 3nm Technology tsmc

CDNA4 is AMD's fourth-generation data center GPU architecture, optimized for AI and high-performance computing workloads with dedicated matrix cores for accelerated low-precision compute, a key requirement for large language model inference. The MI350P includes 128 compute units (CUs), 8,192 stream processors, 512 matrix cores, and a maximum clock speed of 2.2GHz.

The card is equipped with 144GB of HBM3E (High Bandwidth Memory 3 Enhanced) memory, delivering 4TB/s of bandwidth, and includes a 128MB last-level cache to reduce memory latency for frequent data accesses. HBM3E is the latest iteration of stacked memory technology, offering higher bandwidth and lower power consumption than previous HBM3 generations, critical for feeding data to the accelerator's compute cores during intensive AI workloads.

Physically, the MI350P is a 10.5-inch dual-slot PCIe card with a passive cooling solution, meaning it relies on chassis fans in rack-mounted servers for thermal management. It has a default power envelope of 600W, but can be configured to run at 450W to fit into power or thermally constrained server chassis. AMD Instinct MI350P PCIe Card

Up to eight MI350P cards can be linked in a single server node for scaled performance, with native support for lower-precision MXFP6 and MXFP4 formats optimized for LLM inference. The table below outlines the MI350P's specifications relative to other members of the Instinct MI350 series:

Specification Instinct MI350P PCIe Instinct MI325X OAM Instinct MI350X OAM 8 x Instinct MI350X OAM Instinct MI355X OAM 8 x Instinct MI355X OAM
GPU Architecture CDNA 4 CDNA 3 CDNA 4 CDNA 4 CDNA 4 CDNA 4
Dedicated Memory Size 144 GB HBM3E 256 GB HBM3E 288 GB HBM3E 2.3 TB HBM3E 288 GB HBM3E 2.3 TB HBM3E
Memory Bandwidth 4 TB/s 6 TB/s 8 TB/s 8 TB/s per OAM 8 TB/s 8 TB/s per OAM
FP64 Performance 36 TFLOPs N/A 72 TFLOPs 577 TFLOPs 78.6 TFLOPS 628.8 TFLOPs
FP16 Performance 2.3 PFLOPS 2.61 PFLOPS 4.6 PFLOPS 36.8 PFLOPS 5 PFLOPS 40.2 PFLOPS
FP8 Performance 4.6 PFLOPS 5.22 PFLOPS 9.2 PFLOPs 73.82 PFLOPs 10.1 PFLOPs 80.5 PFLOPs
FP6 Performance N/A N/A 18.45 PFLOPS 147.6 PFLOPS 20.1 PFLOPS 161 PFLOPS
FP4 Performance* N/A N/A 18.45 PFLOPS 147.6 PFLOPS 20.1 PFLOPS 161 PFLOPS

Performance Comparisons

AMD positions the MI350P as a direct competitor to Nvidia's H200 NVL PCIe accelerator, the current leading PCIe-based AI accelerator from the market leader. Nvidia H200 NVL Internal testing from AMD claims the MI350P delivers 20% higher theoretical FP64 performance, 43% higher FP16 performance, and 39% higher FP8 performance than the H200 NVL. For MXFP4 workloads, AMD estimates the MI350P delivers 2,299 sustained TFLOPs and 4,600 peak TFLOPs, making it the highest-performance enterprise PCIe AI accelerator available as of its launch.

Nvidia has not announced a PCIe-compatible version of its latest B200 Blackwell GPUs, which use HBM3e memory, leaving AMD with a temporary lead in the PCIe HBM accelerator segment. AMD Instinct MI350P PCIe Card Press Deck

Supply Chain Context

The MI350P's production relies on TSMC's advanced node capacity, which has faced high demand from AI accelerator and smartphone chip clients over the past 24 months. TSMC's $165 billion U.S. manufacturing expansion, which includes new 3nm fabs in Arizona, is expected to increase supply chain resilience for AMD's accelerator production, reducing reliance on Taiwanese fabs for North American data center clients.

TSMC's ability to deliver stable 3nm and 6nm wafer volumes will be a key factor in AMD's ability to meet MI350P demand, as the accelerator market remains supply-constrained for advanced node chips.

Market Position and Adoption Outlook

AMD's primary challenge in the AI accelerator market remains Nvidia's dominant CUDA software ecosystem, which has become a standard for AI model training and inference across most data center operators. AMD has invested heavily in its ROCm software stack to close this gap, with company representatives noting at CES 2026 that ROCm now supports 90% of the top 100 open-source AI models with performance parity to CUDA in key inference workloads. AMD ROCm AMD Instinct MI350P PCIe Card Press Deck

The MI350P's drop-in compatibility with existing air-cooled servers lowers the barrier to adoption for data center operators that want to avoid costly infrastructure upgrades required for liquid-cooled OAM modules. Early interest has come from mid-sized cloud service providers and enterprise data center operators with existing PCIe server infrastructure, who want to add AI acceleration capacity without retrofitting for liquid cooling.

Pricing for the MI350P has not been announced, but AMD typically prices its Instinct accelerators 20-30% lower than comparable Nvidia models to incentivize adoption. Volume shipments are expected to begin in Q3 2026, with select hyperscale clients receiving early samples in Q2.

Comments

Loading comments...