The custom AI ASIC state of play (May 2026) – Broadcom, Google, Meta and the rest
#Regulation

The custom AI ASIC state of play (May 2026) – Broadcom, Google, Meta and the rest

Chips Reporter
5 min read

Broadcom’s $73 billion AI backlog, Google’s Ironwood TPU, Meta’s MTIA chips and AWS Trainium3 illustrate how TSMC’s advanced nodes and packaging are reshaping the AI‑silicon market, with custom ASICs now poised to capture a majority of inference spend by 2027.

The custom AI ASIC state of play (May 2026)

Featured image

Announcement

TSMC now manufactures the silicon that powers every major hyperscaler‑custom AI chip, and the numbers are moving fast. Broadcom reported $8.4 billion of AI semiconductor revenue in Q1 FY2026 – a 106 % YoY rise – and it is guiding $10.7 billion for Q2. The company says it has a $73 billion AI backlog and expects over $100 billion in annual AI‑chip revenue by 2027. At the same time, Google’s Ironwood TPU, Meta’s MTIA 500, AWS Trainium3 and Microsoft Maia 200 are all in volume or near‑volume production, each pushing the performance‑per‑watt envelope higher while TSMC’s CoWoS and SoIC packaging become the bottleneck for supply.

Technical specs

Broadcom XDSiP platform

  • Stacking technology: face‑to‑face 3‑D via TSMC SoIC + 2.5‑D CoWoS
  • Die area: up to 6,000 mm² per package (vs. ~2,500 mm² for conventional 2‑D)
  • HBM: up to 12 stacks per package, each 8 GB → 96 GB total per XPU
  • First 2 nm compute SoC (Feb 2026): 4 N2 compute dies, 1 I/O die, 6 HBM modules
  • Networking: Tomahawk 6 switch (102.4 Tbps Ethernet) and Jericho 4 fabric (51.2 Tbps) to interconnect >1 million XPUs

Google TPU v7 – “Ironwood”

  • Process: TSMC N3P (3 nm) dual‑chiplet, co‑designed with Broadcom & MediaTek
  • Compute: 4,614 FP8 TFLOPS per chip, 2 TensorCores (256×256 MXU) + 4 SparseCores
  • Memory: 192 GB HBM3E, 7.37 TB/s bandwidth
  • System: 9,216‑chip SuperPod → 42.5 exaflops FP8, 1.77 PB HBM aggregate
  • Utilisation: semi‑analysis estimates ~90 % sustained FLOP utilisation for transformers (vs. 70‑80 % for GPUs)
  • Cost: Google claims 44 % lower TCO vs. a GB200 server for the same workload

AWS Trainium3

  • Process: TSMC 3 nm (first 3 nm AI accelerator from AWS)
  • Compute: 2.517 PFLOPS FP8 per chip, 144 GB HBM3E, 4.9 TB/s bandwidth
  • UltraServer: 144 chips → 362 PFLOPS FP8, 20.7 TB memory (4.4× improvement over Trainium2)
  • Roadmap: Trainium4 (late 2026/early 2027) – 3× FP8 performance, 6× FP4 throughput, 4× memory bandwidth, 288 GB HBM, NVLink Fusion support

Meta MTIA 400 / 500

  • Process: 3 nm CoWoS (MTIA 300‑series onward)
  • MTIA 400: 6 PFLOPS FP8, 18 PFLOPS MX4, 288 GB HBM, 9.2 Tbps bandwidth, 1,200 W
  • MTIA 500 (2027): 10 PFLOPS FP8, 30 PFLOPS MX4, up to 512 GB HBM, 27.6 Tbps bandwidth, 1,700 W
  • Scaling: HBM bandwidth +4.5×, compute +25× from 300 to 500, new chip every ~6 months

Microsoft Maia 200

  • Process: TSMC 3 nm, ~140 billion transistors
  • Compute: >10 PFLOPS FP4, 5 PFLOPS FP8, 216 GB HBM3E, 7 TB/s bandwidth, 750 W
  • Use case: GPT‑5.2 training, Azure Copilot workloads

Other notable efforts

  • Tesla AI6 (Samsung 3 nm) – inference‑only, still in early silicon
  • Intel Gaudi 3 – software maturity issues, shipments cut >30 % in 2024
  • Huawei Ascend 910C (SMIC 7 nm) – 800 TFLOPS FP16, 128 GB HBM, 20 % yield
  • Cambricon – targeting 500 k units in 2026

Market implications

  1. TSMC’s capacity is the new constraint – CoWoS/CoWoS‑L wafers rose from ~70 k WPM in 2025 to a planned 120‑130 k WPM in 2026. Allocation shares (approx.) are:

    • Nvidia ≈ 60 % (≈ 595 k wafers)
    • Broadcom ≈ 15 % (≈ 150 k wafers)
    • AMD ≈ 11 % (≈ 105 k wafers) The remainder is split among AWS, Meta, Microsoft and emerging Chinese players. When CoWoS capacity fills, any delay in new fab lines will directly throttle AI‑chip shipments.
  2. Inference spend dominates – Deloitte projects two‑thirds of AI compute in 2026 will be inference. Custom ASICs claim up to 65 % lower TCO versus GPUs at scale, making them the default choice for large‑scale serving farms.

  3. Revenue shift – Broadcom and Marvell together control ~95 % of the ASIC co‑design market. Nvidia’s share of AI‑silicon revenue is already falling; with Broadcom targeting $100 billion in 2027, Nvidia could see double‑digit percentage point erosion of pricing power.

  4. Supply‑chain risk – The concentration of advanced‑node capacity in a single foundry means geopolitical or weather‑related disruptions (e.g., Taiwan’s seismic activity) could ripple through every hyperscaler’s AI roadmap. Diversification efforts (Samsung’s AI6, SMIC’s Ascend) are still small‑scale and face yield challenges.

  5. Performance‑per‑watt race – Ironwood’s 4,614 FP8 TFLOPS at 1,200 W is within 10 % of Nvidia’s Blackwell GPU, but TPU utilisation numbers push effective performance higher. Meta’s MTIA 500 aims for 10 PFLOPS FP8 at 1,700 W, a ~2× FP8 density over Ironwood. Trainium4’s announced 3× FP8 boost will place it in the same league, but its NVLink Fusion feature suggests a hybrid future where custom ASICs and GPUs share workloads.

  6. Customer lock‑in – Multi‑year contracts (OpenAI‑Broadcom, Anthropic‑Google, Meta‑Meta, AWS‑OpenAI) lock in billions of dollars of silicon demand. The contracts also force hyperscalers to co‑design around the foundry’s roadmap, reinforcing TSMC’s bargaining position.

Outlook to 2027

  • Volume growth: All five hyperscalers are slated to ship >10 million custom AI chips annually by 2027, a ten‑fold increase from 2023 levels.
  • Node migration: 2 nm production will become mainstream for high‑density inference chips (Broadcom, Meta) while 3 nm remains the sweet spot for most new designs.
  • Packaging pressure: CoWoS‑L and emerging fan‑out wafer‑level packaging (FOWLP) will be required to keep HBM bandwidth scaling with compute.
  • Competitive dynamics: Nvidia will retain a lead in frontier‑model training (FP8‑optimized GPUs), but its margin on inference will shrink as custom ASICs capture the bulk of serving workloads.

For deeper dive into each chip’s benchmark data, see the Bench database linked in the sidebar.

Comments

Loading comments...