A new study details the first large-scale mixture-of-experts (MoE) pretraining entirely on AMD hardware, introducing ZAYA1—a 760M active parameter model that rivals Qwen3-4B and Gemma3-12B. Packed with microbenchmarks for Pollara networking and MI300X optimizations, it offers actionable guidance for developers eyeing non-NVIDIA AI training stacks. This work underscores AMD's readiness for competitive foundation model development.